# Notebook 12.b: Finding Linear Patterns

> Data! Data! Data! I can't make bricks without clay.
>
> — [Arthur Conan Doyle](https://en.wikipedia.org/wiki/Arthur_Conan_Doyle)

## 🎯 Learning Objectives

By the end of this notebook, you will be able to:
- Use the "method of first differences" to identify a linear pattern in a sequence of data.
- Write a Python function that calculates the differences between elements in a list.
- Determine the equation of a line (`y = mx + b`) from a sequence of numbers.

## 📚 Prerequisites

This notebook builds on concepts from the previous lesson. Before you begin, make sure you are comfortable with:
- Concepts from [Notebook 12.a: Functions, Sequences, and Plots](https://colab.research.google.com/github/sguy/programming-and-problem-solving/blob/main/notebooks/12.a-functions-sequences-and-plots.ipynb), including writing functions and using lists.

*Estimated Time: 45 minutes*

---

[Return to Table of Contents](https://colab.research.google.com/github/sguy/programming-and-problem-solving/blob/main/notebooks/table-of-contents.ipynb)

In [None]:
import matplotlib.pyplot as plt

## Introduction: The Detective's Work

In the last notebook, we started with a rule (a function) and generated a sequence of data. Now, we're going to do the reverse. Like a detective finding clues, we'll start with a sequence of numbers and work backward to find the secret rule that created it.

Our first case is a taxi service. We have a list of fares for trips of different lengths, but we don't know the pricing rule. Can we figure it out?

## 🐍 New Concept: The Method of First Differences

Let's start with the data. Here are the costs for taxi rides of 1, 2, 3, 4, and 5 miles.

In [None]:
# miles:    1,   2,   3,    4,    5
fares = [4.5, 7.0, 9.5, 12.0, 14.5]

In [None]:
plt.figure(figsize=(8, 6))
plt.plot([1, 2, 3, 4, 5], fares, marker='o', linestyle='-')
plt.title('Taxi Fare vs. Miles Traveled')
plt.xlabel('Miles Traveled')
plt.ylabel('Fare ($)')
plt.grid(True)
plt.show()

Let's look at the data and see if we can spot a pattern. How much does the price change between each one-mile increase?

- From 1 to 2 miles: `7.00 - 4.50 = 2.50`
- From 2 to 3 miles: `9.50 - 7.00 = 2.50`
- From 3 to 4 miles: `12.00 - 9.50 = 2.50`
- From 4 to 5 miles: `14.50 - 12.00 = 2.50`

This is our first big clue! The price increases by the exact same amount for every extra mile. This is called a **constant difference**.

### 💡 Tip: Slope vs. Gradient

The special name for the rate of change in a linear function can be a bit confusing. You will hear it called by two different names:

- **Slope**: This is very common in algebra classes.
- **Gradient**: This is very common in higher-level mathematics, science, and fields like machine learning.

They both mean the exact same thing: the 'steepness' or rate of change of the line. In these notebooks, we will use the term **gradient**.

### The Rule of Constant Differences

Here is a fundamental rule in mathematics:
> If the difference between consecutive terms in a sequence is constant, the sequence is generated by a **linear function**.

A linear function is the mathematical name for the equation of a straight line, which you might have seen before as `y = mx + b`.

- `m` is the **gradient** (or slope), which represents the rate of change. In our case, it's the cost per mile. The constant difference we found, `$2.50`, is our gradient!
- `b` is the **y-intercept**, a starting value or flat fee. In our case, it would be the pickup fee for the taxi.

### Finding the Full Equation

So we know our rule is something like:

`cost = 2.50 * miles + b`

How do we find `b`? We can just take any point from our data and plug it in. Let's use the first data point: when `miles = 1`, the `cost = 4.50`.

\begin{aligned}
m x + b &= y\\
2.50 \times 1 + b &= 4.50\\
2.50 + b &= 4.50\\
b &= 4.50 - 2.50\\
&= 2.00\\
\end{aligned}

So the flat pickup fee is $2.00! Our final rule is:

**`cost = 2.50 * miles + 2.00`**

Let's test it with another point, say 3 miles: 

\begin{aligned}
&= 2.50 \times 3 + 2.00\\
&= 7.50 + 2.00\\
&= 9.50\\
\end{aligned}

It works!

### 🎯 Mini-Challenge: Automate the Detective Work

Manually calculating the differences is fine for a short list, but what if we had hundreds of data points? That would be tedious and error-prone. This is a perfect job for a computer!

Your goal is to complete the function `calculate_differences` that takes a list of numbers (a `sequence`) and returns a new list containing the differences between them.

<details>
  <summary>Hint 1: How do you get the first difference?</summary>
  The first difference is calculated by subtracting the first element from the second element.
</details>

<details>
  <summary>Hint 2: How do you iterate through a list to get pairs of numbers?</summary>
  You can use a `for` loop with `range()` and indexing. Remember that to get the difference between two numbers, you need to access `sequence[i+1]` and `sequence[i]`. Be careful not to go out of bounds!
</details>

<details>
  <summary>Hint 3: What should be the length of the new list?</summary>
  If the original sequence has `n` elements, the list of differences will have `n-1` elements.
</details>

In [None]:
def calculate_differences(sequence):
    # YOUR CODE HERE
    pass

# Test your function
test_sequence = [4.5, 7.0, 9.5, 12.0, 14.5]
expected_differences = [2.5, 2.5, 2.5, 2.5]
calculated_differences = calculate_differences(test_sequence)

print("Original Sequence: " + str(test_sequence))
print("Calculated Differences: " + str(calculated_differences))
print("Expected Differences: " + str(expected_differences))

assert calculated_differences == expected_differences, "Your function did not return the correct differences!"
print("Success! Your function works for the taxi fare example.")

test_sequence_2 = [1, 3, 6, 10, 15]
expected_differences_2 = [2, 3, 4, 5]
calculated_differences_2 = calculate_differences(test_sequence_2)

print("Original Sequence 2: " + str(test_sequence_2))
print("Calculated Differences 2: " + str(calculated_differences_2))
print("Expected Differences 2: " + str(expected_differences_2))

assert calculated_differences_2 == expected_differences_2, "Your function did not return the correct differences for the second test case!"
print("Success! Your function works for the second test case.")

<details>
  <summary>Click to see a possible solution</summary>

```python
def calculate_differences(sequence):
    differences = []
    for i in range(len(sequence) - 1):
        diff = sequence[i+1] - sequence[i]
        differences.append(diff)
    return differences

# Test your function
test_sequence = [4.5, 7.0, 9.5, 12.0, 14.5]
expected_differences = [2.5, 2.5, 2.5, 2.5]
calculated_differences = calculate_differences(test_sequence)

print("Original Sequence: " + str(test_sequence))
# Expected output: Original Sequence: [4.5, 7.0, 9.5, 12.0, 14.5]
print("Calculated Differences: " + str(calculated_differences))
# Expected output: Calculated Differences: [2.5, 2.5, 2.5, 2.5]
print("Expected Differences: " + str(expected_differences))
# Expected output: Expected Differences: [2.5, 2.5, 2.5, 2.5]

assert calculated_differences == expected_differences, "Your function did not return the correct differences!"
print("\nSuccess! Your function works for the taxi fare example.")
# Expected output: 
# Expected output: Success! Your function works for the taxi fare example.

test_sequence_2 = [1, 3, 6, 10, 15]
expected_differences_2 = [2, 3, 4, 5]
calculated_differences_2 = calculate_differences(test_sequence_2)

print("\nOriginal Sequence 2: " + str(test_sequence_2))
# Expected output: 
# Expected output: Original Sequence 2: [1, 3, 6, 10, 15]
print("Calculated Differences 2: " + str(calculated_differences_2))
# Expected output: Calculated Differences 2: [2, 3, 4, 5]
print("Expected Differences 2: " + str(expected_differences_2))
# Expected output: Expected Differences 2: [2, 3, 4, 5]

assert calculated_differences_2 == expected_differences_2, "Your function did not return the correct differences for the second test case!"
print("\nSuccess! Your function works for the second test case.")
# Expected output: 
# Expected output: Success! Your function works for the second test case.
```
</details>

### 🎯 Mini-Challenge: The Cupcake Conundrum

You're at a bakery, and you want to buy some cupcakes. The bakery sells cupcakes in boxes, and there's a special deal: the more boxes you buy, the cheaper each additional box becomes. However, there's also a fixed cost for the fancy packaging.

You have the following data for the total cost of buying different numbers of boxes:

- 1 box: $10.00
- 2 boxes: $18.00
- 3 boxes: $26.00
- 4 boxes: $34.00
- 5 boxes: $42.00

Your task is to:
1. Use the `calculate_differences` function you just wrote to find the constant difference in the total cost. This will be the cost per additional box.
2. Using this constant difference and one of the data points, deduce the fixed cost of the fancy packaging (the `b` in `y = mx + b`).
3. Print the cost per additional box and the fixed packaging cost.

In [None]:
# Cupcake costs for 1, 2, 3, 4, and 5 boxes
cupcake_costs = [10.00, 18.00, 26.00, 34.00, 42.00]

# YOUR CODE HERE
pass

# Expected output:
# Cost per additional box: 8.0
# Fixed packaging cost: 2.0

<details>
  <summary>Click to see a possible solution</summary>

```python
# Cupcake costs for 1, 2, 3, 4, and 5 boxes
cupcake_costs = [10.00, 18.00, 26.00, 34.00, 42.00]

# 1. Use calculate_differences to find the constant difference
cost_differences = calculate_differences(cupcake_costs)
cost_per_additional_box = cost_differences[0] # Since it's a constant difference

# 2. Deduce the fixed packaging cost (b)
# Using the first data point: 1 box costs $10.00
# total_cost = cost_per_additional_box * num_boxes + fixed_packaging_cost
# 10.00 = cost_per_additional_box * 1 + fixed_packaging_cost
fixed_packaging_cost = cupcake_costs[0] - (cost_per_additional_box * 1)

# 3. Print the results
print("Cost per additional box: " + str(cost_per_additional_box))
# Expected output: Cost per additional box: 8.0
print("Fixed packaging cost: " + str(fixed_packaging_cost))
# Expected output: Fixed packaging cost: 2.0
```
</details>

In [None]:
num_boxes = [1, 2, 3, 4, 5]
cupcake_costs = [10.00, 18.00, 26.00, 34.00, 42.00]

plt.figure(figsize=(8, 6))
plt.plot(num_boxes, cupcake_costs, marker='o', linestyle='-')
plt.title('Cupcake Cost vs. Number of Boxes')
plt.xlabel('Number of Boxes')
plt.ylabel('Total Cost ($)')
plt.grid(True)
plt.show()

## Summary and Next Steps

In this notebook, you became a data detective! You learned how to identify linear patterns in sequences of numbers using the **Method of First Differences**. By calculating the constant difference between consecutive terms, you were able to determine the **gradient** (`m`) of a linear function. Then, by plugging in a known data point, you found the **y-intercept** (`b`), thus revealing the hidden `y = mx + b` rule. You also practiced automating this process with Python functions.

### Key Takeaways:
- The **Method of First Differences** helps identify linear relationships in sequences.
- A **constant first difference** indicates a linear function.
- The constant first difference is the **gradient** (`m`) of the linear function.
- You can find the **y-intercept** (`b`) by using a known data point and the calculated gradient.

### Next Up: Notebook 12.c: Cracking the Quadratic Code 🕵️‍♀️

In our next notebook, [Notebook 12.c: Cracking the Quadratic Code](https://colab.research.google.com/github/sguy/programming-and-problem-solving/blob/main/notebooks/12.c-cracking-the-quadratic-code.ipynb), we'll tackle even more complex patterns. What happens when the first differences aren't constant? Get ready to uncover the secrets of quadratic functions!

[Return to Table of Contents](https://colab.research.google.com/github/sguy/programming-and-problem-solving/blob/main/notebooks/table-of-contents.ipynb)