# 12.b: Finding Linear Patterns

> Data! Data! Data! I can't make bricks without clay.
>
> — [Arthur Conan Doyle](https://en.wikipedia.org/wiki/Arthur_Conan_Doyle)

## 🎯 Learning Objectives

By the end of this notebook, you will be able to:
- Analyze the differences of a constant function (degree 0).
- Use the Method of First Differences to identify linear patterns (degree 1).
- Visualize a sequence and its differences using plots.

## 📚 Prerequisites

This notebook builds on concepts from the previous lesson. Before you begin, make sure you are comfortable with:
- Concepts from [Notebook 12.a: Functions, Sequences, and Plots](https://colab.research.google.com/github/sguy/programming-and-problem-solving/blob/main/notebooks/12.a-functions-sequences-and-plots.ipynb), including passing functions as arguments and basic plotting.

*Estimated Time: 30 minutes*

---

[Return to Table of Contents](https://colab.research.google.com/github/sguy/programming-and-problem-solving/blob/main/notebooks/table-of-contents.ipynb)

## Introduction: From Data to Rules

In the last notebook, we started with a rule (a Python function) and used it to generate a sequence of data, which we then plotted. We acted like engineers building a model.

Now, we're going to put on our detective hats and do the reverse. We will start with a sequence of data and try to uncover the hidden mathematical rule that created it. This is a fundamental skill in science, finance, and engineering, where the goal is to find a model that explains the data you observe.

Our primary tool will be the **Method of Differences**. By calculating the difference between consecutive data points, we can reveal the underlying pattern.

## 🛠️ Setup: Our Detective Kit

Before we start our investigation, let's prepare our tools. We'll use a helper function from our previous lesson and create a new one for calculating differences. We will also import `numpy` and `matplotlib`.

- **`get_function_values(func, domain)`**: Our helper from last time. It takes a function and a list of x-values and returns the list of y-values.
- **`calculate_differences(sequence)`**: Our new tool. It takes a list of numbers and returns a new list containing the differences between consecutive values.
- **`np.nan`**: A special value from `numpy` meaning "Not a Number." Why do we need this? When we calculate the difference between items in a list, the *first* item has nothing before it to compare against. We use `np.nan` as a placeholder for this first calculated difference, which is undefined. This also ensures our lists of differences stay the same length as our original list, which is useful for plotting.
- **`matplotlib.pyplot`**: Our go-to library for plotting our data and our differences, so we can see the patterns.

### 🔗 External Resources:
- **Documentation for `numpy.nan`**: [https://numpy.org/doc/stable/reference/constants.html#numpy.nan](https://numpy.org/doc/stable/reference/constants.html#numpy.nan)

In [None]:
import numpy as np
import matplotlib.pyplot as plt

def get_function_values(func, domain):
    codomain = []
    for x in domain:
        codomain.append(func(x))
    return codomain

def calculate_differences(sequence):
    if not sequence or len(sequence) < 2:
        return [np.nan] * len(sequence)

    differences = [np.nan]
    for i in range(1, len(sequence)):
        differences.append(sequence[i] - sequence[i-1])
    return differences

## A Clue for Our Detective: The Polynomial's "Degree"

The secret to the Method of Differences lies in a concept from algebra called the **degree** of a polynomial. A polynomial is just an expression with variables (like `x`), numbers, and whole number exponents. For example:
$$y = 4x^2 - 5x + 10$$
The **degree** is simply the highest exponent of the variable. In the example above, the exponents are 2 (from $4x^2$), 1 (from $-5x^1$), and 0 (from $10x^0$), so the degree is **2**.

Why is this our big clue? Because a polynomial's degree tells us exactly how many times we need to calculate the differences to find a constant pattern. It's the key that unlocks the secret of the sequence!

- A **degree 0** polynomial (like `y = 15`) will have a constant sequence to begin with.
- A **degree 1** polynomial (like `y = 10x - 2`) will have a constant *first* difference.
- A **degree 2** polynomial (like `y = 4x^2 - 5x + 10`) will have a constant *second* difference. (We'll see this in the next notebook!)

### ✅ Check Your Understanding

What is the degree of each of the following polynomials?

- a) $y = 5x^3 + 2x^2 - 7$
- b) $y = 10x - 2$
- c) $y = 15$
- d) $y = x^4 + x$

<details><summary>Click for the answers</summary>

- a) **Degree 3** (the highest exponent is 3)
- b) **Degree 1** (since $10x$ is the same as $10x^1$)
- c) **Degree 0** (since $15$ is the same as $15x^0$)
- d) **Degree 4** (the highest exponent is 4)

</details>

## Case 1: The Constant Function (Degree 0)

Let's start with the simplest possible rule: a constant function. No matter what the input is, the output is always the same. This is like a subway system where the ticket price is the same for every trip.

What do you expect to see when we calculate the differences between each step?

In [None]:
# 1. Define the rule as a function
def calc_subway_fare(station_number):
    return 2.75

# 2. Define the domain (the inputs to the function)
x_stations = list(range(5))

# 3. Generate the sequence (the outputs)
y_fares = get_function_values(calc_subway_fare, x_stations)

# 4. Calculate the first difference
d1_fares = calculate_differences(y_fares)

print(f"Original Sequence: {y_fares}")
print(f"First Difference:  {d1_fares}")

In [None]:
# Let's visualize the constant sequence and its differences
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(8, 6), sharex=True)
fig.suptitle('Analysis of a Constant Function', fontsize=16)

# Plot Original Sequence
ax1.plot(x_stations, y_fares, 'o-')
ax1.set_title("Original Sequence (Constant)")
ax1.set_ylabel("Fare ($)")
ax1.grid(True)

# Plot First Difference
ax2.plot(x_stations, d1_fares, 'o-', color='r')
ax2.set_title("First Difference")
ax2.set_ylabel("Change in Fare ($)")
ax2.set_xlabel("Station Number")
ax2.grid(True)

plt.show()

As the plots clearly show, the first difference is a sequence of zeros (after the initial `nan`). This makes sense—since the value isn't changing, the difference between consecutive values is always zero.

This connects to the idea of polynomial degrees. A constant function like `y = 2.75` can be written mathematically as:
$$y = 2.75 \cdot x^0$$
Since anything to the power of 0 is 1, the $x^0$ term is equal to 1 and we don't usually write it. But seeing it this way shows that the highest power of x is 0, which is why we call a constant function a **degree 0** polynomial.

> 💡 A sequence is **constant (degree 0)** if and only if its first difference is zero.

## Case 2: The Linear Function (Degree 1)

Now, let's look at a linear function. Imagine a taxi fare that starts at $5 (the flag-drop fee) and increases by $2 for every kilometer driven. We can model this rule with a function.

In [None]:
# 1. Define the rule as a function
def calc_taxi_fare(kilometers):
    return 5 + 2 * kilometers

# 2. Define the domain
x_kilometers = list(range(6))

# 3. Generate the sequence
y_taxi_fares = get_function_values(calc_taxi_fare, x_kilometers)

# 4. Calculate the first difference
d1_taxi_fares = calculate_differences(y_taxi_fares)

print(f"Original Sequence: {y_taxi_fares}")
print(f"First Difference:  {d1_taxi_fares}")

In [None]:
# Let's visualize the linear sequence and its differences
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(8, 6), sharex=True)
fig.suptitle('Analysis of a Linear Function', fontsize=16)

# Plot Original Sequence
ax1.plot(x_kilometers, y_taxi_fares, 'o-')
ax1.set_title("Original Sequence (Linear)")
ax1.set_ylabel("Fare ($)")
ax1.grid(True)

# Plot First Difference
ax2.plot(x_kilometers, d1_taxi_fares, 'o-', color='r')
ax2.set_title("First Difference")
ax2.set_ylabel("Change in Fare ($)")
ax2.set_xlabel("Kilometer")
ax2.grid(True)

plt.show()

This time, the plot of the first difference is a horizontal line at `2.0`. This constant value is the **rate of change**, or the slope of the line. In our taxi example, it's the $2 cost per kilometer.

> 💡 A sequence is **linear** if and only if its first difference is a non-zero constant.

### ✅ Check Your Understanding

You are given a sequence of numbers and after calculating its first differences, you get `[nan, 4, 4, 4, 4]`. What can you conclude about the original sequence?

a) It is a constant sequence.
b) It is a linear sequence.
c) It is a quadratic sequence.
d) You cannot determine the pattern from the first differences.

<details><summary>Click for the answer</summary>
The answer is **b) It is a linear sequence**. A constant first difference indicates a linear pattern.
</details>

### 🎯 Mini-Challenge: The Leaky Bucket

A bucket is leaking water at a steady rate. The volume of water (in liters) is measured every minute. The data is recorded in the `water_volume` list below.

Your task is to:
1. Calculate the first differences for this sequence.
2. Based on the result, print whether the leak is linear or not.
3. Explain what the constant difference represents in this context.

In [None]:
water_volume = [10.0, 9.5, 9.0, 8.5, 8.0]

# 1. Calculate the first differences
d1_water = [] # YOUR CODE HERE

print(f"First differences: {d1_water}")

# 2. Check if the result is constant and print your conclusion
# YOUR CODE HERE

# 3. Explain what the difference means in a print statement
# YOUR CODE HERE

<details><summary>Click to see a possible solution</summary>

```python
water_volume = [10.0, 9.5, 9.0, 8.5, 8.0]

# 1. Calculate the first differences
d1_water = calculate_differences(water_volume)
print(f"First differences: {d1_water}")
# Expected output: First differences: [nan, -0.5, -0.5, -0.5, -0.5]

# 2. Check if the result is constant and print your conclusion
# We can see by looking that the difference is a constant -0.5.
print("The leak is linear because the first difference is constant.")

# 3. Explain what the difference means in a print statement
print("The constant difference of -0.5 means the bucket is losing 0.5 liters per minute.")
```

</details>

## 🎉 Well Done!

In this notebook, we used the Method of First Differences to test if a sequence of data was generated by a constant or linear rule. By calculating and plotting the differences, we can quickly identify the nature of the underlying function.

### Key Takeaways
- The first difference of a **constant** sequence is **zero**.
- The first difference of a **linear** sequence is a **non-zero constant**.
- This constant difference is the function's **rate of change** (or slope).

### Next Up: Notebook 12.c: Cracking the Quadratic Code 🚀

But what happens if the first difference *isn't* constant? In our next notebook, [Notebook 12.c: Cracking the Quadratic Code](https://colab.research.google.com/github/sguy/programming-and-problem-solving/blob/main/notebooks/12.c-cracking-the-quadratic-code.ipynb), we'll become data detectives again and see what happens when we take the difference of the differences.

---
[Return to Table of Contents](https://colab.research.google.com/github/sguy/programming-and-problem-solving/blob/main/notebooks/table-of-contents.ipynb)