# 📘 Introduction to Univariate Linear Regression & Gradient Descent

## Welcome, Future AI Expert! 👋

In this 2-hour interactive session, we will dive into one of the most fundamental algorithms in machine learning: **Univariate Linear Regression**. We'll learn how a computer can find relationships in data and make predictions. We'll also uncover the magic behind how it 'learns' using an algorithm called **Gradient Descent**.

### 🎯 Our Learning Objectives:

1.  **Understand Linear Regression**: What it is and why it's useful.
2.  **The Hypothesis & Cost Function**: Learn how the model makes predictions and measures its own error.
3.  **Grasp Gradient Descent**: Discover the iterative process of learning from data.
4.  **Code with Loops**: Implement Gradient Descent the intuitive, step-by-step way.
5.  **Code with Vectorization**: Learn the fast and efficient way using the NumPy library.
6.  **Apply Your Knowledge**: Solve a final assignment to solidify your new skills!

Let's get started! 🚀

## Topic 1: What is Univariate Linear Regression?

Imagine you have data and you want to predict a value. For example, predicting a student's exam score based on the hours they studied. Univariate Linear Regression helps us find a **straight line** that best fits our data.

This line is represented by a simple equation:

`y = mx + c`

In Machine Learning, we often write it like this:

**`hθ(x) = θ₀ + θ₁x`**

*   `hθ(x)` is our **prediction** (hypothesis).
*   `x` is our **input feature** (e.g., hours studied).
*   `θ₀` (theta-zero) is the **y-intercept**.
*   `θ₁` (theta-one) is the **slope** of the line.

Our goal is to find the best values for `θ₀` and `θ₁` that make our line fit the data as closely as possible!

In [None]:
# Let's import the libraries we'll need
import numpy as np
import matplotlib.pyplot as plt

# Here's our simple dataset: Hours Studied vs. Exam Score
hours_studied = np.array([1, 2, 3, 4, 5])
exam_score = np.array([2, 4, 5, 4, 5])

# Let's visualize the data!
plt.scatter(hours_studied, exam_score)
plt.xlabel("Hours Studied")
plt.ylabel("Exam Score")
plt.title("Student Data")
plt.show()

### 🎯 Practice Task 1: Observe the Data

Look at the plot above. Do you see a general trend? As the 'Hours Studied' increase, what tends to happen to the 'Exam Score'?

## Topic 2: How Good is Our Prediction? - The Cost Function (MSE)

Before we can find the 'best' line, we need a way to measure how 'bad' a line is. We do this with a **Cost Function**.

The cost function calculates the difference between our model's predictions and the actual data points. We use the **Mean Squared Error (MSE)**, which sounds complex but is quite simple:

1.  For each data point, calculate the **error** (the distance between the predicted value and the actual value).
2.  **Square** each error (this makes them positive).
3.  Calculate the **average** of all the squared errors.

**Our goal is to find `θ₀` and `θ₁` that make this cost as low as possible!** A lower cost means our line is a better fit.

In [None]:
# Let's define the cost function in Python
def calculate_cost(theta0, theta1, x, y):
    m = len(y) # Number of data points
    predictions = theta0 + theta1 * x
    squared_errors = (predictions - y) ** 2
    cost = (1 / (2 * m)) * np.sum(squared_errors)
    return cost

# Let's test it with a guess: theta0 = 0 and theta1 = 0
initial_cost = calculate_cost(0, 0, hours_studied, exam_score)
print(f"The initial cost with our first guess is: {initial_cost}")

### 🎯 Practice Task 2: Experiment with Cost

Let's try a better guess for `theta1`. We can see from the plot that the slope should be positive. 

Copy the code from the cell above and calculate the cost for `theta1 = 1`. Is the cost higher or lower? What does this tell you about your new guess?

🧪 **Try changing the values of `theta0` and `theta1` to see if you can get the cost even lower!**

## Topic 3: The Learning Algorithm - Gradient Descent

Guessing `θ₀` and `θ₁` randomly is not very efficient. We need an algorithm that can find the best values for us. That algorithm is **Gradient Descent**!

**💡 Analogy:** Imagine you are standing on a foggy mountain and want to get to the lowest point. You can't see the bottom, but you can feel the slope under your feet. You would take a small step in the steepest downhill direction. You repeat this process until you reach the valley floor.

That's exactly what Gradient Descent does!
*   The **'mountain'** is our cost function.
*   The **'lowest point'** is the minimum cost.
*   The **'step size'** is called the **Learning Rate (alpha, `α`)**.

It iteratively adjusts `θ₀` and `θ₁` in small steps to minimize the cost.

## Topic 4: The Step-by-Step Way - Gradient Descent with Loops (Non-Vectorized)

One way to implement Gradient Descent is by using `for` loops. We loop through our dataset many times (called 'iterations' or 'epochs'), and in each iteration, we update our `theta` values slightly.

This approach is very intuitive and easy to understand because it follows the process one data point at a time.

In [11]:
# Data (same as before)
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])

# Parameters
alpha = 0.01          # Learning Rate
iterations = 1000     # Number of steps to take
m = len(y)            # Number of data points
theta0 = 0            # Initial guess for intercept
theta1 = 0            # Initial guess for slope

# Gradient Descent Loop
for _ in range(iterations):
    sum_error0 = 0
    sum_error1 = 0
    
    # Inner loop through each data point
    for i in range(m):
        prediction = theta0 + theta1 * x[i]
        error = prediction - y[i]
        sum_error0 += error
        sum_error1 += error * x[i]
    
    # Calculate the gradients (the direction of the step)
    grad0 = (1/m) * sum_error0
    grad1 = (1/m) * sum_error1
    
    # Update the parameters (take the step downhill)
    theta0 = theta0 - alpha * grad0
    theta1 = theta1 - alpha * grad1

print(f"✅ Training Complete!")
print(f"Final Theta0 (intercept): {theta0}")
print(f"Final Theta1 (slope): {theta1}")

✅ Training Complete!
Final Theta0 (intercept): 1.8521278749603411
Final Theta1 (slope): 0.6963550004885218


### 🎯 Practice Task 3: The Learning Rate

The learning rate `alpha` is a very important parameter.

*   **Too small:** The algorithm will learn very slowly.
*   **Too large:** It might overshoot the minimum and never find it!

🧪 **Experiment:** Copy the code above. Change `alpha` to `0.5` and re-run. What happens to the final `theta` values? (You might see `inf` or `NaN`, which means it failed!). Now, try a very small value like `0.0001`.

## Topic 5: The Speedy Way - Vectorized Gradient Descent

Using loops in Python can be slow, especially with large datasets. **Vectorization** is the process of using libraries like **NumPy** to perform operations on entire arrays (or 'vectors') at once.

It's like calculating the predictions for all students at the same time, instead of one by one. This is much faster because NumPy's functions are written in low-level languages like C and are highly optimized.

In [12]:
import numpy as np
# Data
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])

# Parameters
alpha = 0.01
iterations = 1000
m = len(y)


In [4]:
# Prepare the data for vectorization
# We add a column of ones to 'x' to handle the theta0 term. This is a common trick!
X = np.c_[np.ones(m), x]
X

array([[1., 1.],
       [1., 2.],
       [1., 3.],
       [1., 4.],
       [1., 5.]])

In [7]:
X.shape

(5, 2)

In [6]:
theta = np.zeros(2) # theta is now a vector [theta0, theta1]
theta

array([0., 0.])

In [8]:
theta.shape

(2,)

In [None]:




# Vectorized Gradient Descent
for _ in range(iterations):
    # Predictions for ALL examples at once!
    predictions = X.dot(theta)
    
    # Errors for ALL examples at once!
    errors = predictions - y
    
    # Gradients for BOTH thetas at once!
    gradient = (1/m) * X.T.dot(errors)
    
    # Update BOTH thetas at once!
    theta = theta - alpha * gradient

print(f"✅ Vectorized Training Complete!")
print(f"Final Theta vector [theta0, theta1]: {theta}")

### 🎯 Practice Task 4: Understand the Shapes

In vectorization, the shape (or dimensions) of your matrices is very important. Add a `print()` statement in the code cell above to check the `.shape` of `X`, `theta`, and `y`. Understanding these dimensions is key to working with matrix operations.

## 🚀 Final Revision Assignment (Time for Practice!)

Congratulations on making it this far! Now it's time to put everything you've learned into practice with a final assignment. These tasks combine all the topics we've covered.

### Task 1: Short Question (Theory)

In your own words, explain the role of the learning rate (`α`) in gradient descent. What happens if it's too large or too small?

### Task 2: New Dataset - Ice Cream Sales!

Let's use a new dataset! We want to predict ice cream sales based on the temperature.

**Data:**
*   `temperature` = [20, 22, 25, 30, 33, 35]
*   `sales` = [15, 20, 28, 40, 45, 52]

Complete the code below to train a linear regression model on this new data using the **vectorized** approach.

In [None]:
# Data for the assignment
temperature = np.array([20, 22, 25, 30, 33, 35])
sales = np.array([15, 20, 28, 40, 45, 52])

# --- YOUR CODE HERE --- #

# 1. Set Parameters (alpha, iterations)
alpha = 0.001 # Hint: You might need a smaller alpha for this data!
iterations = 10000
m = len(sales)

# 2. Prepare the feature matrix 'X' with a column of ones
# X = ...

# 3. Initialize the theta vector with zeros
# theta = ...

# 4. Write the gradient descent loop
# for _ in range(iterations):
    # ... (calculate predictions, errors, gradient, and update theta)

# 5. Print the final trained theta values
# print(f"Final Theta for ice cream sales: {theta}")

### Task 3: Visualize the Cost Function

It's very useful to see if our cost is actually decreasing. Modify your vectorized code from Task 2 to store the cost at each iteration and then plot it.

You will need to:
1. Create an empty list `cost_history = []` before the loop.
2. Inside the loop, calculate the cost using the `calculate_cost` function we wrote earlier and append it to the list: `cost_history.append(cost)`.
3. After the loop, use `matplotlib.pyplot` to plot the `cost_history`.

In [None]:
# Complete this code to plot the cost history.
# You will need to combine your vectorized gradient descent code
# with the cost calculation at each step.

cost_history = []

# --- YOUR GRADIENT DESCENT LOOP HERE --- #


# --- PLOTTING CODE --- #
# plt.plot(cost_history)
# plt.xlabel("Iterations")
# plt.ylabel("Cost")
# plt.title("Cost Decrease Over Time")
# plt.show()

### Task 4: Make a Prediction!

Using your final, trained `theta` values from the ice cream sales model, write a line of code to predict the number of sales on a day when the temperature is **28** degrees.

In [None]:
# Your final theta values (replace with your trained values)
final_theta0 = 0 # Replace with your value
final_theta1 = 0 # Replace with your value

temperature_to_predict = 28

# Calculate the predicted sales
predicted_sales = final_theta0 + final_theta1 * temperature_to_predict

print(f"Predicted sales for {temperature_to_predict} degrees: {predicted_sales}")

## 🎉 You Did It! 🎉

Excellent work! You have successfully built, trained, and used your very first machine learning model from scratch. You've learned about linear regression, how to measure its performance with a cost function, and how to train it with gradient descent using two different methods.

This is a huge step in your AI journey. Keep experimenting and happy learning!