- are required to program the complete technique of *gradient descent* by hand.
- Download and load the file `housing-data.txt`.

1. Load the data and create a scatter plot. As you will see, there seems to be a nice linear relationship between the size (on the horizontal axis) and the price (on the vertical axis).
- As we have discussed, the general formula for the cost of a model is as follows:

$$
J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} ( h_\theta(x^{(i)}) - y^{(i)} ) ^2 
$$

2. Write a function `compute_cost` that receives a matrix `X` (of size $m \times n$), a vector `y` (of size $n \times 1$) and a vector `theta` (of size $n+1 \times 1$) and returns the total cost based on the formula above. For this to work correctly, you will need to add a column of 1's to the original `X`-matrix

- When we call `compute-cost` with a value of `theta` of `[0,0]`, the total cost will probably be extremely high. In the next step, you need to update the values of this vector in order to minimize $J(\theta)$. 

- As we have discussed, the technique we are using for this is *gradient descent*: every step of this descent, we update the values of `theta` as follows:

$$
\theta_j := \theta_j - \alpha \frac{1}{m}\sum_{i=1}^{m}(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_j
$$

3. Make a function `gradient_descent`, that receives the matrix `X`, the vectors `y` and `theta`, the learning curve `alpha` and a `num_iters`. In this method, performs `num_iters` steps of the gradient descent, calculating the cost $J(\theta)$ every step and storing that in a list. After the `num_iters`, this function needs to return the found value of `theta` and the list of all the costs.

4. Create a plot of the values of $J(\theta)$ that `compute_costs` has found. Do you see a decrease in the total costs?

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


In [None]:
# Read the file
with open('housing-data.txt', 'r') as file:
    content = file.readlines()

# Extract column names from the first row
column_names = content[0].strip().split(',')

# Create a list of dictionaries for each row (excluding the first row)
data = []
for line in content[1:]:

    # Split the line into columns
    columns = line.strip().split(',')

    # Create a dictionary for the row
    row = {}
    for i, col_name in enumerate(column_names): 

        # Convert column value to numeric type
        row[col_name] = pd.to_numeric(columns[i])
        
    # Add the row to the data list
    data.append(row)

# Create the DataFrame
df = pd.DataFrame(data)
df.head()


In [None]:
df.info()

In [None]:
plt.scatter(df['size'], df['price'], s=10, c='b', marker='o', alpha=0.5)
plt.xlabel('Size')
plt.ylabel('Price')
plt.title('Price vs Size of houses')
plt.show()

### Implement the "compute_cost" function:

In [None]:
def compute_cost(X, y, theta):
    m = len(y)
    predictions = X.dot(theta)
    square_err = (predictions - y) ** 2
    return 1 / (2 * m) * np.sum(square_err)


### let's calculate the cost with theta as [0, 0]:

In [None]:
df['ones'] = 1
X = df[['ones', 'size']].values
y = df['price'].values
theta = np.array([0.0, 0.0])
cost = compute_cost(X, y, theta)
print(f"Cost: {cost}")

### implement the gradient_descent function:


In [None]:
def gradient_descent(X, y, theta, alpha, num_iters):
    m = len(y)
    J_history = []

    for i in range(num_iters):
        predictions = X.dot(theta)
        error = np.dot(X.transpose(), (predictions - y))
        descent = alpha * 1/m * error
        theta -= descent
        J_history.append(compute_cost(X, y, theta))

    return theta, J_history


### Create a plot of the values of $J(\theta)$ that compute_costs has found.


The plot of J(θ) values should ideally show a downward trend, steadily decreasing and eventually flattening out when the algorithm has converged to the optimal theta values.

In [None]:
alpha = 0.01
num_iters = 1000

theta, J_history = gradient_descent(X, y, theta, alpha, num_iters)

plt.plot(range(1, num_iters + 1), J_history, color='blue')
plt.xlabel('Number of iterations')
plt.ylabel('Cost J')
plt.title('Cost function using Gradient Descent')
plt.show()



But we're not seeing this.
The cost is increasing.
it is suggested the learning rate is too high and the gradient descent algorithm is overshooting the minimum.

When the learning rate is high, the algorithm takes larger steps down the cost function and might not only miss the minimum but also end up at a point where the cost is higher, leading to divergence.



In [None]:
print(J_history)

It appears that the cost function values (J_history) contain inf and nan values. There are some reasons:

1- The learning rate:
earning rate that's too high. This can cause the gradient descent algorithm to take too large a step, causing numerical instability and resulting in nan values. 


2- Data Scaling: Gradient Descent is sensitive to the scale of the features.
So i seems so important to normalize the data first and then try again.

In [None]:
df['size'] = (df['size'] - df['size'].mean()) / df['size'].std()
df['price'] = (df['price'] - df['price'].mean()) / df['price'].std()
df['ones'] = 1
X = df[['ones', 'size']].values
y = df['price'].values
theta = np.array([0.0, 0.0])
cost = compute_cost(X, y, theta)
print(f"Cost: {cost}")

In [None]:
alpha = 0.01 # decrease learning rate
num_iters = 2000 

theta, J_history = gradient_descent(X, y, theta, alpha, num_iters)

plt.plot(range(1, num_iters + 1), J_history, color='blue')
plt.xlabel('Number of iterations')
plt.ylabel('Cost J')
plt.title('Cost function using Gradient Descent')
plt.show()


Finally we can see  J(θ), decreases with each iteration. This decrease represents the algorithm getting "closer" to the optimal parameters for my linear regression model.

----


Fatemeh and I collaborated on this assignment.