In [1]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

# QUIZ - Regression models

### QUIZ 1 - Linear Regression Functions
When building the linear regression model, we came across several new functions. One of these functions is shown below. What is the name of this function?


* gradient
* sum function
* distance
* residual
* gradient descent
* cost function
* least squares function
* optimization function
* derivative
* modelling function
* partial derivative of the error function with respect to a


**Answer**: Cost function

### QUIZ 2 - Income, Part 1
We have collected data from an ice cream shop. We modelled the income as a function of the outside temperature (shown below). Which of the following is / are true, based on this research only?


income[$] = 20.67*T[°C] - 30.12


* Decreasing temperature increases ice cream sales
* Increased temperature is correlated with increased ice cream sales
* Temperature has no effect on ice cream sales
* When the temperature is around 20 degrees, the income is greater than $400
* Increasing temperature increases ice cream sales
* Decreased temperature is correlated with increased ice cream sales
* Increasing temperature decreases ice cream sales
* Decreased temperature is correlated with decreased ice cream sales

**Answer**:
1. Increasing temperature increases ice cream sales.
2. Increased temperature is correlated with increased ice cream sales.
3. Decreased temperature is correlated with decreased ice cream sales.

### QUIZ 3 - Income, Part 2
In some cases we need to augment (extend) the model to return valid results. What income (in dollars) will our current model predict when the temperature is 1.2 degrees? Round your answer to 2 decimal places.

In [2]:
temperature = 1.2
income = 20.67449411 * temperature - 30.12047857

print(round(income, 2))

-5.31


### QUIZ 3 - Income, Part 3
The specification tells that "income" is defined as being non-negative. The model does not account for operational costs or anything like that. We need to return a valid value based on our specification. What income (in dollars) should an augmented model predict for T = 1.2 deg C? Round your answer to 2 decimal places.

In [3]:
temperature = 1.2
income = 20.67449411 * temperature - 30.12047857

# Ensure the predicted income is non-negative
predicted_income = max(income, 0)

rounded_income = round(predicted_income, 2)

print("Predicted income: {:.2f}$".format(rounded_income))

Predicted income: 0.00$


### QUIZ 4 - Local Minima
When performing gradient descent on a linear regression, the choice of starting point is really important. If we choose a starting point which is far away from the global minimum of the error function, we can get stuck in a local minimum.

a) True

b) False

**Answer:** False.

In linear regression, the error function (cost function) is convex, which means it has only one global minimum, and no local minima or other critical points. Convexity ensures that gradient descent will converge to the global minimum, regardless of the choice of the starting point.

Therefore, in the case of linear regression, the choice of the starting point for gradient descent is not critical. The algorithm will always find the global minimum and reach the optimal parameters of the model that minimize the error function.

### QUIZ 5 - Multiple Regression, Part 1
As we already saw, we can do linear regression on many variables. The Boston housing dataset is really famous and is often used for this purpose. You can download it online or - better - load it using scikit-learn (look up how). Note: This dataset is cleaned and prepared for modelling. If you want to download the original one and prepare it yourself, you're in for quite a challenge :). Now, Perform linear regression on all features. What is the coefficient related to the number of rooms? Round your answer to two decimal places.


In [4]:
boston_data = load_boston()


    The Boston housing prices dataset has an ethical problem. You can refer to
    the documentation of this function for further details.

    The scikit-learn maintainers therefore strongly discourage the use of this
    dataset unless the purpose of the code is to study and educate about
    ethical issues in data science and machine learning.

    In this special case, you can fetch the dataset from the original
    source::

        import pandas as pd
        import numpy as np


        data_url = "http://lib.stat.cmu.edu/datasets/boston"
        raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
        data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
        target = raw_df.values[1::2, 2]

    Alternative datasets include the California housing dataset (i.e.
    :func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
    dataset. You can load the datasets as follows::

        from sklearn.datasets import fetch_california_h

In [5]:
print(boston_data.keys())

dict_keys(['data', 'target', 'feature_names', 'DESCR', 'filename', 'data_module'])


In [6]:
# Extract the features (X) and target (y)
X = boston_data.data
y = boston_data.target

# Create a Linear Regression model
model = LinearRegression()

# Fit the model to the data
model.fit(X, y)

# Find the coefficient related to the number of rooms
coefficient_rooms = round(model.coef_[boston_data.feature_names.tolist().index('RM')], 2)

print(f"Coefficient related to the number of rooms:{coefficient_rooms}")

Coefficient related to the number of rooms:3.81


### QUIZ 6 - Multiple Regression, Part 2
What is the price of a hypothetical house with all variables set to zero? Round your answer to two decimal places.

In [7]:
# Get the intercept (price when all features are set to zero)
intercept = model.intercept_

# Round the intercept to two decimal places
price_hypothetical_house = round(intercept, 2)

print("Price of the hypothetical house:", price_hypothetical_house)

Price of the hypothetical house: 36.46


### QUIZ 7 - Multiple Regression, Part 3
It's good to have a model of the data but it means nothing if we have no way of testing it. A way to test regression algorithms involves the so-called "coefficient of determination" (R^2). Research how to compute it and apply it to the regression model you just created. What is the coefficient of determination for this model? Round your answer to two decimal places. (Note: Compute the coefficient of determination using all the data. Technically, this is not correct but at least gives a good idea of how this model performs. If you're more interested, look up "training and testing set".)

In [8]:
# Predict the target variable using the fitted model
y_pred = model.predict(X)

# Compute the R-squared value
r_squared = r2_score(y, y_pred)

# Round the R-squared value to two decimal places
r_squared = round(r_squared, 2)

print("Coefficient of determination (R^2):", r_squared)

Coefficient of determination (R^2): 0.74


### Q  8
In a CPU factory, a camera takes a picture of every single manufactured chip. After that, it sends the picture to an algorithm. The algorithm outputs whether the CPU is defective or not.

What type of algorithm is that?

**Answer:** Clasification

### Q9
When building the linear regression model, we came across several new functions. What is the name of this function?

**Answer:** "cost function". We sum all the error distances.