## Exercise for predicting per capita income of Nigeria based on Historical Data From 1970 - 2016

In [None]:
# import libraries
from sklearn.linear_model import LinearRegression
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# read the data
df = pd.read_csv('nigeria_per_capita_income.csv')

In [None]:
df.head()

In [None]:
# plot our data and see
%matplotlib inline
plt.title("Per Capita Income of Nigeria")
plt.scatter(df['year'], df['per capita income (NGN)'])
plt.xlabel('Year')
plt.ylabel('Per Capita Income (NGN)')

In [None]:
# call our model for training
model = LinearRegression()

In [None]:
# preapare our variable for prediction
X = df[['year']]
y = df['per capita income (NGN)']

In [None]:
# fit our data into the model
model.fit(X, y)

In [None]:
year = pd.DataFrame({'year': [2030]})

In [None]:
predicted_capita = model.predict(year)

In [None]:
print(predicted_capita)

In [None]:
# show the best fit line our model created
%matplotlib inline
plt.title("Per Capita Income of Nigeria")
plt.scatter(df['year'], df['per capita income (NGN)'], marker='*', color='red')
plt.plot(df.year, model.predict(df[['year']]), color='blue') 
plt.xlabel('Year')
plt.ylabel('Per Capita Income (NGN)')

In [None]:
# model evaluation
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

In [None]:
# make predictions
predictions = model.predict(df[['year']])

## MODEL EVALUATION
there are several methods to evaluate how well your linear regression model is performing. While visual inspection of the best-fit line can provide some insights, it's essential to use quantitative metrics to assess its performance more rigorously. Here are some common evaluation methods for linear regression models:

1. **Mean Absolute Error (MAE)**:
   - MAE measures the average absolute difference between the predicted values and the actual values. Lower MAE indicates better performance.

2. **Mean Squared Error (MSE)**:
   - MSE calculates the average of the squared differences between predicted and actual values. It punishes large errors more heavily than MAE.

3. **Root Mean Squared Error (RMSE)**:
   - RMSE is the square root of MSE and provides a measure of the model's prediction error in the same units as the target variable. Lower RMSE is better.

4. **R-squared (R²) Score**:
   - R² measures the proportion of the variance in the dependent variable that is predictable from the independent variables. It ranges from 0 to 1, with higher values indicating a better fit. An R² of 1 means a perfect fit.

5. **Adjusted R-squared (Adjusted R²)**:
   - Adjusted R² adjusts the R² score based on the number of predictors in the model. It penalizes the inclusion of irrelevant variables.

6. **Residual Plots**:
   - Visual inspection of residual plots can reveal patterns or heteroscedasticity in the residuals. A well-fitted model should have residuals that appear random and evenly distributed around zero.

7. **Cross-Validation**:
   - Cross-validation techniques like k-fold cross-validation can assess how well your model generalizes to unseen data. It helps detect overfitting.

You can use scikit-learn to calculate these metrics easily. For example, to calculate MAE, MSE, RMSE, and R² in scikit-learn, you can do the following:

```python
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Make predictions using your model
predictions = model.predict(df[['year']])

# Calculate metrics
mae = mean_absolute_error(df['per capita income (NGN)'], predictions)
mse = mean_squared_error(df['per capita income (NGN)'], predictions)
rmse = np.sqrt(mse)
r2 = r2_score(df['per capita income (NGN)'], predictions)

print(f"MAE: {mae}")
print(f"MSE: {mse}")
print(f"RMSE: {rmse}")
print(f"R²: {r2}")
```

By examining these metrics, you can get a more quantitative understanding of how well your model is performing and whether it meets your desired level of accuracy.

In [None]:
# Calculate metrics
mae = mean_absolute_error(df['per capita income (NGN)'], predictions)
mse = mean_squared_error(df['per capita income (NGN)'], predictions)
rmse = np.sqrt(mse)
r2 = r2_score(df['per capita income (NGN)'], predictions)

In [None]:
print(f"MAE:{mae}")
print(f"MSE: {mse}")
print(f"RMSE: {rmse}")
print(f"R2: {r2}")