<a href="https://colab.research.google.com/github/emilsar/bit-of-data-science-and-scikit-learn/blob/master/TRAIN_AWS_Part_II_Day_1_Lab_Part_II_Notebook_%5BEmil%20Sargsyan%5D.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Lab 1: Part II - Review of Linear Regression with sklearn**
---
### **Description**
This lab provides a comprehensive overview of implementing and evaluating Linear Regression with sklearn.

<br>

### **Lab Structure**
**Part 1**: [Predicting Wine Quality](#p1)

**Part 2**: [Predicting CO2 Emissions](#p2)



<br>

### **Learning Objectives**
 By the end of this lab, we will:
* Understand basic pandas commands for EDA.

* Understand basic matplotlib commands for Data Visualization.


<br>


### **Resources**
* [EDA with pandas Cheat Sheet](https://docs.google.com/document/d/1FFoqw45P-kuoq912ARP4qfdGeLTqoq73_qjZThPp2_8/edit?usp=drive_link)

* [Data Visualization with matplotlib Cheat Sheet](https://docs.google.com/document/d/1YlUp6ll81qOyDpU1OWzE-SPxQ3hnF5C9ukLRL_6PYKE/edit?usp=drive_link)

* [Linear Regression with sklearn Cheat Sheet](https://docs.google.com/document/d/1iVieBynTpoKq1LA0kR-4pqDo6evoW5wvbNyE0wOGhYY/edit?usp=drive_link)


<br>

**Before starting, run the code below to import all necessary functions and libraries.**


In [None]:
import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import matplotlib.pyplot as plt

from sklearn import model_selection
from sklearn import datasets
from sklearn.metrics import *

<a name="p1"></a>

---
## **Part 1: Predicting Wine Quality**
---

In this part, we will implement a linear regression model aimed at predicting the quality rating of wines based on their chemical properties and characteristics.

<br>

This dataset contains data related to wine properties, including chemical characteristics like acidity, pH, and alcohol content. The target variable (label) represents a quality rating for each wine, which is a quantitative measure of wine quality.




#### **Step #1: Load in Data**

**Run the code below to load the data.**

In [None]:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv"
data = pd.read_csv(url, sep=';')

#### **Step #2: Choose your Variables**



In [None]:
inputs = df.drop("quality", axis = 1)
output = df[# COMPLETE THIS CODE

#### **Step #3: Split your Data**


In [None]:
X_train, X_test, y_train, y_test = # COMPLETE THIS CODE

#### **Step #4: Import an ML Algorithm**




In [None]:
# COMPLETE THIS CODE

#### **Step #5: Initialize the Model**


In [None]:
model = # COMPLETE THIS CODE

#### **Step #6: Fit, Test, and Visualize**


In [None]:
model.fit(X_train, # COMPLETE THIS CODE

In [None]:
predictions = # COMPLETE THIS CODE

In [None]:
plt.figure(figsize=(8, 8))
plt.scatter(y_test, predictions)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], color = 'black', label="Correct prediction")


plt.xlabel('True Quality', fontsize = 'x-large')
plt.ylabel('Predicted Quality', fontsize = 'x-large')
plt.title("Real vs. Predicted Wine Quality", fontsize = 'x-large')
plt.legend()

plt.show()

#### **Step #7: Evaluate**

Let's evaluate this model and put it to the test! Specifically, evaluate the model using our standard regression metrics: $R^2$, MSE, and MAE.


In [None]:
# COMPLETE THIS CODE

#### **Step #8: Apply your Model**

You are provided with data from two new wines, and you want to assess the predicted quality ratings for each of them. The goal is to determine whether either wine is likely to have a higher quality rating based on the model's predictions.

Here is the data for the two wines:

**Wine 1:**
* Fixed Acidity = 12.5
* Volatile Acidity = 0.3
* Citric Acid = 0.6
* Residual Sugar = 1.2
* Chlorides = 0.07
* Free Sulfur Dioxide = 15.0
* Total Sulfur Dioxide = 50.0
* Density = 0.998
* pH = 3.2
* Sulphates = 0.68
* Alcohol Content = 11.5

<br>

**Wine 2:**

* Fixed Acidity = 13.2
* Volatile Acidity = 0.28
* Citric Acid = 0.45
* Residual Sugar = 2.0
* Chlorides = 0.09
* Free Sulfur Dioxide = 12.0
* Total Sulfur Dioxide = 65.0
* Density = 0.995
* pH = 3.3
* Sulphates = 0.55
* Alcohol Content = 12.0

You will use your linear regression model to predict the quality ratings for these wines and assess their relative quality based on the predictions.

##### **1. Predict the quality of Wine 1**


In [None]:
# COMPLETE THIS CODE

##### **2. Predict the quality of Wine 2**

In [None]:
# COMPLETE THIS CODE

<a name="p2"></a>

---
## **Part 2: Predicting C02 Emissions**
---

Using the CO2 Emissions dataset, do the following:
* Build a model that will predict the CO2 emissions of a car;
* Predict the CO2 emissions of a car with a specific volume and weight.

<br>

Since 1970, CO2 emissions have increased by nearly 90%. These elevated CO2 levels cause poor air quality and contribute to climate change. Globally, cars and other transportation vehicles are responsible for about 29% of overall CO2 emissions. This CO2 emissions dataset is a collection of data from cars that contains information on the car's make, model, volume, weight, and how much CO2 it emits.

The features are as follows:
* `Car`: name of car brand
* `Model`: name of car model
* `Volume`: engine size (in cm^3)
* `Weight`: weight of car (in kg)
* `CO2`: amount of CO2 emitted (in g/km)

#### **Step #1: Load the data**

In [None]:
url = "https://raw.githubusercontent.com/the-codingschool/TRAIN/main/emissions/car_emissions.csv"

cars_df = pd.read_csv(url)
cars_df.head()

#### **Step #2: Decide independent and dependent variables**

We are going to use `Volume` and `Weight` as our independent variables for predicting `CO2` emissions.



In [None]:
features = # COMPLETE THIS CODE
labels = # COMPLETE THIS CODE

#### **Step #3: Split data into training and testing data**


In [None]:
# COMPLETE THIS CODE

#### **Step #4: Import your algorithm**


In [None]:
# COMPLETE THIS CODE

#### **Step #5: Initialize your model and set hyperparameters**



In [None]:
# COMPLETE THIS CODE

#### **Step #6: Fit, Test, and Visualize**


In [None]:
model.fit(X_train, # COMPLETE THIS CODE

In [None]:
predictions = # COMPLETE THIS CODE

In [None]:
# VISUALIZE THE TRUE VS. PREDICTED VALUES

#### **Step #7: Evaluate**

Let's evaluate this model and put it to the test! Specifically, evaluate the model using our standard regression metrics: $R^2$, MSE, and MAE.


#### **Step #8: Use the model**

Using the model we created, predict the CO2 emissions of two cars:

* **Car 1:** Volume is 800 cm^3 and weight is 1020 kg

* **Car 2:**  Volume is 1020 cm^3 and weight is 800 kg

<br>

**NOTE**: You must create a dataframe containing with the information of the new cars:

```python
new_car_data = pd.DataFrame(new_car_data_here, columns = ["Volume", "Weight"])
```

In [None]:
# COMPLETE THIS CODE

---

# End of Notebook

© 2023 The Coding School, All rights reserved