# **ICS5110 Notebook**

View the web page for this project [here](https://mkenely.com/ics5110).

- [Feature Reference](https://mkenely.com/ics5110/features)
- [Feature Distributions](https://mkenely.com/ics5110/distributions)
- [Correlation Matrix](https://mkenely.com/ics5110/correlation_matrix)
- [Feature vs G3 Scatter Plots](https://mkenely.com/ics5110/scatter_plots)


### **Imports**

In [None]:
import os
import sys

import numpy as np
import pandas as pd

from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

import pickle

from gradio_implementations import pca_gradio
from gradio_implementations import ensemble_gradio
from gradio_implementations import kmc_gradio
from gradio_implementations import lr_gradio

import matplotlib.pyplot as plt

### **Data**

In [2]:
portugese_df = pd.read_csv('./data/Portuguese.csv')

le = LabelEncoder()
encoding_mappings = {}

for column in portugese_df.columns:
    if portugese_df[column].dtype == 'object':
        portugese_df[column] = le.fit_transform(portugese_df[column])
        encoding_mappings[column] = {index: label for index, label in enumerate(le.classes_)}

X = portugese_df.drop('G3', axis=1)
X = X.drop('G1', axis=1)
X = X.drop('G2', axis=1)

y = portugese_df['G3']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

### **Models**

#### **Linear Regression (Jan)**

**Imports**

In [3]:
from sklearn.linear_model import LinearRegression

**Data**

In [4]:
X_all = portugese_df.drop('G3', axis=1)
X_no_grades = X_all.drop(['G1', 'G2'], axis=1)
X_only_grades = X_all[['G1', 'G2']]

# Reorder so G1 and G2 are first
X_all = pd.concat([X_only_grades, X_no_grades], axis=1)

y = portugese_df['G3']

**Models**

In [5]:
# 3 logistic regression models trained on different sets of features
linear_regression_all = LinearRegression()
linear_regression_no_grades = LinearRegression()
linear_regression_only_grades = LinearRegression()

In [6]:
linear_regression_all.fit(X_all, y)
linear_regression_no_grades.fit(X_no_grades, y)
linear_regression_only_grades.fit(X_only_grades, y)

# Save to pickle for gradio
with open('../gradio/lr_gradio/models/linear_regression_all.pkl', 'wb') as f:
    pickle.dump(linear_regression_all, f)

with open('../gradio/lr_gradio/models/linear_regression_no_grades.pkl', 'wb') as f:
    pickle.dump(linear_regression_no_grades, f)

with open('../gradio/lr_gradio/models/linear_regression_only_grades.pkl', 'wb') as f:
    pickle.dump(linear_regression_only_grades, f)

In [14]:
y_pred_all = linear_regression_all.predict(X_all)
y_pred_no_grades = linear_regression_no_grades.predict(X_no_grades)
y_pred_only_grades = linear_regression_only_grades.predict(X_only_grades)

In [15]:
accuracy_all = linear_regression_all.score(X_all, y)
accuracy_no_grades = linear_regression_no_grades.score(X_no_grades, y)
accuracy_only_grades = linear_regression_only_grades.score(X_only_grades, y)

In [16]:
results_df = pd.DataFrame({
    'model': ['All Features', 'No Grades', 'Only Grades'],
    'accuracy': [accuracy_all, accuracy_no_grades, accuracy_only_grades]
    })

results_df['accuracy'] = results_df['accuracy'].apply(lambda x: round(x, 3))

results_df.set_index('model', inplace=True)

In [17]:
results_df

Unnamed: 0_level_0,accuracy
model,Unnamed: 1_level_1
All Features,0.858
No Grades,0.345
Only Grades,0.848


### **Gradio**

#### **Linear Regression**

In [7]:
lr_gradio.make_gradio(
    [linear_regression_all, linear_regression_no_grades, linear_regression_only_grades],
)

* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.


Keyboard interruption in main thread... closing server.
