<h1 style="text-align:center"> Predicting Grades for the School Year (Math Subject) </h1>

# Introduction: 
In this small project I implement a simple algorithmic model that predicts the score of an individual students at he end of the year. "G3" or the final grade will be our label (output) and the rest of the columns will be our features (inputs). First, I will explore the data to see if we can get a grasp of what is the story behind the data. I'm not looking for to just implementing a linear regression or random forest regression algorithm just to get the score. My aim is to understand what the data is telling us through visualizations (plotly, matplotlib, seaborn). One last thing, Im new to plotly so please excuse me for my simple graphs, I'm in the process of learning this new visualization tool. Have fun and I'm open to constructive criticisms that will make this project more effective and interesting. 

# Outline of the Project: <br>
1) **Extract the Data and Gather General Information of the Dataset** <br>
2)** Visualize the three outputs ["G1", "G2" and "G3]** <br>
a) We will see how the data is distributed.<br>
b) Gain some insights about the three grades. (Distribution Plots)<br>
c) We will finally use "G3" as our output for our linear regression algorithm since it is the final and most important grade.<br>
3) **Data Structuring:** <br>
a) Drop the G1 and G2 columns. <br>
b) Transform some of the columns into binary columns for future analysis. <br>
c) Split the data (Training and Testing sets.)<br>
d) Implement StartiefiedShuffleSplit to the two most important features in terms of correlation with G3. <br>
e) Create a binary column and a criteria determining what score is a Failing and what score is a Passing Grade. <br>
4) **Data Analysis and Visualization** <br>
a) Students that passed and failed the course (Using Plotly) (%). <br>
b) How did students perform by gender (Using Plotly). (%) <br>
c) Correlation Analysis.  <br>
d) Number of Absences throughout the Course (Using Plotly) <br>
e) Further Analysis. <br>
5) ** Data Cleaning (Preparing the Data for our Algorithm) ** <br>
a) Split the data into Numeric and Categorical Values.<br>
b) Scale the Numeric and Categorical columns using StandardScaler in order to fit the data into our algorithms. <br>
c) Select the best algorithm (Better score and accuracy) Make sure its not overfitting! <br>
d) Select the best hyperparameters by using GridSearchCV <br>
6) **Conclusion**

In [None]:
import pandas as pd
import numpy as np
import plotly
from plotly import tools


df = pd.read_csv("../input/student-mat.csv")
df.head()


# Information about the Variables: 
Attribute Information:

# Attributes for both student-mat.csv (Math course) and student-por.csv (Portuguese language course) datasets: 

1 school - student's school (binary: 'GP' - Gabriel Pereira or 'MS' - Mousinho da Silveira) <br>
2 sex - student's sex (binary: 'F' - female or 'M' - male) <br>
3 age - student's age (numeric: from 15 to 22) <br>
4 address - student's home address type (binary: 'U' - urban or 'R' - rural)<br> 
5 famsize - family size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater than 3) <br>
6 Pstatus - parent's cohabitation status (binary: 'T' - living together or 'A' - apart) <br>
7 Medu - mother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 â€“ 5th to 9th grade, 3 â€“ secondary education or 4 â€“ higher education) <br>
8 Fedu - father's education (numeric: 0 - none, 1 - primary education (4th grade), 2 â€“ 5th to 9th grade, 3 â€“ secondary education or 4 â€“ higher education)<br> 
9 Mjob - mother's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other') <br>
10 Fjob - father's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other') <br>
11 reason - reason to choose this school (nominal: close to 'home', school 'reputation', 'course' preference or 'other') <br>
12 guardian - student's guardian (nominal: 'mother', 'father' or 'other')<br> 
13 traveltime - home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour) <br>
14 studytime - weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours) <br>
15 failures - number of past class failures (numeric: n if 1<=n<3, else 4) <br>
16 schoolsup - extra educational support (binary: yes or no) <br>
17 famsup - family educational support (binary: yes or no) <br>
18 paid - extra paid classes within the course subject (Math or Portuguese) (binary: yes or no) <br>
19 activities - extra-curricular activities (binary: yes or no) <br>
20 nursery - attended nursery school (binary: yes or no) <br>
21 higher - wants to take higher education (binary: yes or no) <br>
22 internet - Internet access at home (binary: yes or no) <br>
23 romantic - with a romantic relationship (binary: yes or no) <br>
24 famrel - quality of family relationships (numeric: from 1 - very bad to 5 - excellent) <br>
25 freetime - free time after school (numeric: from 1 - very low to 5 - very high) <br>
26 goout - going out with friends (numeric: from 1 - very low to 5 - very high) <br>
27 Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high) <br>
28 Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high) <br>
29 health - current health status (numeric: from 1 - very bad to 5 - very good) <br>
30 absences - number of school absences (numeric: from 0 to 93) <br>

# These grades are related with the course subject, Math or Portuguese: <br>
31 G1 - first period grade (numeric: from 0 to 20) <br>
31 G2 - second period grade (numeric: from 0 to 20) <br>
32 G3 - final grade (numeric: from 0 to 20, output target)<br>

In [None]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

# Exploring the Information of the DataFrame
## Why Explore the DataFrame Information?
1) To see if we have any null values in the dataset. As you can see there are 395 rows in each of the columns so that means there are no null values. <br>
2) It is important to know if we have any null values because in case we do have we have to find a way to fill the null values normally we do this with the ("mean", "median", "mode" or we just drop the rows that contain null values.) Nomally I do this by using sklearn Imputer.

<img src="https://media.giphy.com/media/1pF0wNxRjHbk4/giphy.gif">

In [None]:
df.info()

In [None]:
# Descriptive Data
df.describe()

<h1 style="text-align:center">Let's Explore some of Our Main Data:</h1><br>
Now we will just see some of our main data and see what data insights can we get from our exploration. However, it is important to state that for our linear regression algorithm we will drop G1 and G2 since they have a high impact in what the result of G3 will be. We want our model to discover what other features have a high impact in determining that a student will either pass or fail.

<img src="https://media.giphy.com/media/l8LTENNory6PK/giphy.gif">

# Let's See How all our Data is Distributed:

In [None]:
import matplotlib.pyplot as plt

df.hist(bins=50, figsize=(20,15), color='r')
plt.show()

## Distribution Plots with Plotly: 
The distribution plots is a great visual way to see how many students passed in G1 [First Period], G2 [Second Period], and <br>
G3 [Last Period]. 

## Failing Grade: 
In our case we are assuming that everything lower than 12 is a "Failing Grade". If the grade is greater or equal to 12 then this will be a "Passing" grade. 

## What can we gather from this Data?
1) Did students that failed in the first period improved in the second and third period? <br>
2) How many students got each grade and what grade did most of the students got.<br>
3) In which of the periods did students failed the most.

In [None]:
# import cufflinks as cf
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
from plotly.graph_objs import *
init_notebook_mode()

# cf.set_config_file(offline=False, world_readable=True, theme='pearl')

x0 = df["G1"]
x1 = df["G2"]
x2 = df["G3"]

First_Period = Histogram(
    x=x0,
    name="First Semester",
    text="Grades",
    marker= dict(
        color='#F79F81',
    )
)

Second_Period = Histogram(
    x=x1,
    name="Second Semester",
    text="Grades",
    marker= dict(
        color='#9FF781',
    )
)

Third_Period = Histogram(
    x=x2,
    name="Third Semester",
    text="Grades",
    marker= dict(
        color='#CED8F6',

    )
)

data = [First_Period, Second_Period, Third_Period]
layout = Layout(barmode='stack',
                  title="Distribution of Student's Grades",
                   font=dict(size=16),
                  xaxis=dict(
                  title="Grades"
                  ),
                  yaxis=dict(
                  title="Number of Students"))

fig = dict(data=data, layout=layout)
iplot(fig)



In [None]:
import plotly.plotly as py
import plotly.figure_factory as ff
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
from plotly.graph_objs import *
import numpy as np
init_notebook_mode()



# Add histogram data
x1 = df['G1'].values.tolist() 
x2 = df['G2'].values.tolist()  
x3 = df['G3'].values.tolist()    


# Group data together
hist_data = [x1, x2, x3]

group_labels = ['First Semester', 'Second Semester', 'Third Semester']

colors = ["#F79F81", "#9FF781", "#2E64FE"]

# # Create distplot with custom bin_size
fig = ff.create_distplot(hist_data, group_labels, bin_size=3, curve_type='normal', colors=colors)
fig['layout'].update(title='Distplot with Normal Distribution', font=dict(size=16))

# # Plot!
iplot(fig)

<h1 style="text-align:center">Data Cleaning</h1><br>
<img src="https://media.giphy.com/media/ffd0F6WNcRJMQ/giphy.gif">

In [None]:
# we will drop those columns to make it more hard to predict and see which attributes impact the most on the final grade.
df.drop(['G1', 'G2'], axis=1, inplace=True)

In [None]:
# Let's not forget there is also a famsup column that is in our dataframe. It tells us if the student need extra educational support.
df.columns
df['famsup'].head()

# Dropping Columns that are not highly correlated <br>
<font size='4'>
<b>1)</b> Here we will find out the columns that are highly correlated (either positively or negatively) in order to leave it in our dataframe. <br><br>
<b>2)</b> We should have a criteria in order to retain the columns. Let's say if the column is correlated more than 8% to G3 we retain the column for further analysis.<br><br>
<b>3)</b> Some columns that are not highly correlated might stay to see if we could develop new features by combining 2 or more columns.<br><br>
<b>4)</b> We will drop the columns G1 and G2 since we already know this is highly indicative of what G3 will be. We want to find out other features that could have a positive impact <br><br>
<b>5)</b> The columns that are in binary format we will convert it into numeric in a new column to see if they have a high correlation with G3. </font>

In [None]:
# 0 stands for U and 1 stands for R. [U=Urban, R=Rural]
# Here we will convert all the binary columns to integers.
df['b_address'] = df['address'].apply(lambda x: 0 if x == 'U' else 1)
df['b_address'].value_counts()
# Interestingly there are more students in families that are greater than 3.
# Could it be possible that all family members are in the same school? This might be a reason why it is higher.
# LE3 = Less than 3. [0], GE3 = Greater than 3.[1]
df['b_famsize'] = df['famsize'].apply(lambda x: 0 if x == 'LE3' else 1)
df['b_famsize'].value_counts()
# T = Parents are living together [0], A = Parents living apart. [1]
df['b_Pstatus'] = df['Pstatus'].apply(lambda x: 0 if x == 'T' else 1)
df['b_Pstatus'].value_counts()
# 0 = no and 1 = yes
df['b_famsup'] = df['famsup'].apply(lambda x: 0 if x == 'no' else 1)
df['b_famsup'].value_counts()
# 0 = no and 1 = yes
# This is an interesting column when it comes to having a positive effect on G3.
df['b_paidxtraclasses'] = df['paid'].apply(lambda x: 0 if x == 'no' else 1)
df['b_paidxtraclasses'].value_counts()
# 0 = no and 1 = yes
df['b_xtraactivities'] = df['activities'].apply(lambda x: 0 if x == 'no' else 1)
df['b_xtraactivities'].value_counts()
# 0 = no and 1 = yes
# It has a high correlation however, we only have 20 students that are not interested in having a high education and 
# thus this column should not be taken into consideration.
df['b_higher_education'] = df['higher'].apply(lambda x: 0 if x == 'no' else 1)
# continue with the analisis.
df['b_internet'] = df['internet'].apply(lambda x: 0 if x == 'no' else 1)
# Interestingly when people are not in a romantic relationship they tend to get better grades.
df['b_romantic'] = df['romantic'].apply(lambda x: 0 if x == 'no' else 1)

df['b_nursery'] = df['nursery'].apply(lambda x: 0 if x == 'no' else 1)

df['b_guardian'] = df['guardian'].apply(lambda x: 0 if x == 'mother' else (1 if x=='father' else 2))
# Does not have any effect on G3. Low correlation.
df['b_reason'] = df['reason'].apply(lambda x: 0 if x == 'home' else (1 if x=='reputation' else (3 if x=='course' else 4)))
# Does not have any effect on G3. Low correlation.
df['b_school'] = df['school'].apply(lambda x: 0 if x == 'GP' else 1)

df['b_schoolsup'] = df['schoolsup'].apply(lambda x: 0 if x == 'no' else 1)

# What binary columns influenced the G3 grade:<br>
<font size='4'> 
<b>1)</b> Higher education [higher]: Students that were aiming in having a higher education tended to get better results in the final grade. However, we have to take into consideration that only 20 students were not aiming for higher education. Nevertheless, this feature can give us a hint as to whether a student will get higher grades or not. <br><br>
<b>2)</b> Paid extra Classes [paid]: These are the extra paid classes within the course sibject. It has a positive correlation whichmeans the more the students took the extra classes the more likely they were to pass. <br><br>
<b>3)</b> People in a relationship [romantic]: The people that were not in a romantic relationship tended to get higher grades than the ones that were in a relationship.</font>


In [None]:
new_corr = df.corr()
new_corr['G3'].sort_values(ascending=False)

# Now let's drop the Columns that we don't need or that dont have a high correlation with G3. 
## Note: I didn't drop absences from our column because I believe it has a huge impact on G3. The more absences the more likely you are to fail. The other columns I dropped because they tend to be redundant since there are other columns that are somehow similar. 

In [None]:
df = df.drop(['school', 'b_school', 'address', 'famsize', 'b_famsize', 'Pstatus', 'reason', 'b_reason',
        'guardian', 'b_guardian', 'famsup', 'b_famsup', 'paid', 'activities', 'b_xtraactivities', 'nursery', 'b_nursery',
        'higher', 'internet', 'romantic', 'famrel', 'freetime'], axis=1)

In [None]:
#We have reduced the amount of columns from 33 to 22. Making our Dataframe much simpler and hopefully we will avoid overfitting.
df.shape

In [None]:
# Rename the columns
df = df.rename(columns={'b_address': 'address', 'b_paidxtraclasses': 'paid_classes', 'b_higher_education': 'higher_education',
          'b_internet': 'internet_availability', 'b_romantic': 'relationship', 'b_schoolsup': 'educational_support'})

df.columns

<h1 style="text-align:center"> Split the Data: </h1>
<img src="https://media.giphy.com/media/xTiTnxpQ3ghPiB2Hp6/giphy.gif">[](http://)

In [None]:
# Simple way to split our data into Training and testing.
# We use random_state so that everytime we run the data we get the exact same split of the data. We don't want the training and
# testing everytime we run the data.
from sklearn.model_selection import train_test_split

train_set, test_set = train_test_split(df, test_size=0.2, random_state=42)

print(len(train_set), 'Train', len(test_set))

In [None]:
correlations = df.corr()
correlations['G3'].sort_values(ascending=False)

# StratifiedShuffleSplit:<br>
<font size='4'> We will use this sklearn function to have our two variables that impact the most the "G3" label be equally distributed in our train and test sets. </font><br>
Here we learn that the highest correlations or the attributes that impact the most the final grade (G3) are "Medu" (Education of the Mother) [Positive Correlation] and "failures[negative correlation]". 

In [None]:
# Code from HandsOn Machine Learning with ScikitLearn and Tensorflow by Aurélien Geron.
from sklearn.model_selection import StratifiedShuffleSplit

split = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42)
for train_index, test_index in split.split(df, df["Medu"]):
    strat_train_set = df.loc[train_index]
    strat_test_set = df.loc[test_index] 

for train_index, test_index in split.split(df, df["failures"]):
    strat_train_set = df.loc[train_index]
    strat_test_set = df.loc[test_index]  

In [None]:
# We want to equally distribute in our training and test set the failures values since it is the highest correlated with G3.
# It is more convenient to use StratifiedShuffleSplit on columns that are highly correlated and equally distributed it tends to 
# vary less.
print(df['Medu'].value_counts()/len(df))
print(df['failures'].value_counts()/len(df))

# Let's Create a Column that tells if the Student Passed or Fail 
## Our criteria is x < 12 = Failed , x > 12 = Passed

In [None]:
df['grade_status'] = df['G3'].apply(lambda x: 'Fail' if x < 12 else 'Pass')
df['grade_status'].value_counts()

<h1 style="text-align:center">Now we will Discover and Visualize Data </h1>
<img src="https://media.giphy.com/media/rsii9v51Eq6eQ/giphy.gif">

In [None]:
# A criteria to determine if the student failed or passed.
df['grade_status'] = df['G3'].apply(lambda x: 'Fail' if x < 12 else 'Pass')
df['grade_status'].value_counts()

# Can we combine any further attributes in order to add it to our dataframe?
# Move the grade_status and G3 columns to the 1st and 2nd columns just for visualization.
lst_c = df.columns.tolist()

df = df[
['G3',
 'grade_status',
 'sex',
 'age',
 'Medu',
 'Fedu',
 'Mjob',
 'Fjob',
 'traveltime',
 'studytime',
 'failures',
 'schoolsup',
 'goout',
 'Dalc',
 'Walc',
 'health',
 'absences',
 'address',
 'paid_classes',
 'higher_education',
 'internet_availability',
 'b_Pstatus',
 'relationship',
 'educational_support']
]

# What percentage of students failed and passed?

## Why would we want to know how many students passed or fail?
1) We want to know if there needs an improvement in the method of education at both Portuguese schools. <br>
2) Is the gap between students that passed and failed close enough?

In [None]:
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
from plotly.graph_objs import *
init_notebook_mode()

labels = ['Failed','Passed']
F = df['grade_status'].value_counts()[0]
P = df['grade_status'].value_counts()[1]
values = [F, P]
colors = ['#FA5858', '#01DF3A']

trace = Pie(labels=labels, values=values,
               hoverinfo='label+percent', textinfo='value', 
               textfont=dict(size=20),
               marker=dict(colors=colors, 
                           line=dict(color='#282828', width=3)))

data = [trace]
layout = Layout(
    title='Students that Failed and Passed the Course',
    font=dict(size=20)
)


fig = dict(data=data, layout=layout)
iplot(fig)

In [None]:
cross_sex = pd.crosstab(df['sex'], df['grade_status']).apply(lambda x: x/x.sum() * 100, axis=1)
cross_sex.iloc[1][0]

In [None]:
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
from plotly.graph_objs import *
init_notebook_mode()

F_Failed = cross_sex.iloc[0][0]
F_Passed = cross_sex.iloc[0][1]
M_Failed = cross_sex.iloc[1][0]
M_Passed = cross_sex.iloc[1][1]

colors = ["#FA5858", "#81F781"]

fig = {
  "data": [
    {
      "values": [F_Failed, F_Passed],
      "labels": [
          "Failed",
          "Passed"
      ],
      "domain": {"x": [0, .48]},
      "name": "Females",
        "marker": {"colors": colors},
      "hoverinfo":"label+percent+name",
      "hole": .4,
      "type": "pie"
    },     
    {
      "values": [M_Failed, M_Passed],
      "labels": [
          "Failed",
          "Passed"
      ],
      "text":"Males",
      "textposition":"inside",
      "domain": {"x": [.52, 1]},
      "name": "Males",
        "marker": {"colors": colors},
      "hoverinfo":"label+percent+name",
      "hole": .4,
      "type": "pie"
    }],
  "layout": {
        "title":"Passing School by Gender",
      "font": dict(size=20),
        "annotations": [
            {
                "font": {
                    "size": 16
                },
                "showarrow": False,
                "text": "Females",
                "x": 0.185,
                "y": 0.5
            },
            {
                "font": {
                    "size": 16
                },
                "showarrow": False,
                "text": "Males",
                "x": 0.8,
                "y": 0.5
            }
        ]
    }
}

iplot(fig)

# What can we learn from both Charts?:
1) There are more students that failed rather than passed.<br>
2) Females had a wider gap in failing the Math program than did Males.<br>

# What to do Next?
1) Maybe we should think about what possible features are making the girls to fail. Could it be that girl students are in a relationship and maybe that relationship is somehow negaively impacting the performance of Females?

In [None]:
# 166 students are female.
df['sex'].value_counts()

# Number of female students in a relationship
# 67 female students are in a relationship
# Roughly 40% of the total female students are in a relationship.
df.loc[(df['sex'] == 'F') & df['relationship'] == 1]

df.head()

In [None]:
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
from plotly.graph_objs import *
init_notebook_mode()

relationship_females = df.loc[(df['sex'] =='F') & (df['relationship'] == 1)]
relationship_males = df.loc[(df['sex'] =='M') & (df['relationship'] == 1)]

class Relationship:
        
    def __init__(self, relationship_females):
        self.relationship_females = relationship_females
        self.relationship_males = relationship_males
    
    def females_failed(self):
        cross_grades_f = pd.crosstab(self.relationship_females['sex'], self.relationship_females['grade_status'])
        girls_in_relationship = cross_grades_f.iloc[0][0] + cross_grades_f.iloc[0][1] 
        girls_failed = (cross_grades_f.iloc[0][0]/ girls_in_relationship) * 100 
        return girls_failed
    
    def females_passed(self): 
        cross_grades_f = pd.crosstab(self.relationship_females['sex'], self.relationship_females['grade_status'])
        girls_in_relationship = cross_grades_f.iloc[0][0] + cross_grades_f.iloc[0][1] 
        girls_passed = (cross_grades_f.iloc[0][1]/ girls_in_relationship) * 100
        return girls_passed
    
    def males_failed(self):
        cross_grades_m = pd.crosstab(self.relationship_males['sex'], self.relationship_males['grade_status'])
        boys_in_relationship =  cross_grades_m.iloc[0][0] + cross_grades_m.iloc[0][1]
        boys_failed = (cross_grades_m.iloc[0][0]/boys_in_relationship) * 100
        boys_passed = (cross_grades_m.iloc[0][1]/boys_in_relationship) * 100
        return boys_failed
    
    def males_passed(self):
        cross_grades_m = pd.crosstab(self.relationship_males['sex'], self.relationship_males['grade_status'])
        boys_in_relationship =  cross_grades_m.iloc[0][0] + cross_grades_m.iloc[0][1]
        boys_passed = (cross_grades_m.iloc[0][1]/boys_in_relationship) * 100
        return boys_passed
        
    
rel = Relationship(relationship_females)
rel.females_failed()

females_passed = rel.females_passed()
males_passed = rel.males_passed()
females_failed = rel.females_failed()
males_failed = rel.males_failed()

# Let's Start graphing in plotly.
table_data = [['Gender','% Passed', '% Failed'],
             ['Males in<br>relationships', males_passed, males_failed],
             ['Females in<br>relationships', females_passed, females_failed]]


figure = ff.create_table(table_data, height_constant=60)

genders = ['Male', 'Female']
passed = [males_passed, females_passed]
failed = [males_failed, females_failed]
trace1 = Bar(x=genders, y=passed, xaxis='x2', yaxis='y2',
                marker=dict(color='#819FF7'),
                text='%',
                name='Passed')
trace2 = Bar(x=genders, y=failed, xaxis='x2', yaxis='y2',
                marker=dict(color='#FA5882'),
                text='%',
                name='Failed')
# Add trace data to the figure.
figure['data'].extend(Data([trace1, trace2]))

# Edit layout for subplots
figure.layout.yaxis.update({'domain': [0, .45]})
figure.layout.yaxis2.update({'domain': [.6, 1]})
# The graph's yaxis2 MUST BE anchored to the graph's xaxis2 and vice versa
figure.layout.yaxis2.update({'anchor': 'x2'})
figure.layout.xaxis2.update({'anchor': 'y2'})
figure.layout.yaxis2.update({'title': '% Result'})
# Update the margins to add a title and see graph x-labels. 
figure.layout.margin.update({'t':75, 'l':50})
figure.layout.update({'title': 'Results of Students that are in a Relationship by Gender'})
# Update the height because adding a graph vertically will interact with
# the plot height calculated for the table
figure.layout.update({'height':800})

iplot(figure)


# What are our conclusions from this Analysis:
1) Of course relationships is not the only attribute that could influence female students to perform badly in the math course. However, such a gap between females that failed and females that passed could be indicative that relationships could be a huge factor on how woman perform in the future.<br>
# Let's ask more new questions to ourselves:
1) What other factors could influence on female students in having a higher probability than male students of failing the class?<br>
2) What other variables could be somehow associated with the relationship factor?



In [None]:
corr_matrix = df.corr()
corr_matrix['G3'].sort_values(ascending=False)

In [None]:
import seaborn as sns

fig, ax = plt.subplots(figsize=(12,6))
sns.heatmap(corr_matrix,
           xticklabels=corr_matrix.columns.values,
           yticklabels = corr_matrix.columns.values,
            ax = ax,
           ).set_title("Correlation Between Columns")


plt.show()

# What insights can we gain from the Heatmap:
1) Medu (Mother's Education) is the highest positive correlation with regards to G3 and hence this is why it is the "reddest" square in the G3 column. <br>
2) Let's see if "Medu" was a significant factor in why Females failed the course.

## Positive Correlations:
1) Here we can see some interesting insights: Medu (Mothers education) seems to be the feature that impacts the most on a student to get better grades. Fedu (Fathers education) is the second feature that impacts the most on a student's grade.<br>
2) Studytime also influences positively the outcome of G3. <br>


## Negative Correlations: 
1) There is some negative correlation with G3 (Final Grades) when Walc (Weeknd alcohol consumption) and Dalc (Workday alcohol consumption).<br>
2) There is also some negative correlation the more the students go out with the  final grade.<br>
3) The highest negative correlation is with "failures" or how many past classes they fail. This is indicative that the the higher the amount of past classes failures the more likely it is for the student to get a lower grade and thus fail.

In [None]:
cross_medu = pd.crosstab(df['Medu'], df['sex']).apply(lambda x: x/x.sum() * 100)
cross_medu

In [None]:
colors = ["#FF4000", "#2E64FE"]
ct = pd.crosstab(df.Medu, df.grade_status).apply(lambda x: x/x.sum() * 100)

# ct.plot.bar(stacked=True, color=colors)
# fig, ax =  plt.subplots(figsize=(14,10))
ax = ct.plot.bar(stacked=True, color=colors, figsize=(12,6))
ax.set(xlabel="Mom's Education (from lowest to highest)", ylabel='% of Passes and Fails', title="How Mom's Education influenced Performance")
plt.show()

In [None]:
cross_fedu = pd.crosstab(df['Fedu'], df['grade_status']).apply(lambda x: x/x.sum() * 100)
cross_fedu

In [None]:
colors = ["#DF0101","#585858"]
cf = pd.crosstab(df.Fedu, df.grade_status).apply(lambda x: x/x.sum() * 100)

# ct.plot.bar(stacked=True, color=colors)
# fig, ax =  plt.subplots(figsize=(14,10))
ax = cf.plot.bar(stacked=True, color=colors, figsize=(12,6))
ax.set(xlabel="Dad's Education (from lowest to highest)", ylabel='% of Passes and Fails', title="How Dad's Education influenced Performance")
plt.show()

In [None]:
# Let's see how the students that receive educaional support performed did they passed or fail?
cross_edus = pd.crosstab(df['educational_support'], df['sex']).apply(lambda x: x/x.sum() * 100)
cross_edus

In [None]:
# Let's see how effective are the paid classes.
# The same % of people who dont take paid classes Fail as the people who pay for classes the paid classes are not effective.
cross_paid = pd.crosstab(df['grade_status'], df['paid_classes']).apply(lambda x: x/x.sum() * 100)
cross_paid

In [None]:
# More Females pay for classes and still more males pass the class the paid classes are definitely not helping in students 
# getting better grades.
cross_paids = pd.crosstab(df['sex'], df['paid_classes']).apply(lambda x: x/x.sum() * 100)
cross_paids

## Analysis of the Crosstab: 
1) Let's say Medu's located in the 3 and 4 areas means the Mother's education is high while the others are located in the low area.<br>
2)  Around 53% of the Female's students mothers have a high education while the remaining 47% have a poor education. <br>
3) Around 64% of the Male's students mothers have a high education while the remaining 36% have a poor education.<br>
4) 59% of Females pay for classes yet, 65.7% of female students fail the course. 
5) 59% of students overall that pay for classes fail the course. While only 41% of the students that pay pass the course. 



## Possible Solutions: 
1) Provide a higher educational support to the Female students that have parents with a low education. This will help the students improve their grades at school.<br>
2) We can also see that a low % of students are having educational support. 81% of Females and 92% of Males don't receive any sort of educational support! <br>
3) The educational support program of the school is not effective at all. There has to be some sort of improvement with the program perhaps, change the program totally. <br>
4) If the paid classes are provided by the school there has to be changes to the way the classes are given to the students. There is a higher percentage of people that fail the course even if they take the paid classes than people that pass the course.


In [None]:
failures_matrix = df.corr()
failures_matrix['failures'].sort_values(ascending=False)

In [None]:
df['failures'].value_counts()

In [None]:
# We will do Further analysis and we will visualize the data we consider will be useful to have in a graph.
# 65% of Females fail and 34% of Females passed.

# The higher the education of the mom the more likely students will pass. [We might visualize this data.]
cross_Medu = pd.crosstab(df['Medu'], df['grade_status']).apply(lambda x: x/x.sum() * 100, axis=1)
cross_Medu
# Interestingly the amount of failures tells us that a student is more likely to fail.
# failures is the number of past class failures.
cross_failures = pd.crosstab(df['failures'], df['grade_status']).apply(lambda x: x/x.sum() * 100, axis=1)
cross_walc = pd.crosstab(df['Walc'], df['grade_status']).apply(lambda x: x/x.sum() * 100, axis=1)
cross_walc


# Let's analyze if the amount of Failures Determines will Fail the course.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(12,6))
colors = ["#DF0101", "#088A08"]
g = sns.countplot(x="failures", hue="grade_status", palette=colors, data=df)
sns.set(style="darkgrid")
g.axes.set_title("Students with Previous Failed Classes [Pass vs Fail]",fontsize=24)
g.set_xlabel("# of Failures",fontsize=16)
g.set_ylabel("# of Students",fontsize=16)
sns.set(font_scale=3)
plt.show()

In [None]:
# Now we will have the results percent wise for each of the failures. 
# By looking at the graph we can tell that people who have failed previous classes before tend to fail more this class.
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
from plotly.graph_objs import *
init_notebook_mode()
# Let's see % wise how many people studied 1,2,3 hours.

nof_f = cross_failures.iloc[0][0]
nof_p = cross_failures.iloc[0][1]
onef_f = cross_failures.iloc[1][0]
onef_p = cross_failures.iloc[1][1]
twof_f = cross_failures.iloc[2][0]
twof_p = cross_failures.iloc[2][1]
threef_f = cross_failures.iloc[3][0]
threef_p = cross_failures.iloc[3][1]

trace1 = Bar(
    y=['No Failures', 'One Failure', 'Two Failures', 'Three Failures'],
    x=[nof_f, onef_f, twof_f, threef_f],
    text='Percent Failed (%)',
    name='Failed',
    orientation='h',
    marker= dict(
        color='#FA5858',
        line = dict(color = '#DF0101',
            width = 3)
    )
)

trace2 = Bar(
    y=['No Failures', 'One Failure', 'Two Failures', 'Three Failures'],
    x=[nof_p, onef_p, twof_p, threef_p],
    text='Percent Passed (%)',
    name='Passed',
    orientation = 'h',
    marker= dict(
        color='#BEF781',
        line = dict(color = '#0B610B',
            width = 3)
    )
)

data = [trace1, trace2]
layout = Layout(
    title = '% of People with Failures who passed the Course ',
    barmode='stack',
)

fig = dict(data=data, layout=layout)
iplot(fig)


In [None]:
df.columns

In [None]:
# color_name = grade_status
# y = G3
# x = age
import plotly.figure_factory as ff
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
from plotly.graph_objs import *
init_notebook_mode()

fig = ff.create_facet_grid(
    df,
    x='absences',
    y='G3',
    color_name='grade_status',
    show_boxes=False,
    marker={'size': 8, 'opacity': 0.75},
    colormap={'Pass': '#0B610B', 'Fail': '#DF0101'},
)

fig.layout.update({'title': 'Absences of Students'})
fig.layout.update({'autosize': 'False', 'width': 800, 'height': 600})

iplot(fig)

The more absences the more likely the student was going to fail the class. (Logically).

In [None]:
# Mo
ct = pd.crosstab(df.Medu, df.grade_status).apply(lambda x: x/x.sum() * 100)
ct

## Conclusion: 
1) We can determine that the Mother's education have a significant impact on whether a Female student is likely to pass or fail the course. <br>

In [None]:
# Check if we have any null values.
# It looks like we dont have any null values.
df_rows_with_null = df[df.isnull().any(axis=1)].head()
df_rows_with_null

<img src="https://media.giphy.com/media/VG7gGGBzhBSP6/giphy.gif">

# Let's Start with the Algorithms:
1) We don't have any Null values so we dont need to fill for missing values.

In [None]:
grades = df.drop(["G3", "grade_status"], axis=1) #we drop the labels for the training set.
grades_labels = df["G3"].copy()

print(grades.shape, grades_labels.shape)

In [None]:
# We need to preprocess the categorical values which are:
# We have grade_status, sex, Mjob, Fjob, schoolsup are objects so we need to preprocess those columns.
grades.dtypes

In [None]:
# Let's split the numerical and categorical values to fit it into our Linear Regression Model.

grades_num = grades.drop(["sex", "Mjob", "Fjob", "schoolsup"], axis=1)
grades_cat = grades[['sex', 'Mjob', 'Fjob', 'schoolsup']]

# Reference to the Class Below:
The class below is used to encode categorical values in the feature columns. <br>
The source of this class is found in the following website: https://github.com/scikit-learn/scikit-learn/pull/9151
which I copied from Pull Request #9151

In [None]:
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.utils import check_array
from sklearn.preprocessing import LabelEncoder
from scipy import sparse

class CategoricalEncoder(BaseEstimator, TransformerMixin):
    """Encode categorical features as a numeric array.
    The input to this transformer should be a matrix of integers or strings,
    denoting the values taken on by categorical (discrete) features.
    The features can be encoded using a one-hot aka one-of-K scheme
    (``encoding='onehot'``, the default) or converted to ordinal integers
    (``encoding='ordinal'``).
    This encoding is needed for feeding categorical data to many scikit-learn
    estimators, notably linear models and SVMs with the standard kernels.
    Read more in the :ref:`User Guide <preprocessing_categorical_features>`.
    Parameters
    ----------
    encoding : str, 'onehot', 'onehot-dense' or 'ordinal'
        The type of encoding to use (default is 'onehot'):
        - 'onehot': encode the features using a one-hot aka one-of-K scheme
          (or also called 'dummy' encoding). This creates a binary column for
          each category and returns a sparse matrix.
        - 'onehot-dense': the same as 'onehot' but returns a dense array
          instead of a sparse matrix.
        - 'ordinal': encode the features as ordinal integers. This results in
          a single column of integers (0 to n_categories - 1) per feature.
    categories : 'auto' or a list of lists/arrays of values.
        Categories (unique values) per feature:
        - 'auto' : Determine categories automatically from the training data.
        - list : ``categories[i]`` holds the categories expected in the ith
          column. The passed categories are sorted before encoding the data
          (used categories can be found in the ``categories_`` attribute).
    dtype : number type, default np.float64
        Desired dtype of output.
    handle_unknown : 'error' (default) or 'ignore'
        Whether to raise an error or ignore if a unknown categorical feature is
        present during transform (default is to raise). When this is parameter
        is set to 'ignore' and an unknown category is encountered during
        transform, the resulting one-hot encoded columns for this feature
        will be all zeros.
        Ignoring unknown categories is not supported for
        ``encoding='ordinal'``.
    Attributes
    ----------
    categories_ : list of arrays
        The categories of each feature determined during fitting. When
        categories were specified manually, this holds the sorted categories
        (in order corresponding with output of `transform`).
    Examples
    --------
    Given a dataset with three features and two samples, we let the encoder
    find the maximum value per feature and transform the data to a binary
    one-hot encoding.
    >>> from sklearn.preprocessing import CategoricalEncoder
    >>> enc = CategoricalEncoder(handle_unknown='ignore')
    >>> enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]])
    ... # doctest: +ELLIPSIS
    CategoricalEncoder(categories='auto', dtype=<... 'numpy.float64'>,
              encoding='onehot', handle_unknown='ignore')
    >>> enc.transform([[0, 1, 1], [1, 0, 4]]).toarray()
    array([[ 1.,  0.,  0.,  1.,  0.,  0.,  1.,  0.,  0.],
           [ 0.,  1.,  1.,  0.,  0.,  0.,  0.,  0.,  0.]])
    See also
    --------
    sklearn.preprocessing.OneHotEncoder : performs a one-hot encoding of
      integer ordinal features. The ``OneHotEncoder assumes`` that input
      features take on values in the range ``[0, max(feature)]`` instead of
      using the unique values.
    sklearn.feature_extraction.DictVectorizer : performs a one-hot encoding of
      dictionary items (also handles string-valued features).
    sklearn.feature_extraction.FeatureHasher : performs an approximate one-hot
      encoding of dictionary items or strings.
    """

    def __init__(self, encoding='onehot', categories='auto', dtype=np.float64,
                 handle_unknown='error'):
        self.encoding = encoding
        self.categories = categories
        self.dtype = dtype
        self.handle_unknown = handle_unknown

    def fit(self, X, y=None):
        """Fit the CategoricalEncoder to X.
        Parameters
        ----------
        X : array-like, shape [n_samples, n_feature]
            The data to determine the categories of each feature.
        Returns
        -------
        self
        """

        if self.encoding not in ['onehot', 'onehot-dense', 'ordinal']:
            template = ("encoding should be either 'onehot', 'onehot-dense' "
                        "or 'ordinal', got %s")
            raise ValueError(template % self.handle_unknown)

        if self.handle_unknown not in ['error', 'ignore']:
            template = ("handle_unknown should be either 'error' or "
                        "'ignore', got %s")
            raise ValueError(template % self.handle_unknown)

        if self.encoding == 'ordinal' and self.handle_unknown == 'ignore':
            raise ValueError("handle_unknown='ignore' is not supported for"
                             " encoding='ordinal'")

        X = check_array(X, dtype=np.object, accept_sparse='csc', copy=True)
        n_samples, n_features = X.shape

        self._label_encoders_ = [LabelEncoder() for _ in range(n_features)]

        for i in range(n_features):
            le = self._label_encoders_[i]
            Xi = X[:, i]
            if self.categories == 'auto':
                le.fit(Xi)
            else:
                valid_mask = np.in1d(Xi, self.categories[i])
                if not np.all(valid_mask):
                    if self.handle_unknown == 'error':
                        diff = np.unique(Xi[~valid_mask])
                        msg = ("Found unknown categories {0} in column {1}"
                               " during fit".format(diff, i))
                        raise ValueError(msg)
                le.classes_ = np.array(np.sort(self.categories[i]))

        self.categories_ = [le.classes_ for le in self._label_encoders_]

        return self

    def transform(self, X):
        """Transform X using one-hot encoding.
        Parameters
        ----------
        X : array-like, shape [n_samples, n_features]
            The data to encode.
        Returns
        -------
        X_out : sparse matrix or a 2-d array
            Transformed input.
        """
        X = check_array(X, accept_sparse='csc', dtype=np.object, copy=True)
        n_samples, n_features = X.shape
        X_int = np.zeros_like(X, dtype=np.int)
        X_mask = np.ones_like(X, dtype=np.bool)

        for i in range(n_features):
            valid_mask = np.in1d(X[:, i], self.categories_[i])

            if not np.all(valid_mask):
                if self.handle_unknown == 'error':
                    diff = np.unique(X[~valid_mask, i])
                    msg = ("Found unknown categories {0} in column {1}"
                           " during transform".format(diff, i))
                    raise ValueError(msg)
                else:
                    # Set the problematic rows to an acceptable value and
                    # continue `The rows are marked `X_mask` and will be
                    # removed later.
                    X_mask[:, i] = valid_mask
                    X[:, i][~valid_mask] = self.categories_[i][0]
            X_int[:, i] = self._label_encoders_[i].transform(X[:, i])

        if self.encoding == 'ordinal':
            return X_int.astype(self.dtype, copy=False)

        mask = X_mask.ravel()
        n_values = [cats.shape[0] for cats in self.categories_]
        n_values = np.array([0] + n_values)
        indices = np.cumsum(n_values)

        column_indices = (X_int + indices[:-1]).ravel()[mask]
        row_indices = np.repeat(np.arange(n_samples, dtype=np.int32),
                                n_features)[mask]
        data = np.ones(n_samples * n_features)[mask]

        out = sparse.csc_matrix((data, (row_indices, column_indices)),
                                shape=(n_samples, indices[-1]),
                                dtype=self.dtype).tocsr()
        if self.encoding == 'onehot-dense':
            return out.toarray()
        else:
            return out

# Feature Scaling and Transformation Pipelines:

In [None]:
# Transform our numeric values into a StandardScaler form for better performance of the linear regressoin model.
# We use pipeline to automize the steps for the linear regression model.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

numeric_pipeline = Pipeline([
    ('standard_scaler', StandardScaler())
])

# fit and transorm into standard_scaler form the numeric values.
grading_numeric_tr = numeric_pipeline.fit_transform(grades_num)
grading_numeric_tr

# Referencing the Code Below: 
## Hands on Machine Learning with Scikit-Learn & Tensorflow by Aurélien Geron <br>
1) I will recommend to anyone who wants to learn more about machine learning to buy this book totally worth it ! I haven't finished it yet but I can tell you everyday I am learning more and more from this book. The insights are super interesting! <br><br>
2) The main purpose of the DataFrameSelector class is to feed into our Pipeline our numerical and cateogrical attributes in a DataFrame format as simple as that! Then we Scale the data and after that we just make a union of the categorical_features and numerical_features.


In [None]:
from sklearn.base import BaseEstimator, TransformerMixin
# Create a class to select numerical or categorical columns.
# since Scikit-learn dosen't handle DataFrames yet.
# BaseEstimator is to avoid kargs and args which will help us later for hyperparameter tuning.
# TranformerMixin helps us with the fit_transform function which we don't have to implement here since we have TransformerMixin.
class DataFrameSelector(BaseEstimator, TransformerMixin):
    def __init__(self, attribute_names):
        self.attribute_names = attribute_names
    def fit (self, X, y=None):
        return self
    def transform(self, X):
        return X[self.attribute_names].values

In [None]:
numerical_features = list(grades_num)
categorical_features = ['sex', 'Mjob', 'Fjob', 'schoolsup']

numeric_pipeline = Pipeline([
    ('selector', DataFrameSelector(numerical_features)),
    ('standard_scaler', StandardScaler())
])

categorical_pipeline = Pipeline([
    ('selector', DataFrameSelector(categorical_features)),
    ('cat_encoder', CategoricalEncoder(encoding="onehot-dense"))
])

# Now lets join both pipelines into one.

In [None]:
from sklearn.pipeline import FeatureUnion

full_pipeline = FeatureUnion(transformer_list=[
    ("numeric_pipeline", numeric_pipeline),
    ("categorical_pipeline", categorical_pipeline),
])

# Now let's run the full pipeline and all our data is scaled! Ready for applying the scaled data to the different algorthims.
grades_scaled = full_pipeline.fit_transform(grades)
grades_scaled

In [None]:
grades_scaled.shape

In [None]:
from sklearn.linear_model import LinearRegression

lin_reg = LinearRegression()
lin_reg.fit(grades_scaled, grades_labels)

In [None]:
top_data = grades.iloc[:5]
top_labels = grades_labels[:5]
top_data_ready = full_pipeline.transform(top_data)
print("Predictions:", lin_reg.predict(top_data_ready))

In [None]:
# We got 1/5 correct.
print("Labels", list(top_labels))

In [None]:
top_data_ready

In [None]:
from sklearn.metrics import mean_squared_error
# Let's see what is the prediction error of our model.
# We have an error of 3.96 points.

grades_predictions = lin_reg.predict(grades_scaled)
lin_mse = mean_squared_error(grades_labels, grades_predictions)
lin_rmse = np.sqrt(lin_mse)
lin_rmse

In [None]:
# Let's see how accurate is our model.
from sklearn import metrics
accuracy = metrics.r2_score(grades_labels, grades_predictions)
accuracy
# 24% accurate.

In [None]:
from sklearn.metrics import mean_absolute_error
# We have a lower error of 3.033 points.
lin_mae = mean_absolute_error(grades_labels, grades_predictions)
lin_mae

In [None]:
from sklearn.tree import DecisionTreeRegressor

tree_reg = DecisionTreeRegressor(random_state=42)
tree_reg.fit(grades_scaled, grades_labels)

## The reason why we use (-) in tree_rmse_scores:
We use a negative sign because the method of cross_validation considers that the best model unlike linear regression in which the lower is the error rate the better will perform our model. [Cost Function]<br>

## Avoiding overfitting: 
We have to use cross validation in the training set in order to avoid overfitting. Overfitting basically means that our model fits well to our training set since it memorized it but will not perform well when new data is analyzed.

In [None]:
# Let's implement the Decision Tree Regressor 
from sklearn.model_selection import cross_val_score 
scores = cross_val_score(tree_reg, grades_scaled, grades_labels,
                        scoring="neg_mean_squared_error", cv=10)

tree_rmse_scores = np.sqrt(-scores)
# What this does is that we will get 10 different scores in order to avoid overfitting to see how our model behaves with
# newly introduced data.

In [None]:
def show_result(scores):
    print("Scores:", scores)
    print("Mean: ", scores.mean())
    
show_result(tree_rmse_scores)
# The model gives us worse results than the linearregression results.


In [None]:
# It slightly improves when we use the linea regression model.
lin_scores = cross_val_score(lin_reg, grades_scaled, grades_labels,
                             scoring="neg_mean_squared_error", cv=10)

lin_rmse_scores = np.sqrt(-lin_scores)

show_result(lin_rmse_scores)


In [None]:
# Lets try with RandomForestRegressor
from sklearn.ensemble import RandomForestRegressor

forest_reg = RandomForestRegressor()
forest_reg.fit(grades_scaled, grades_labels)

In [None]:
grades_predictions = forest_reg.predict(grades_scaled)
forest_mse = mean_squared_error(grades_labels, grades_predictions)
forest_rmse = np.sqrt(forest_mse)
forest_rmse
# 1.69 points error by far the lowest error from all the algorithms we have tried out.

In [None]:
# Accuracy for random_forest_regressor
accuracy = metrics.r2_score(grades_labels, grades_predictions)
accuracy
# 86% accuracy.

In [None]:
from sklearn.model_selection import cross_val_score
# Best score so far.
forest_scores = cross_val_score(forest_reg, grades_scaled, grades_labels,
                               scoring="neg_mean_squared_error", cv=10)

forest_rmse_scores = np.sqrt(-forest_scores)
show_result(forest_rmse_scores)

In [None]:
# Implement Support Vector Machines.
# 4.09 points error this does not look good.
from sklearn.svm import SVR
svm_reg = SVR(kernel="linear")
svm_reg.fit(grades_scaled, grades_labels)
grades_predictions = svm_reg.predict(grades_scaled)
svm_mse = mean_squared_error(grades_labels, grades_predictions)
svm_rmse = np.sqrt(svm_mse)
svm_rmse

In [None]:
# Accuracy for SVR = 20% accuracy.
accuracy = metrics.r2_score(grades_labels, grades_predictions)
accuracy

In [None]:
# Now we will implement GridSearchCV which tells us what are our best hyperparameters to 
# have the lowest error rate.
from sklearn.model_selection import GridSearchCV

param_grid = [
    # 3 x 4 combinations of hyperparameters.
    {'n_estimators': [3, 10, 30, 50], 'max_features': [2, 4, 6, 8, 10]},
    # then we try (2x3) combinations with bootstrap set as False.
    {'bootstrap': [False], 'n_estimators': [3, 10, 30], 'max_features': [2,3,4,5]}
]

forest_reg = RandomForestRegressor(random_state=42)
# Let's train acrros 5 folds, thats a total of (12 + 6) * 5 = 90 rounds of training.
grid_search = GridSearchCV(forest_reg, param_grid, cv=5,
                          scoring='neg_mean_squared_error')
grid_search.fit(grades_scaled, grades_labels)

In [None]:
# Best parameter to obtain a better score.
# When you get the highest parameters you can tune the hyperparameters and slightly increase the parameters
# In order to see by how much the error rate decreases.
grid_search.best_params_

In [None]:
cvres = grid_search.cv_results_

for mean_score, params in zip(cvres["mean_test_score"], cvres["params"]):
    print(np.sqrt(-mean_score), params)

In [None]:
feature_importances = grid_search.best_estimator_.feature_importances_
feature_importances

In [None]:
cat_encoder = categorical_pipeline.named_steps["cat_encoder"]
cat_one_hot_attribs = list(cat_encoder.categories_[0])
qualities = numerical_features + cat_one_hot_attribs
sorted(zip(feature_importances, qualities), reverse=True)
# This shows the relative importance of each attribute for making accurate prediction.
# Absences gives you a guidance on what score the student will obtain in G3. It is the feature that is the most important

In [None]:
# Create a list without "grade_status" we already know this will be the highest indicator on whether a student will pass or fail
# the course.

df = df[
['G3',
 'sex',
 'age',
 'Medu',
 'Fedu',
 'Mjob',
 'Fjob',
 'traveltime',
 'studytime',
 'failures',
 'schoolsup',
 'goout',
 'Dalc',
 'Walc',
 'health',
 'absences',
 'address',
 'paid_classes',
 'higher_education',
 'internet_availability',
 'b_Pstatus',
 'relationship',
 'educational_support']
]

lst_y = df.columns.tolist()
lst_y

In [None]:
# Time to test the model hopefully it gets a good score into new seen data.
df.head()
strat_test_set = df[lst_y]
strat_test_set.head()

In [None]:
# Let's do the final test.
# We have an error of 1.50 points.
final_model = grid_search.best_estimator_

X_test = strat_test_set.drop("G3", axis=1)
y_test = strat_test_set["G3"].copy()

X_test_prepared = full_pipeline.transform(X_test)
final_predictions = final_model.predict(X_test_prepared)
final_mse = mean_squared_error(y_test, final_predictions)
final_rmse = np.sqrt(final_mse)
final_rmse

In [None]:
from sklearn import metrics
# Our model is 89% accurate. Not bad!
accuracy = metrics.r2_score(y_test, final_predictions)
accuracy

In [None]:
# Finally let's create a full pipeline with both the preparation phase and prediction.
full_pipeline_with_predictor = Pipeline([
    ("preparation", full_pipeline),
    ("linear",  RandomForestRegressor())
])

full_pipeline_with_predictor.fit(grades, grades_labels)
full_pipeline_with_predictor.predict(top_data)


In [None]:
# Comparing our predictions with our labels now we can say that our Random Forest Regressor Model is
# really accurate in predicting math scores!
print("Labels", list(top_labels))

<img src="https://media.giphy.com/media/LfNYfVwk0ICbu/giphy.gif">

# Conclusion:
In this project we did a deep analysis of what could be possible factor on whether a student is likely to get a high score or a low score.The data does not contain that much information but still we were able to predict a pretty precise RandomForest Regressor algorithm that predicts what score a student will get in the foreseen feature by analyzing the features. This is my first Kernel so I am open to constructive criticisms. 

## Reference:
I will like to reference the book Hands On Machine Learning with Scikit Learn and Tensorflow by Aurélien Géron. It really helped me gather a general concept of how Regression models work. 

## For what would we use a Linear Regression Model: 
It is to my understanding that the linear regression model is used to predict values with a given amount of features. Here we learned how to tuned Hyperparameters in a more automatic way in order for our model to have better predictions when new features of students will come in and will let us know as data comes by if a student is most likely to pass the class or not. 