<a href="https://colab.research.google.com/github/itsayushiii/Student-Intervention-System-Project/blob/main/student_intervention_system_final.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
import pandas as pd
df =pd.read_csv('/content/drive/MyDrive/datasets/student-data.csv')
df.head()

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,...,internet,romantic,famrel,freetime,goout,Dalc,Walc,health,absences,passed
0,GP,F,18,U,GT3,A,4,4,at_home,teacher,...,no,no,4,3,4,1,1,3,6,no
1,GP,F,17,U,GT3,T,1,1,at_home,other,...,yes,no,5,3,3,1,1,3,4,no
2,GP,F,15,U,LE3,T,1,1,at_home,other,...,yes,no,4,3,2,2,3,3,10,yes
3,GP,F,15,U,GT3,T,4,2,health,services,...,yes,yes,3,2,2,1,1,5,2,yes
4,GP,F,16,U,GT3,T,3,3,other,other,...,no,no,4,3,2,1,2,5,4,yes


Attributes for student-data.csv:

1. school - student's school (binary: "GP" or "MS")
2. sex - student's sex (binary: "F" - female or "M" - male)
3. age - student's age (numeric: from 15 to 22)
4. address - student's home address type (binary: "U" - urban or "R" - rural)
famsize - family size (binary: "LE3" - less or equal to 3 or "GT3" - greater than 3)
5. Pstatus - parent's cohabitation status (binary: "T" - living together or "A" - apart)
6. Medu - mother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
7. Fedu - father's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
8. Mjob - mother's job (nominal: "teacher", "health" care related, civil "services" (e.g. administrative or police), "at_home" or "other")
9. Fjob - father's job (nominal: "teacher", "health" care related, civil "services" (e.g. administrative or police), "at_home" or "other")
10. reason - reason to choose this school (nominal: close to "home", school "reputation", "course" preference or "other")
11. guardian - student's guardian (nominal: "mother", "father" or "other")
12. traveltime - home to school travel time numeric: 1 - 15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - 1 hour
13. studytime - weekly study time (numeric: 1 - 2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - 10 hours
14. failures - number of past class failures (numeric: n if 1<=n < 3, else 4)
15. schoolsup - extra educational support (binary: yes or no)
16. famsup - family educational support (binary: yes or no)
17. paid - extra paid classes within the course subject (Math or Portuguese) (binary: yes or no) activities - extra-curricular activities (binary: yes or no)
18. nursery - attended nursery school (binary: yes or no)
19. higher - wants to take higher education (binary: yes or no)
20. internet - Internet access at home (binary: yes or no)
21. romantic - with a romantic relationship (binary: yes or no)
22. famrel - quality of family relationships (numeric: from 1 - very bad to 5 - excellent) freetime - free time after school (numeric: from 1 - very low to 5 - very high)
23. goout - going out with friends (numeric: from 1 - very low to 5 - very high)
24. Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high)
25. Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)
26. health - current health status (numeric: from 1 - very bad to 5 - very good)
27. absences - number of school absences (numeric: from 0 to 93)
28. passed - did the student pass the final exam (binary: yes or no)

Uploading the student data in csv format using pandas library

Removing irrelevant columns from the dataframe(df) that does not help in prediction

In [4]:
df=df.drop(['sex','age','address','famsize','Pstatus','Mjob','Fjob','romantic','famrel','Dalc','Walc','nursery','school','reason','guardian'],axis='columns')
df.head()

Unnamed: 0,Medu,Fedu,traveltime,studytime,failures,schoolsup,famsup,paid,activities,higher,internet,freetime,goout,health,absences,passed
0,4,4,2,2,0,yes,no,no,no,yes,no,3,4,3,6,no
1,1,1,1,2,0,no,yes,no,no,yes,yes,3,3,3,4,no
2,1,1,1,2,3,yes,no,yes,no,yes,yes,3,2,3,10,yes
3,4,2,1,3,0,no,yes,yes,yes,yes,yes,2,2,5,2,yes
4,3,3,1,2,0,no,yes,yes,no,yes,no,3,2,5,4,yes


performing Standardisation using label encoder to convert string data into numeric form
since there are many columns like schoolsup, higher etc that are present in string form hence we have to convert it into numeric form to bring all the data in same range

In [5]:
from sklearn.preprocessing import LabelEncoder
le_schoolsup	=LabelEncoder()
le_famsup =LabelEncoder()
le_paid	=LabelEncoder()
le_activities =LabelEncoder()
le_higher =LabelEncoder()
le_internet =LabelEncoder()
le_passed =LabelEncoder()


Adding the new columns(numeric form) into our original dataframe(df)

In [6]:
df['schoolsup'] = le_schoolsup.fit_transform(df['schoolsup'])
df['famsup'] = le_famsup.fit_transform(df['famsup'])
df['paid'] = le_paid.fit_transform(df['paid'])
df['higher'] = le_higher.fit_transform(df['higher'])
df['internet'] = le_internet.fit_transform(df['internet'])
df['activities'] = le_activities.fit_transform(df['activities'])
df['passed'] = le_passed.fit_transform(df['passed'])

Separating the target values and input information that will be used for prediction from our dataframe(df). From df passed is the target value which we have to predict that a studnt will get pass or not based on the input information.

In [7]:
inputs = df.drop(['passed'],axis='columns')
target = df['passed']

printing inputs first 5 rows

In [8]:
inputs.head()

Unnamed: 0,Medu,Fedu,traveltime,studytime,failures,schoolsup,famsup,paid,activities,higher,internet,freetime,goout,health,absences
0,4,4,2,2,0,1,0,0,0,1,0,3,4,3,6
1,1,1,1,2,0,0,1,0,0,1,1,3,3,3,4
2,1,1,1,2,3,1,0,1,0,1,1,3,2,3,10
3,4,2,1,3,0,0,1,1,1,1,1,2,2,5,2
4,3,3,1,2,0,0,1,1,0,1,0,3,2,5,4


printing target first 5 rows

In [9]:
target.head()

0    0
1    0
2    1
3    1
4    1
Name: passed, dtype: int64

We are dividing the whole dataset into training and testing sets. A training set will be used to train our model and a Testing set will be used to check the accuracy of our model.

In [10]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(inputs,target,random_state=20)

printing x_train's first 5 rows

In [11]:
x_train.head()

Unnamed: 0,Medu,Fedu,traveltime,studytime,failures,schoolsup,famsup,paid,activities,higher,internet,freetime,goout,health,absences
326,3,3,1,1,0,0,0,0,1,1,1,3,5,5,3
200,4,3,1,2,0,0,1,0,1,1,1,3,5,2,2
53,4,4,1,1,0,1,1,1,0,1,1,3,4,5,0
147,1,2,1,2,0,0,1,1,0,1,1,3,2,5,2
16,4,4,1,3,0,0,1,1,1,1,1,2,3,2,6


In [20]:
y_train.head()

326    1
200    1
53     1
147    1
16     1
Name: passed, dtype: int64

In [21]:
x_test.head()

Unnamed: 0,Medu,Fedu,traveltime,studytime,failures,schoolsup,famsup,paid,activities,higher,internet,freetime,goout,health,absences
10,4,4,1,2,0,0,1,1,0,1,1,3,3,2,0
261,4,3,1,2,0,0,1,1,0,1,1,3,2,3,2
353,1,1,3,1,1,0,1,0,0,1,1,4,4,5,4
276,3,2,2,2,0,0,0,0,0,0,1,1,1,5,75
17,3,3,3,2,0,1,1,0,1,1,0,3,2,4,4


In [22]:
y_test.head()

10     0
261    0
353    0
276    0
17     1
Name: passed, dtype: int64

Since it is a classification problem We are using Decision tree model.

In [12]:
from sklearn import tree

Making a model of Decision Tree and traing it with training set



In [13]:
model = tree.DecisionTreeClassifier()
model.fit(x_train,y_train)

Checking the accuracy of our model

In [14]:
model.score(x_test,y_test)

0.6868686868686869

Predicting for all x_test data

In [15]:
model.predict(x_test)

array([1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
       1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1,
       1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
       1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [16]:
x_test.head()

Unnamed: 0,Medu,Fedu,traveltime,studytime,failures,schoolsup,famsup,paid,activities,higher,internet,freetime,goout,health,absences
10,4,4,1,2,0,0,1,1,0,1,1,3,3,2,0
261,4,3,1,2,0,0,1,1,0,1,1,3,2,3,2
353,1,1,3,1,1,0,1,0,0,1,1,4,4,5,4
276,3,2,2,2,0,0,0,0,0,0,1,1,1,5,75
17,3,3,3,2,0,1,1,0,1,1,0,3,2,4,4


Predicting if a student will get pass or not based on given information , mother's education = 4, father's education = 4, traveltime =1, studytime=2,failures =0,schoolsup = 0,paid = 1, activities =0, higher = 1, internet = 1, freetime = 3, goout=3, health=2, absences=0

In [18]:
model.predict([[4,4,1,2,0,0,1,1,0,1,1,3,3,2,0]])



array([1])