# Individualized Progress Tracking

This notebook is part of BCA / Sem 5 / Machine Learning / Project / Individualized Progress Tracking

#### Objective
 
Develop a tool that allows users to track their progress in various learning modules. Include visualizations and statistics to help users understand their strengths and areas that need improvement.

Dataset : https://www.statlect.com/datasets/SimpleR-Pre-loaded-Student-performance.csv

### Setting up the environment

```bash
# after creating a virtual environment and activating it
pip install -r requirements.txt
# also rename dataset to StudentPerformance.csv
mv SimpleR-Pre-loaded-Student-performance.csv StudentPerformance.csv
```


In [37]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [38]:
df=pd.read_csv('StudentsPerformance.csv')

In [39]:
df.head()

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
0,female,group B,bachelor's degree,standard,none,72,72,74
1,female,group C,some college,standard,completed,69,90,88
2,female,group B,master's degree,standard,none,90,95,93
3,male,group A,associate's degree,free/reduced,none,47,57,44
4,male,group C,some college,standard,none,76,78,75


In [40]:
print(df.shape)
df.info()

(1000, 8)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   gender                       1000 non-null   object
 1   race/ethnicity               1000 non-null   object
 2   parental level of education  1000 non-null   object
 3   lunch                        1000 non-null   object
 4   test preparation course      1000 non-null   object
 5   math score                   1000 non-null   int64 
 6   reading score                1000 non-null   int64 
 7   writing score                1000 non-null   int64 
dtypes: int64(3), object(5)
memory usage: 62.6+ KB


##### We saw a memory usage of 62 KB, which is not bad. But we can do better. 

- Scores never go below 0 or above 100, so we can use int8 instead of int64

- race/ethnicity are within 5 categories (`'group A', 'group B', 'group C', 'group D', 'group E'`), so we can use category instead of object

- parental level of education are within 6 categories (`["bachelor's degree", 'some college', "master's degree",
       "associate's degree", 'high school', 'some high school']`), so we can use category instead of object

- lunch are within 2 categories (`['standard', 'free/reduced']`), so we can use category instead of object there too.


In [51]:
df['math score'] = df['math score'].astype(np.int8)
df['reading score'] = df['reading score'].astype(np.int8)
df['writing score'] = df['writing score'].astype(np.int8)
df['gender'] = df['gender'].astype('category')
df['gender'] = df['gender'].replace({'F': 'female', 'M': 'male'})
df['race/ethnicity'] = df['race/ethnicity'].astype('category')
df['parental level of education'] = df['parental level of education'].astype('category')
df['lunch'] = df['lunch'].astype('category')


In [52]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
 #   Column                       Non-Null Count  Dtype   
---  ------                       --------------  -----   
 0   gender                       1000 non-null   category
 1   race/ethnicity               1000 non-null   category
 2   parental level of education  1000 non-null   category
 3   lunch                        1000 non-null   category
 4   test preparation course      1000 non-null   object  
 5   math score                   1000 non-null   int8    
 6   reading score                1000 non-null   int8    
 7   writing score                1000 non-null   int8    
dtypes: category(4), int8(3), object(1)
memory usage: 15.4+ KB


##### Now we are using just a little over 15 KB of memory, which is a **75% reduction** in memory usage 

**Lets look for missing values now**

In [59]:
df.isnull().sum()

gender                         0
race/ethnicity                 0
parental level of education    0
lunch                          0
test preparation course        0
math score                     0
reading score                  0
writing score                  0
dtype: int64

> No missing values, which is great!