# Mock Analysis

This is a mock analysis of the ASUU Strike Project. This is to help us figure out what to do with the data and optimise the process.

### Import packages

In [83]:
import pandas as pd
import numpy as np
import seaborn as sns

### Load data into dataframe

In [84]:
sample_df = pd.read_csv("sample.csv")

sample_df.head()

Unnamed: 0,Timestamp,Are you a student of UNILAG?,Current level of study \n,How old are you?,What is your gender?,Marital Status,What's your faculty?,What is your department?,How has the just concluded strike affected you ?,Did you acquire any skill relevant to your course of study?,...,How would you rate your preparation for the exams ? [After Strike],How would you rate the quality of your lectures? [Before Strike],How would you rate the quality of your lectures? [After Strike],How confident are you in the significance of your academics [Before Strike],How confident are you in the significance of your academics [After Strike],What was your CGPA before the strike?,What is your current CGPA?\nDon't be shy,Were you employed during the strike?,Did you quit the job when the strike was called-off,"Would you like to be contacted for follow-up questions? If so, kindly drop your email.\nOnly drop your email if you're comfortable doing so!"
0,2/9/2023 10:18:58,Yes,400level,22–24,Female,Single,Management Science,Finance,1. It has increased the amount of years I’m to...,Yes,...,Poor,Neutral,Poor,Good,Neutral,4.3,4.09,Yes,No,
1,2/9/2023 10:39:25,Yes,300level,19–21,Male,Single,Science,Statistics,Positively,Yes,...,Good,Good,Good,Good,Good,4.51,4.51,Yes,No,
2,2/9/2023 10:59:26,Yes,300level,25–27,Male,Single,Science,Botany,It has affected my reading culture and habits,Yes,...,Neutral,Neutral,Poor,Good,Neutral,4.09,4.08,No,No,
3,2/9/2023 23:01:53,Yes,400level,22–24,Male,Single,Science,Geophysics,NIL,No,...,Neutral,Neutral,Poor,Neutral,Poor,3.88,3.56,No,No,
4,2/9/2023 23:03:22,Yes,200level,16–18,Female,Married,Engineering,Civil Engineering,Made me a millionaire,No,...,Neutral,Neutral,Neutral,Neutral,Neutral,2.96,2.02,Yes,No,


In [85]:
sample_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14 entries, 0 to 13
Data columns (total 22 columns):
 #   Column                                                                                                                                       Non-Null Count  Dtype  
---  ------                                                                                                                                       --------------  -----  
 0   Timestamp                                                                                                                                    14 non-null     object 
 1   Are you a student of UNILAG?                                                                                                                 14 non-null     object 
 2   Current level of study 
                                                                                                                     14 non-null     object 
 3   How old are you?                          

## Clean data

Drop all non-UNILAG students

In [86]:
sample_df = sample_df[sample_df["Are you a student of UNILAG?"] == "Yes"]

Drop unrequired columns

In [87]:
cols_to_drop = ["Timestamp", 
                "Are you a student of UNILAG?",
                "How has the just concluded strike affected you ?",
                "What challenges did you experience during resumption after the long strike?",
                "Would you like to be contacted for follow-up questions? If so, kindly drop your email.\nOnly drop your email if you're comfortable doing so!",
                ]
sample_df.drop(columns=cols_to_drop, axis=1, inplace=True)
sample_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 13 entries, 0 to 13
Data columns (total 17 columns):
 #   Column                                                                       Non-Null Count  Dtype  
---  ------                                                                       --------------  -----  
 0   Current level of study 
                                                     13 non-null     object 
 1   How old are you?                                                             13 non-null     object 
 2   What is your gender?                                                         13 non-null     object 
 3   Marital  Status                                                              13 non-null     object 
 4   What's your faculty?                                                         13 non-null     object 
 5   What is your department?                                                     13 non-null     object 
 6   Did you acquire any skill relevant to your c

Rename columns

In [88]:
new_col_names = [
    "level", "age", "gender", "marital_status", "faculty", "department", "skill", "prep_before", "prep_after", 
    "quality_before", "quality_after", "confidence_before", "confidence_after", 
    "cgpa_before", "cgpa_after", "employment", "quit_job"
]

sample_df.columns = new_col_names

sample_df.head()

Unnamed: 0,level,age,gender,marital_status,faculty,department,skill,prep_before,prep_after,quality_before,quality_after,confidence_before,confidence_after,cgpa_before,cgpa_after,employment,quit_job
0,400level,22–24,Female,Single,Management Science,Finance,Yes,Neutral,Poor,Neutral,Poor,Good,Neutral,4.3,4.09,Yes,No
1,300level,19–21,Male,Single,Science,Statistics,Yes,Good,Good,Good,Good,Good,Good,4.51,4.51,Yes,No
2,300level,25–27,Male,Single,Science,Botany,Yes,Good,Neutral,Neutral,Poor,Good,Neutral,4.09,4.08,No,No
3,400level,22–24,Male,Single,Science,Geophysics,No,Poor,Neutral,Neutral,Poor,Neutral,Poor,3.88,3.56,No,No
4,200level,16–18,Female,Married,Engineering,Civil Engineering,No,Neutral,Neutral,Neutral,Neutral,Neutral,Neutral,2.96,2.02,Yes,No


**Remove Outliers**

Ouliers include all 100 Level students, as they have nothing to compare against. Some students might have entered wrong values for CGPA. Take note of that as well.

In [89]:
# remove 100 level students
sample_df = sample_df[sample_df["level"]!="100level"]

# remove outliers (CGPA = 0)
sample_df = sample_df[
    (sample_df["cgpa_before"]>0) & (sample_df["cgpa_after"]>0)
    ]
sample_df.describe()

Unnamed: 0,cgpa_before,cgpa_after
count,11.0,11.0
mean,3.999091,3.820909
std,0.557018,0.807235
min,2.96,2.02
25%,3.64,3.58
50%,4.09,4.08
75%,4.39,4.295
max,4.71,4.79


Compute target variable

In [90]:
sample_df["cgpa_change"] = sample_df["cgpa_after"] - sample_df["cgpa_before"]

sample_df.describe()

Unnamed: 0,cgpa_before,cgpa_after,cgpa_change
count,11.0,11.0,11.0
mean,3.999091,3.820909,-0.178182
std,0.557018,0.807235,0.348965
min,2.96,2.02,-0.94
25%,3.64,3.58,-0.31
50%,4.09,4.08,-0.01
75%,4.39,4.295,0.03
max,4.71,4.79,0.28


## Exploratory Data Analysis

## Build Model

### Baseline

### Iterate

### Evaluate

## Communicate