# DATA ANALYSIS PROJECT: ONLINE EDUCATION SYSTEM

## Determining the Problems

1. Bagaimana demografi dan latar belakang siswa yang menggunakan sistem pendidikan online?
2. Bagaimana skor performa siswa dalam berbagai status ekonomi dan lokasi rumah (perkotaan vs pedesaan)?
3. Bagaimana kualitas fasilitas internet berdampak pada performa dan kepuasan siswa secara keseluruhan?
4. 

## Importing Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import seaborn as sns

## Data Wrangling

### Loading the Data

In [2]:
df = pd.read_csv('Dataset/ONLINE EDUCATION SYSTEM REVIEW.csv')

In [3]:
df.head()

Unnamed: 0,Gender,Home Location,Level of Education,Age(Years),Number of Subjects,Device type used to attend classes,Economic status,Family size,Internet facility in your locality,Are you involved in any sports?,...,Time spent on social media (Hours),Interested in Gaming?,Have separate room for studying?,Engaged in group studies?,Average marks scored before pandemic in traditional classroom,Your interaction in online mode,Clearing doubts with faculties in online mode,Interested in?,Performance in online,Your level of satisfaction in Online Education
0,Male,Urban,Under Graduate,18,11,Laptop,Middle Class,4,5,No,...,1,No,No,No,91-100,1,1,Practical,6,Average
1,Male,Urban,Under Graduate,19,7,Laptop,Middle Class,4,1,Yes,...,1,Yes,Yes,No,91-100,1,1,Theory,3,Bad
2,Male,Rural,Under Graduate,18,5,Laptop,Middle Class,5,2,No,...,1,No,Yes,No,71-80,1,1,Both,6,Bad
3,Male,Urban,Under Graduate,18,5,Laptop,Middle Class,4,4,Yes,...,2,No,No,yes,91-100,1,2,Theory,4,Bad
4,Male,Rural,Under Graduate,18,5,Laptop,Middle Class,4,3,No,...,2,Yes,Yes,yes,81-90,3,3,Both,6,Average


### Assessing the Data

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1033 entries, 0 to 1032
Data columns (total 23 columns):
 #   Column                                                         Non-Null Count  Dtype 
---  ------                                                         --------------  ----- 
 0   Gender                                                         1033 non-null   object
 1   Home Location                                                  1033 non-null   object
 2   Level of Education                                             1033 non-null   object
 3   Age(Years)                                                     1033 non-null   int64 
 4   Number of Subjects                                             1033 non-null   int64 
 5   Device type used to attend classes                             1033 non-null   object
 6   Economic status                                                1033 non-null   object
 7   Family size                                                    1033 n

Tipe data pada "Average marks scored before pandemic in traditional classroom" belum sesuai dengan tipe data yang seharusnya. Seharusnya tipe data tersebut adalah float, bukan object.

In [5]:
df.nunique()

Gender                                                            2
Home Location                                                     2
Level of Education                                                3
Age(Years)                                                       24
Number of Subjects                                               20
Device type used to attend classes                                3
Economic status                                                   3
Family size                                                       9
Internet facility in your locality                                5
Are you involved in any sports?                                   2
Do elderly people monitor you?                                    2
Study time (Hours)                                               10
Sleep time (Hours)                                               10
Time spent on social media (Hours)                               10
Interested in Gaming?                           

In [6]:
df.isna().sum()

Gender                                                           0
Home Location                                                    0
Level of Education                                               0
Age(Years)                                                       0
Number of Subjects                                               0
Device type used to attend classes                               0
Economic status                                                  0
Family size                                                      0
Internet facility in your locality                               0
Are you involved in any sports?                                  0
Do elderly people monitor you?                                   0
Study time (Hours)                                               0
Sleep time (Hours)                                               0
Time spent on social media (Hours)                               0
Interested in Gaming?                                         

In [7]:
print("Sum of duplicated rows: ", df.duplicated().sum())

Sum of duplicated rows:  0


In [8]:
df.describe()

Unnamed: 0,Age(Years),Number of Subjects,Family size,Internet facility in your locality,Study time (Hours),Sleep time (Hours),Time spent on social media (Hours),Your interaction in online mode,Clearing doubts with faculties in online mode,Performance in online
count,1033.0,1033.0,1033.0,1033.0,1033.0,1033.0,1033.0,1033.0,1033.0,1033.0
mean,19.798645,7.03485,4.413359,3.586641,4.325266,6.947725,2.63698,2.9303,2.833495,6.696031
std,3.199158,2.81034,1.23675,1.026063,2.134233,1.324039,1.859625,1.105387,1.163629,1.920048
min,9.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0
25%,18.0,6.0,4.0,3.0,3.0,6.0,1.0,2.0,2.0,6.0
50%,19.0,7.0,4.0,4.0,4.0,7.0,2.0,3.0,3.0,7.0
75%,20.0,8.0,5.0,4.0,6.0,8.0,3.0,4.0,4.0,8.0
max,40.0,20.0,10.0,5.0,10.0,10.0,10.0,5.0,5.0,10.0


### Cleaning the Data

In [9]:
# Melihat entri "Average marks scored before pandemic in traditional classroom"
df['Average marks scored before pandemic in traditional classroom'].head()

0    91-100
1    91-100
2     71-80
3    91-100
4     81-90
Name: Average marks scored before pandemic in traditional classroom, dtype: object

- Data masih berupa object.
- Terdapat nilai '-' yang tidak sesuai. Perlu juga mengganti data dengan mencari rerata antara 2 angka tersebut.

In [10]:
def calculate_mean(value):
    # Split the string into two parts on the '-' character
    values = value.split('-')
    # Convert the parts to integers and calculate the mean
    mean_value = (int(values[0]) + int(values[1])) / 2
    return mean_value

In [11]:
# Mengaplikasikan fungsi untuk kolom tersebut
df['Average marks scored before pandemic in traditional classroom_mean'] = df['Average marks scored before pandemic in traditional classroom'].apply(calculate_mean)

In [12]:
# Melihat hasil perubahan
df['Average marks scored before pandemic in traditional classroom_mean'].head()

0    95.5
1    95.5
2    75.5
3    95.5
4    85.5
Name: Average marks scored before pandemic in traditional classroom_mean, dtype: float64

In [13]:
# Menghapus kolom yang tidak diperlukan
df.drop('Average marks scored before pandemic in traditional classroom', axis=1, inplace=True)

In [14]:
df.head()

Unnamed: 0,Gender,Home Location,Level of Education,Age(Years),Number of Subjects,Device type used to attend classes,Economic status,Family size,Internet facility in your locality,Are you involved in any sports?,...,Time spent on social media (Hours),Interested in Gaming?,Have separate room for studying?,Engaged in group studies?,Your interaction in online mode,Clearing doubts with faculties in online mode,Interested in?,Performance in online,Your level of satisfaction in Online Education,Average marks scored before pandemic in traditional classroom_mean
0,Male,Urban,Under Graduate,18,11,Laptop,Middle Class,4,5,No,...,1,No,No,No,1,1,Practical,6,Average,95.5
1,Male,Urban,Under Graduate,19,7,Laptop,Middle Class,4,1,Yes,...,1,Yes,Yes,No,1,1,Theory,3,Bad,95.5
2,Male,Rural,Under Graduate,18,5,Laptop,Middle Class,5,2,No,...,1,No,Yes,No,1,1,Both,6,Bad,75.5
3,Male,Urban,Under Graduate,18,5,Laptop,Middle Class,4,4,Yes,...,2,No,No,yes,1,2,Theory,4,Bad,95.5
4,Male,Rural,Under Graduate,18,5,Laptop,Middle Class,4,3,No,...,2,Yes,Yes,yes,3,3,Both,6,Average,85.5


## Exploratory Data Analysis (EDA)

Bagaimana demografi dan latar belakang siswa yang menggunakan sistem pendidikan online?

In [None]:
# Melihat 

## Data Visualization

## Conclusion