# Educational Dataset Analysis

This repository contains an analysis of the Educational Dataset of Pakistan. The dataset provides information on various educational indicators and infrastructure in different provinces of Pakistan.

## Dataset Description

The dataset is in CSV format and consists of 580 rows and 51 columns. Each row represents a specific entry related to educational data, while each column represents a specific attribute or variable. Some of the important columns include:

% Boys Enrolled: The percentage of boys enrolled in schools.

% Complete Primary Schools: The percentage of primary schools that are considered complete.

% Girls Enrolled: The percentage of girls enrolled in schools.

% Primary Schools with single classroom: The percentage of primary schools with only one classroom.

% Primary Schools with single teacher: The percentage of primary schools with only one teacher.

All Four Facilities: The presence of all four facilities (boundary wall, building condition satisfactory, drinking water, and more).

Area (km²): The area in square kilometers.

Terrorist Attacks Affectees: The number of people affected by terrorist attacks.

Toilet: The availability of toilets in schools.

Total number of schools: The total number of schools.


## Analysis

The analysis of the dataset includes exploring the distribution of various variables, identifying patterns, and examining the relationship between different factors. It aims to provide insights into the educational landscape of Pakistan and highlight areas that require attention and improvement.

# Import Library

In [1]:
import pandas as pd

## Read Dataset

In [2]:
ed = pd.read_csv("Educational Dataset of Pakistan.csv")

In [30]:
ed.head()

Unnamed: 0,% Boys Enrolled,% Complete Primary Schools,% Girls Enrolled,% Primary Schools with single classroom,% Primary Schools with single teacher,All Four Facilities,Any One Facility,Any Three Facilities,Any Two Facilities,Area (km²),...,Primary Schools with single classroom,Primary Schools with single teacher,Province,Retention score,School infrastructure score,Terrorist Attacks Affectees,Toilet,Total number of schools,Year,Year on Year Change
0,,0.936599,,0.034582,0.028818,,16.20%,24.73%,24.95%,768.0,...,12.0,10.0,AJK,14.8,24.929178,5379,37.393768,601,2013,
1,,0.594203,,0.359903,0.045894,,20.21%,17.38%,15.43%,1516.0,...,149.0,19.0,AJK,75.2,28.883827,5379,20.728929,595,2013,9.704002
2,,0.939068,,0.021505,0.039427,,4.55%,16.31%,20.05%,,...,6.0,11.0,AJK,0.0,22.463768,5379,43.84058,379,2013,-17.313724
3,,0.827225,,0.041885,0.13089,,22.44%,7.09%,12.20%,600.0,...,8.0,25.0,AJK,0.0,13.048128,5379,26.203209,262,2013,-4.762458
4,,0.592348,,0.346966,0.060686,,23.06%,9.78%,13.60%,2162.0,...,263.0,46.0,AJK,73.2,20.184453,5379,14.888011,1036,2013,8.122142


## Information Of Dataset

In [4]:
ed.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 580 entries, 0 to 579
Data columns (total 51 columns):
 #   Column                                                                                Non-Null Count  Dtype  
---  ------                                                                                --------------  -----  
 0   % Boys Enrolled                                                                       579 non-null    object 
 1   % Complete Primary Schools                                                            576 non-null    float64
 2   % Girls Enrolled                                                                      579 non-null    object 
 3   % Primary Schools with single classroom                                               576 non-null    float64
 4   % Primary Schools with single teacher                                                 576 non-null    float64
 5   All Four Facilities                                                                  

In [5]:
ed.ndim

2

In [6]:
ed.shape

(580, 51)

## Null Values in Dataset

In [7]:
ed.isnull()

Unnamed: 0,% Boys Enrolled,% Complete Primary Schools,% Girls Enrolled,% Primary Schools with single classroom,% Primary Schools with single teacher,All Four Facilities,Any One Facility,Any Three Facilities,Any Two Facilities,Area (km²),...,Population,Primary Schools with single classroom,Primary Schools with single teacher,Province,Retention score,School infrastructure score,Terrorist Attacks Affectees,Toilet,Total number of schools,Year
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,True,...,True,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
575,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
576,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
577,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
578,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


## Sum of all Null Values

In [8]:
ed.isnull().sum(axis = 0)

% Boys Enrolled                                                                          1
% Complete Primary Schools                                                               4
% Girls Enrolled                                                                         1
% Primary Schools with single classroom                                                  4
% Primary Schools with single teacher                                                    4
All Four Facilities                                                                      4
Any One Facility                                                                         4
Any Three Facilities                                                                     4
Any Two Facilities                                                                       4
Area (km²)                                                                              68
Bomb Blasts Occurred                                                                     0

## Null Values in Area km²

In [9]:
ed["Area (km²)"].isnull()

0      False
1      False
2       True
3      False
4      False
       ...  
575    False
576    False
577    False
578    False
579    False
Name: Area (km²), Length: 580, dtype: bool

## Now Find Index oF Area km²

In [10]:
ed.columns.get_loc("Area (km²)")

9

In [11]:
for index, row in ed.iterrows():
    if len(row) >= 10:
        print(row[9])
    else:
        print("Index out of range")
            

768.0
1516.0
nan
600.0
2162.0
2310.0
2496.0
nan
855.0
569.0
12510.0
3514.0
44748.0
10160.0
12637.0
4096.0
2445.0
3615.0
7499.0
6622.0
22539.0
8958.0
35380.0
3293.0
6831.0
7610.0
15153.0
9830.0
5896.0
5728.0
3387.0
5797.0
16891.0
7819.0
2653.0
0.0
7796.0
29510.0
20297.0
1489.0
nan
1227.0
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
6400.0
9635.0
3800.0
15700.0
15000.0
906.0
1967.0
nan
1301.0
1865.0
996.0
14850.0
7326.0
1597.0
1725.0
3372.0
2545.0
7492.0
3164.0
1582.0
952.0
4579.0
1632.0
1748.0
1257.0
1586.0
1543.0
5337.0
1679.0
497.0
3699.0
6858.0
8878.0
24830.0
8153.0
6524.0
0.0
11922.0
5856.0
3622.0
3192.0
2367.0
8809.0
3587.0
4796.0
4349.0
6511.0
1772.0
6291.0
2778.0
2673.0
5840.0
3720.0
8249.0
2960.0
2337.0
3004.0
2724.0
11880.0
12319.0
5286.0
3201.0
5854.0
15960.0
3016.0
3252.0
4364.0
6726.0
19070.0
6083.0
5519.0
5278.0
0.0
0.0
3527.0
2592.0
15910.0
7423.0
1417.0
2925.0
2945.0
10720.0
4502.0
0.0
2512.0
5165.0
2310.0
19638.0
19638.0
17355.0
768.0
1516.0
nan
600.0
2162.0
2310.

## Now Convert Null Values of Area (km²) into Area Unknown

In [12]:
for index, row in ed.iterrows():
    Area = row['Area (km²)']
    if pd.isna(Area) or Area == "":
        ed.at[index, 'Area (km²)'] = "Area Unknown"
ed        

Unnamed: 0,% Boys Enrolled,% Complete Primary Schools,% Girls Enrolled,% Primary Schools with single classroom,% Primary Schools with single teacher,All Four Facilities,Any One Facility,Any Three Facilities,Any Two Facilities,Area (km²),...,Population,Primary Schools with single classroom,Primary Schools with single teacher,Province,Retention score,School infrastructure score,Terrorist Attacks Affectees,Toilet,Total number of schools,Year
0,54.77%,0.936599,45.23%,0.034582,0.028818,17.06%,16.20%,24.73%,24.95%,768.0,...,351415.0,12.0,10.0,AJK,14.800000,24.929178,5379,37.393768,601,2013
1,62.50%,0.594203,37.50%,0.359903,0.045894,15.25%,20.21%,17.38%,15.43%,1516.0,...,301633.0,149.0,19.0,AJK,75.200000,28.883827,5379,20.728929,595,2013
2,86.63%,0.939068,13.37%,0.021505,0.039427,5.35%,4.55%,16.31%,20.05%,Area Unknown,...,,6.0,11.0,AJK,0.000000,22.463768,5379,43.840580,379,2013
3,60.77%,0.827225,39.23%,0.041885,0.130890,1.57%,22.44%,7.09%,12.20%,600.0,...,150000.0,8.0,25.0,AJK,0.000000,13.048128,5379,26.203209,262,2013
4,60.75%,0.592348,39.25%,0.346966,0.060686,7.12%,23.06%,9.78%,13.60%,2162.0,...,834094.0,263.0,46.0,AJK,73.200000,20.184453,5379,14.888011,1036,2013
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
575,63.98%,0.214286,36.02%,0.215633,0.570081,31.77%,12.64%,23.71%,20.58%,5165.0,...,908373.0,160.0,423.0,Sindh,59.000000,55.390836,1803,62.533693,831,2016
576,73.89%,0.000000,26.11%,0.440000,0.743158,14.16%,22.81%,25.17%,19.08%,2310.0,...,550000.0,418.0,706.0,Sindh,43.668667,43.010526,1803,50.947368,1017,2016
577,75.73%,0.024200,24.27%,0.417585,0.558215,2.10%,15.12%,10.10%,19.79%,19638.0,...,955812.0,1553.0,2076.0,Sindh,26.000000,26.071525,1803,33.019629,4008,2016
578,71.06%,0.018284,28.94%,0.514768,0.466948,2.71%,22.57%,4.82%,25.48%,19638.0,...,914291.0,732.0,664.0,Sindh,18.000000,19.774965,1803,35.161744,1515,2016


In [13]:
ed['Area (km²)']

0             768.0
1            1516.0
2      Area Unknown
3             600.0
4            2162.0
           ...     
575          5165.0
576          2310.0
577         19638.0
578         19638.0
579         17355.0
Name: Area (km²), Length: 580, dtype: object

## Now Find Index of Population

In [14]:
ed.columns.get_loc("Population")

41

## Now Convert Null Values of Population into Unknown

In [15]:
for index, row in ed.iterrows():
    Area = row['Population']
    if pd.isna(Area) or Area == "":
        ed.at[index, 'Population'] = "Unknown"
ed       

Unnamed: 0,% Boys Enrolled,% Complete Primary Schools,% Girls Enrolled,% Primary Schools with single classroom,% Primary Schools with single teacher,All Four Facilities,Any One Facility,Any Three Facilities,Any Two Facilities,Area (km²),...,Population,Primary Schools with single classroom,Primary Schools with single teacher,Province,Retention score,School infrastructure score,Terrorist Attacks Affectees,Toilet,Total number of schools,Year
0,54.77%,0.936599,45.23%,0.034582,0.028818,17.06%,16.20%,24.73%,24.95%,768.0,...,351415.0,12.0,10.0,AJK,14.800000,24.929178,5379,37.393768,601,2013
1,62.50%,0.594203,37.50%,0.359903,0.045894,15.25%,20.21%,17.38%,15.43%,1516.0,...,301633.0,149.0,19.0,AJK,75.200000,28.883827,5379,20.728929,595,2013
2,86.63%,0.939068,13.37%,0.021505,0.039427,5.35%,4.55%,16.31%,20.05%,Area Unknown,...,Unknown,6.0,11.0,AJK,0.000000,22.463768,5379,43.840580,379,2013
3,60.77%,0.827225,39.23%,0.041885,0.130890,1.57%,22.44%,7.09%,12.20%,600.0,...,150000.0,8.0,25.0,AJK,0.000000,13.048128,5379,26.203209,262,2013
4,60.75%,0.592348,39.25%,0.346966,0.060686,7.12%,23.06%,9.78%,13.60%,2162.0,...,834094.0,263.0,46.0,AJK,73.200000,20.184453,5379,14.888011,1036,2013
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
575,63.98%,0.214286,36.02%,0.215633,0.570081,31.77%,12.64%,23.71%,20.58%,5165.0,...,908373.0,160.0,423.0,Sindh,59.000000,55.390836,1803,62.533693,831,2016
576,73.89%,0.000000,26.11%,0.440000,0.743158,14.16%,22.81%,25.17%,19.08%,2310.0,...,550000.0,418.0,706.0,Sindh,43.668667,43.010526,1803,50.947368,1017,2016
577,75.73%,0.024200,24.27%,0.417585,0.558215,2.10%,15.12%,10.10%,19.79%,19638.0,...,955812.0,1553.0,2076.0,Sindh,26.000000,26.071525,1803,33.019629,4008,2016
578,71.06%,0.018284,28.94%,0.514768,0.466948,2.71%,22.57%,4.82%,25.48%,19638.0,...,914291.0,732.0,664.0,Sindh,18.000000,19.774965,1803,35.161744,1515,2016


In [16]:
ed["Population"] 

0       351415.0
1       301633.0
2        Unknown
3       150000.0
4       834094.0
         ...    
575     908373.0
576     550000.0
577     955812.0
578     914291.0
579    1113194.0
Name: Population, Length: 580, dtype: object

# The average enrollment percentage for boys and girls

In [18]:
ed['% Boys Enrolled'] = pd.to_numeric(ed['% Boys Enrolled'], errors='coerce')
ed['% Girls Enrolled'] = pd.to_numeric(ed['% Girls Enrolled'], errors='coerce')

boys_enrollment_avg = ed['% Boys Enrolled'].mean()
girls_enrollment_avg = ed['% Girls Enrolled'].mean()

print("Average enrollment percentage for boys: ", str(boys_enrollment_avg))
print("Average enrollment percentage for girls: ", str(girls_enrollment_avg))

Average enrollment percentage for boys:  nan
Average enrollment percentage for girls:  nan


# Count the number of primary schools with a single classroom

In [20]:
single_classroom_count = ed['% Primary Schools with single classroom'].sum()
print("Number of primary schools with a single classroom:", single_classroom_count)

Number of primary schools with a single classroom: 77.837307044


# the percentage of primary schools with all four facilities

In [24]:
ed['All Four Facilities'] = pd.to_numeric(ed['All Four Facilities'], errors='coerce')
ed['Area (km²)'] = pd.to_numeric(ed['Area (km²)'], errors='coerce')

all_facilities_percentage = (ed['All Four Facilities'] / ed['Area (km²)']) * 100

print("Percentage of primary schools with all four facilities:")
print(all_facilities_percentage)

Percentage of primary schools with all four facilities:
0     NaN
1     NaN
2     NaN
3     NaN
4     NaN
       ..
575   NaN
576   NaN
577   NaN
578   NaN
579   NaN
Length: 580, dtype: float64


# the year-on-year change in boundary wall condition

In [25]:
ed['Year on Year Change'] = ed['Boundary wall'].diff()
print("Year-on-year change in boundary wall condition:")
print(['Year on Year Change'])

Year-on-year change in boundary wall condition:
0            NaN
1       9.704002
2     -17.313724
3      -4.762458
4       8.122142
         ...    
575    11.267582
576   -17.884807
577   -22.071156
578     8.518710
579     6.097993
Name: Year on Year Change, Length: 580, dtype: float64
