# Loading the dataset (Extra co-curricular activities)

In [None]:
import numpy as np
import pandas as pd
df=pd.read_csv('/kaggle/input/schools-information-directory-singapore/co-curricular-activities-ccas.csv')

In [None]:
df.head()

As we can see above dataset, it contains column - 'cca_customized_name' which is of no use as they are the customized names and can be removed. Another reason for dropping column is that it contains lot of 'NA' values which is of no use as it is not taking part in analysis.

In [None]:
df.drop('cca_customized_name',axis=1,inplace=True)

In [None]:
df.head()

In [None]:
df.shape

# Identifying unique values present in all the columns

In [None]:
df['school_section'].value_counts()

The above observation gives the total number of Secondary, Primary and Junior College in Singapore

In [None]:
df['cca_grouping_desc'].value_counts()

The above observation shows that out of all co-curricular activities 'Physical Sports' contains the largest part followed by ' Visual and Performing Arts'

# To check what percentage of school sections are there out of total schools

In [None]:
df.groupby('school_section').count()/df.shape[0]*100

The above observation shows that among all sections SECONDARY SCHOOLS are the highest (¬47%) in numbers while JUNIOR COLLEGES are lowest (¬8%). In the same way we can calculate the total percentage of each co-curricular activities. Check below :

In [None]:
df.groupby('cca_grouping_desc').count()['school_name']/df.groupby('cca_grouping_desc').count()['school_name'].sum()*100

Now in order to check which physical activity category contains the most part of this co-curricular activity, we perform the following :

In [None]:
df[df['cca_grouping_desc']=='PHYSICAL SPORTS']['cca_generic_name'].value_counts()

The above observation proves that among all activities, basketball is played the most, followed by badminton and then football.
The least played games are Karate and Sea Sports to name a few.
In the same fashion we will find out the categories present in the 'OTHERS' co-curricular activity :

In [None]:
df[df['cca_grouping_desc']=='OTHERS']['cca_generic_name'].value_counts()

This shows that activity related to leadership takes place in relatively lesser amount as compared to other other co-curricular activities, that is 0.7% as seen in the above observation number 10.

In the similar fashion we will take out the detailed view of 'VISUAL AND PERFORMING ARTS' co-curricular activity :

In [None]:
df[df['cca_grouping_desc']=='VISUAL AND PERFORMING ARTS']['cca_generic_name'].value_counts()

This proves that schools in Singapore focus more towards training their students in 'CHOIR' and 'ART AND CRAFTS' and least in 'INDIAN ORCHESTRA', 'NEW MEDIA ARTS ' and 'ENSEMBLE -MIXED INSTRUMENT' to name a few.

Hence also checking out for 'UNIFORMED GROUPS' co-curricular activity :

In [None]:
df[df['cca_grouping_desc']=='UNIFORMED GROUPS']['cca_generic_name'].value_counts()

This observation proves that Singapore schools train their students most for SCOUTS and  NATIONAL POLICE CADET CORPS followed by NATIONAL CADET CORPS(LAND). This clearly shows that majority students are allowed to take part in co-curricular activities that train them for national security (on land).

# Digging deeper for 'JUNIOR COLLEGE' school section

In [None]:
data = df[df['school_section']=='JUNIOR COLLEGE']
data

As the colleges are considered to be as the catalyst between students professional life and school life, therefore going deeper into the college section to see how well Singapore train there students in terms of college activities :

In [None]:
data['cca_grouping_desc'].value_counts()/data.shape[0]*100

This shows that whether its a school or college, students gets trained in 'PHYSICAL SPORTS' to larger extent as compared to other categories. Whereas, the focus on co-curricular activity such as 'OTHERS' which consist of leadership trainings of students is just 1.35% of all the categories present in school of 'JUNIOR COLLEGE'. 
The reason can be, they might have seperate classes related to it.
'PHYSICAL SPORTS' is followed by 'VISUAL AND PERFORMING ARTS' and 'CLUBS AND SOCIETIES'.

# Analysis of MoE Programme Data

In [None]:
data_moe=pd.read_csv('/kaggle/input/schools-information-directory-singapore/moe-programmes.csv')

In [None]:
data_moe.head()

Checking unique values and null values :

In [None]:
data_moe.isnull().sum()

In [None]:
data_moe['moe_programme_desc'].unique()

As we can see that these schools contains language electives subjects of different countries such as  -  India, China, Malaysia. Therefore checking out how many schools contains these language electives subject :

In [None]:
data_moe[data_moe['moe_programme_desc']=='LANGUAGE ELECTIVE PROGRAMME (CHINESE)']

In [None]:
data_moe[data_moe['moe_programme_desc']=='LANGUAGE ELECTIVE PROGRAMME (TAMIL)']

In [None]:
data_moe[data_moe['moe_programme_desc']=='LANGUAGE ELECTIVE PROGRAMME (MALAY)']

Hence, from above observations we can clearly see that maximum schools have inculded CHINESE as there Language Elective Subject and then followed by MALAYSIAN Language Elective Subject, whereas the TAMIL (Indian Language) is the language which is preferred by minimum number of schools in Singapore.

# Analysis of General Information of Schools Data

In [None]:
info=pd.read_csv('/kaggle/input/schools-information-directory-singapore/general-information-of-schools.csv')

In [None]:
info.head()    

Removing the below columms as they contain irrelevant values/information and they will not be a part of our analysis :

In [None]:
info.drop(['url_address','address','telephone_no','telephone_no_2','fax_no','fax_no_2','email_address','bus_desc','principal_name','first_vp_name','second_vp_name','third_vp_name','fourth_vp_name','fifth_vp_name','sap_ind','autonomous_ind','gifted_ind','ip_ind','sixth_vp_name'],axis=1,inplace=True)

In [None]:
info.head()

In [None]:
info.shape

In [None]:
info.isnull().sum()

For finding out the number of schools present in each region of the country :


In [None]:
info.groupby('zone_code').count().sort_values(by='zone_code')['school_name']

North Region can be considered for good development of children as it has the highest number of schools followed by 2 regions - South and West Regions.

To count the number of COED-Schools, Girls Schools and Boys Schools :

In [None]:
info['nature_code'].value_counts()

Further seggregating the above classification (for CO-ED SCHOOL) according to school sections :

In [None]:
info[info['nature_code']=='CO-ED SCHOOL']['mainlevel_code'].value_counts()/info['mainlevel_code'].value_counts()*100

Above analysis shows that 88% of primary schools are CO-ED SCHOOLS while 100% of JUNIOR COLLEGE and CENTRALISED INSTITUTE have CO-ED Education system.

To check the types of schools that are presnt in Singapore :

In [None]:
info['type_code'].value_counts()

The above calculation shows that above 90% of the schools are Government School.

# Thank You guys for referring to my detailed analysis.