# Communicate Data Findings Project

## Part 1 - PISA Data

The data set is comprised of selected demographic, student and parent questionnaire responses, and quantitative scores produced from a variety of questionnaire booklets for China, Singapore, and United States of America.  The student questionnaire responses cover topics including school, learning resources, attitudes towards math and school, and student opinions of their teachers.  

<strong>Store data attributes into one CSV file called pisa2012_data.csv</strong>

- Download pisa 2012 data file
- Load pisa 2012 data into sqlite database table
- Load subset of pisa 2012 data into csv file
- Map pisa columns into readable columns names for selected attributes

In [1]:
import pandas as pd
import numpy as np
from pisa_data_load import load_pisa_database

### Download raw PISA CSV file
The PISA raw dataset can be downloaded from https://www.oecd.org/pisa/pisaproducts/database-cbapisa2012.htm

### Load raw PISA CSV file into sqlite database
There are about 500,000 records in the pisa20012.csv file. The code below was used to load the raw pisa CSV file into a new sqlite database table called pisa.  The raw pisa csv data file is in the data/pisa2012.zip file

In [8]:
# load the pisa 2012 data file into sqlite3 database table
file = 'pisa2012.csv'
load_pisa_database(file)

### Create new PISA CSV file with new column names
- Execute the sql query below in a sqlite terminal 
- Save the results to a new CSV file called pisa2012_data.csv

### Read new pisa2012_data file into data frame

In [4]:
df_pisa = pd.read_csv('./data/pisa2012_data.csv')
df_pisa.head()

Unnamed: 0,country,sub_nation_code,stratum,OECD,natnl_center,school_id,mother_curr_job_status,father_curr_job_status,parents_believe_math_study,parents_believe_math_career,...,teacher_listen_student,teacher_treat_student_fair,math_self_concept,math_work_ethic,math_work_ethic_anchored,math_teacher_support_anchored,math_self_concept_anchored,math_literacy_score,senate_weight,math_literacy_score_wt
0,China-Shanghai,1560000,QCN0003,Non-OECD,China (Shanghai),1,"Other (e.g. home duties, retired)","Not working, but looking for a job",Agree,Strongly agree,...,Agree,Strongly agree,0.65,2.0389,2.4243,1.9408,1.0014,661.4815,0.1897,125.483041
1,China-Shanghai,1560000,QCN0003,Non-OECD,China (Shanghai),1,"Other (e.g. home duties, retired)",Working full-time <for pay>,Agree,Agree,...,Strongly agree,Strongly agree,0.88,2.7167,1.1311,0.5831,0.7562,676.4371,0.1897,128.320118
2,China-Shanghai,1560000,QCN0003,Non-OECD,China (Shanghai),1,Working full-time <for pay>,Working full-time <for pay>,,,...,Strongly agree,Strongly agree,-0.52,,,1.5224,0.2649,639.0481,0.1897,121.227425
3,China-Shanghai,1560000,QCN0003,Non-OECD,China (Shanghai),1,Working full-time <for pay>,Working full-time <for pay>,,,...,Strongly agree,Strongly agree,-0.76,,,0.4052,0.2052,740.9332,0.1897,140.555028
4,China-Shanghai,1560000,QCN0003,Non-OECD,China (Shanghai),1,"Other (e.g. home duties, retired)","Other (e.g. home duties, retired)",Strongly agree,Agree,...,,,,2.0389,,,,735.169,0.1897,139.461559


### Prepare CSV file for Data Visualization phase
This next query was used to select a subset of categorical and quantitative variables to be used in this project.  The output of this query will be used to produce a new CSV file for the data visualization phase.

In [5]:
df_pisa_new = pd.DataFrame(df_pisa, columns=['country','gender','mother_curr_job_status',
'father_curr_job_status','parents_believe_math_career','parents_believe_math_study','parents_like_math','highest_parent_education_yrs',
'behaviour_max_math_sci_classes','behaviour_math_sci_college_major','behaviour_pursue_mathsci_career',
'math_interested', 'belonging_feel_lonely_sch', 'belonging_feel_happy_sch', 'belonging_feel_outsider', 'belonging_satisfied_atsch',
'familiar_arith_mean','familiar_complex_numb','familiar_congruent_fig','familiar_cosine','familiar_divisor','familiar_exp_function',
'familiar_linear_eq','familiar_polygon','familiar_probability','familiar_quadratic',
'familiar_radicals','familiar_rational_numb','familiar_vectors','mteacher_shows_interest',  
'mteacher_extra_help','mteacher_helps','mteacher_continues','mteacher_express_opinion',
'outof_school_study_time','outof_school_study_guided_hw','outof_school_study_tutor',
'outof_school_study_parent', 'no_of_math_classes_wk', 'no_of_all_classes_wk', 'class_size',
'math_self_concept', 'math_work_ethic', 'math_work_ethic_anchored', 'math_teacher_support_anchored', 'math_self_concept_anchored',    
'math_literacy_score','senate_weight'])

df_pisa_new.head()

Unnamed: 0,country,gender,mother_curr_job_status,father_curr_job_status,parents_believe_math_career,parents_believe_math_study,parents_like_math,highest_parent_education_yrs,behaviour_max_math_sci_classes,behaviour_math_sci_college_major,...,no_of_math_classes_wk,no_of_all_classes_wk,class_size,math_self_concept,math_work_ethic,math_work_ethic_anchored,math_teacher_support_anchored,math_self_concept_anchored,math_literacy_score,senate_weight
0,China-Shanghai,Male,"Other (e.g. home duties, retired)","Not working, but looking for a job",Strongly agree,Agree,Agree,12.0,Maximum classes Math,Major in college Math,...,,,,0.65,2.0389,2.4243,1.9408,1.0014,661.4815,0.1897
1,China-Shanghai,Male,"Other (e.g. home duties, retired)",Working full-time <for pay>,Agree,Agree,Agree,16.0,Maximum classes Math,Major in college Math,...,,,,0.88,2.7167,1.1311,0.5831,0.7562,676.4371,0.1897
2,China-Shanghai,Female,Working full-time <for pay>,Working full-time <for pay>,,,,12.0,,,...,5.0,43.0,37.0,-0.52,,,1.5224,0.2649,639.0481,0.1897
3,China-Shanghai,Female,Working full-time <for pay>,Working full-time <for pay>,,,,15.0,,,...,5.0,,42.0,-0.76,,,0.4052,0.2052,740.9332,0.1897
4,China-Shanghai,Male,"Other (e.g. home duties, retired)","Other (e.g. home duties, retired)",Agree,Strongly agree,Agree,9.0,Maximum classes Math,Major in college Math,...,6.0,51.0,43.0,,2.0389,,,,735.169,0.1897


In [6]:
df_pisa_new.info(max_cols=125)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15701 entries, 0 to 15700
Data columns (total 48 columns):
country                             15701 non-null object
gender                              15701 non-null object
mother_curr_job_status              15476 non-null object
father_curr_job_status              15102 non-null object
parents_believe_math_career         10363 non-null object
parents_believe_math_study          10374 non-null object
parents_like_math                   10355 non-null object
highest_parent_education_yrs        15536 non-null float64
behaviour_max_math_sci_classes      10190 non-null object
behaviour_math_sci_college_major    10179 non-null object
behaviour_pursue_mathsci_career     10143 non-null object
math_interested                     10361 non-null object
belonging_feel_lonely_sch           10307 non-null object
belonging_feel_happy_sch            10304 non-null object
belonging_feel_outsider             10321 non-null object
belonging_satisfied_

In [7]:
df_pisa_new.to_csv('./data/pisa2012_data_clean.csv', index=False)