# SHAP Report - PISA 2022 Amazon SageMaker XGBoost
_**This Notebook Runs a SHAP report to explain variables on our model**_

---

---

## Contents

1. [Background](#Background)
1. [Prepration](#Preparation)
1. [Data](#Data)
    1. [Exploration](#Exploration)
    1. [Transformation](#Transformation)
1. [Training](#Training)
1. [Hosting](#Hosting)
1. [Evaluation](#Evaluation)
1. [Exentsions](#Extensions)

---

## Background

This notebook runs an Amazon SageMaker pipeline to predict if students will fall behind in Math using hte PISA 2022 dataset

* Preparing your Amazon SageMaker notebook
* Downloading data from the internet into Amazon SageMaker
* Investigating and transforming the data so that it can be fed to Amazon SageMaker algorithms
* Estimating a model using the Gradient Boosting algorithm
* Evaluating the effectiveness of the model
* Setting the model up to make on-going predictions

---

## Preparation

_This notebook was created and tested on an ml.m4.xlarge notebook instance._

Let's start by specifying:

- The S3 bucket and prefix that you want to use for training and model data.  This should be within the same region as the Notebook Instance, training, and hosting.
- The IAM role arn used to give training and hosting access to your data. See the documentation for how to create these.  Note, if more than one role is required for notebook instances, training, and/or hosting, please replace the boto regexp with a the appropriate full IAM role arn string(s).

In [2]:
# cell 02
import sagemaker
bucket=sagemaker.Session().default_bucket()
prefix = 'sagemaker/DEMO-xgboost-dm-1500-samples'
 
# Define IAM role
import boto3
import re
from sagemaker import get_execution_role

role = get_execution_role()

Now let's bring in the Python libraries that we'll use throughout the analysis

In [3]:
# cell 03
import numpy as np                                # For matrix operations and numerical processing
import pandas as pd                               # For munging tabular data
import matplotlib.pyplot as plt                   # For charts and visualizations
from IPython.display import Image                 # For displaying images in the notebook
from IPython.display import display               # For displaying outputs in the notebook
from time import gmtime, strftime                 # For labeling SageMaker models, endpoints, etc.
import sys                                        # For writing outputs to notebook
import math                                       # For ceiling function
import json                                       # For parsing hosting outputs
import os                                         # For manipulating filepath names
import sagemaker 
import zipfile     # Amazon SageMaker's Python SDK provides many helper functions
import mlflow

In [4]:
# cell 04
pd.__version__

'2.2.3'

Make sure pandas version is set to 1.2.4 or later. If it is not the case, restart the kernel before going further

---

## Download PISA 2022 Prepared Dataset

This is our dataset output from our cleaning notebook [here](https://7z4vtvpqcoxouiu.studio.us-west-2.sagemaker.aws/jupyterlab/default/lab/tree/RTC%3Amids-capstone/notebooks/eda/Data_merging.ipynb)


In [5]:
%%time 

# cell 06

# Define local file path
local_file_path = "PISA_cleaned_dataset.csv"  # Change as needed

# Define S3 details
bucket_name = "sagemaker-us-west-2-986030204467"
file_key = "capstone/testfiles/PISA_cleaned_dataset.csv"

# Check if the file exists locally
if os.path.exists(local_file_path):
    print("📂 Loading data from local file...")
    data = pd.read_csv(local_file_path, usecols=None)
else:
    print("☁️ Downloading data from S3...")
    
    # Create S3 client
    s3_client = boto3.client("s3")

    # Download the file from S3
    response = s3_client.get_object(Bucket=bucket_name, Key=file_key)

    # Read the file into pandas DataFrame
    data = pd.read_csv(response["Body"], usecols=None)

    # Save a local copy for future use
    data.to_csv(local_file_path, index=False)
    print(f"✅ File saved locally as {local_file_path}")

# Display first few rows
#data.head()

pd.set_option('display.max_columns', 500)     # Make sure we can see all of the columns
pd.set_option('display.max_rows', 20)         # Keep the output on one page
data

📂 Loading data from local file...
CPU times: user 23 s, sys: 4.13 s, total: 27.1 s
Wall time: 29.8 s


Unnamed: 0,CNT,CNTSCHID,CNTSTUID,MATH_Proficient,SISCO,ST347Q01JA,ST347Q02JA,ST349Q01JA_0,ST349Q01JA_1,ST349Q01JA_2,ST349Q01JA_3,ST349Q01JA_4,ST350Q01JA,ST356Q01JA,ST322Q01JA,ST322Q02JA,ST322Q03JA,ST322Q04JA,ST322Q06JA,ST322Q07JA,DURECEC,EFFORT1,EFFORT2,ST259Q01JA,WB164Q01HA,HOMEPOS,ST004D01T,GRADE,REPEAT,EXPECEDU,ICTAVSCH,ICTAVHOM,ICTDISTR,IMMIG,TARDYSD,ST226Q01JA,ST016Q01NA,MISSSC,Option_UH,OECD,PAREDINT,BMMJ1,BFMJ2,WB163Q06HA,WB163Q07HA,ST230Q01JA,SKIPPING,IC180Q01JA,IC180Q08JA,ST059Q02JA,ST296Q04JA,WB176Q01HA,STUDYHMW,IC184Q01JA,IC184Q02JA,IC184Q03JA,IC184Q04JA,ST059Q01TA,ST296Q01JA,ST272Q01JA,ST268Q01JA,ST268Q04JA,ST268Q07JA,ST293Q04JA,ST297Q01JA,ST297Q03JA,ST297Q05JA,ST297Q06JA,ST297Q07JA,ST297Q09JA,WB165Q01HA,WB166Q01HA,WB166Q02HA,WB166Q03HA,WB166Q04HA,ST258Q01JA,ST294Q01JA,ST295Q01JA,WB150Q01HA,WB156Q01HA,WB158Q01HA,WB160Q01HA,WB161Q01HA,WB171Q01HA,WB171Q02HA,WB171Q03HA,WB171Q04HA,WB172Q01HA,WB173Q01HA,WB173Q02HA,WB173Q03HA,WB173Q04HA,WB177Q01HA,WB177Q02HA,WB177Q03HA,WB177Q04HA,WB032Q01NA,WB032Q02NA,WB031Q01NA,EXERPRAC,STUBMI,RELATST,BELONG,BULLIED,FEELSAFE,SCHRISK,PERSEVAGR,CURIOAGR,COOPAGR,EMPATAGR,ASSERAGR,STRESAGR,EMOCOAGR,GROSAGR,INFOSEEK,FAMSUP,DISCLIM,TEACHSUP,COGACRCO,COGACMCO,EXPOFA,EXPO21ST,MATHEFF,MATHEF21,FAMCON,ANXMAT,MATHPERS,CREATEFF,CREATSCH,CREATFAM,CREATAS,CREATOOS,CREATOP,OPENART,IMAGINE,SCHSUST,LEARRES,PROBSELF,FAMSUPSL,FEELLAH,SDLEFF,ICTRES,ESCS,FLSCHOOL,FLMULTSB,FLFAMILY,ACCESSFP,FLCONFIN,FLCONICT,ACCESSFA,ATTCONFM,FRINFLFM,ICTSCH,ICTHOME,ICTQUAL,ICTSUBJ,ICTENQ,ICTFEED,ICTOUT,ICTWKDY,ICTWKEND,ICTREG,ICTINFO,ICTEFFIC,BODYIMA,SOCONPA,LIFESAT,PSYCHSYM,SOCCON,EXPWB,CURSUPP,PQMIMP,PQMCAR,PARINVOL,PQSCHOOL,PASCHPOL,ATTIMMP,CREATHME,CREATACT,CREATOPN,CREATOR,WORKPAY,WORKHOME,SC001Q01TA,SC211Q01JA,SC211Q02JA,SC211Q03JA,SC211Q04JA,SC211Q05JA,SC211Q06JA,SC209Q04JA,SC209Q05JA,SC209Q06JA,SC037Q11JA,SC183Q02JA,SC183Q03JA,SC183Q04JA,SC175Q01JA,SC177Q01JA_1,SC177Q01JA_2,SC177Q01JA_3,SC177Q02JA_1,SC177Q02JA_2,SC177Q02JA_3,SC177Q03JA_1,SC177Q03JA_2,SC177Q03JA_3,SC188Q01JA,SC188Q02JA,SC188Q03JA,SC188Q04JA,SC188Q05JA,SC188Q06JA,SC188Q07JA,SC188Q08JA,SC188Q09JA,SC188Q10JA,SC188Q11JA,SC198Q01JA,SC198Q02JA,SC198Q03JA,SC178Q01JA,SC178Q02JA,SC180Q01JA,SC189Q02WA,SC189Q03WA,SC189Q04WA,SMRATIO,MCLSIZE,MACTIV,MATHEXC_0,MATHEXC_1,MATHEXC_2,MATHEXC_3,ABGMATH,SC064Q05WA,SC064Q06WA,SC064Q01TA,SC064Q02TA,SC064Q04NA,SC064Q03TA,SC064Q07WA,SC213Q01JA,SC213Q02JA,SC037Q01TA,SC037Q02TA,SC037Q03TA,SC037Q04TA,SC037Q05NA,SC037Q06NA,...,DIGDVPOL,TEAFDBK,MTTRAIN,DMCVIEWS,NEGSCLIM,STAFFSHORT,EDUSHORT,STUBEHA,TEACHBEHA,STDTEST,TDTEST,ALLACTIV,BCREATSC,CREENVSC,ACTCRESC,OPENCUL,PROBSCRI,SCPREPBP,SCPREPAP,DIGPREP,LANGN_105,LANGN_108,LANGN_112,LANGN_113,LANGN_118,LANGN_121,LANGN_130,LANGN_133,LANGN_137,LANGN_140,LANGN_147,LANGN_148,LANGN_150,LANGN_154,LANGN_156,LANGN_160,LANGN_170,LANGN_195,LANGN_200,LANGN_202,LANGN_204,LANGN_232,LANGN_237,LANGN_244,LANGN_246,LANGN_254,LANGN_258,LANGN_263,LANGN_264,LANGN_266,LANGN_272,LANGN_273,LANGN_275,LANGN_286,LANGN_301,LANGN_313,LANGN_316,LANGN_317,LANGN_322,LANGN_325,LANGN_327,LANGN_329,LANGN_338,LANGN_340,LANGN_344,LANGN_351,LANGN_358,LANGN_363,LANGN_369,LANGN_371,LANGN_375,LANGN_379,LANGN_381,LANGN_382,LANGN_383,LANGN_404,LANGN_409,LANGN_415,LANGN_420,LANGN_422,LANGN_428,LANGN_434,LANGN_442,LANGN_449,LANGN_451,LANGN_463,LANGN_465,LANGN_467,LANGN_471,LANGN_472,LANGN_474,LANGN_492,LANGN_493,LANGN_494,LANGN_495,LANGN_496,LANGN_500,LANGN_503,LANGN_514,LANGN_517,LANGN_520,LANGN_523,LANGN_527,LANGN_529,LANGN_531,LANGN_540,LANGN_547,LANGN_555,LANGN_561,LANGN_562,LANGN_563,LANGN_565,LANGN_566,LANGN_567,LANGN_600,LANGN_601,LANGN_602,LANGN_605,LANGN_606,LANGN_607,LANGN_608,LANGN_611,LANGN_614,LANGN_615,LANGN_616,LANGN_618,LANGN_619,LANGN_621,LANGN_622,LANGN_623,LANGN_624,LANGN_625,LANGN_626,LANGN_627,LANGN_628,LANGN_630,LANGN_631,LANGN_634,LANGN_635,LANGN_639,LANGN_640,LANGN_641,LANGN_642,LANGN_648,LANGN_650,LANGN_661,LANGN_662,LANGN_663,LANGN_665,LANGN_666,LANGN_667,LANGN_668,LANGN_669,LANGN_670,LANGN_673,LANGN_674,LANGN_675,LANGN_676,LANGN_677,LANGN_678,LANGN_800,LANGN_801,LANGN_802,LANGN_804,LANGN_805,LANGN_806,LANGN_807,LANGN_808,LANGN_809,LANGN_810,LANGN_811,LANGN_812,LANGN_813,LANGN_814,LANGN_815,LANGN_816,LANGN_817,LANGN_818,LANGN_819,LANGN_821,LANGN_823,LANGN_824,LANGN_825,LANGN_826,LANGN_827,LANGN_828,LANGN_829,LANGN_831,LANGN_832,LANGN_833,LANGN_836,LANGN_837,LANGN_838,LANGN_839,LANGN_840,LANGN_841,LANGN_842,LANGN_843,LANGN_844,LANGN_845,LANGN_846,LANGN_849,LANGN_850,LANGN_851,LANGN_852,LANGN_854,LANGN_855,LANGN_857,LANGN_859,LANGN_860,LANGN_861,LANGN_865,LANGN_866,LANGN_868,LANGN_870,LANGN_872,LANGN_873,LANGN_877,LANGN_879,LANGN_881,LANGN_885,LANGN_890,LANGN_892,LANGN_895,LANGN_896,LANGN_897,LANGN_898,LANGN_899,LANGN_900,LANGN_901,LANGN_902,LANGN_903,LANGN_904,LANGN_905,LANGN_906,LANGN_907,LANGN_908,LANGN_909,LANGN_910,LANGN_911,LANGN_912,LANGN_913,LANGN_914,LANGN_916,LANGN_917,LANGN_918,LANGN_919,LANGN_920,LANGN_921,LANGN_922
0,Albania,800282,800001,0,,,,0,0,0,0,0,,,5.0,5.0,3.0,,1.0,1.0,,10.0,10.0,10.0,,1.5995,1.0,0.0,0.0,9.0,0.0,,,1.0,,4.0,10.0,0.0,0,0,14.5,73.91,16.50,,,4.0,1.0,2.0,3.0,7.0,6.0,,10.0,5.0,,,,4.0,3.0,10.0,2.0,1.0,4.0,3.0,0.0,0.0,0.0,0.0,0.0,1.0,,,,,,1.0,6.0,6.0,,,,,,,,,,,,,,,,,,,,,,0.0,,0.9905,-0.2327,-1.2280,1.1246,-0.6386,,3.3518,,,,,,-0.5185,,1.8355,0.6387,1.5558,0.8246,2.4962,-0.2284,2.4031,-1.4413,,,0.5440,-0.0085,2.4021,0.0590,0.8155,4.1226,,,0.7507,2.0225,,,,,,,4.9507,1.1112,,,,,,,,,,,,-1.1989,-2.0261,-1.7886,,,,,0.8373,0.6984,,,,,,,,,,,,,,,,,,,0.0,10.0,3.0,100.0,3.0,23.0,,24.0,,1.0,1.0,1.0,2.0,1.0,1.0,1.0,45.0,0,0,1,0,0,1,0,0,1,4.0,4.0,4.0,4.0,3.0,3.0,3.0,4.0,2.0,4.0,2.0,2.0,2.0,1.0,74.0,26.0,1.0,1.0,1.0,1.0,100.0,28.0,5.0,0,0,0,1,3.0,30.0,30.0,61.0,62.0,11.0,50.0,10.0,90.0,3.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.5220,0.9868,1.0982,2.1585,-0.4315,-0.0097,-0.2805,-0.9198,0.5521,2.0709,2.0131,1.1162,-0.3682,1.3541,0.3430,0.4217,1.1110,-0.8314,0.8462,0.5908,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Albania,800115,800002,0,,,,0,0,0,0,0,,,,,,,,,,9.0,8.0,7.0,,-3.8115,2.0,-1.0,0.0,,7.0,6.0,10.0,1.0,0.0,1.0,7.0,0.0,0,0,9.0,24.16,,,,3.0,1.0,4.0,2.0,,,,,,5.0,5.0,5.0,,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,0.3226,0.5031,1.3336,1.1246,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,-3.4930,-3.0507,,,,,,,,,,0.4062,0.3346,-0.1403,-2.0261,0.6198,-0.3848,0.2149,,,0.3729,1.3060,-0.4933,,,,,,,,,,,,,,,,,,,,4.0,,1.0,25.0,,15.0,,1.0,1.0,1.0,1.0,2.0,1.0,2.0,45.0,0,0,0,0,0,0,0,0,0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,3.0,2.0,2.0,1.0,2.0,1.0,1.0,90.0,10.0,2.0,1.0,1.0,,100.0,28.0,0.0,0,0,0,0,3.0,75.0,85.0,50.0,75.0,80.0,75.0,,80.0,3.0,1.0,1.0,1.0,1.0,1.0,1.0,...,-0.4729,-0.4120,0.6955,0.3610,0.3386,-1.4551,2.9595,-0.1936,-2.0409,0.0400,-0.6686,-0.5714,0.1019,1.0791,-0.5544,-0.5450,0.1705,-0.8314,-1.1166,0.0988,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Albania,800242,800003,0,,,,0,0,0,0,0,,,,,,,,,4.0,10.0,10.0,8.0,,0.2314,2.0,-1.0,0.0,,0.0,,4.0,1.0,0.0,1.0,10.0,0.0,0,0,12.0,,,,,4.0,0.0,,,,2.0,,0.0,,,,5.0,,2.0,10.0,,,,,0.0,0.0,0.0,0.0,1.0,0.0,,,,,,1.0,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.8637,-0.6386,,,,,,,,,,,-0.8615,,,,,,,,,,,,,,,,,,,,,,,,,0.4307,-0.1867,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,0.0,1.0,,,,1.0,2.0,1.0,1.0,1.0,1.0,2.0,2.0,2.0,1.0,45.0,0,0,1,0,0,1,0,0,0,1.0,,4.0,4.0,2.0,2.0,4.0,3.0,4.0,2.0,1.0,2.0,2.0,1.0,100.0,0.0,1.0,1.0,1.0,1.0,100.0,18.0,3.0,0,0,0,1,3.0,100.0,100.0,100.0,100.0,10.0,10.0,100.0,60.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.1884,1.2416,1.0982,2.1585,-0.9382,0.1683,0.1753,-2.0719,-0.4985,0.5750,1.5226,0.5086,0.3731,0.9015,0.5400,1.2274,0.6353,1.1784,-0.6374,-0.8981,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Albania,800245,800005,0,1.0,6.0,1.0,0,1,0,0,0,2.0,4.0,3.0,3.0,,3.0,3.0,3.0,0.0,,,5.0,,-2.5956,1.0,-2.0,1.0,4.0,5.0,5.0,12.0,1.0,1.0,1.0,10.0,0.0,0,0,6.0,,14.82,,,3.0,0.0,3.0,4.0,30.0,4.0,,10.0,,,,,4.0,1.0,5.0,4.0,4.0,4.0,4.0,1.0,0.0,1.0,0.0,0.0,0.0,,,,,,1.0,6.0,6.0,,,,,,,,,,,,,,,,,,,,,,10.0,,1.8580,0.5159,0.9885,-0.7560,-0.6386,,-0.7687,,,,,,0.1371,2.2134,-0.7468,0.4426,1.5558,-0.7146,-0.1216,-0.2207,0.3556,-1.3156,2.2322,0.4222,0.5653,-0.2546,-0.4909,-0.3010,-1.0261,1.0191,1.4468,-0.5423,-0.0564,-0.8763,1.5382,0.4308,0.4516,0.0427,-2.1941,-0.9408,-2.1392,-3.2198,,,,,,,,,,-1.7984,-1.5118,-0.3516,-0.1594,0.8946,0.8435,0.4035,,,2.8904,1.2637,,,,,,,,,,,,,,,,,,,0.0,10.0,1.0,,5.0,11.0,,30.0,,1.0,1.0,1.0,1.0,1.0,2.0,2.0,45.0,0,0,1,0,0,1,0,0,0,3.0,3.0,4.0,4.0,2.0,2.0,2.0,3.0,2.0,3.0,2.0,2.0,2.0,1.0,100.0,0.0,1.0,2.0,1.0,1.0,69.5,13.0,4.0,0,0,0,1,3.0,91.0,84.0,93.0,64.0,82.0,97.0,100.0,0.0,3.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.5587,0.6480,-0.0703,-0.1332,-1.6916,-1.4551,0.4399,-0.5010,-1.4190,0.1011,0.1724,0.4559,-0.3682,1.0478,0.5608,0.4217,,,,0.0419,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Albania,800285,800006,1,1.0,4.0,6.0,0,0,0,0,1,3.0,1.0,3.0,,1.0,1.0,1.0,1.0,1.0,10.0,9.0,8.0,,-0.5632,1.0,0.0,0.0,,3.0,2.0,13.0,1.0,0.0,4.0,10.0,0.0,0,0,12.0,17.00,30.11,,,2.0,0.0,3.0,4.0,30.0,3.0,,10.0,3.0,3.0,4.0,5.0,4.0,3.0,10.0,2.0,1.0,4.0,,1.0,0.0,0.0,0.0,1.0,0.0,,,,,,1.0,1.0,6.0,,,,,,,,,,,,,,,,,,,,,,2.0,,1.7382,0.7639,-1.2280,1.1246,-0.6386,,0.5342,,,,,,-0.3061,0.6761,-0.5122,0.4029,0.1475,-0.0073,0.7927,-0.6616,-1.0257,-0.5867,0.9425,1.1266,-0.2704,-0.1735,-0.7475,-0.1405,-0.9293,1.6583,1.8557,0.9322,0.9037,-0.4033,0.2241,1.7224,1.6004,1.5114,,1.0353,-0.5542,-1.0548,,,,,,,,,,-2.8292,-3.3582,1.0161,,0.8886,-0.0643,0.9861,,,2.0196,1.6029,-0.2354,,,,,,,,,,,,,,,,,,0.0,4.0,,37.0,1.0,9.0,,,,1.0,1.0,1.0,1.0,1.0,2.0,2.0,45.0,1,0,0,1,0,0,0,0,1,3.0,3.0,3.0,3.0,2.0,2.0,2.0,2.0,4.0,4.0,2.0,2.0,2.0,1.0,80.0,20.0,2.0,1.0,1.0,1.0,100.0,33.0,2.0,0,0,0,0,1.0,67.0,18.0,12.0,21.0,19.0,3.0,21.0,90.0,7.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.3483,1.0430,0.6888,2.1585,-0.6145,-0.7828,0.1000,-0.6199,-0.0485,0.7086,0.7899,0.9383,0.1019,1.6939,0.8448,1.0318,0.0074,-0.8314,-0.7625,3.0051,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
591852,Uzbekistan,86000120,86007488,0,1.0,2.0,1.0,0,0,1,0,0,1.0,,,,,,,,4.0,10.0,10.0,9.0,,-0.9146,1.0,0.0,0.0,9.0,,,,1.0,0.0,1.0,10.0,0.0,0,0,3.0,17.00,28.95,,,4.0,0.0,,,36.0,6.0,,10.0,,,,,5.0,6.0,10.0,4.0,2.0,4.0,,1.0,1.0,1.0,1.0,1.0,1.0,,,,,,1.0,6.0,6.0,,,,,,,,,,,,,,,,,,,,,,4.0,,,-1.0817,-1.2280,0.6942,-0.6386,,0.3063,,,,,,0.5765,-1.0979,1.5941,1.7598,1.5558,2.3368,2.5872,0.1530,-2.2416,2.2815,2.3441,,0.8819,2.2393,2.1524,0.5032,-0.0326,,,-0.4280,,-0.1324,,,,,,,,-2.7487,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,10.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,120.0,1,0,0,1,0,0,0,0,0,1.0,1.0,4.0,2.0,1.0,1.0,4.0,1.0,1.0,4.0,1.0,2.0,1.0,1.0,100.0,0.0,1.0,2.0,1.0,1.0,1.4,28.0,5.0,0,0,0,1,1.0,0.0,0.0,1.0,0.0,0.0,70.0,30.0,73.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0977,0.4554,0.2023,-0.7457,-1.4918,,-1.4212,,-1.3372,0.6904,0.0175,1.7104,0.4397,0.7711,,1.2405,-0.5687,-0.8314,-1.1382,0.5571,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
591853,Uzbekistan,86000140,86007489,0,,,,0,0,0,0,0,,,1.0,,2.0,2.0,1.0,1.0,0.0,10.0,10.0,3.0,,-2.1015,2.0,0.0,0.0,,,,,,1.0,5.0,3.0,1.0,0,0,16.0,73.91,30.11,,,4.0,,,,,,,7.0,,,,,,,,,,,,,,,,,,,,,,,1.0,1.0,6.0,,,,,,,,,,,,,,,,,,,,,,4.0,,,-0.2482,-1.2280,-0.7560,-0.6386,,0.0167,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,-0.2024,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,10.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,2.0,2.0,2.0,2.0,115.0,0,0,1,0,0,1,0,0,1,3.0,3.0,3.0,3.0,3.0,2.0,3.0,3.0,3.0,3.0,3.0,2.0,1.0,1.0,60.0,40.0,1.0,1.0,1.0,1.0,100.0,53.0,5.0,0,0,0,1,2.0,81.0,85.0,88.0,96.0,68.0,85.0,63.0,100.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,-1.8150,1.0904,-1.6751,-2.6032,-1.4918,,-1.4212,,-1.1342,2.0709,2.0131,3.4880,1.5231,-0.2686,,0.3221,-1.1097,-0.8314,0.8462,-0.1857,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
591854,Uzbekistan,86000024,86007490,0,1.0,1.0,1.0,0,0,0,0,0,,,1.0,1.0,,4.0,1.0,1.0,,,,6.0,,-1.5194,2.0,1.0,0.0,7.0,,,,1.0,0.0,4.0,9.0,0.0,0,0,9.0,17.00,25.71,,,4.0,0.0,,,31.0,6.0,,10.0,,,,,5.0,5.0,10.0,3.0,3.0,3.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,,,,,,4.0,6.0,6.0,,,,,,,,,,,,,,,,,,,,,,10.0,,,-0.3261,-0.5168,0.4417,-0.6386,,-0.0140,,,,,,0.2429,0.2973,-1.0296,0.3521,0.8211,1.0932,0.9323,-0.3998,0.6856,0.3926,0.9997,,-0.2907,0.6311,0.0846,0.5352,-0.5679,0.4911,0.6097,0.4185,-0.3483,-0.1783,,,,,,,,-2.0506,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,5.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,120.0,1,0,0,0,0,1,0,0,1,2.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,3.0,1.0,1.0,1.0,1.0,100.0,0.0,1.0,1.0,1.0,1.0,100.0,28.0,5.0,0,0,0,1,2.0,90.0,50.0,100.0,100.0,70.0,85.0,0.0,93.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,...,0.5796,1.4724,1.0982,-3.1484,-1.4918,,0.2650,,-1.9660,2.0709,2.0131,1.7685,1.5231,2.1631,,2.8331,-1.6218,1.5159,0.8462,0.8376,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
591855,Uzbekistan,86000174,86007491,0,,,,0,0,0,0,0,,,1.0,1.0,1.0,1.0,,1.0,,7.0,6.0,9.0,,-0.3975,1.0,0.0,0.0,,,,,1.0,0.0,4.0,10.0,0.0,0,0,12.0,73.91,75.43,,,3.0,1.0,,,35.0,6.0,,10.0,,,,,3.0,2.0,7.0,2.0,2.0,3.0,,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,1.0,6.0,6.0,,,,,,,,,,,,,,,,,,,,,,9.0,,,0.5337,-1.2280,1.1246,-0.6386,,2.2987,,,,,,1.2952,,,1.7598,1.5558,1.5399,1.3822,0.3331,0.3322,-0.1652,2.4215,,,,,,,,,,,,,,,,,,,-0.1290,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,10.0,1.0,0.0,0.0,6.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,120.0,0,0,0,0,0,0,0,0,0,1.0,4.0,4.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,100.0,0.0,1.0,2.0,1.0,1.0,100.0,28.0,5.0,0,0,0,1,1.0,75.0,21.0,77.0,70.0,85.0,69.0,0.0,79.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,-0.3081,0.7604,1.0982,1.2033,-1.4918,,1.2048,,-0.2361,0.6904,0.6028,1.2086,0.7589,0.8065,,0.7825,0.5093,-0.8314,0.1102,-0.4657,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Let's talk about the data.  At a high level, we can see:

_**Specifics on each of the features:**_

*Target variable:*
* `MATH_Proficient`: Is the student proficient in Math per PISA statistics? (binary: 'yes','no')

### Exploration
Let's start exploring the data in our data prep widget.  First, let's understand how the features are distributed.

In [6]:
print(data['MATH_Proficient'].shape)

(591857,)


In [7]:
print(data.columns.duplicated().any()) 

False


In [8]:
# cell 10
# Convert categorical variables to sets of indicators
# model_data = pd.get_dummies(data, dtype=float)   
model_data = data

Another question to ask yourself before building a model is whether certain features will add value in your final use case.  For example, if your goal is to deliver the best prediction, then will you have access to that data at the moment of prediction?  Knowing it's raining is highly predictive for umbrella sales, but forecasting weather far enough out to plan inventory on umbrellas is probably just as difficult as forecasting umbrella sales without knowledge of the weather.  So, including this in your model may give you a false sense of precision.

Following this logic, let's remove the economic features and `duration` from our data as they would need to be forecasted with high precision to use as inputs in future predictions.

Even if we were to use values of the economic indicators from the previous quarter, this value is likely not as relevant for prospects contacted early in the next quarter as those contacted later on.

When building a model whose primary goal is to predict a target value on new data, it is important to understand overfitting.  Supervised learning models are designed to minimize error between their predictions of the target value and actuals, in the data they are given.  This last part is key, as frequently in their quest for greater accuracy, machine learning models bias themselves toward picking up on minor idiosyncrasies within the data they are shown.  These idiosyncrasies then don't repeat themselves in subsequent data, meaning those predictions can actually be made less accurate, at the expense of more accurate predictions in the training phase.

The most common way of preventing this is to build models with the concept that a model shouldn't only be judged on its fit to the data it was trained on, but also on "new" data.  There are several different ways of operationalizing this, holdout validation, cross-validation, leave-one-out validation, etc.  For our purposes, we'll simply randomly split the data into 3 uneven groups.  The model will be trained on 70% of data, it will then be evaluated on 20% of data to give us an estimate of the accuracy we hope to have on "new" data, and 10% will be held back as a final testing dataset which will be used later on.

In [9]:
# cell 12
train_data, validation_data, test_data = np.split(model_data.sample(frac=1, random_state=1729), [int(0.7 * len(model_data)), int(0.9 * len(model_data))])   # Randomly sort the data then split out first 70%, second 20%, and last 10%

  return bound(*args, **kwds)


Amazon SageMaker's XGBoost container expects data in the libSVM or CSV data format.  For this example, we'll stick to CSV.  Note that the first column must be the target variable and the CSV should not include headers.  Also, notice that although repetitive it's easiest to do this after the train|validation|test split rather than before.  This avoids any misalignment issues due to random reordering.

In [10]:
# cell 13
#pd.concat([train_data['y_yes'], train_data.drop(['y_no', 'y_yes'], axis=1)], axis=1).to_csv('train.csv', index=False, header=False)
#pd.concat([validation_data['y_yes'], validation_data.drop(['y_no', 'y_yes'], axis=1)], axis=1).to_csv('validation.csv', index=False, header=False)# Drop non-numeric columns (e.g., country names or IDs that are not numeric)
non_numeric_columns = train_data.select_dtypes(exclude=['number']).columns
train_data = train_data.drop(columns=non_numeric_columns)
validation_data = validation_data.drop(columns=non_numeric_columns)

# Save train dataset with MATH_Proficient as the first column
train_data[['MATH_Proficient'] + [col for col in train_data.columns if col != 'MATH_Proficient']].to_csv('train.csv', index=False, header=False)

# Save validation dataset with MATH_Proficient as the first column
validation_data[['MATH_Proficient'] + [col for col in validation_data.columns if col != 'MATH_Proficient']].to_csv('validation.csv', index=False, header=False)


In [11]:
train_data = pd.concat([train_data["MATH_Proficient"], train_data.drop(["MATH_Proficient"], axis=1)], axis=1)
validation_data = pd.concat([validation_data["MATH_Proficient"], validation_data.drop(["MATH_Proficient"], axis=1)], axis=1)
test_data = pd.concat([test_data["MATH_Proficient"], test_data.drop(["MATH_Proficient"], axis=1)], axis=1)

In [12]:
train_data.head(5)

Unnamed: 0,MATH_Proficient,CNTSCHID,CNTSTUID,SISCO,ST347Q01JA,ST347Q02JA,ST349Q01JA_0,ST349Q01JA_1,ST349Q01JA_2,ST349Q01JA_3,ST349Q01JA_4,ST350Q01JA,ST356Q01JA,ST322Q01JA,ST322Q02JA,ST322Q03JA,ST322Q04JA,ST322Q06JA,ST322Q07JA,DURECEC,EFFORT1,EFFORT2,ST259Q01JA,WB164Q01HA,HOMEPOS,ST004D01T,GRADE,REPEAT,EXPECEDU,ICTAVSCH,ICTAVHOM,ICTDISTR,IMMIG,TARDYSD,ST226Q01JA,ST016Q01NA,MISSSC,Option_UH,OECD,PAREDINT,BMMJ1,BFMJ2,WB163Q06HA,WB163Q07HA,ST230Q01JA,SKIPPING,IC180Q01JA,IC180Q08JA,ST059Q02JA,ST296Q04JA,WB176Q01HA,STUDYHMW,IC184Q01JA,IC184Q02JA,IC184Q03JA,IC184Q04JA,ST059Q01TA,ST296Q01JA,ST272Q01JA,ST268Q01JA,ST268Q04JA,ST268Q07JA,ST293Q04JA,ST297Q01JA,ST297Q03JA,ST297Q05JA,ST297Q06JA,ST297Q07JA,ST297Q09JA,WB165Q01HA,WB166Q01HA,WB166Q02HA,WB166Q03HA,WB166Q04HA,ST258Q01JA,ST294Q01JA,ST295Q01JA,WB150Q01HA,WB156Q01HA,WB158Q01HA,WB160Q01HA,WB161Q01HA,WB171Q01HA,WB171Q02HA,WB171Q03HA,WB171Q04HA,WB172Q01HA,WB173Q01HA,WB173Q02HA,WB173Q03HA,WB173Q04HA,WB177Q01HA,WB177Q02HA,WB177Q03HA,WB177Q04HA,WB032Q01NA,WB032Q02NA,WB031Q01NA,EXERPRAC,STUBMI,RELATST,BELONG,BULLIED,FEELSAFE,SCHRISK,PERSEVAGR,CURIOAGR,COOPAGR,EMPATAGR,ASSERAGR,STRESAGR,EMOCOAGR,GROSAGR,INFOSEEK,FAMSUP,DISCLIM,TEACHSUP,COGACRCO,COGACMCO,EXPOFA,EXPO21ST,MATHEFF,MATHEF21,FAMCON,ANXMAT,MATHPERS,CREATEFF,CREATSCH,CREATFAM,CREATAS,CREATOOS,CREATOP,OPENART,IMAGINE,SCHSUST,LEARRES,PROBSELF,FAMSUPSL,FEELLAH,SDLEFF,ICTRES,ESCS,FLSCHOOL,FLMULTSB,FLFAMILY,ACCESSFP,FLCONFIN,FLCONICT,ACCESSFA,ATTCONFM,FRINFLFM,ICTSCH,ICTHOME,ICTQUAL,ICTSUBJ,ICTENQ,ICTFEED,ICTOUT,ICTWKDY,ICTWKEND,ICTREG,ICTINFO,ICTEFFIC,BODYIMA,SOCONPA,LIFESAT,PSYCHSYM,SOCCON,EXPWB,CURSUPP,PQMIMP,PQMCAR,PARINVOL,PQSCHOOL,PASCHPOL,ATTIMMP,CREATHME,CREATACT,CREATOPN,CREATOR,WORKPAY,WORKHOME,SC001Q01TA,SC211Q01JA,SC211Q02JA,SC211Q03JA,SC211Q04JA,SC211Q05JA,SC211Q06JA,SC209Q04JA,SC209Q05JA,SC209Q06JA,SC037Q11JA,SC183Q02JA,SC183Q03JA,SC183Q04JA,SC175Q01JA,SC177Q01JA_1,SC177Q01JA_2,SC177Q01JA_3,SC177Q02JA_1,SC177Q02JA_2,SC177Q02JA_3,SC177Q03JA_1,SC177Q03JA_2,SC177Q03JA_3,SC188Q01JA,SC188Q02JA,SC188Q03JA,SC188Q04JA,SC188Q05JA,SC188Q06JA,SC188Q07JA,SC188Q08JA,SC188Q09JA,SC188Q10JA,SC188Q11JA,SC198Q01JA,SC198Q02JA,SC198Q03JA,SC178Q01JA,SC178Q02JA,SC180Q01JA,SC189Q02WA,SC189Q03WA,SC189Q04WA,SMRATIO,MCLSIZE,MACTIV,MATHEXC_0,MATHEXC_1,MATHEXC_2,MATHEXC_3,ABGMATH,SC064Q05WA,SC064Q06WA,SC064Q01TA,SC064Q02TA,SC064Q04NA,SC064Q03TA,SC064Q07WA,SC213Q01JA,SC213Q02JA,SC037Q01TA,SC037Q02TA,SC037Q03TA,SC037Q04TA,SC037Q05NA,SC037Q06NA,SC037Q07TA,...,DIGDVPOL,TEAFDBK,MTTRAIN,DMCVIEWS,NEGSCLIM,STAFFSHORT,EDUSHORT,STUBEHA,TEACHBEHA,STDTEST,TDTEST,ALLACTIV,BCREATSC,CREENVSC,ACTCRESC,OPENCUL,PROBSCRI,SCPREPBP,SCPREPAP,DIGPREP,LANGN_105,LANGN_108,LANGN_112,LANGN_113,LANGN_118,LANGN_121,LANGN_130,LANGN_133,LANGN_137,LANGN_140,LANGN_147,LANGN_148,LANGN_150,LANGN_154,LANGN_156,LANGN_160,LANGN_170,LANGN_195,LANGN_200,LANGN_202,LANGN_204,LANGN_232,LANGN_237,LANGN_244,LANGN_246,LANGN_254,LANGN_258,LANGN_263,LANGN_264,LANGN_266,LANGN_272,LANGN_273,LANGN_275,LANGN_286,LANGN_301,LANGN_313,LANGN_316,LANGN_317,LANGN_322,LANGN_325,LANGN_327,LANGN_329,LANGN_338,LANGN_340,LANGN_344,LANGN_351,LANGN_358,LANGN_363,LANGN_369,LANGN_371,LANGN_375,LANGN_379,LANGN_381,LANGN_382,LANGN_383,LANGN_404,LANGN_409,LANGN_415,LANGN_420,LANGN_422,LANGN_428,LANGN_434,LANGN_442,LANGN_449,LANGN_451,LANGN_463,LANGN_465,LANGN_467,LANGN_471,LANGN_472,LANGN_474,LANGN_492,LANGN_493,LANGN_494,LANGN_495,LANGN_496,LANGN_500,LANGN_503,LANGN_514,LANGN_517,LANGN_520,LANGN_523,LANGN_527,LANGN_529,LANGN_531,LANGN_540,LANGN_547,LANGN_555,LANGN_561,LANGN_562,LANGN_563,LANGN_565,LANGN_566,LANGN_567,LANGN_600,LANGN_601,LANGN_602,LANGN_605,LANGN_606,LANGN_607,LANGN_608,LANGN_611,LANGN_614,LANGN_615,LANGN_616,LANGN_618,LANGN_619,LANGN_621,LANGN_622,LANGN_623,LANGN_624,LANGN_625,LANGN_626,LANGN_627,LANGN_628,LANGN_630,LANGN_631,LANGN_634,LANGN_635,LANGN_639,LANGN_640,LANGN_641,LANGN_642,LANGN_648,LANGN_650,LANGN_661,LANGN_662,LANGN_663,LANGN_665,LANGN_666,LANGN_667,LANGN_668,LANGN_669,LANGN_670,LANGN_673,LANGN_674,LANGN_675,LANGN_676,LANGN_677,LANGN_678,LANGN_800,LANGN_801,LANGN_802,LANGN_804,LANGN_805,LANGN_806,LANGN_807,LANGN_808,LANGN_809,LANGN_810,LANGN_811,LANGN_812,LANGN_813,LANGN_814,LANGN_815,LANGN_816,LANGN_817,LANGN_818,LANGN_819,LANGN_821,LANGN_823,LANGN_824,LANGN_825,LANGN_826,LANGN_827,LANGN_828,LANGN_829,LANGN_831,LANGN_832,LANGN_833,LANGN_836,LANGN_837,LANGN_838,LANGN_839,LANGN_840,LANGN_841,LANGN_842,LANGN_843,LANGN_844,LANGN_845,LANGN_846,LANGN_849,LANGN_850,LANGN_851,LANGN_852,LANGN_854,LANGN_855,LANGN_857,LANGN_859,LANGN_860,LANGN_861,LANGN_865,LANGN_866,LANGN_868,LANGN_870,LANGN_872,LANGN_873,LANGN_877,LANGN_879,LANGN_881,LANGN_885,LANGN_890,LANGN_892,LANGN_895,LANGN_896,LANGN_897,LANGN_898,LANGN_899,LANGN_900,LANGN_901,LANGN_902,LANGN_903,LANGN_904,LANGN_905,LANGN_906,LANGN_907,LANGN_908,LANGN_909,LANGN_910,LANGN_911,LANGN_912,LANGN_913,LANGN_914,LANGN_916,LANGN_917,LANGN_918,LANGN_919,LANGN_920,LANGN_921,LANGN_922
150693,0,21400051,21407678,1.0,3.0,3.0,0,0,1,0,0,1.0,3.0,5.0,5.0,5.0,5.0,,5.0,,9.0,9.0,5.0,,0.1762,2.0,0.0,0.0,9.0,7.0,6.0,8.0,1.0,1.0,3.0,8.0,0.0,0,0,16.0,85.85,51.5,,,4.0,0.0,3.0,3.0,5.0,2.0,,7.0,4.0,4.0,4.0,5.0,4.0,2.0,8.0,2.0,3.0,3.0,4.0,1.0,1.0,1.0,1.0,1.0,1.0,,,,,,1.0,6.0,6.0,,,,,,,,,,,,,,,,,,,,,,8.0,,1.0229,-0.5294,0.1361,0.3308,1.18,,-0.7671,,,,,-0.205,0.2929,2.5175,-0.5003,-0.1361,-0.5635,0.109,0.4899,0.0338,-0.0418,0.1188,0.0692,0.4693,0.3025,-0.3179,-0.3994,-0.6108,-0.9716,1.46,2.1024,-0.7465,0.0695,-1.35,-0.2772,0.261,0.9816,0.1991,,-0.5139,0.4565,0.9877,,,,,,,,,,0.4062,0.3346,0.3623,0.7732,0.2466,2.1365,-0.0426,1.2041,1.0735,0.7874,0.1614,-0.2067,,,,,,,1.2128,-0.4256,1.7522,1.3712,,,-1.1773,-0.1759,2.0979,-0.0483,0.1257,7.0,7.0,5.0,5.0,6.0,80.0,2.0,0.0,0.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,50.0,0,0,1,0,0,1,0,0,1,4.0,4.0,4.0,4.0,4.0,4.0,3.0,4.0,3.0,4.0,3.0,2.0,1.0,1.0,85.0,15.0,1.0,2.0,1.0,1.0,100.0,38.0,5.0,0,0,0,1,3.0,58.0,100.0,13.0,99.0,83.0,100.0,68.0,10.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.2275,1.5946,1.0982,0.3708,-0.6534,-0.467,-0.2376,-1.4329,-0.5079,1.6312,2.0131,1.5599,1.4387,1.578,0.7239,1.4448,0.0436,1.3156,0.8462,0.4339,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
120002,0,18800072,18804079,1.0,5.0,2.0,0,1,0,0,0,1.0,1.0,1.0,1.0,,5.0,3.0,3.0,,8.0,10.0,,,0.6583,1.0,0.0,0.0,9.0,7.0,6.0,4.0,1.0,1.0,1.0,8.0,0.0,1,1,,,,1.0,1.0,4.0,1.0,1.0,1.0,60.0,6.0,2.0,2.0,1.0,1.0,1.0,1.0,4.0,1.0,2.0,1.0,1.0,4.0,,1.0,0.0,1.0,0.0,0.0,0.0,2.0,3.0,2.0,1.0,1.0,,,6.0,,4.0,7.0,2.0,3.0,4.0,1.0,1.0,4.0,2.0,1.0,4.0,1.0,4.0,2.0,2.0,1.0,1.0,1.0,1.0,2.0,0.0,,-0.2699,0.195,-1.228,-1.2649,-0.6386,0.3594,-0.4795,0.4156,,1.0255,0.2862,-0.1211,-1.0261,-0.0052,,1.0945,-0.1002,-0.591,-1.0058,-0.4958,-0.3435,-1.6189,-1.2748,-0.0061,2.5078,-0.1911,0.6074,0.9025,-0.6532,0.876,0.5336,0.6142,1.052,2.4644,-0.288,-0.164,0.4118,0.0615,-2.4279,1.4238,,,-1.5638,-1.6481,1.8952,0.5847,-0.3983,1.2419,0.8627,1.1176,-0.8748,0.4062,0.3346,-0.5359,-1.0124,-0.2535,1.544,0.7883,1.122,0.4184,-2.672,-1.1989,0.2523,,0.4636,,0.8209,-0.3565,0.1262,0.1363,1.2955,0.4708,-0.1007,0.914,1.5845,,1.7914,-0.9143,0.5735,-0.4188,4.0,4.0,2.0,0.0,49.0,60.0,14.0,53.0,0.0,1.0,1.0,1.0,1.0,2.0,2.0,2.0,40.0,0,0,0,0,0,0,0,0,0,1.0,1.0,4.0,1.0,1.0,1.0,1.0,2.0,4.0,4.0,4.0,2.0,2.0,1.0,95.0,5.0,2.0,2.0,1.0,1.0,100.0,33.0,0.0,0,0,0,0,3.0,47.0,17.0,9.0,13.0,22.0,3.0,16.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,...,-0.1783,-0.5043,-0.2052,-0.2451,2.2029,4.0442,2.4611,3.441,3.7879,-1.6402,-0.829,-0.5327,1.2301,-1.7095,-1.5369,0.4555,,,,-1.8306,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
327356,1,45800120,45809473,1.0,5.0,1.0,0,0,1,0,0,1.0,3.0,,1.0,,1.0,,5.0,3.0,10.0,10.0,7.0,,-0.4027,2.0,0.0,0.0,6.0,7.0,6.0,14.0,1.0,0.0,1.0,8.0,0.0,0,0,16.0,82.41,82.41,,,4.0,0.0,3.0,1.0,8.0,4.0,,10.0,4.0,4.0,1.0,1.0,5.0,6.0,7.0,2.0,2.0,3.0,,0.0,0.0,1.0,1.0,0.0,0.0,,,,,,1.0,6.0,6.0,,,,,,,,,,,,,,,,,,,,,,4.0,,,-0.4425,0.5767,-0.756,-0.6386,,-0.0497,,,,,-0.2353,-0.3061,-1.405,0.6139,0.1997,0.4357,-1.2505,0.1899,0.3206,-0.6058,-1.251,-0.6116,,0.7429,-0.5537,-0.9074,0.0048,-0.0326,0.4667,-0.8105,-0.4635,-0.1977,,-0.4972,0.6522,-0.3551,0.895,0.7659,0.1885,1.1731,0.6776,-0.4197,1.8332,-0.4442,-0.4783,0.6582,-0.1077,-0.3802,0.2372,0.2614,0.4062,0.3346,0.3623,,-0.6375,0.0753,0.4974,0.963,0.85,0.2629,0.6984,-0.4661,,,,,,,,,,,,,,,,,,0.0,10.0,4.0,45.0,0.0,2.0,0.0,0.0,0.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,35.0,0,0,0,0,0,1,0,0,0,4.0,3.0,3.0,4.0,4.0,3.0,3.0,4.0,3.0,4.0,1.0,2.0,1.0,1.0,35.0,65.0,1.0,2.0,1.0,2.0,100.0,48.0,4.0,0,0,0,1,3.0,10.0,50.0,10.0,50.0,10.0,10.0,10.0,309.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.5837,1.8558,1.0982,,0.6294,,0.1,1.8887,1.8674,1.3326,0.7755,0.8255,-0.6554,-0.2686,0.0506,-1.3522,1.2328,-0.8314,,-0.2189,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
81627,1,12400760,12413119,1.0,2.0,1.0,0,1,0,0,0,1.0,4.0,,5.0,5.0,2.0,,1.0,0.0,10.0,10.0,,,-0.1872,1.0,0.0,0.0,8.0,,,,1.0,0.0,1.0,,0.0,1,1,12.0,,13.35,,,1.0,0.0,,,4.0,3.0,,5.0,,,,,4.0,1.0,8.0,4.0,3.0,4.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,,,,,,1.0,6.0,6.0,,,,,,,,,,,,,,,,,,,,,,5.0,,,2.7562,-1.228,1.1246,,0.5139,0.7629,3.7863,0.4142,-1.148,-0.9078,-0.1006,-0.3061,-0.6426,1.554,-0.1006,,0.0702,0.9568,-0.0388,0.5506,-0.4923,0.5356,-0.3589,-0.2704,0.8088,-0.2399,0.1117,-0.0326,1.4299,0.279,-0.3661,1.2543,0.3733,0.1542,0.3135,0.7194,0.5727,0.1178,0.1697,0.1083,-1.212,-0.4197,1.1994,-2.4821,1.0407,-1.3593,-1.1759,1.8694,0.3515,0.0098,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,0.0,2.0,,2.0,,,,,2.0,1.0,1.0,3.0,2.0,2.0,2.0,60.0,0,0,1,0,0,0,0,0,1,4.0,4.0,1.0,4.0,2.0,2.0,1.0,1.0,1.0,4.0,1.0,2.0,1.0,1.0,99.0,1.0,1.0,2.0,1.0,1.0,,,3.0,0,0,0,1,1.0,47.0,5.0,38.0,28.0,7.0,3.0,5.0,100.0,20.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,,-1.9591,-1.6751,,-0.9368,,-1.4797,0.1661,-2.0409,-0.4484,0.2342,-0.3064,,0.8144,,-1.0918,-0.2084,0.7337,0.8462,-0.2207,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
210388,0,32000023,32002783,,,,0,0,0,0,0,,,,,,,,,,,,4.0,,0.8679,1.0,-2.0,1.0,,,,,1.0,0.0,,10.0,0.0,0,0,9.0,23.53,30.34,,,4.0,0.0,,,6.0,4.0,,10.0,,,,,4.0,1.0,9.0,,,,,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,5.0,6.0,6.0,,,,,,,,,,,,,,,,,,,,,,2.0,,2.715,-0.1128,-1.228,1.1246,,,1.2273,,,,,0.5387,,,,1.8876,1.5558,1.6688,,-0.2632,,,,,,,,,,,,,,,,,,,,,-0.1529,-0.9521,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,5.0,10.0,2.0,15.0,0.0,70.0,0.0,35.0,0.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,45.0,0,0,1,0,0,1,0,0,1,4.0,3.0,4.0,4.0,1.0,3.0,2.0,4.0,3.0,4.0,3.0,2.0,1.0,1.0,30.0,70.0,1.0,2.0,2.0,1.0,39.3333,13.0,2.0,0,0,1,0,3.0,85.0,65.0,75.0,70.0,0.0,0.0,0.0,0.0,,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.2103,,1.0982,0.3981,-1.1403,-0.4314,-0.3878,,-1.419,,0.8207,0.37,1.4387,1.0793,0.5312,1.7884,,,,1.9471,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Now we'll copy the file to S3 for Amazon SageMaker's managed training to pickup.

In [13]:
# cell 14
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train/train.csv')).upload_file('train.csv')
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'validation/validation.csv')).upload_file('validation.csv')

---

## End of Lab 1


---

## Training
Now we know that most of our features have skewed distributions, some are highly correlated with one another, and some appear to have non-linear relationships with our target variable.  Also, for targeting future prospects, good predictive accuracy is preferred to being able to explain why that prospect was targeted.  Taken together, these aspects make gradient boosted trees a good candidate algorithm.

There are several intricacies to understanding the algorithm, but at a high level, gradient boosted trees works by combining predictions from many simple models, each of which tries to address the weaknesses of the previous models.  By doing this the collection of simple models can actually outperform large, complex models.  Other Amazon SageMaker notebooks elaborate on gradient boosting trees further and how they differ from similar algorithms.

`xgboost` is an extremely popular, open-source package for gradient boosted trees.  It is computationally powerful, fully featured, and has been successfully used in many machine learning competitions.  Let's start with a simple `xgboost` model, trained using Amazon SageMaker's managed, distributed training framework.

First we'll need to specify the ECR container location for Amazon SageMaker's implementation of XGBoost.

In [14]:
# cell 15
container = sagemaker.image_uris.retrieve(region=boto3.Session().region_name, framework='xgboost', version='latest')

Then, because we're training with the CSV file format, we'll create `s3_input`s that our training function can use as a pointer to the files in S3, which also specify that the content type is CSV.

In [15]:
# cell 16
s3_input_train = sagemaker.inputs.TrainingInput(s3_data='s3://{}/{}/train'.format(bucket, prefix), content_type='csv')
s3_input_validation = sagemaker.inputs.TrainingInput(s3_data='s3://{}/{}/validation/'.format(bucket, prefix), content_type='csv')

First we'll need to specify training parameters to the estimator.  This includes:
1. The `xgboost` algorithm container
1. The IAM role to use
1. Training instance type and count
1. S3 location for output data
1. Algorithm hyperparameters

And then a `.fit()` function which specifies:
1. S3 location for output data.  In this case we have both a training and validation set which are passed in.

In [16]:
mlflow_arn = "arn:aws:sagemaker:us-west-2:986030204467:mlflow-tracking-server/capstone-ml-flow"
mlflow.set_tracking_uri(mlflow_arn)

In [21]:
# cell 17
sess = sagemaker.Session()

with mlflow.start_run() as run:
    xgb = sagemaker.estimator.Estimator(container,
                                        role, 
                                        instance_count=1, 
                                        instance_type='ml.m4.xlarge',
                                        output_path='s3://{}/{}/output'.format(bucket, prefix),
                                        sagemaker_session=sess)
    xgb.set_hyperparameters(max_depth=5,
                            eta=0.2,
                            gamma=4,
                            min_child_weight=6,
                            subsample=0.8,
                            silent=0,
                            objective='binary:logistic',
                            num_round=100,
                            seed=42,  # Set fixed seed
                            seed_per_iteration=42,  # Ensures same randomness per iteration
                            early_stopping_rounds=10
                           )
    
    xgb.fit({'train': s3_input_train, 'validation': s3_input_validation})
     # Log hyperparameters to MLflow
    mlflow.log_params(xgb.hyperparameters())

INFO:sagemaker:Creating training-job with name: xgboost-2025-02-19-03-39-29-938


2025-02-19 03:39:31 Starting - Starting the training job...
2025-02-19 03:39:46 Starting - Preparing the instances for training...
2025-02-19 03:40:10 Downloading - Downloading input data...
2025-02-19 03:40:50 Downloading - Downloading the training image......
2025-02-19 03:41:46 Training - Training image download completed. Training in progress..[34mArguments: train[0m
[34m[2025-02-19:03:41:58:INFO] Running standalone xgboost training.[0m
[34m[2025-02-19:03:41:58:INFO] File size need to be processed in the node: 865.39mb. Available memory size in the node: 8547.92mb[0m
[34m[2025-02-19:03:41:58:INFO] Determined delimiter of CSV input is ','[0m
[34m[03:41:58] S3DistributionType set as FullyReplicated[0m
[34m[03:42:04] 414299x568 matrix with 235321832 entries loaded from /opt/ml/input/data/train?format=csv&label_column=0&delimiter=,[0m
[34m[2025-02-19:03:42:04:INFO] Determined delimiter of CSV input is ','[0m
[34m[03:42:04] S3DistributionType set as FullyReplicated[0m
[

In [22]:

    # Log the trained model
    #mlflow.sagemaker.log_model(xgb, artifact_path="model", registered_model_name="PISA_XGBoost")

    print("MLflow Run ID:", run.info.run_id)

MLflow Run ID: f2637e08b05a4f218b751f61d7420d49


# Explain the Model using Clarify

In [23]:
from datetime import datetime

session = sagemaker.Session()
model_name = "DEMO-pisa-2022-clarify-model-{}".format(datetime.now().strftime("%d-%m-%Y-%H-%M-%S"))
model = xgb.create_model(name=model_name)
container_def = model.prepare_container_def()
session.create_model(model_name, role, container_def)

INFO:sagemaker:Creating model with name: DEMO-pisa-2022-clarify-model-19-02-2025-03-50-24


'DEMO-pisa-2022-clarify-model-19-02-2025-03-50-24'

# Explaining predictions

In [24]:
train_data.head(5)

Unnamed: 0,MATH_Proficient,CNTSCHID,CNTSTUID,SISCO,ST347Q01JA,ST347Q02JA,ST349Q01JA_0,ST349Q01JA_1,ST349Q01JA_2,ST349Q01JA_3,ST349Q01JA_4,ST350Q01JA,ST356Q01JA,ST322Q01JA,ST322Q02JA,ST322Q03JA,ST322Q04JA,ST322Q06JA,ST322Q07JA,DURECEC,EFFORT1,EFFORT2,ST259Q01JA,WB164Q01HA,HOMEPOS,ST004D01T,GRADE,REPEAT,EXPECEDU,ICTAVSCH,ICTAVHOM,ICTDISTR,IMMIG,TARDYSD,ST226Q01JA,ST016Q01NA,MISSSC,Option_UH,OECD,PAREDINT,BMMJ1,BFMJ2,WB163Q06HA,WB163Q07HA,ST230Q01JA,SKIPPING,IC180Q01JA,IC180Q08JA,ST059Q02JA,ST296Q04JA,WB176Q01HA,STUDYHMW,IC184Q01JA,IC184Q02JA,IC184Q03JA,IC184Q04JA,ST059Q01TA,ST296Q01JA,ST272Q01JA,ST268Q01JA,ST268Q04JA,ST268Q07JA,ST293Q04JA,ST297Q01JA,ST297Q03JA,ST297Q05JA,ST297Q06JA,ST297Q07JA,ST297Q09JA,WB165Q01HA,WB166Q01HA,WB166Q02HA,WB166Q03HA,WB166Q04HA,ST258Q01JA,ST294Q01JA,ST295Q01JA,WB150Q01HA,WB156Q01HA,WB158Q01HA,WB160Q01HA,WB161Q01HA,WB171Q01HA,WB171Q02HA,WB171Q03HA,WB171Q04HA,WB172Q01HA,WB173Q01HA,WB173Q02HA,WB173Q03HA,WB173Q04HA,WB177Q01HA,WB177Q02HA,WB177Q03HA,WB177Q04HA,WB032Q01NA,WB032Q02NA,WB031Q01NA,EXERPRAC,STUBMI,RELATST,BELONG,BULLIED,FEELSAFE,SCHRISK,PERSEVAGR,CURIOAGR,COOPAGR,EMPATAGR,ASSERAGR,STRESAGR,EMOCOAGR,GROSAGR,INFOSEEK,FAMSUP,DISCLIM,TEACHSUP,COGACRCO,COGACMCO,EXPOFA,EXPO21ST,MATHEFF,MATHEF21,FAMCON,ANXMAT,MATHPERS,CREATEFF,CREATSCH,CREATFAM,CREATAS,CREATOOS,CREATOP,OPENART,IMAGINE,SCHSUST,LEARRES,PROBSELF,FAMSUPSL,FEELLAH,SDLEFF,ICTRES,ESCS,FLSCHOOL,FLMULTSB,FLFAMILY,ACCESSFP,FLCONFIN,FLCONICT,ACCESSFA,ATTCONFM,FRINFLFM,ICTSCH,ICTHOME,ICTQUAL,ICTSUBJ,ICTENQ,ICTFEED,ICTOUT,ICTWKDY,ICTWKEND,ICTREG,ICTINFO,ICTEFFIC,BODYIMA,SOCONPA,LIFESAT,PSYCHSYM,SOCCON,EXPWB,CURSUPP,PQMIMP,PQMCAR,PARINVOL,PQSCHOOL,PASCHPOL,ATTIMMP,CREATHME,CREATACT,CREATOPN,CREATOR,WORKPAY,WORKHOME,SC001Q01TA,SC211Q01JA,SC211Q02JA,SC211Q03JA,SC211Q04JA,SC211Q05JA,SC211Q06JA,SC209Q04JA,SC209Q05JA,SC209Q06JA,SC037Q11JA,SC183Q02JA,SC183Q03JA,SC183Q04JA,SC175Q01JA,SC177Q01JA_1,SC177Q01JA_2,SC177Q01JA_3,SC177Q02JA_1,SC177Q02JA_2,SC177Q02JA_3,SC177Q03JA_1,SC177Q03JA_2,SC177Q03JA_3,SC188Q01JA,SC188Q02JA,SC188Q03JA,SC188Q04JA,SC188Q05JA,SC188Q06JA,SC188Q07JA,SC188Q08JA,SC188Q09JA,SC188Q10JA,SC188Q11JA,SC198Q01JA,SC198Q02JA,SC198Q03JA,SC178Q01JA,SC178Q02JA,SC180Q01JA,SC189Q02WA,SC189Q03WA,SC189Q04WA,SMRATIO,MCLSIZE,MACTIV,MATHEXC_0,MATHEXC_1,MATHEXC_2,MATHEXC_3,ABGMATH,SC064Q05WA,SC064Q06WA,SC064Q01TA,SC064Q02TA,SC064Q04NA,SC064Q03TA,SC064Q07WA,SC213Q01JA,SC213Q02JA,SC037Q01TA,SC037Q02TA,SC037Q03TA,SC037Q04TA,SC037Q05NA,SC037Q06NA,SC037Q07TA,...,DIGDVPOL,TEAFDBK,MTTRAIN,DMCVIEWS,NEGSCLIM,STAFFSHORT,EDUSHORT,STUBEHA,TEACHBEHA,STDTEST,TDTEST,ALLACTIV,BCREATSC,CREENVSC,ACTCRESC,OPENCUL,PROBSCRI,SCPREPBP,SCPREPAP,DIGPREP,LANGN_105,LANGN_108,LANGN_112,LANGN_113,LANGN_118,LANGN_121,LANGN_130,LANGN_133,LANGN_137,LANGN_140,LANGN_147,LANGN_148,LANGN_150,LANGN_154,LANGN_156,LANGN_160,LANGN_170,LANGN_195,LANGN_200,LANGN_202,LANGN_204,LANGN_232,LANGN_237,LANGN_244,LANGN_246,LANGN_254,LANGN_258,LANGN_263,LANGN_264,LANGN_266,LANGN_272,LANGN_273,LANGN_275,LANGN_286,LANGN_301,LANGN_313,LANGN_316,LANGN_317,LANGN_322,LANGN_325,LANGN_327,LANGN_329,LANGN_338,LANGN_340,LANGN_344,LANGN_351,LANGN_358,LANGN_363,LANGN_369,LANGN_371,LANGN_375,LANGN_379,LANGN_381,LANGN_382,LANGN_383,LANGN_404,LANGN_409,LANGN_415,LANGN_420,LANGN_422,LANGN_428,LANGN_434,LANGN_442,LANGN_449,LANGN_451,LANGN_463,LANGN_465,LANGN_467,LANGN_471,LANGN_472,LANGN_474,LANGN_492,LANGN_493,LANGN_494,LANGN_495,LANGN_496,LANGN_500,LANGN_503,LANGN_514,LANGN_517,LANGN_520,LANGN_523,LANGN_527,LANGN_529,LANGN_531,LANGN_540,LANGN_547,LANGN_555,LANGN_561,LANGN_562,LANGN_563,LANGN_565,LANGN_566,LANGN_567,LANGN_600,LANGN_601,LANGN_602,LANGN_605,LANGN_606,LANGN_607,LANGN_608,LANGN_611,LANGN_614,LANGN_615,LANGN_616,LANGN_618,LANGN_619,LANGN_621,LANGN_622,LANGN_623,LANGN_624,LANGN_625,LANGN_626,LANGN_627,LANGN_628,LANGN_630,LANGN_631,LANGN_634,LANGN_635,LANGN_639,LANGN_640,LANGN_641,LANGN_642,LANGN_648,LANGN_650,LANGN_661,LANGN_662,LANGN_663,LANGN_665,LANGN_666,LANGN_667,LANGN_668,LANGN_669,LANGN_670,LANGN_673,LANGN_674,LANGN_675,LANGN_676,LANGN_677,LANGN_678,LANGN_800,LANGN_801,LANGN_802,LANGN_804,LANGN_805,LANGN_806,LANGN_807,LANGN_808,LANGN_809,LANGN_810,LANGN_811,LANGN_812,LANGN_813,LANGN_814,LANGN_815,LANGN_816,LANGN_817,LANGN_818,LANGN_819,LANGN_821,LANGN_823,LANGN_824,LANGN_825,LANGN_826,LANGN_827,LANGN_828,LANGN_829,LANGN_831,LANGN_832,LANGN_833,LANGN_836,LANGN_837,LANGN_838,LANGN_839,LANGN_840,LANGN_841,LANGN_842,LANGN_843,LANGN_844,LANGN_845,LANGN_846,LANGN_849,LANGN_850,LANGN_851,LANGN_852,LANGN_854,LANGN_855,LANGN_857,LANGN_859,LANGN_860,LANGN_861,LANGN_865,LANGN_866,LANGN_868,LANGN_870,LANGN_872,LANGN_873,LANGN_877,LANGN_879,LANGN_881,LANGN_885,LANGN_890,LANGN_892,LANGN_895,LANGN_896,LANGN_897,LANGN_898,LANGN_899,LANGN_900,LANGN_901,LANGN_902,LANGN_903,LANGN_904,LANGN_905,LANGN_906,LANGN_907,LANGN_908,LANGN_909,LANGN_910,LANGN_911,LANGN_912,LANGN_913,LANGN_914,LANGN_916,LANGN_917,LANGN_918,LANGN_919,LANGN_920,LANGN_921,LANGN_922
150693,0,21400051,21407678,1.0,3.0,3.0,0,0,1,0,0,1.0,3.0,5.0,5.0,5.0,5.0,,5.0,,9.0,9.0,5.0,,0.1762,2.0,0.0,0.0,9.0,7.0,6.0,8.0,1.0,1.0,3.0,8.0,0.0,0,0,16.0,85.85,51.5,,,4.0,0.0,3.0,3.0,5.0,2.0,,7.0,4.0,4.0,4.0,5.0,4.0,2.0,8.0,2.0,3.0,3.0,4.0,1.0,1.0,1.0,1.0,1.0,1.0,,,,,,1.0,6.0,6.0,,,,,,,,,,,,,,,,,,,,,,8.0,,1.0229,-0.5294,0.1361,0.3308,1.18,,-0.7671,,,,,-0.205,0.2929,2.5175,-0.5003,-0.1361,-0.5635,0.109,0.4899,0.0338,-0.0418,0.1188,0.0692,0.4693,0.3025,-0.3179,-0.3994,-0.6108,-0.9716,1.46,2.1024,-0.7465,0.0695,-1.35,-0.2772,0.261,0.9816,0.1991,,-0.5139,0.4565,0.9877,,,,,,,,,,0.4062,0.3346,0.3623,0.7732,0.2466,2.1365,-0.0426,1.2041,1.0735,0.7874,0.1614,-0.2067,,,,,,,1.2128,-0.4256,1.7522,1.3712,,,-1.1773,-0.1759,2.0979,-0.0483,0.1257,7.0,7.0,5.0,5.0,6.0,80.0,2.0,0.0,0.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,50.0,0,0,1,0,0,1,0,0,1,4.0,4.0,4.0,4.0,4.0,4.0,3.0,4.0,3.0,4.0,3.0,2.0,1.0,1.0,85.0,15.0,1.0,2.0,1.0,1.0,100.0,38.0,5.0,0,0,0,1,3.0,58.0,100.0,13.0,99.0,83.0,100.0,68.0,10.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.2275,1.5946,1.0982,0.3708,-0.6534,-0.467,-0.2376,-1.4329,-0.5079,1.6312,2.0131,1.5599,1.4387,1.578,0.7239,1.4448,0.0436,1.3156,0.8462,0.4339,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
120002,0,18800072,18804079,1.0,5.0,2.0,0,1,0,0,0,1.0,1.0,1.0,1.0,,5.0,3.0,3.0,,8.0,10.0,,,0.6583,1.0,0.0,0.0,9.0,7.0,6.0,4.0,1.0,1.0,1.0,8.0,0.0,1,1,,,,1.0,1.0,4.0,1.0,1.0,1.0,60.0,6.0,2.0,2.0,1.0,1.0,1.0,1.0,4.0,1.0,2.0,1.0,1.0,4.0,,1.0,0.0,1.0,0.0,0.0,0.0,2.0,3.0,2.0,1.0,1.0,,,6.0,,4.0,7.0,2.0,3.0,4.0,1.0,1.0,4.0,2.0,1.0,4.0,1.0,4.0,2.0,2.0,1.0,1.0,1.0,1.0,2.0,0.0,,-0.2699,0.195,-1.228,-1.2649,-0.6386,0.3594,-0.4795,0.4156,,1.0255,0.2862,-0.1211,-1.0261,-0.0052,,1.0945,-0.1002,-0.591,-1.0058,-0.4958,-0.3435,-1.6189,-1.2748,-0.0061,2.5078,-0.1911,0.6074,0.9025,-0.6532,0.876,0.5336,0.6142,1.052,2.4644,-0.288,-0.164,0.4118,0.0615,-2.4279,1.4238,,,-1.5638,-1.6481,1.8952,0.5847,-0.3983,1.2419,0.8627,1.1176,-0.8748,0.4062,0.3346,-0.5359,-1.0124,-0.2535,1.544,0.7883,1.122,0.4184,-2.672,-1.1989,0.2523,,0.4636,,0.8209,-0.3565,0.1262,0.1363,1.2955,0.4708,-0.1007,0.914,1.5845,,1.7914,-0.9143,0.5735,-0.4188,4.0,4.0,2.0,0.0,49.0,60.0,14.0,53.0,0.0,1.0,1.0,1.0,1.0,2.0,2.0,2.0,40.0,0,0,0,0,0,0,0,0,0,1.0,1.0,4.0,1.0,1.0,1.0,1.0,2.0,4.0,4.0,4.0,2.0,2.0,1.0,95.0,5.0,2.0,2.0,1.0,1.0,100.0,33.0,0.0,0,0,0,0,3.0,47.0,17.0,9.0,13.0,22.0,3.0,16.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,...,-0.1783,-0.5043,-0.2052,-0.2451,2.2029,4.0442,2.4611,3.441,3.7879,-1.6402,-0.829,-0.5327,1.2301,-1.7095,-1.5369,0.4555,,,,-1.8306,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
327356,1,45800120,45809473,1.0,5.0,1.0,0,0,1,0,0,1.0,3.0,,1.0,,1.0,,5.0,3.0,10.0,10.0,7.0,,-0.4027,2.0,0.0,0.0,6.0,7.0,6.0,14.0,1.0,0.0,1.0,8.0,0.0,0,0,16.0,82.41,82.41,,,4.0,0.0,3.0,1.0,8.0,4.0,,10.0,4.0,4.0,1.0,1.0,5.0,6.0,7.0,2.0,2.0,3.0,,0.0,0.0,1.0,1.0,0.0,0.0,,,,,,1.0,6.0,6.0,,,,,,,,,,,,,,,,,,,,,,4.0,,,-0.4425,0.5767,-0.756,-0.6386,,-0.0497,,,,,-0.2353,-0.3061,-1.405,0.6139,0.1997,0.4357,-1.2505,0.1899,0.3206,-0.6058,-1.251,-0.6116,,0.7429,-0.5537,-0.9074,0.0048,-0.0326,0.4667,-0.8105,-0.4635,-0.1977,,-0.4972,0.6522,-0.3551,0.895,0.7659,0.1885,1.1731,0.6776,-0.4197,1.8332,-0.4442,-0.4783,0.6582,-0.1077,-0.3802,0.2372,0.2614,0.4062,0.3346,0.3623,,-0.6375,0.0753,0.4974,0.963,0.85,0.2629,0.6984,-0.4661,,,,,,,,,,,,,,,,,,0.0,10.0,4.0,45.0,0.0,2.0,0.0,0.0,0.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,35.0,0,0,0,0,0,1,0,0,0,4.0,3.0,3.0,4.0,4.0,3.0,3.0,4.0,3.0,4.0,1.0,2.0,1.0,1.0,35.0,65.0,1.0,2.0,1.0,2.0,100.0,48.0,4.0,0,0,0,1,3.0,10.0,50.0,10.0,50.0,10.0,10.0,10.0,309.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.5837,1.8558,1.0982,,0.6294,,0.1,1.8887,1.8674,1.3326,0.7755,0.8255,-0.6554,-0.2686,0.0506,-1.3522,1.2328,-0.8314,,-0.2189,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
81627,1,12400760,12413119,1.0,2.0,1.0,0,1,0,0,0,1.0,4.0,,5.0,5.0,2.0,,1.0,0.0,10.0,10.0,,,-0.1872,1.0,0.0,0.0,8.0,,,,1.0,0.0,1.0,,0.0,1,1,12.0,,13.35,,,1.0,0.0,,,4.0,3.0,,5.0,,,,,4.0,1.0,8.0,4.0,3.0,4.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,,,,,,1.0,6.0,6.0,,,,,,,,,,,,,,,,,,,,,,5.0,,,2.7562,-1.228,1.1246,,0.5139,0.7629,3.7863,0.4142,-1.148,-0.9078,-0.1006,-0.3061,-0.6426,1.554,-0.1006,,0.0702,0.9568,-0.0388,0.5506,-0.4923,0.5356,-0.3589,-0.2704,0.8088,-0.2399,0.1117,-0.0326,1.4299,0.279,-0.3661,1.2543,0.3733,0.1542,0.3135,0.7194,0.5727,0.1178,0.1697,0.1083,-1.212,-0.4197,1.1994,-2.4821,1.0407,-1.3593,-1.1759,1.8694,0.3515,0.0098,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,0.0,2.0,,2.0,,,,,2.0,1.0,1.0,3.0,2.0,2.0,2.0,60.0,0,0,1,0,0,0,0,0,1,4.0,4.0,1.0,4.0,2.0,2.0,1.0,1.0,1.0,4.0,1.0,2.0,1.0,1.0,99.0,1.0,1.0,2.0,1.0,1.0,,,3.0,0,0,0,1,1.0,47.0,5.0,38.0,28.0,7.0,3.0,5.0,100.0,20.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,,-1.9591,-1.6751,,-0.9368,,-1.4797,0.1661,-2.0409,-0.4484,0.2342,-0.3064,,0.8144,,-1.0918,-0.2084,0.7337,0.8462,-0.2207,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
210388,0,32000023,32002783,,,,0,0,0,0,0,,,,,,,,,,,,4.0,,0.8679,1.0,-2.0,1.0,,,,,1.0,0.0,,10.0,0.0,0,0,9.0,23.53,30.34,,,4.0,0.0,,,6.0,4.0,,10.0,,,,,4.0,1.0,9.0,,,,,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,5.0,6.0,6.0,,,,,,,,,,,,,,,,,,,,,,2.0,,2.715,-0.1128,-1.228,1.1246,,,1.2273,,,,,0.5387,,,,1.8876,1.5558,1.6688,,-0.2632,,,,,,,,,,,,,,,,,,,,,-0.1529,-0.9521,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,5.0,10.0,2.0,15.0,0.0,70.0,0.0,35.0,0.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,45.0,0,0,1,0,0,1,0,0,1,4.0,3.0,4.0,4.0,1.0,3.0,2.0,4.0,3.0,4.0,3.0,2.0,1.0,1.0,30.0,70.0,1.0,2.0,2.0,1.0,39.3333,13.0,2.0,0,0,1,0,3.0,85.0,65.0,75.0,70.0,0.0,0.0,0.0,0.0,,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.2103,,1.0982,0.3981,-1.1403,-0.4314,-0.3878,,-1.419,,0.8207,0.37,1.4387,1.0793,0.5312,1.7884,,,,1.9471,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [25]:
validation_data.head(5)

Unnamed: 0,MATH_Proficient,CNTSCHID,CNTSTUID,SISCO,ST347Q01JA,ST347Q02JA,ST349Q01JA_0,ST349Q01JA_1,ST349Q01JA_2,ST349Q01JA_3,ST349Q01JA_4,ST350Q01JA,ST356Q01JA,ST322Q01JA,ST322Q02JA,ST322Q03JA,ST322Q04JA,ST322Q06JA,ST322Q07JA,DURECEC,EFFORT1,EFFORT2,ST259Q01JA,WB164Q01HA,HOMEPOS,ST004D01T,GRADE,REPEAT,EXPECEDU,ICTAVSCH,ICTAVHOM,ICTDISTR,IMMIG,TARDYSD,ST226Q01JA,ST016Q01NA,MISSSC,Option_UH,OECD,PAREDINT,BMMJ1,BFMJ2,WB163Q06HA,WB163Q07HA,ST230Q01JA,SKIPPING,IC180Q01JA,IC180Q08JA,ST059Q02JA,ST296Q04JA,WB176Q01HA,STUDYHMW,IC184Q01JA,IC184Q02JA,IC184Q03JA,IC184Q04JA,ST059Q01TA,ST296Q01JA,ST272Q01JA,ST268Q01JA,ST268Q04JA,ST268Q07JA,ST293Q04JA,ST297Q01JA,ST297Q03JA,ST297Q05JA,ST297Q06JA,ST297Q07JA,ST297Q09JA,WB165Q01HA,WB166Q01HA,WB166Q02HA,WB166Q03HA,WB166Q04HA,ST258Q01JA,ST294Q01JA,ST295Q01JA,WB150Q01HA,WB156Q01HA,WB158Q01HA,WB160Q01HA,WB161Q01HA,WB171Q01HA,WB171Q02HA,WB171Q03HA,WB171Q04HA,WB172Q01HA,WB173Q01HA,WB173Q02HA,WB173Q03HA,WB173Q04HA,WB177Q01HA,WB177Q02HA,WB177Q03HA,WB177Q04HA,WB032Q01NA,WB032Q02NA,WB031Q01NA,EXERPRAC,STUBMI,RELATST,BELONG,BULLIED,FEELSAFE,SCHRISK,PERSEVAGR,CURIOAGR,COOPAGR,EMPATAGR,ASSERAGR,STRESAGR,EMOCOAGR,GROSAGR,INFOSEEK,FAMSUP,DISCLIM,TEACHSUP,COGACRCO,COGACMCO,EXPOFA,EXPO21ST,MATHEFF,MATHEF21,FAMCON,ANXMAT,MATHPERS,CREATEFF,CREATSCH,CREATFAM,CREATAS,CREATOOS,CREATOP,OPENART,IMAGINE,SCHSUST,LEARRES,PROBSELF,FAMSUPSL,FEELLAH,SDLEFF,ICTRES,ESCS,FLSCHOOL,FLMULTSB,FLFAMILY,ACCESSFP,FLCONFIN,FLCONICT,ACCESSFA,ATTCONFM,FRINFLFM,ICTSCH,ICTHOME,ICTQUAL,ICTSUBJ,ICTENQ,ICTFEED,ICTOUT,ICTWKDY,ICTWKEND,ICTREG,ICTINFO,ICTEFFIC,BODYIMA,SOCONPA,LIFESAT,PSYCHSYM,SOCCON,EXPWB,CURSUPP,PQMIMP,PQMCAR,PARINVOL,PQSCHOOL,PASCHPOL,ATTIMMP,CREATHME,CREATACT,CREATOPN,CREATOR,WORKPAY,WORKHOME,SC001Q01TA,SC211Q01JA,SC211Q02JA,SC211Q03JA,SC211Q04JA,SC211Q05JA,SC211Q06JA,SC209Q04JA,SC209Q05JA,SC209Q06JA,SC037Q11JA,SC183Q02JA,SC183Q03JA,SC183Q04JA,SC175Q01JA,SC177Q01JA_1,SC177Q01JA_2,SC177Q01JA_3,SC177Q02JA_1,SC177Q02JA_2,SC177Q02JA_3,SC177Q03JA_1,SC177Q03JA_2,SC177Q03JA_3,SC188Q01JA,SC188Q02JA,SC188Q03JA,SC188Q04JA,SC188Q05JA,SC188Q06JA,SC188Q07JA,SC188Q08JA,SC188Q09JA,SC188Q10JA,SC188Q11JA,SC198Q01JA,SC198Q02JA,SC198Q03JA,SC178Q01JA,SC178Q02JA,SC180Q01JA,SC189Q02WA,SC189Q03WA,SC189Q04WA,SMRATIO,MCLSIZE,MACTIV,MATHEXC_0,MATHEXC_1,MATHEXC_2,MATHEXC_3,ABGMATH,SC064Q05WA,SC064Q06WA,SC064Q01TA,SC064Q02TA,SC064Q04NA,SC064Q03TA,SC064Q07WA,SC213Q01JA,SC213Q02JA,SC037Q01TA,SC037Q02TA,SC037Q03TA,SC037Q04TA,SC037Q05NA,SC037Q06NA,SC037Q07TA,...,DIGDVPOL,TEAFDBK,MTTRAIN,DMCVIEWS,NEGSCLIM,STAFFSHORT,EDUSHORT,STUBEHA,TEACHBEHA,STDTEST,TDTEST,ALLACTIV,BCREATSC,CREENVSC,ACTCRESC,OPENCUL,PROBSCRI,SCPREPBP,SCPREPAP,DIGPREP,LANGN_105,LANGN_108,LANGN_112,LANGN_113,LANGN_118,LANGN_121,LANGN_130,LANGN_133,LANGN_137,LANGN_140,LANGN_147,LANGN_148,LANGN_150,LANGN_154,LANGN_156,LANGN_160,LANGN_170,LANGN_195,LANGN_200,LANGN_202,LANGN_204,LANGN_232,LANGN_237,LANGN_244,LANGN_246,LANGN_254,LANGN_258,LANGN_263,LANGN_264,LANGN_266,LANGN_272,LANGN_273,LANGN_275,LANGN_286,LANGN_301,LANGN_313,LANGN_316,LANGN_317,LANGN_322,LANGN_325,LANGN_327,LANGN_329,LANGN_338,LANGN_340,LANGN_344,LANGN_351,LANGN_358,LANGN_363,LANGN_369,LANGN_371,LANGN_375,LANGN_379,LANGN_381,LANGN_382,LANGN_383,LANGN_404,LANGN_409,LANGN_415,LANGN_420,LANGN_422,LANGN_428,LANGN_434,LANGN_442,LANGN_449,LANGN_451,LANGN_463,LANGN_465,LANGN_467,LANGN_471,LANGN_472,LANGN_474,LANGN_492,LANGN_493,LANGN_494,LANGN_495,LANGN_496,LANGN_500,LANGN_503,LANGN_514,LANGN_517,LANGN_520,LANGN_523,LANGN_527,LANGN_529,LANGN_531,LANGN_540,LANGN_547,LANGN_555,LANGN_561,LANGN_562,LANGN_563,LANGN_565,LANGN_566,LANGN_567,LANGN_600,LANGN_601,LANGN_602,LANGN_605,LANGN_606,LANGN_607,LANGN_608,LANGN_611,LANGN_614,LANGN_615,LANGN_616,LANGN_618,LANGN_619,LANGN_621,LANGN_622,LANGN_623,LANGN_624,LANGN_625,LANGN_626,LANGN_627,LANGN_628,LANGN_630,LANGN_631,LANGN_634,LANGN_635,LANGN_639,LANGN_640,LANGN_641,LANGN_642,LANGN_648,LANGN_650,LANGN_661,LANGN_662,LANGN_663,LANGN_665,LANGN_666,LANGN_667,LANGN_668,LANGN_669,LANGN_670,LANGN_673,LANGN_674,LANGN_675,LANGN_676,LANGN_677,LANGN_678,LANGN_800,LANGN_801,LANGN_802,LANGN_804,LANGN_805,LANGN_806,LANGN_807,LANGN_808,LANGN_809,LANGN_810,LANGN_811,LANGN_812,LANGN_813,LANGN_814,LANGN_815,LANGN_816,LANGN_817,LANGN_818,LANGN_819,LANGN_821,LANGN_823,LANGN_824,LANGN_825,LANGN_826,LANGN_827,LANGN_828,LANGN_829,LANGN_831,LANGN_832,LANGN_833,LANGN_836,LANGN_837,LANGN_838,LANGN_839,LANGN_840,LANGN_841,LANGN_842,LANGN_843,LANGN_844,LANGN_845,LANGN_846,LANGN_849,LANGN_850,LANGN_851,LANGN_852,LANGN_854,LANGN_855,LANGN_857,LANGN_859,LANGN_860,LANGN_861,LANGN_865,LANGN_866,LANGN_868,LANGN_870,LANGN_872,LANGN_873,LANGN_877,LANGN_879,LANGN_881,LANGN_885,LANGN_890,LANGN_892,LANGN_895,LANGN_896,LANGN_897,LANGN_898,LANGN_899,LANGN_900,LANGN_901,LANGN_902,LANGN_903,LANGN_904,LANGN_905,LANGN_906,LANGN_907,LANGN_908,LANGN_909,LANGN_910,LANGN_911,LANGN_912,LANGN_913,LANGN_914,LANGN_916,LANGN_917,LANGN_918,LANGN_919,LANGN_920,LANGN_921,LANGN_922
387609,0,60000072,60003929,1.0,6.0,1.0,0,0,0,1,0,1.0,,,,,,,,1.0,8.0,9.0,3.0,,-1.1691,1.0,0.0,0.0,9.0,,,,1.0,0.0,1.0,6.0,1.0,0,0,16.0,68.88,64.44,,,4.0,1.0,,,6.0,1.0,,5.0,,,,,5.0,1.0,10.0,,,,1.0,0.0,0.0,0.0,0.0,0.0,1.0,,,,,,1.0,6.0,6.0,,,,,,,,,,,,,,,,,,,,,,0.0,,-0.0054,-0.07,1.1008,1.1246,,,0.1972,,,,,0.0689,,,-1.7508,0.6896,-0.1002,-0.4814,,-0.2095,0.5561,-0.8915,,,1.6218,0.3951,,,,,,,,,2.7117,-1.2426,-0.1857,,,,-2.6528,0.0975,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,10.0,10.0,2.0,,,,,,,1.0,1.0,1.0,3.0,1.0,1.0,1.0,,0,0,1,0,0,1,0,0,1,1.0,1.0,3.0,1.0,1.0,1.0,1.0,1.0,1.0,,1.0,2.0,2.0,1.0,90.0,10.0,2.0,2.0,1.0,1.0,56.5,38.0,0.0,0,0,0,0,1.0,100.0,100.0,96.0,80.0,20.0,30.0,30.0,255.0,3.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,...,-1.5514,0.4685,-1.6751,0.4874,-0.8905,-2.3871,-0.0672,-1.0607,-1.0504,1.8798,0.6734,-1.0515,0.6355,1.1513,0.1122,1.2303,-0.5687,-0.8314,-0.1008,-1.1093,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
275645,1,39800264,39801502,1.0,5.0,1.0,0,0,1,0,0,1.0,3.0,5.0,5.0,4.0,3.0,,,,10.0,10.0,8.0,,-0.6762,1.0,0.0,0.0,9.0,7.0,6.0,,,1.0,1.0,10.0,0.0,0,0,14.5,70.09,51.01,,,2.0,0.0,1.0,1.0,37.0,6.0,,10.0,5.0,1.0,5.0,5.0,5.0,4.0,7.0,3.0,3.0,4.0,,1.0,1.0,1.0,0.0,0.0,0.0,,,,,,1.0,6.0,6.0,,,,,,,,,,,,,,,,,,,,,,8.0,,2.5017,0.6428,-1.228,1.1246,-0.6386,,-0.163,,,,,0.5207,2.0294,-0.2976,0.8416,1.6526,1.5558,-1.1303,2.3189,2.5556,3.1677,,0.7581,,-2.3558,2.4282,0.3356,2.7508,2.2134,0.3161,0.5825,2.3384,1.3218,,1.5406,2.8782,-2.2306,2.2223,-0.2361,0.1731,,0.0753,,,,,,,,,,0.4062,0.3346,0.3623,1.95,1.0759,2.942,2.9804,0.4074,,0.1227,0.6984,0.3238,,,,,,,,,,,,,,,,,,0.0,10.0,4.0,15.0,2.0,13.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,45.0,0,1,0,0,0,1,0,0,1,1.0,1.0,4.0,4.0,1.0,1.0,4.0,2.0,1.0,2.0,1.0,1.0,1.0,1.0,100.0,0.0,1.0,1.0,2.0,1.0,100.0,23.0,5.0,0,0,0,1,3.0,3.0,10.0,6.0,20.0,100.0,10.0,0.0,100.0,7.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.5145,0.8109,1.0982,-0.9555,0.3378,1.3305,0.6191,,2.0157,0.8581,1.676,1.3763,-0.6554,-0.2686,1.4379,0.2964,0.3285,-0.8314,,1.3156,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
42756,1,5600198,5606778,,4.0,1.0,0,0,1,0,0,1.0,4.0,5.0,5.0,5.0,,5.0,5.0,2.0,4.0,10.0,,,2.571,2.0,0.0,0.0,9.0,0.0,0.0,7.0,1.0,2.0,1.0,,0.0,1,1,16.0,16.38,,,,3.0,0.0,3.0,3.0,33.0,4.0,,10.0,4.0,4.0,5.0,4.0,6.0,3.0,10.0,1.0,2.0,4.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,,,,,,,6.0,6.0,,,,,,,,,,,,,,,,,,,,,,10.0,,-1.8033,2.2967,-0.5168,1.1246,3.6492,0.2996,-0.3602,,-0.6072,3.7254,5.6515,,0.2652,-2.0011,-1.1017,0.3884,1.5558,2.1557,2.4415,-0.4329,0.2198,0.6543,2.2349,3.0457,-0.2885,2.3863,2.4989,-0.0927,,0.3831,-0.8105,-0.1767,0.1083,,0.9302,0.8756,1.2266,1.597,,-0.4895,,0.6903,2.3166,0.4872,-1.2421,1.9096,2.2716,2.0781,0.4645,0.6912,-1.8695,-5.6242,-6.3015,2.8889,1.8553,1.2049,-1.6286,0.1581,3.9236,3.8168,1.0594,-0.5855,-0.1904,,,,,,,,-1.5646,0.4708,-0.8126,-1.0697,-0.475,,-1.0475,-0.9143,-1.3206,-0.8907,3.0,6.0,4.0,3.0,5.0,3.0,3.0,5.0,1.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,45.0,0,1,0,0,1,0,0,1,0,4.0,4.0,,1.0,1.0,1.0,,3.0,1.0,3.0,1.0,2.0,2.0,1.0,85.0,15.0,1.0,1.0,1.0,1.0,78.8571,23.0,2.0,0,0,1,0,3.0,5.0,5.0,10.0,2.0,2.0,5.0,0.0,40.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,...,,-0.7467,-1.6751,-1.1856,0.6029,,-1.4212,,1.6124,-1.6402,0.3452,,0.3711,-1.3101,-0.638,-1.0681,0.8027,-0.8314,-1.1181,-0.9707,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
204254,1,30000170,30002998,1.0,4.0,2.0,0,0,1,0,0,1.0,3.0,1.0,5.0,5.0,,4.0,1.0,2.0,7.0,7.0,6.0,,0.3906,2.0,0.0,0.0,7.0,7.0,6.0,1.0,1.0,0.0,4.0,8.0,0.0,0,1,16.0,51.25,56.35,,,1.0,1.0,1.0,2.0,43.0,2.0,,2.0,1.0,3.0,1.0,1.0,7.0,1.0,6.0,3.0,2.0,3.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,,,,,,,6.0,6.0,,,,,,,,,,,,,,,,,,,,,,5.0,,0.3594,2.1143,-1.228,1.1246,-0.6386,-0.3755,0.6547,1.069,0.7465,1.0916,0.1257,-0.2148,-0.16,-0.2869,0.4604,1.0945,0.1475,-0.0912,0.4201,-1.4793,-1.0831,0.4989,-0.1412,0.3386,-0.638,-1.3657,0.8181,-0.8493,1.0426,,0.403,0.6579,1.4432,-0.9483,-2.522,-2.6502,-2.0067,0.8716,0.6049,-0.6434,,0.5215,,,,,,,,,,0.4062,0.3346,1.5648,-0.0595,-0.5588,-0.9048,-0.8236,-0.7931,-0.5597,0.8135,0.5686,0.2405,,,,,,,,,,,,,,,,,,0.0,0.0,5.0,4.0,9.0,,6.0,,,1.0,2.0,1.0,3.0,2.0,2.0,2.0,40.0,0,0,1,0,0,1,0,0,0,1.0,4.0,4.0,4.0,1.0,1.0,4.0,3.0,2.0,1.0,1.0,2.0,1.0,1.0,90.0,10.0,2.0,2.0,1.0,1.0,66.8571,23.0,1.0,0,0,0,0,1.0,1.0,5.0,40.0,1.0,,1.0,,140.0,20.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,...,-0.7601,-0.5446,-0.3559,-0.8274,0.3056,0.6203,0.6299,0.2408,1.4277,-0.4283,-0.4293,-1.1437,0.1019,-0.2686,,0.4971,0.9309,1.4133,-1.4425,-1.5058,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
555565,0,80700060,80702001,,,,0,0,0,0,0,,4.0,,,,,,,,6.0,6.0,10.0,,1.0723,2.0,0.0,0.0,,,,,1.0,0.0,4.0,10.0,0.0,0,0,12.0,,,,,3.0,0.0,,,32.0,2.0,,,,,,,2.0,1.0,10.0,,,,,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.4967,-0.338,-1.228,1.1246,-0.6386,,,,,,,,0.6737,,,0.7029,,,,,,,,,,,,,,1.3753,,,,,,,,,,-0.1462,,0.3048,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,24.0,2.0,6.0,2.0,5.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,,0,1,0,0,1,0,0,1,0,,,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,2.0,1.0,1.0,,,1.0,1.0,1.0,1.0,,,5.0,0,0,0,1,3.0,,,,,,,,,,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,,1.068,1.0982,0.3828,-1.6916,,1.2478,,,,,2.4383,-0.6554,-0.2686,2.4315,0.4217,-1.6218,3.3255,0.8087,0.2816,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [26]:
test_data.head(5)

Unnamed: 0,MATH_Proficient,CNT,CNTSCHID,CNTSTUID,SISCO,ST347Q01JA,ST347Q02JA,ST349Q01JA_0,ST349Q01JA_1,ST349Q01JA_2,ST349Q01JA_3,ST349Q01JA_4,ST350Q01JA,ST356Q01JA,ST322Q01JA,ST322Q02JA,ST322Q03JA,ST322Q04JA,ST322Q06JA,ST322Q07JA,DURECEC,EFFORT1,EFFORT2,ST259Q01JA,WB164Q01HA,HOMEPOS,ST004D01T,GRADE,REPEAT,EXPECEDU,ICTAVSCH,ICTAVHOM,ICTDISTR,IMMIG,TARDYSD,ST226Q01JA,ST016Q01NA,MISSSC,Option_UH,OECD,PAREDINT,BMMJ1,BFMJ2,WB163Q06HA,WB163Q07HA,ST230Q01JA,SKIPPING,IC180Q01JA,IC180Q08JA,ST059Q02JA,ST296Q04JA,WB176Q01HA,STUDYHMW,IC184Q01JA,IC184Q02JA,IC184Q03JA,IC184Q04JA,ST059Q01TA,ST296Q01JA,ST272Q01JA,ST268Q01JA,ST268Q04JA,ST268Q07JA,ST293Q04JA,ST297Q01JA,ST297Q03JA,ST297Q05JA,ST297Q06JA,ST297Q07JA,ST297Q09JA,WB165Q01HA,WB166Q01HA,WB166Q02HA,WB166Q03HA,WB166Q04HA,ST258Q01JA,ST294Q01JA,ST295Q01JA,WB150Q01HA,WB156Q01HA,WB158Q01HA,WB160Q01HA,WB161Q01HA,WB171Q01HA,WB171Q02HA,WB171Q03HA,WB171Q04HA,WB172Q01HA,WB173Q01HA,WB173Q02HA,WB173Q03HA,WB173Q04HA,WB177Q01HA,WB177Q02HA,WB177Q03HA,WB177Q04HA,WB032Q01NA,WB032Q02NA,WB031Q01NA,EXERPRAC,STUBMI,RELATST,BELONG,BULLIED,FEELSAFE,SCHRISK,PERSEVAGR,CURIOAGR,COOPAGR,EMPATAGR,ASSERAGR,STRESAGR,EMOCOAGR,GROSAGR,INFOSEEK,FAMSUP,DISCLIM,TEACHSUP,COGACRCO,COGACMCO,EXPOFA,EXPO21ST,MATHEFF,MATHEF21,FAMCON,ANXMAT,MATHPERS,CREATEFF,CREATSCH,CREATFAM,CREATAS,CREATOOS,CREATOP,OPENART,IMAGINE,SCHSUST,LEARRES,PROBSELF,FAMSUPSL,FEELLAH,SDLEFF,ICTRES,ESCS,FLSCHOOL,FLMULTSB,FLFAMILY,ACCESSFP,FLCONFIN,FLCONICT,ACCESSFA,ATTCONFM,FRINFLFM,ICTSCH,ICTHOME,ICTQUAL,ICTSUBJ,ICTENQ,ICTFEED,ICTOUT,ICTWKDY,ICTWKEND,ICTREG,ICTINFO,ICTEFFIC,BODYIMA,SOCONPA,LIFESAT,PSYCHSYM,SOCCON,EXPWB,CURSUPP,PQMIMP,PQMCAR,PARINVOL,PQSCHOOL,PASCHPOL,ATTIMMP,CREATHME,CREATACT,CREATOPN,CREATOR,WORKPAY,WORKHOME,SC001Q01TA,SC211Q01JA,SC211Q02JA,SC211Q03JA,SC211Q04JA,SC211Q05JA,SC211Q06JA,SC209Q04JA,SC209Q05JA,SC209Q06JA,SC037Q11JA,SC183Q02JA,SC183Q03JA,SC183Q04JA,SC175Q01JA,SC177Q01JA_1,SC177Q01JA_2,SC177Q01JA_3,SC177Q02JA_1,SC177Q02JA_2,SC177Q02JA_3,SC177Q03JA_1,SC177Q03JA_2,SC177Q03JA_3,SC188Q01JA,SC188Q02JA,SC188Q03JA,SC188Q04JA,SC188Q05JA,SC188Q06JA,SC188Q07JA,SC188Q08JA,SC188Q09JA,SC188Q10JA,SC188Q11JA,SC198Q01JA,SC198Q02JA,SC198Q03JA,SC178Q01JA,SC178Q02JA,SC180Q01JA,SC189Q02WA,SC189Q03WA,SC189Q04WA,SMRATIO,MCLSIZE,MACTIV,MATHEXC_0,MATHEXC_1,MATHEXC_2,MATHEXC_3,ABGMATH,SC064Q05WA,SC064Q06WA,SC064Q01TA,SC064Q02TA,SC064Q04NA,SC064Q03TA,SC064Q07WA,SC213Q01JA,SC213Q02JA,SC037Q01TA,SC037Q02TA,SC037Q03TA,SC037Q04TA,SC037Q05NA,SC037Q06NA,...,DIGDVPOL,TEAFDBK,MTTRAIN,DMCVIEWS,NEGSCLIM,STAFFSHORT,EDUSHORT,STUBEHA,TEACHBEHA,STDTEST,TDTEST,ALLACTIV,BCREATSC,CREENVSC,ACTCRESC,OPENCUL,PROBSCRI,SCPREPBP,SCPREPAP,DIGPREP,LANGN_105,LANGN_108,LANGN_112,LANGN_113,LANGN_118,LANGN_121,LANGN_130,LANGN_133,LANGN_137,LANGN_140,LANGN_147,LANGN_148,LANGN_150,LANGN_154,LANGN_156,LANGN_160,LANGN_170,LANGN_195,LANGN_200,LANGN_202,LANGN_204,LANGN_232,LANGN_237,LANGN_244,LANGN_246,LANGN_254,LANGN_258,LANGN_263,LANGN_264,LANGN_266,LANGN_272,LANGN_273,LANGN_275,LANGN_286,LANGN_301,LANGN_313,LANGN_316,LANGN_317,LANGN_322,LANGN_325,LANGN_327,LANGN_329,LANGN_338,LANGN_340,LANGN_344,LANGN_351,LANGN_358,LANGN_363,LANGN_369,LANGN_371,LANGN_375,LANGN_379,LANGN_381,LANGN_382,LANGN_383,LANGN_404,LANGN_409,LANGN_415,LANGN_420,LANGN_422,LANGN_428,LANGN_434,LANGN_442,LANGN_449,LANGN_451,LANGN_463,LANGN_465,LANGN_467,LANGN_471,LANGN_472,LANGN_474,LANGN_492,LANGN_493,LANGN_494,LANGN_495,LANGN_496,LANGN_500,LANGN_503,LANGN_514,LANGN_517,LANGN_520,LANGN_523,LANGN_527,LANGN_529,LANGN_531,LANGN_540,LANGN_547,LANGN_555,LANGN_561,LANGN_562,LANGN_563,LANGN_565,LANGN_566,LANGN_567,LANGN_600,LANGN_601,LANGN_602,LANGN_605,LANGN_606,LANGN_607,LANGN_608,LANGN_611,LANGN_614,LANGN_615,LANGN_616,LANGN_618,LANGN_619,LANGN_621,LANGN_622,LANGN_623,LANGN_624,LANGN_625,LANGN_626,LANGN_627,LANGN_628,LANGN_630,LANGN_631,LANGN_634,LANGN_635,LANGN_639,LANGN_640,LANGN_641,LANGN_642,LANGN_648,LANGN_650,LANGN_661,LANGN_662,LANGN_663,LANGN_665,LANGN_666,LANGN_667,LANGN_668,LANGN_669,LANGN_670,LANGN_673,LANGN_674,LANGN_675,LANGN_676,LANGN_677,LANGN_678,LANGN_800,LANGN_801,LANGN_802,LANGN_804,LANGN_805,LANGN_806,LANGN_807,LANGN_808,LANGN_809,LANGN_810,LANGN_811,LANGN_812,LANGN_813,LANGN_814,LANGN_815,LANGN_816,LANGN_817,LANGN_818,LANGN_819,LANGN_821,LANGN_823,LANGN_824,LANGN_825,LANGN_826,LANGN_827,LANGN_828,LANGN_829,LANGN_831,LANGN_832,LANGN_833,LANGN_836,LANGN_837,LANGN_838,LANGN_839,LANGN_840,LANGN_841,LANGN_842,LANGN_843,LANGN_844,LANGN_845,LANGN_846,LANGN_849,LANGN_850,LANGN_851,LANGN_852,LANGN_854,LANGN_855,LANGN_857,LANGN_859,LANGN_860,LANGN_861,LANGN_865,LANGN_866,LANGN_868,LANGN_870,LANGN_872,LANGN_873,LANGN_877,LANGN_879,LANGN_881,LANGN_885,LANGN_890,LANGN_892,LANGN_895,LANGN_896,LANGN_897,LANGN_898,LANGN_899,LANGN_900,LANGN_901,LANGN_902,LANGN_903,LANGN_904,LANGN_905,LANGN_906,LANGN_907,LANGN_908,LANGN_909,LANGN_910,LANGN_911,LANGN_912,LANGN_913,LANGN_914,LANGN_916,LANGN_917,LANGN_918,LANGN_919,LANGN_920,LANGN_921,LANGN_922
110214,1,Colombia,17000028,17001050,1.0,5.0,1.0,0,1,0,0,0,1.0,3.0,5.0,1.0,5.0,,1.0,1.0,,9.0,9.0,4.0,,-0.0552,2.0,0.0,0.0,9.0,,,,1.0,1.0,1.0,4.0,0.0,0,1,12.0,,28.48,,,3.0,0.0,,,7.0,5.0,,9.0,,,,,4.0,4.0,4.0,1.0,1.0,3.0,5.0,0.0,0.0,0.0,0.0,0.0,1.0,,,,,,1.0,4.0,6.0,,,,,,,,,,,,,,,,,,,,,,0.0,,-0.7187,0.1426,-0.1325,-0.756,-0.6386,-1.4171,-0.6745,1.7516,,,,0.852,-0.0742,-0.3275,1.5442,-2.4425,1.5558,2.0852,1.2664,0.3579,-0.1194,-0.0276,-0.0316,0.8233,1.1606,-2.6006,1.3048,-1.2337,2.1897,0.056,0.105,0.3014,-0.2817,2.2091,-0.027,1.4315,0.4823,0.262,0.9437,0.2409,0.0325,-0.8705,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.1637,-0.4256,-1.2159,0.3164,-0.2402,1.4474,0.5488,-0.3099,-0.9143,0.3629,-0.9733,0.0,6.0,2.0,0.0,0.0,10.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,55.0,0,0,1,0,0,1,0,0,1,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,2.0,4.0,2.0,1.0,1.0,1.0,80.0,20.0,2.0,2.0,1.0,1.0,100.0,33.0,1.0,0,0,0,0,3.0,60.0,70.0,60.0,100.0,0.0,100.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.2275,1.0108,0.6955,0.2248,-1.1403,-1.4551,-1.4212,-1.4072,-2.0409,1.3356,0.8207,0.5027,1.4387,2.1631,0.8786,1.82,,,,0.7639,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
377739,1,Norway,57800095,57807535,1.0,,,0,0,0,0,0,,,5.0,1.0,5.0,4.0,,1.0,,7.0,9.0,,,0.7769,1.0,,,8.0,,,,1.0,0.0,1.0,,,0,1,16.0,,80.75,,,3.0,0.0,,,,1.0,,2.0,,,,,3.0,1.0,6.0,1.0,2.0,3.0,2.0,0.0,0.0,0.0,0.0,1.0,0.0,,,,,,,6.0,6.0,,,,,,,,,,,,,,,,,,,,,,5.0,,0.2316,0.5274,-1.228,1.1246,-1.8677,0.2365,-0.1866,0.183,,,,,-0.0144,-0.6436,-0.6831,-0.1219,-0.3322,-0.0496,0.1002,-0.2949,0.4113,-0.6921,0.5219,0.5229,0.1931,-0.4627,1.4964,-0.2924,2.1897,0.101,-0.8105,0.2185,-1.1804,,,,,,,,,1.1458,0.4019,1.0869,-0.4064,0.3895,0.0602,2.0781,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,5.0,3.0,5.0,16.0,15.0,4.0,30.0,2.0,1.0,1.0,1.0,2.0,2.0,1.0,2.0,60.0,0,0,0,0,0,0,0,0,0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,3.0,4.0,1.0,1.0,1.0,1.0,,,2.0,1.0,1.0,1.0,,,2.0,0,0,0,0,3.0,15.0,35.0,10.0,25.0,0.0,2.0,0.0,,,1.0,1.0,1.0,1.0,1.0,1.0,...,,0.3154,1.0982,,1.0196,0.592,0.8339,1.1806,-0.7955,0.8921,-0.4556,-0.2918,,,0.3246,,,,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
443428,1,Serbia,68800063,68806366,1.0,5.0,1.0,0,1,0,0,0,2.0,,,5.0,,2.0,1.0,2.0,4.0,,,9.0,,1.209,2.0,0.0,0.0,8.0,,,,1.0,2.0,4.0,7.0,0.0,0,0,16.0,74.66,74.66,,,2.0,1.0,,,34.0,1.0,,10.0,,,,,5.0,1.0,4.0,1.0,2.0,4.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,,,,,,1.0,6.0,6.0,,,,,,,,,,,,,,,,,,,,,,10.0,,-0.7135,0.0977,-0.5168,-1.2649,0.4456,-0.5744,2.6275,,-0.9228,0.3598,0.7155,0.0483,0.8823,0.7247,-0.3254,-0.9635,-1.0693,0.3384,-0.5235,1.008,0.1738,0.5278,0.6667,0.433,0.6566,-0.3059,2.1556,-1.1824,0.7957,-1.117,-0.8105,0.8626,1.8261,1.4457,1.3851,0.2239,0.7028,0.5361,1.5956,,,1.2139,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,0.0,4.0,3.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,45.0,0,0,0,0,0,0,0,0,0,,,4.0,4.0,1.0,1.0,3.0,3.0,3.0,3.0,1.0,2.0,1.0,1.0,100.0,0.0,1.0,1.0,1.0,1.0,43.2083,23.0,5.0,0,0,0,1,3.0,20.0,40.0,10.0,20.0,0.0,5.0,0.0,45.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.4425,0.1411,1.0982,-0.0242,-1.4018,-1.4551,,-1.1977,-2.0409,0.3979,-0.1757,,0.3621,-0.3292,0.4063,-0.8952,-1.6218,0.4388,-0.4247,0.788,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
481811,1,Spain,72400779,72415562,1.0,3.0,2.0,0,0,0,1,0,1.0,2.0,5.0,5.0,5.0,1.0,,4.0,2.0,,,7.0,3.0,-1.9237,1.0,0.0,0.0,8.0,5.0,6.0,4.0,1.0,0.0,2.0,6.0,,0,1,16.0,71.45,,1.0,1.0,4.0,0.0,2.0,1.0,,3.0,2.0,6.0,5.0,4.0,1.0,1.0,4.0,1.0,9.0,2.0,2.0,3.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,3.0,2.0,3.0,2.0,,1.0,6.0,2.0,4.0,3.0,2.0,3.0,2.0,1.0,1.0,2.0,1.0,1.0,4.0,1.0,3.0,2.0,3.0,4.0,2.0,6.0,4.0,2.0,4.0,16.44,-1.3742,-0.4375,-0.5168,,,-0.6428,-0.1364,0.1385,3.5114,-1.1377,-0.7449,-0.8534,-1.0364,,-0.2583,0.1765,-1.0693,0.3871,-0.7816,-0.8253,-0.9675,-0.0462,-0.5104,1.4718,1.4278,-0.1679,-0.5302,-1.0086,0.591,0.1366,0.2102,-0.5753,0.0056,,-0.8745,-0.5293,0.8904,0.34,,-0.5849,,-0.1736,-1.0699,,0.487,-0.3956,0.8366,0.4248,-0.1013,0.3973,0.7753,-1.9088,0.3346,-0.191,-0.5855,-0.2062,-0.5523,-0.1304,-0.9773,-0.3662,0.6683,0.8965,-0.2929,0.2455,0.7894,-1.0029,0.7989,-0.192,-1.9592,,,,,,,,,,,,0.0,2.0,3.0,1.0,31.0,35.0,1.0,7.0,0.0,1.0,2.0,1.0,,2.0,2.0,2.0,55.0,0,1,0,0,0,1,0,0,0,,4.0,,,,,,,,,,2.0,1.0,1.0,51.0,49.0,2.0,2.0,,,,48.0,1.0,0,0,0,0,1.0,20.0,20.0,10.0,100.0,0.0,100.0,0.0,42.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,,,-0.395,0.6267,,-2.0582,,-0.1603,-1.9767,-0.8172,0.5155,-0.7568,,,-0.2957,-0.2246,0.3285,1.5063,-0.1421,-0.648,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
26226,1,Australia,3600731,3611959,1.0,3.0,1.0,0,1,0,0,0,2.0,3.0,5.0,5.0,5.0,3.0,1.0,,1.0,8.0,9.0,,,1.7922,1.0,-1.0,0.0,4.0,7.0,6.0,10.0,1.0,0.0,2.0,,0.0,0,1,16.0,76.49,28.48,,,3.0,1.0,3.0,1.0,30.0,3.0,,2.0,3.0,1.0,1.0,1.0,4.0,2.0,,3.0,3.0,4.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,,,,,,,6.0,6.0,,,,,,,,,,,,,,,,,,,,,,5.0,,0.132,-0.2482,0.4749,0.6942,,0.2303,0.0279,0.8793,1.2989,0.416,-0.023,0.1138,-0.145,-1.8228,-0.614,0.6764,1.5558,-0.0324,-0.2543,1.128,0.8848,0.93,0.4395,0.6501,-0.508,0.1034,-0.2129,-0.1046,-0.0504,-0.0025,-0.1676,-0.4634,0.1775,-0.2065,0.0753,-0.0728,0.0996,-0.0989,0.5967,0.4581,1.3284,1.4954,,,,,,,,,,0.4062,0.3346,0.0498,0.5942,0.0991,0.128,-0.19,-0.6881,-0.0322,0.173,-1.0473,0.398,,,,,,,,,,,,,,,,,,2.0,10.0,3.0,2.0,10.0,18.0,2.0,5.0,0.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,50.0,1,0,0,1,0,0,0,0,0,,4.0,4.0,2.0,,2.0,2.0,3.0,2.0,3.0,1.0,1.0,1.0,2.0,,,2.0,1.0,1.0,1.0,100.0,28.0,1.0,0,0,0,0,3.0,5.0,5.0,4.0,66.0,2.0,1.0,1.0,50.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,,0.1564,1.0982,0.3387,0.0698,0.411,-1.4212,0.1795,1.0446,-1.6402,0.3009,0.3942,,1.0791,1.2896,0.0911,-0.264,0.0805,0.8462,0.8282,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [27]:
test_data = test_data.drop(["CNT"], axis=1)
test_features = test_data.drop(["MATH_Proficient"], axis=1)
test_target = test_data["MATH_Proficient"]
test_features.to_csv("test_features.csv", index=False, header=False)

In [28]:
from sagemaker import clarify
clarify_processor = clarify.SageMakerClarifyProcessor(
    role=role, instance_count=1, instance_type="ml.m5.2xlarge", sagemaker_session=session
)
model_config = clarify.ModelConfig(
    model_name=model_name,
    instance_type="ml.m5.2xlarge",
    instance_count=1,
    accept_type="text/csv",
    content_type="text/csv",
)

INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: 1.0.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


In [30]:
from sagemaker.s3 import S3Downloader
# Download data from S3 to local instance
local_path = S3Downloader.download('s3://{}/{}/train'.format(bucket, prefix), './tmp/train_data')

In [31]:
local_path

['./tmp/train_data/train.csv']

In [32]:
# Load and sample
full_data = pd.read_csv('./tmp/train_data/train.csv', header=None)
sampled_data = full_data.sample(n=1500)  # Adjust the sample size as needed

# Save sampled data back to S3
sampled_path = 'sampled_train_data.csv'
sampled_data.to_csv(sampled_path, index=False)

from sagemaker.s3 import S3Uploader
sampled_s3_uri = S3Uploader.upload(sampled_path, 's3://{}/{}/sampled_train'.format(bucket, prefix))

In [33]:
sampled_data.head(5)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,...,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,543,544,545,546,547,548,549,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,565,566,567,568
237938,0,60000115,60003704,1.0,1.0,1.0,0,0,0,0,0,,,,,,,,,0.0,10.0,10.0,5.0,,-1.0134,2.0,-1.0,0.0,9.0,,,,1.0,,4.0,6.0,1.0,0,0,16.0,16.5,71.39,,,4.0,1.0,,,55.0,6.0,,6.0,,,,,6.0,1.0,9.0,,,,1.0,0.0,0.0,0.0,0.0,1.0,0.0,,,,,,1.0,5.0,6.0,,,,,,,,,,,,,,,,,,,,,,7.0,,-1.6679,-0.3724,-1.228,1.1246,,,-0.0035,,,,,0.4567,,,2.1939,-0.66,0.8211,0.1242,,0.5211,0.9312,-0.4995,,,-2.457,0.5702,,,,,,,,,,,,,,,-1.1715,0.2108,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,8.0,10.0,1.0,80.0,60.0,80.0,1.0,0.0,0.0,1.0,2.0,2.0,1.0,2.0,2.0,2.0,40.0,1,0,0,0,0,1,0,0,1,1.0,1.0,4.0,3.0,1.0,1.0,3.0,1.0,4.0,4.0,1.0,2.0,2.0,1.0,70.0,30.0,2.0,2.0,1.0,1.0,45.5,13.0,0.0,0,0,0,0,3.0,60.0,20.0,10.0,5.0,0.0,80.0,80.0,320.0,,1.0,0.0,1.0,1.0,,1.0,1.0,...,-1.268,-1.9591,-0.1996,0.353,0.1278,1.455,0.0574,0.9917,-0.2901,0.4732,0.5814,-1.6856,-0.6554,-0.2686,-1.9258,0.4217,1.6287,-0.8314,-1.5778,1.0613,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
155368,0,39800298,39806799,,,1.0,0,0,1,0,0,1.0,4.0,1.0,1.0,5.0,1.0,,1.0,,8.0,10.0,10.0,,-0.2316,2.0,0.0,0.0,,3.0,5.0,,1.0,2.0,1.0,10.0,0.0,0,0,12.0,41.22,51.5,,,4.0,1.0,1.0,1.0,38.0,4.0,,6.0,1.0,1.0,1.0,1.0,4.0,2.0,6.0,2.0,2.0,2.0,2.0,0.0,0.0,0.0,0.0,0.0,1.0,,,,,,1.0,6.0,6.0,,,,,,,,,,,,,,,,,,,,,,10.0,,-1.6706,-0.47,-0.5168,0.1413,0.4437,,-0.0046,,,,,0.8561,0.7798,-2.1245,-0.0556,1.8372,-0.1002,-0.1216,0.3009,-1.2535,-0.7234,,-0.5935,,-0.3744,-0.2154,0.3369,-0.3713,-0.0504,-1.117,,0.0855,-0.0827,,-0.8239,-0.161,0.4751,-0.0283,-0.7658,-2.2824,,-0.5105,,,,,,,,,,-1.7775,-1.1402,-1.0009,-2.0101,-2.3763,-1.4165,-2.6018,-3.6359,,-2.672,-2.7256,-1.1559,,,,,,,,,,,,,,,,,,0.0,4.0,4.0,10.0,0.0,7.0,0.0,0.0,0.0,1.0,1.0,1.0,2.0,2.0,1.0,1.0,45.0,1,0,0,0,0,0,1,0,0,1.0,1.0,3.0,4.0,2.0,2.0,1.0,2.0,2.0,2.0,1.0,1.0,1.0,1.0,92.0,8.0,1.0,1.0,1.0,1.0,100.0,23.0,3.0,0,0,0,1,3.0,8.0,38.0,13.0,23.0,5.0,2.0,0.0,55.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.4793,-0.334,1.0982,-0.049,-1.6916,-1.1673,-0.2488,,-1.966,1.2671,1.5961,-1.4875,-1.9078,-0.2686,-2.0239,-0.1694,-1.6218,1.1005,,-0.1567,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
397501,1,75200221,75202217,1.0,2.0,1.0,0,0,0,0,1,1.0,,2.0,,5.0,3.0,1.0,2.0,5.0,7.0,10.0,,,1.6708,1.0,,0.0,9.0,5.0,6.0,1.0,2.0,0.0,,8.0,0.0,0,1,16.0,75.5,59.35,,,3.0,0.0,3.0,2.0,27.0,1.0,,2.0,3.0,1.0,1.0,1.0,5.0,1.0,7.0,2.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,,,,,,,6.0,6.0,,,,,,,,,,,,,,,,,,,,,,4.0,,0.41,0.6428,0.6302,-0.039,0.7868,,,,,,,,0.2859,1.31,0.3492,0.6764,0.8211,0.209,0.4572,0.9129,1.0661,0.8136,1.0031,0.9111,-1.6297,1.3636,,,,,,,,,0.0268,,,,,,0.9862,1.4253,,,,,,,,,,-1.3717,0.3346,1.4913,0.3747,0.7371,2.942,0.2081,-0.8141,-0.8195,-0.1114,0.4191,-0.6177,,,,,,,,,,,,,,,,,,0.0,2.0,,,,,,,,1.0,1.0,1.0,2.0,2.0,2.0,1.0,50.0,0,0,1,0,0,1,0,0,1,4.0,4.0,4.0,4.0,1.0,1.0,2.0,4.0,3.0,3.0,1.0,2.0,2.0,1.0,90.0,10.0,1.0,1.0,2.0,1.0,44.375,18.0,2.0,0,0,0,1,3.0,9.0,29.0,22.0,27.0,,,,25.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,,-1.5866,1.0164,-0.0384,-0.4666,-1.4551,-1.4212,0.4614,-1.2356,0.1023,0.3413,-0.6901,,,,,,1.4801,,-3.1832,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
254077,1,15800096,15807101,1.0,1.0,1.0,0,0,0,0,0,,,1.0,1.0,3.0,1.0,1.0,,3.0,6.0,8.0,5.0,,-1.0155,2.0,0.0,0.0,7.0,7.0,6.0,8.0,1.0,0.0,4.0,8.0,0.0,0,0,12.0,17.0,25.26,,,4.0,1.0,3.0,1.0,35.0,1.0,,0.0,4.0,1.0,1.0,1.0,4.0,1.0,5.0,2.0,2.0,1.0,,0.0,0.0,0.0,0.0,0.0,1.0,,,,,,2.0,1.0,6.0,,,,,,,,,,,,,,,,,,,,,,10.0,,-0.0722,0.1686,-1.228,-0.756,-0.6386,-0.9702,0.5711,0.5429,0.3417,0.6068,-0.4405,0.3948,0.7798,-0.0773,-0.9008,-1.007,-1.3627,-2.4977,-1.8818,0.1064,0.6634,0.8748,-1.0765,,-1.2286,-2.7876,0.7363,-0.1166,-0.4924,0.2929,,0.2712,0.1775,0.464,,,,,,,-0.4681,-1.338,,,,,,,,,,0.4062,0.3346,-0.6217,,-0.7651,0.1528,-0.7869,-0.1099,0.441,-1.0564,0.4191,-0.468,,,,,,,,,,,,,,,,,,6.0,3.0,4.0,0.0,10.0,6.0,0.0,0.0,0.0,1.0,2.0,1.0,2.0,2.0,1.0,2.0,50.0,0,0,1,0,0,1,0,0,1,,,4.0,4.0,4.0,,4.0,3.0,3.0,3.0,3.0,2.0,2.0,1.0,52.0,48.0,2.0,2.0,1.0,1.0,100.0,33.0,2.0,0,0,0,0,2.0,11.0,11.0,12.0,14.0,14.0,15.0,7.0,101.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.4765,0.9284,0.0585,,0.7111,,1.4653,1.4092,0.585,-0.522,-1.151,0.7408,-0.8312,-0.2686,0.6839,-0.5428,0.9235,0.4563,,0.99,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
105072,0,35200115,35200093,,,,0,0,0,0,0,,,5.0,5.0,5.0,2.0,,3.0,3.0,8.0,10.0,7.0,,-1.711,1.0,,0.0,,5.0,5.0,,1.0,0.0,1.0,7.0,0.0,1,1,14.5,55.25,29.16,,,,0.0,,,36.0,5.0,,6.0,1.0,1.0,1.0,1.0,7.0,3.0,9.0,4.0,3.0,4.0,3.0,0.0,0.0,0.0,0.0,0.0,1.0,,,,,,1.0,2.0,6.0,,,,,,,,,,,,,,,,,,,,,,8.0,,1.2727,-0.3397,0.2988,0.8637,-0.6386,1.6981,0.2753,0.5146,-0.0911,-0.0046,-0.3503,-0.1858,-1.3386,,,-1.1901,0.8211,-0.0973,0.5489,-0.0612,-0.3316,-0.191,-0.8185,0.6804,0.7774,0.6355,-0.7152,-0.3796,-0.0065,-1.117,-0.804,-1.2188,,,,,,,,,,-0.6432,,,,,,,,,,0.1884,-1.5285,-0.6613,0.2271,-1.7331,0.0673,0.4435,-0.2455,0.151,-0.9342,,,,,,,,,,,,,,,,,,,,0.0,8.0,2.0,15.0,20.0,20.0,10.0,10.0,1.0,1.0,2.0,1.0,2.0,2.0,1.0,1.0,40.0,0,0,1,0,0,1,0,0,1,1.0,,4.0,4.0,1.0,,3.0,4.0,4.0,3.0,1.0,2.0,2.0,1.0,30.0,70.0,2.0,1.0,2.0,2.0,21.3636,13.0,1.0,0,0,0,0,3.0,15.0,70.0,15.0,97.0,0.0,4.0,0.0,6.0,11.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,,-1.013,0.0464,-0.1831,1.33,-1.4551,0.1423,0.4147,0.6015,0.3223,-0.5926,0.1042,1.4387,-0.8885,0.465,0.4217,0.6141,-0.8314,-2.1057,0.2816,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [34]:
shap_config = clarify.SHAPConfig(
    baseline=[test_features.iloc[0].values.tolist()],
    num_samples=3000,
    agg_method="mean_abs",
    save_local_shap_values=True
)

explainability_output_path = "s3://{}/{}/clarify-explainability".format(bucket, prefix)
explainability_data_config = clarify.DataConfig(
    #s3_data_input_path='s3://{}/{}/train'.format(bucket, prefix),
    s3_data_input_path=sampled_s3_uri,
    s3_output_path=explainability_output_path,
    label='MATH_Proficient',
    headers=train_data.columns.to_list(),
    dataset_type="text/csv",
)

In [35]:
clarify_processor.run_explainability(
    data_config=explainability_data_config,
    model_config=model_config,
    explainability_config=shap_config
)

INFO:sagemaker.clarify:Analysis Config: {'dataset_type': 'text/csv', 'headers': ['MATH_Proficient', 'CNTSCHID', 'CNTSTUID', 'SISCO', 'ST347Q01JA', 'ST347Q02JA', 'ST349Q01JA_0', 'ST349Q01JA_1', 'ST349Q01JA_2', 'ST349Q01JA_3', 'ST349Q01JA_4', 'ST350Q01JA', 'ST356Q01JA', 'ST322Q01JA', 'ST322Q02JA', 'ST322Q03JA', 'ST322Q04JA', 'ST322Q06JA', 'ST322Q07JA', 'DURECEC', 'EFFORT1', 'EFFORT2', 'ST259Q01JA', 'WB164Q01HA', 'HOMEPOS', 'ST004D01T', 'GRADE', 'REPEAT', 'EXPECEDU', 'ICTAVSCH', 'ICTAVHOM', 'ICTDISTR', 'IMMIG', 'TARDYSD', 'ST226Q01JA', 'ST016Q01NA', 'MISSSC', 'Option_UH', 'OECD', 'PAREDINT', 'BMMJ1', 'BFMJ2', 'WB163Q06HA', 'WB163Q07HA', 'ST230Q01JA', 'SKIPPING', 'IC180Q01JA', 'IC180Q08JA', 'ST059Q02JA', 'ST296Q04JA', 'WB176Q01HA', 'STUDYHMW', 'IC184Q01JA', 'IC184Q02JA', 'IC184Q03JA', 'IC184Q04JA', 'ST059Q01TA', 'ST296Q01JA', 'ST272Q01JA', 'ST268Q01JA', 'ST268Q04JA', 'ST268Q07JA', 'ST293Q04JA', 'ST297Q01JA', 'ST297Q03JA', 'ST297Q05JA', 'ST297Q06JA', 'ST297Q07JA', 'ST297Q09JA', 'WB165Q01HA'

................[34msagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml[0m
[34msagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml[0m
[34mWe are not in a supported iso region, /bin/sh exiting gracefully with no changes.[0m
[34mINFO:sagemaker-clarify-processing:Starting SageMaker Clarify Processing job[0m
[34mINFO:analyzer.data_loading.data_loader_util:Analysis config path: /opt/ml/processing/input/config/analysis_config.json[0m
[34mINFO:analyzer.data_loading.data_loader_util:Analysis result path: /opt/ml/processing/output[0m
[34mINFO:analyzer.data_loading.data_loader_util:This host is algo-1.[0m
[34mINFO:analyzer.data_loading.data_loader_util:This host is the leader.[0m
[34mINFO:analyzer.data_loading.data_loader_util:Number of hosts in the cluster is 1.[0m
[34mINFO:sagemaker-clarify-processing:Running Python / Pandas based analyzer.[0m
[34mINFO:analyzer.data_loading.data_lo