# Working with Numeric Data

In [1]:
import pandas as pd

## Sample data representing student data and exam scores

In [7]:
exam_data = pd.read_csv('data/exams.csv', quotechar='"')
exam_data.head()

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
0,male,group E,associate's degree,standard,completed,79,75,81
1,female,group C,associate's degree,free/reduced,none,56,65,64
2,male,group D,bachelor's degree,standard,none,86,68,74
3,female,group A,bachelor's degree,standard,none,68,78,76
4,female,group D,high school,free/reduced,none,49,68,61


## Check out average score for each exam

In [3]:
math_average = exam_data['math score'].mean()
reading_average = exam_data['reading score'].mean()
writing_average = average = exam_data['writing score'].mean()

print('Math Avg: ', math_average)
print('Reading Avg: ', reading_average)
print('Writing Avg: ', writing_average)

Math Avg:  65.06
Reading Avg:  67.28
Writing Avg:  66.47


## Data Standardization:

`StandardScaler()` Apply scaling on the test scores to express them in terms of <b>z-score</b>. Z-score is the expression of a value in terms of the number of standard deviations from the mean. The effect is to give a score which is relative to the the distribution of values for that column. z = (x - u) / s ~ N(0,1)

`scale()` Center to the mean and component wise scale to unit variance. z = (x-u)/s ~ N(0,1)

it's exactly the sameeee

In [5]:
from sklearn import preprocessing
'''
scaler = preprocessing.StandardScaler().fit(X_train)
X_train_scaled = scaler.transform(X_train) 
X_test_scaled = scaler.transform(X_test) 
'''
exam_data[['math score']] = preprocessing.scale(exam_data[['math score']])
exam_data[['reading score']] = preprocessing.scale(exam_data[['reading score']])
exam_data[['writing score']] = preprocessing.scale(exam_data[['writing score']])
exam_data.head()

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
0,male,group E,associate's degree,standard,completed,0.994557,0.574138,1.049901
1,female,group C,associate's degree,free/reduced,none,-0.646391,-0.169564,-0.178476
2,male,group D,bachelor's degree,standard,none,1.493976,0.053547,0.544099
3,female,group A,bachelor's degree,standard,none,0.209756,0.797248,0.688613
4,female,group D,high school,free/reduced,none,-1.14581,0.053547,-0.395248


## Explore averages after scaling

In [6]:
math_average = exam_data['math score'].mean() # mean is 0
reading_average = exam_data['reading score'].mean()
writing_average = average = exam_data['writing score'].mean()

print('Math Avg: ', math_average)
print('Reading Avg: ', reading_average)
print('Writing Avg: ', writing_average)

Math Avg:  -2.1510571102112408e-18
Reading Avg:  -1.1102230246251566e-17
Writing Avg:  -2.6645352591003756e-17
