# T81-558: Applications of Deep Neural Networks
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), School of Engineering and Applied Science, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

**Module 2 Assignment: Creating Columns in Pandas**

**Student Name: Julia Huang**

# Assignment Instructions

For this assignment you will use the **reg-33-data.csv** dataset.  This is a dataset that I generated specifically for this class.  You can find the CSV file on my data site, at this location: [reg-33-data.csv](http://data.heatonresearch.com/data/t81-558/datasets/reg-33-data.csv).

For this assignment, load and modify the data set.  You will submit this modified dataset to the **submit** function.  See [Assignment #1](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/assignments/assignment_yourname_class1.ipynb) for details on how to submit an assignment or check that one was submitted.

Modify the dataset as follows:

* Add a column named *ratio* that is *max* divided by *number*.  Leave *max* and *number* in the dataframe.
* Replace the *cat2* column with dummy variables. e.g. 'cat2_CA-0', 'cat2_CA-1',
       'cat2_CA-10', 'cat2_CA-11', 'cat2_CA-12', ...
* Replace the *item* column with dummy variables, e.g. 'item_IT-0', 'item_IT-1',
       'item_IT-10', 'item_IT-11', 'item_IT-12', ...
* For field *length* replace missing values with the median of *length*.
* For field *height* remove missing (with median) and convert to zscore.
* Remove all other columns.
* Your submitted dataframe will have these columns: 'height', 'max', 'number', 'length', 'ratio', 'cat2_CA-0', 'cat2_CA-1',
       'cat2_CA-10', 'cat2_CA-11', 'cat2_CA-12', 'cat2_CA-13', 'cat2_CA-14',
       'cat2_CA-15', 'cat2_CA-16', 'cat2_CA-17', 'cat2_CA-18', 'cat2_CA-19',
       'cat2_CA-1A', 'cat2_CA-1B', 'cat2_CA-1C', 'cat2_CA-1D', 'cat2_CA-1E',
       'cat2_CA-1F', 'cat2_CA-2', 'cat2_CA-20', 'cat2_CA-21', 'cat2_CA-22',
       'cat2_CA-23', 'cat2_CA-24', 'cat2_CA-25', 'cat2_CA-26', 'cat2_CA-27',
       'cat2_CA-3', 'cat2_CA-4', 'cat2_CA-5', 'cat2_CA-6', 'cat2_CA-7',
       'cat2_CA-8', 'cat2_CA-9', 'cat2_CA-A', 'cat2_CA-B', 'cat2_CA-C',
       'cat2_CA-D', 'cat2_CA-E', 'cat2_CA-F', 'item_IT-0', 'item_IT-1',
       'item_IT-10', 'item_IT-11', 'item_IT-12', 'item_IT-13', 'item_IT-14',
       'item_IT-15', 'item_IT-16', 'item_IT-17', 'item_IT-18', 'item_IT-19',
       'item_IT-1A', 'item_IT-1B', 'item_IT-1C', 'item_IT-1D', 'item_IT-1E',
       'item_IT-2', 'item_IT-3', 'item_IT-4', 'item_IT-5', 'item_IT-6',
       'item_IT-7', 'item_IT-8', 'item_IT-9', 'item_IT-A', 'item_IT-B',
       'item_IT-C', 'item_IT-D', 'item_IT-E', 'item_IT-F'.

# Assignment Submit Function

You will submit the 10 programming assignments electronically.  The following submit function can be used to do this.  My server will perform a basic check of each assignment and let you know if it sees any basic problems. 

**It is unlikely that should need to modify this function.**

In [0]:
import base64
import os
import numpy as np
import pandas as pd
import requests

# This function submits an assignment.  You can submit an assignment as much as you like, only the final
# submission counts.  The paramaters are as follows:
# data - Pandas dataframe output.
# key - Your student key that was emailed to you.
# no - The assignment class number, should be 1 through 1.
# source_file - The full path to your Python or IPYNB file.  This must have "_class1" as part of its name.  
# .             The number must match your assignment number.  For example "_class2" for class assignment #2.
def submit(data,key,no,source_file=None):
    if source_file is None and '__file__' not in globals(): raise Exception('Must specify a filename when a Jupyter notebook.')
    if source_file is None: source_file = __file__
    suffix = '_class{}'.format(no)
    if suffix not in source_file: raise Exception('{} must be part of the filename.'.format(suffix))
    with open(source_file, "rb") as image_file:
        encoded_python = base64.b64encode(image_file.read()).decode('ascii')
    ext = os.path.splitext(source_file)[-1].lower()
    if ext not in ['.ipynb','.py']: raise Exception("Source file is {} must be .py or .ipynb".format(ext))
    r = requests.post("https://api.heatonresearch.com/assignment-submit",
        headers={'x-api-key':key}, json={'csv':base64.b64encode(data.to_csv(index=False).encode('ascii')).decode("ascii"),
        'assignment': no, 'ext':ext, 'py':encoded_python})
    if r.status_code == 200:
        print("Success: {}".format(r.text))
    else: print("Failure: {}".format(r.text))

# Google CoLab Instructions

If you are using Google CoLab, it will be necessary to mount your GDrive so that you can send your notebook during the submit process.  Running the following code will map your GDrive to /content/drive.

In [0]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
!ls /content/drive/My\ Drive/Colab\ Notebooks

# Assignment #2 Sample Code

The following code provides a starting point for this assignment.

In [0]:
import os
import pandas as pd
from scipy.stats import zscore

# This is your student key that I emailed to you at the beginnning of the semester.
key = "Yg3Uc8sn118A6HaWAFSKG5g1Th1nOyw34jLD5Uh8"  # This is an example key and will not work.

# You must also identify your source file.  (modify for your local setup)
file='/content/drive/My Drive/Colab Notebooks/assignment_jhuang_class2.ipynb'  # Google CoLab
#file='/Users/jheaton/projects/t81_558_deep_learning/assignments/assignment_jhuang_class2.ipynb'  # Mac/Linux
# file = "C:\\Users\\jeffh\\Dropbox\\school\\teaching\\wustl\\classes\\T81_558_deep_learning\\solutions\\assignment_solution_class2.ipynb" # Windows

# Begin assignment
df = pd.read_csv("http://data.heatonresearch.com/data/t81-558/datasets/reg-33-data.csv")

df.drop('id',1,inplace=True)

# Your code goes here!
df['ratio'] = df['max'] / df['number']

dummies_1 = pd.get_dummies(['CA-0','CA-1','CA-10','CA-11','CA-12','CA-13','CA-14','CA-15','CA-16','CA-17','CA-18','CA-19','CA-1A','CA-1B','CA-1C','CA-1D','CA-1E','CA-1F','CA-2','CA-20','CA-21','CA-22','CA-23','CA-24','CA-25','CA-26','CA-27','CA-3','CA-4','CA-5','CA-6','CA-7','CA-8','CA-9','CA-A','CA-B','CA-C','CA-D','CA-E','CA-F'], prefix = 'cat2')
dummies_1 = pd.get_dummies(df['cat2'], prefix = 'cat2')
df = pd.concat([df,dummies_1],axis=1)

dummies_2 = pd.get_dummies(['CA-10','CA-11','CA-12','CA-13','CA-14','CA-15','CA-16','CA-17','CA-18','CA-19','CA-1A','CA-1B','CA-1C','CA-1D','CA-1E','CA-2','CA-3','CA-4','5','6','7','8','9','A','B','C','D','E','F'], prefix = 'item_IT')
dummies_2 = pd.get_dummies(df['item'], prefix = 'item')
df = pd.concat([df,dummies_2],axis=1)

med_1 = df['length'].median()
df['length'] = df['length'].fillna(med_1)

med_2 = df['length'].median()
df['height'] = df['height'].fillna(med_2)
df['height'] = zscore(df['height'])

df.drop(['convention','cat2','item','usage','region','code','power','weight','country','target'], axis = 1, inplace = True)
# Submit assignment
submit(source_file=file,data=df,key=key,no=2)

Success: Submitted Assignment #2 for hjulia:
This is your first submission of this assignment.



In [0]:
from scipy.stats import zscore
df = pd.read_csv("http://data.heatonresearch.com/data/t81-558/datasets/reg-33-data.csv")

df.drop('id',1,inplace=True)
df['ratio'] = df['max'] / df['number']

dummies_1 = pd.get_dummies(['CA-E'], prefix = 'cat2')
#dummies_1 = pd.get_dummies(['CA-0','CA-1','CA-10','CA-11','CA-12','CA-13','CA-14','CA-15','CA-16','CA-17','CA-18','CA-19','CA-1A','CA-1B','CA-1C','CA-1D','CA-1E','CA-1F','CA-2','CA-20','CA-21','CA-22','CA-23','CA-24','CA-25','CA-26','CA-27','CA-3','CA-4','CA-5','CA-6','CA-7','CA-8','CA-9','CA-A','CA-B','CA-C','CA-D','CA-E','CA-F'], prefix = 'cat2')
dummies_1 = pd.get_dummies(df['cat2'], prefix = 'cat2')
df = pd.concat([df,dummies_1],axis=1)
dummies_2 = pd.get_dummies(['CA-10','CA-11','CA-12','CA-13','CA-14','CA-15','CA-16','CA-17','CA-18','CA-19','CA-1A','CA-1B','CA-1C','CA-1D','CA-1E','CA-2','CA-3','CA-4','5','6','7','8','9','A','B','C','D','E','F'], prefix = 'item_IT')
dummies_2 = pd.get_dummies(df['item'], prefix = 'item')
df = pd.concat([df,dummies_2],axis=1)
med_1 = df['length'].median()
df['length'] = df['length'].fillna(med_1)
med_2 = df['length'].median()
df['height'] = df['height'].fillna(med_2)
df['height'] = zscore(df['height'])
df.drop(['convention','cat2','item','usage','region','code','power','weight','country','target'], axis = 1, inplace = True)
df.head(61)


Unnamed: 0,height,max,number,length,ratio,cat2_CA-0,cat2_CA-1,cat2_CA-10,cat2_CA-11,cat2_CA-12,cat2_CA-13,cat2_CA-14,cat2_CA-15,cat2_CA-16,cat2_CA-17,cat2_CA-18,cat2_CA-19,cat2_CA-1A,cat2_CA-1B,cat2_CA-1C,cat2_CA-1D,cat2_CA-1E,cat2_CA-1F,cat2_CA-2,cat2_CA-20,cat2_CA-21,cat2_CA-22,cat2_CA-23,cat2_CA-24,cat2_CA-25,cat2_CA-26,cat2_CA-27,cat2_CA-3,cat2_CA-4,cat2_CA-5,cat2_CA-6,cat2_CA-7,cat2_CA-8,cat2_CA-9,cat2_CA-A,cat2_CA-B,cat2_CA-C,cat2_CA-D,cat2_CA-E,cat2_CA-F,item_IT-0,item_IT-1,item_IT-10,item_IT-11,item_IT-12,item_IT-13,item_IT-14,item_IT-15,item_IT-16,item_IT-17,item_IT-18,item_IT-19,item_IT-1A,item_IT-1B,item_IT-1C,item_IT-1D,item_IT-1E,item_IT-2,item_IT-3,item_IT-4,item_IT-5,item_IT-6,item_IT-7,item_IT-8,item_IT-9,item_IT-A,item_IT-B,item_IT-C,item_IT-D,item_IT-E,item_IT-F
0,-0.213449,44907,16669,12471.11270,2.694043,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,-1.044834,48831,8652,10035.70850,5.643897,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,-0.554050,40760,23103,14442.65660,1.764273,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0.154560,33597,17680,15121.49370,1.900283,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
4,1.995044,29848,24136,18093.91470,1.236659,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
5,-0.752334,46624,14122,13522.49225,3.301515,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
6,0.388953,41022,21974,19150.63430,1.866843,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
7,1.995044,32791,24311,16803.25720,1.348813,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
8,0.164498,64563,23250,17532.00300,2.776903,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
9,0.104794,50865,13579,14460.49020,3.745858,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
