# T81-558: Applications of Deep Neural Networks
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), School of Engineering and Applied Science, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

**Module 1 Assignment: How to Submit an Assignment**

**Student Name: Your Name**

# Assignment Instructions

Assignments are submitted using the **submit** function that is defined earlier in this document.  When you submit an assignment, you are sending both your source code and data.  Your data will automatically be checked and you will be informed
of how closely your data matches up with my solution file.  You are allowed to submit as many times as you like, so if you see some issues with your first submission, you are allowed to make changes and resubmit.  You may resubmit as many times as you like, only your final submission will be counted towards your grade.

When you first signed up for the course you were emailed a student key.  You can see a sample key below.  If you use this key, you will get an error.  It is not a current key.  **Use YOUR student key, that I provided in email.**

You must also provide a filename and assignment number.  The filename is simply your source file that you wish to submit.  Your data is a Pandas dataframe. 

**Assignment 1 is very easy!** To complete assignment one, all you have to do is add your student key and make sure that the **file** variable contains the path to your source file.  Your source file will most likly end in **.pynb** if you are using a juputer notebook; however, it might also end in **.py** if you are simply using a Python script.

Run the code below, and if you are successful, you should see something similar to:

```
Success: Submitted assignment 1 for jheaton:
You have submitted this assignment 2 times. (this is fine)
No warnings on your data. You will probably do well, but no guarantee. :-)
```

If there is an issue with your data, you will get a warning.


**Common Problem #1: Bad student key**

If you use an invalid student key, you will see:

```
Failure: {"message":"Forbidden"}
```

You should also make sure that **_class#** appears somewhere in your filename. For example, for assignment 1, you should have **_class1** somewhere in your filename. If not, you will get an error.  This is a check to make sure you do not submit the wrong assignment, with the wrong file.  If you do have a mismatch, you will get an error such as:


**Common Problem #2: Must have class1 (or other number) as part of the filename**
```
Exception: _class1 must be part of the filename.
```

The following video covers assignment submission: [assignment submission video](http://www.yahoo.com).

**Common Problem #3: Can't find source file**

You might get an error similar to this:

```
FileNotFoundError: [Errno 2] No such file or directory: '/Users/jeffh/projects/t81_558_deep_learning/t81_558_class1_intro_python.ipynb'
```

This means your **file** path is wrong.  Make sure the path matches where your file actually is at.  See my hints below in the comments for paths in different environments.

**Common Problem #4: ??? **

If you run into a problem not listed here, just let me know.

# Helpful Functions

You will see these at the top of every module and assignment.  These are simply a set of reusable functions that we will make use of.  Each of them will be explained as the semester progresses.  They are explained in greater detail as the course progresses.  Class 4 contains a complete overview of these functions.

In [1]:
import base64
import os

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import requests
from sklearn import preprocessing


# Encode text values to dummy variables(i.e. [1,0,0],[0,1,0],[0,0,1] for red,green,blue)
def encode_text_dummy(df, name):
    dummies = pd.get_dummies(df[name])
    for x in dummies.columns:
        dummy_name = f"{name}-{x}"
        df[dummy_name] = dummies[x]
    df.drop(name, axis=1, inplace=True)


# Encode text values to a single dummy variable.  The new columns (which do not replace the old) will have a 1
# at every location where the original column (name) matches each of the target_values.  One column is added for
# each target value.
def encode_text_single_dummy(df, name, target_values):
    for tv in target_values:
        l = list(df[name].astype(str))
        l = [1 if str(x) == str(tv) else 0 for x in l]
        name2 = f"{name}-{tv}"
        df[name2] = l


# Encode text values to indexes(i.e. [1],[2],[3] for red,green,blue).
def encode_text_index(df, name):
    le = preprocessing.LabelEncoder()
    df[name] = le.fit_transform(df[name])
    return le.classes_


# Encode a numeric column as zscores
def encode_numeric_zscore(df, name, mean=None, sd=None):
    if mean is None:
        mean = df[name].mean()

    if sd is None:
        sd = df[name].std()

    df[name] = (df[name] - mean) / sd


# Convert all missing values in the specified column to the median
def missing_median(df, name):
    med = df[name].median()
    df[name] = df[name].fillna(med)


# Convert all missing values in the specified column to the default
def missing_default(df, name, default_value):
    df[name] = df[name].fillna(default_value)


# Convert a Pandas dataframe to the x,y inputs that TensorFlow needs
def to_xy(df, target):
    result = []
    for x in df.columns:
        if x != target:
            result.append(x)
    # find out the type of the target column.  Is it really this hard? :(
    target_type = df[target].dtypes
    target_type = target_type[0] if hasattr(
        target_type, '__iter__') else target_type
    # Encode to int for classification, float otherwise. TensorFlow likes 32 bits.
    if target_type in (np.int64, np.int32):
        # Classification
        dummies = pd.get_dummies(df[target])
        return df.as_matrix(result).astype(np.float32), dummies.as_matrix().astype(np.float32)
    # Regression
    return df.as_matrix(result).astype(np.float32), df.as_matrix([target]).astype(np.float32)

# Nicely formatted time string
def hms_string(sec_elapsed):
    h = int(sec_elapsed / (60 * 60))
    m = int((sec_elapsed % (60 * 60)) / 60)
    s = sec_elapsed % 60
    return f"{h}:{m:>02}:{s:>05.2f}"


# Regression chart.
def chart_regression(pred, y, sort=True):
    t = pd.DataFrame({'pred': pred, 'y': y.flatten()})
    if sort:
        t.sort_values(by=['y'], inplace=True)
    plt.plot(t['y'].tolist(), label='expected')
    plt.plot(t['pred'].tolist(), label='prediction')
    plt.ylabel('output')
    plt.legend()
    plt.show()

# Remove all rows where the specified column is +/- sd standard deviations
def remove_outliers(df, name, sd):
    drop_rows = df.index[(np.abs(df[name] - df[name].mean())
                          >= (sd * df[name].std()))]
    df.drop(drop_rows, axis=0, inplace=True)


# Encode a column to a range between normalized_low and normalized_high.
def encode_numeric_range(df, name, normalized_low=-1, normalized_high=1,
                         data_low=None, data_high=None):
    if data_low is None:
        data_low = min(df[name])
        data_high = max(df[name])

    df[name] = ((df[name] - data_low) / (data_high - data_low)) \
        * (normalized_high - normalized_low) + normalized_low


# This function submits an assignment.  You can submit an assignment as much as you like, only the final
# submission counts.  The paramaters are as follows:
# data - Pandas dataframe output.
# key - Your student key that was emailed to you.
# no - The assignment class number, should be 1 through 1.
# source_file - The full path to your Python or IPYNB file.  This must have "_class1" as part of its name.  
# .             The number must match your assignment number.  For example "_class2" for class assignment #2.
def submit(data,key,no,source_file=None):
    if source_file is None and '__file__' not in globals(): raise Exception('Must specify a filename when a Jupyter notebook.')
    if source_file is None: source_file = __file__
    suffix = '_class{}'.format(no)
    if suffix not in source_file: raise Exception('{} must be part of the filename.'.format(suffix))
    with open(source_file, "rb") as image_file:
        encoded_python = base64.b64encode(image_file.read()).decode('ascii')
    ext = os.path.splitext(source_file)[-1].lower()
    if ext not in ['.ipynb','.py']: raise Exception("Source file is {} must be .py or .ipynb".format(ext))
    r = requests.post("https://api.heatonresearch.com/assignment-submit",
        headers={'x-api-key':key}, json={'csv':base64.b64encode(data.to_csv(index=False).encode('ascii')).decode("ascii"),
        'assignment': no, 'ext':ext, 'py':encoded_python})
    if r.status_code == 200:
        print("Success: {}".format(r.text))
    else: print("Failure: {}".format(r.text))

# Assignment #1 Sample Code

For assignment #1, you only must change two things.  The key must be modified to be your key and the file path most be modified to be the path to your local file.  Once you have that, just run it and your assignment is submitted.

In [4]:
# This is your student key that I emailed to you at the beginnning of the semester.
key = "1uiMs0QRy5wN9nZ7euvc3dreAEwQQ0w3H0P7qzc7"  # This is an example key and will not work.

# You must also identify your source file.  (modify for your local setup)
# file='/resources/t81_558_deep_learning/assignment_yourname_class1.ipynb'  # IBM Data Science Workbench
# file='C:\\Users\\jeffh\\projects\\t81_558_deep_learning\\t81_558_class1_intro_python.ipynb'  # Windows
file='/Users/jheaton/projects/t81_558_deep_learning/assignments/assignment_yourname_class1.ipynb'  # Mac/Linux

df = pd.DataFrame({'a' : [0, 0, 1, 1], 'b' : [0, 1, 0, 1], 'c' : [0, 1, 1, 0]})

submit(source_file=file,data=df,key=key,no=1)

Success: Submitted Assignment #1 for heaton-jeff:
You have submitted this assignment 2 times. (this is fine)



# Checking Your Submission

You can always double check to make sure your submission actually happened.  The following utility code will help with that.

In [6]:
import requests
import pandas as pd
import base64
import os

def list_submits(key):
    r = requests.post("https://api.heatonresearch.com/assignment-submit",
                      headers={'x-api-key': key},
                      json={})
    if r.status_code == 200:
        print("Success: \n{}".format(r.text))
    else:
        print("Failure: {}".format(r.text))

def display_submit(key,no):
    r = requests.post("https://api.heatonresearch.com/assignment-submit",
                      headers={'x-api-key': key},
                      json={'assignment':no})
    if r.status_code == 200:
        print("Success: \n{}".format(r.text))
    else:
        print("Failure: {}".format(r.text))


In [7]:
# Show a listing of all submitted assignments.

key = "1uiMs0QRy5wN9nZ7euvc3dreAEwQQ0w3H0P7qzc7"

list_submits(key)

Success: 
Assignment #1: Submitted 2 times, last on: 2018-08-20T04:13:48.334Z



In [8]:
# Show one assignment, by number.

display_submit(key,1)

Success: 
Assignment #1: Submitted 2 times, last on: 2018-08-20T04:13:48.334Z
*** Check ***

