<a href="https://colab.research.google.com/github/jeffheaton/t81_558_deep_learning/blob/master/assignments/assignment_yourname_class3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T81-558: Applications of Deep Neural Networks
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), School of Engineering and Applied Science, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

**Module 3 Assignment: Data Preparation in Pandas**

**Student Name: Elizabeth Orrico**

# Assignment Instructions

For this assignment, you will use the **series-31** dataset.  This file contains a dataset that I generated explicitly for this semester.  You can find the CSV file on my data site, at this location: [series-31](https://data.heatonresearch.com/data/t81-558/datasets/series-31.csv). Load and summarize the data set.  You will submit this summarized dataset to the **submit** function.  See [Assignment #1](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/assignments/assignment_yourname_class1.ipynb) for details on how to submit an assignment or check that one was submitted.

The RAW datafile looks something like the following:


|time|value|
|----|-----|
|8/22/19 12:51|    19.19535862|
|9/19/19 9:44|13.51954348|
|8/26/19 14:05|9.191413297|
|8/19/19 16:37|18.34659762|
|9/5/19 9:18|1.349778007|
|9/2/19 10:23|8.462216832|
|8/23/19 15:05|17.2471252|
|...|...|

Summarize the dataset as follows:

|date|starting|max|min|ending|
|---|---|---|---|---|
|8/19/19|17.57352208|18.46883497|17.57352208|18.46883497|
|8/20/19|19.49660945|19.84883044|19.49660945|19.84883044|
|8/21/19|20.0339169|20.0339169|19.92099707|19.92099707|
|...|...|...|...|...|

* There should be one row for each unique date in the data set.
* Think of the **value** as a stock price.  You only have values during certain hours and certain days.
* The **date** column is each of the different dates in the file.
* The **starting** column is the first **value** of that date (has the earliest time).
* The **max** column is the maximum **value** for that day.
* The **min** column is the minimum **value** for that day.
* The **ending** column is the final **value** for that day (has the latest time).

You can process the **time** column either as strings or as Python **datetime**.  It may be necessary to use Pandas functions beyond those given in the class lecture.

Note, you might get the following warning on the date field from the API.  You can safely ignore this warning:

* Warning: The mean of column date differs from the solution file by 2010.4. (might not matter if small)

Your submission triggers this warning due to the method you use to convert the time/date.  Your code is correct, whether you get this warning or not.

# Google CoLab Instructions

If you are using Google CoLab, it will be necessary to mount your GDrive so that you can send your notebook during the submit process. Running the following code will map your GDrive to ```/content/drive```.

In [None]:
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
    COLAB = True
    print("Note: using Google CoLab")
    %tensorflow_version 2.x
except:
    print("Note: not using Google CoLab")
    COLAB = False

# Assignment Submit Function

You will submit the 10 programming assignments electronically.  The following submit function can be used to do this.  My server will perform a basic check of each assignment and let you know if it sees any basic problems. 

**It is unlikely that should need to modify this function.**

In [None]:
import base64
import os
import numpy as np
import pandas as pd
import requests

# This function submits an assignment.  You can submit an assignment as much as you like, only the final
# submission counts.  The paramaters are as follows:
# data - Pandas dataframe output.
# key - Your student key that was emailed to you.
# no - The assignment class number, should be 1 through 1.
# source_file - The full path to your Python or IPYNB file.  This must have "_class1" as part of its name.  
# .             The number must match your assignment number.  For example "_class2" for class assignment #2.
def submit(data,key,no,source_file=None):
    if source_file is None and '__file__' not in globals(): raise Exception('Must specify a filename when a Jupyter notebook.')
    if source_file is None: source_file = __file__
    suffix = '_class{}'.format(no)
    if suffix not in source_file: raise Exception('{} must be part of the filename.'.format(suffix))
    with open(source_file, "rb") as image_file:
        encoded_python = base64.b64encode(image_file.read()).decode('ascii')
    ext = os.path.splitext(source_file)[-1].lower()
    if ext not in ['.ipynb','.py']: raise Exception("Source file is {} must be .py or .ipynb".format(ext))
    r = requests.post("https://api.heatonresearch.com/assignment-submit",
        headers={'x-api-key':key}, json={'csv':base64.b64encode(data.to_csv(index=False).encode('ascii')).decode("ascii"),
        'assignment': no, 'ext':ext, 'py':encoded_python})
    if r.status_code == 200:
        print("Success: {}".format(r.text))
    else: print("Failure: {}".format(r.text))

# Assignment #3 Sample Code

The following code provides a starting point for this assignment.

In [1]:
import os
import pandas as pd
from scipy.stats import zscore

# You must also identify your source file.  (modify for your local setup)
# file='/content/drive/My Drive/Colab Notebooks/assignment_yourname_class3.ipynb'  # Google CoLab
# file='C:\\Users\\jeffh\\projects\\t81_558_deep_learning\\assignments\\assignment_yourname_class3.ipynb'  # Windows
file='/Users/jheaton/projects/t81_558_deep_learning/assignments/assignment_yourname_class3.ipynb'  # Mac/Linux

# Begin assignment
df = pd.read_csv("https://data.heatonresearch.com/data/t81-558/datasets/series-31.csv",infer_datetime_format=True)

# Make 2 columns, one for date and one for time
datesAndTimes = pd.to_datetime(df['time'], errors='coerce')
dates = pd.DataFrame( datesAndTimes.dt.date, columns=['time'])
times = pd.DataFrame( datesAndTimes.dt.time, columns=['time'])

col_names = ['date', 'time', 'value']
result = pd.DataFrame(index=df.index, columns = col_names)
result['date'] = dates
result['time'] = times
result['value'] = df.value
display(result)



min_val = result.groupby('date',as_index=False)['value'].min()[['value', 'date']]
# Figure out starting value, ending value
starting_val = result.sort_values(['date','time']).groupby('date',as_index=False).first()[['date','value','time']]

# Figure out max per day, min per day
max_val = result.groupby('date',as_index=False)['value'].max()[['value']]
ending_val = result.sort_values(['date','time']).groupby('date',as_index=False).last()[['date', 'value', 'time']]

# construct answer data table
col_names = ['date', 'starting', 'max', 'min', 'ending']
answer = pd.DataFrame( columns = col_names)
answer['date'] = starting_val['date']
answer['starting'] = starting_val['value']
answer['max'] = max_val['value']
answer['min'] = min_val['value']
answer['ending'] = ending_val['value']

display(answer)

df.to_csv(index=False,path_or_buf='.\class3_output.csv')


# Valuable way to print out GroupBy Object
# thing = result.groupby('date')['time','value']
# display(list(thing))



Unnamed: 0,date,time,value
0,2019-08-22,12:51:00,19.195359
1,2019-09-19,09:44:00,13.519543
2,2019-08-26,14:05:00,9.191413
3,2019-08-19,16:37:00,18.346598
4,2019-09-05,09:18:00,1.349778
...,...,...,...
13495,2019-09-09,12:50:00,19.046697
13496,2019-09-09,10:00:00,17.984843
13497,2019-09-18,14:16:00,15.827774
13498,2019-09-19,16:16:00,13.070049


Unnamed: 0,date,starting,max,min,ending
0,2019-08-19,17.573522,18.468835,17.573522,18.468835
1,2019-08-20,19.496609,19.84883,19.496609,19.84883
2,2019-08-21,20.033917,20.033917,19.920997,19.920997
3,2019-08-22,19.3937,19.3937,18.89359,18.89359
4,2019-08-23,17.784213,17.784213,16.974865,16.974865
5,2019-08-26,9.717627,9.717627,8.814028,8.814028
6,2019-08-27,7.630234,7.630234,7.175677,7.175677
7,2019-08-28,6.940731,7.091136,6.940731,7.091136
8,2019-08-29,7.677676,8.150454,7.677676,8.150454
9,2019-08-30,8.989132,9.441207,8.989132,9.441207
