![MLA Logo](https://drive.corp.amazon.com/view/mrruckma@/MLA_headerv2.png?download=true)

# [Machine Learning Accelerator](https://w.amazon.com/index.php/MLSciences/Community/Education/MLAccelerator)

## Thank You's 
A special thank you to [Petar Butkovic, InTech](https://phonetool.amazon.com/users/petarb) and [Nisha Menon, Hyd Ops](https://phonetool.amazon.com/users/nisham) for building this dataset.


## Loading the Data into Eider:
Since we got some people asking about best ways to import into Eider, I thought we'd go one step further and make it trivial to import!  Below is a snippet for loading and and taking a look at the dataset via S3 below. It's highly recommended to use the below method to avoid a needless local import. 

* Here's some documentation from the Eider team in case you want to import from local on a separate project: [UserFileUpload](https://w.amazon.com/index.php/Eider/Documentation/UserFileUpload)

# Computer Vision I

## Day 1 Final Project Goals - [CV I Curriculum](https://w.amazon.com/bin/view/MLSciences/Community/Education/MLAccelerator/Curriculum/#HComputerVisionI) and [MLA Learning Platform Walkthrough](https://mla.corp.amazon.com/computer-vision-i/day-1-computer-vision/final-project/)
* Hands-on development with Eider, and learn about NumPy, SciPy, Sci-kit Learn and Keras.
* ML Life-cycle
* Data pre-processing
* Class imbalance fix
* Data Augmentation
* Image representation
* Image Filtering and convolution
* Neural Networks
* Convolutional Neural Networks

## Deliverables
* **Submit your model's .csv output to the class Leaderboard before the start of the next class' lecture. For Day 3, you can submit as long as the Leaderboard is open (usually it closes at 5pm PST the day after). A full description of the business problem is available [here on the MLA Learning Platform Final Project Walkthrough](https://mla.corp.amazon.com/computer-vision-i/day-1-computer-vision/final-project/)**

## Resources
Here are a some resources that might be of interest while working on this project. Of particular note is [Eider Expo](https://eider.corp.amazon.com/expo) which lets you search for notebooks that demonstrate particular libraries or concepts.

* Scikit-learn
    * [Quick-Start tutorial](http://scikit-learn.org/stable/tutorial/basic/tutorial.html)
    * [User Guide](http://scikit-learn.org/stable/user_guide.html)
    * [Chosing the right estimator](http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html)
    * [API](http://scikit-learn.org/stable/modules/classes.html)
* SciPy
    * [Getting Started](https://www.scipy.org/)
    * [Documentation](https://www.scipy.org/docs.html)
* matplotlib
    * [Plotting API](http://matplotlib.org/api/pyplot_summary.html)
    * [Gallery](http://matplotlib.org/gallery.html)
* NumPy
    * [API](https://docs.scipy.org/doc/numpy/reference/routines.html)
* Eider
    * [User guide documentation](https://w.amazon.com/index.php/Eider/Documentation)
    * [Expo](https://eider.corp.amazon.com/expo)

In [0]:
 # Let's ensure we're using Eider's permissions with Eider's Credential cell below:

In [0]:
#Let's read in our training data. ASINs correspond to those in Leaderboard's ID.
import pandas as pd
eider.s3.download("s3://eider-datasets/mlu/projects/MLACVIFinalProject/training_data.pkl", "/tmp/training_data.pkl")
	
df = pd.read_pickle("/tmp/training_data.pkl")

### Onboarding MLA's GPU Cluster

Are your models taking too long to train? MLA provides a single [p2.xLarge](https://aws.amazon.com/ec2/instance-types/p2/) instance to students for the duration of the course. See the [MLA Learning Platform](https://mla.corp.amazon.com/computer-vision-i/day-1-computer-vision/libraries-and-frameworks/) for directions on onboarding to our GPU cluster. Please exercise frugality when using the cluster

In [0]:
#Let's see what kind of data we're working with
import matplotlib.pyplot as plt

plt.imshow(df['data'][90])

Our labels correspondend to the following:

* Class 0: *Inconclusive*
* Class 1: *Two wheels*
* Class 2: *Four wheels*
* Class 3: *Not luggage*
* Class 4: *Zero wheels*

In [0]:
# Let's take a look at this data in more detail and then start working. Remember 'label' is our target variable/column
df.loc[90]

### You're all set to get started. Remember you can refer back to the class lectures and Learning Platform. See below for the dataset that you need to predict and submit and directions on how to export from Eider and import into Leaderboard

In [0]:
# If you're unsure of how to submit to Leaderboard, no problemo.You'll use the training file loaded above to make your ML model and then predict on the files below:
eider.s3.download("s3://eider-datasets/mlu/projects/MLACVIFinalProject/test_data.pkl", "/tmp/test_data.pkl")
test = pd.read_pickle("/tmp/test_data.pkl")
plt.imshow(test['data'][90])

In [0]:
# Below is an example submission of a very poor model
eider.s3.download("s3://eider-datasets/mlu/projects/MLACVIFinalProject/MLA CV I Sample Model Output.csv", "/tmp/MLA CV I Sample Model Output.csv")
test_submission = pd.read_csv('/tmp/MLA CV I Sample Model Output.csv', header=0)
test_submission.head(5)

In [0]:
# Now let's write the dataframe as a new file to save as a prediction to upload to Leaderboard
test_submission.to_csv("tmp/MLA CV I Sample Model Output.csv", encoding='utf-8', index=False)

You should now be able to see in your Eider TMP File section [here](https://eider.corp.amazon.com/file). To save locally follow the steps below or just select 'Save' to the corresponding file to save permanently in the Eider 'Files' section. You can then locally download the file. 

## Getting our model output out of Eider and into Leaderboard
Great. Now we have a dummie sample submission in Eider that we now need to export locally so that we may then upload to Leaderboard in the following steps:
1. Within the Eider console top bar, select [Files](https://eider.corp.amazon.com/file)
2. You should now see 'Files', 'TMP' and 'Exported notebooks' tabs. 
3. Select 'TMP' then select 'Connect to workspace'. You should now see any files from your last run of your workspace. If there was no 'Connect to workspace' option, your files from the last run should already be present. *Files in the 'TMP' should be considered temporary as they will expire after an hour's worth of idle time.*
4. Go to the 'MLA CVI I Sample Model Output.csv' file and select Save
5. This file will now be permanently saved to your Eider account and available for local download from the 'Files' tab via the download button.

We now have our model's output .csv and are ready to upload to Leaderboard
1. Search for your class [Leaderboard instance](https://leaderboard.corp.amazon.com/) and go to the 'Make a Submission' section
2. Upload your local file and include your notebook version URL for tracking
3. Your score on the public leaderboard should now appear. Marvel on how much room for improvement there is

## Eider Notice - Notebook Commenting 
Eider has launched a new experience for adding comments in read-only notebooks and it's currently in Limited Beta. You'll be able to discuss the fional project and comment on your classmate's work! To try out this experience, share a [*versioned*](https://w.amazon.com/index.php/Eider/Documentation#Notebook_Versions) notebook with your classmates via the Chime group or some other medium. 

### Not seeing the "SHOW COMMENTS" option in your read-only notebook?
If you don't see the "SHOW COMMENTS" option in your read-only notebook, please [SIGN UP HERE](https://quip-amazon.com/ggIlANowXeHD) and we will add you into the Beta group ASAP.

For more information and updates on this feature, check out the Notebook Commenting Wiki [HERE](https://w.amazon.com/bin/view/Eider/Docs/NotebookCommenting/).