<img src='notebooks/img/logo.png'>

<big><big><big><b>Anaconda: the Open-Source Platform for Data Science</b></big></big></big>
<br/><br/>
<big><big><i>Introduction to Machine Learning with Python</i></big></big>

# Machine Learning Fundamentals

This course will cover theoretical *basics* of machine learning, with a focus on the tools in **scikit-learn**.  We assume a passing familiarity with statistical techniques, but concentrate on the concrete APIs available in Python machine learning libraries.

A specialized area within machine learning is often called *deep learning*, and focuses on use of neural networks.  Basic support for neural networks is available in scikit-learn, but for more sophisticated tasks and faster processessing, you will want to use `tensor-flow` or `theano` (or the `keras` and `lasagna` libraries built on top of them).

# Table of Contents
* [Machine Learning Fundamentals](#Machine-Learning-Fundamentals)
	* [Classical scientific discovery](#Classical-scientific-discovery)
	* [Statistical Learning](#Statistical-Learning)
* [Model Selection](#Model-Selection)
	* [Model structure](#Model-structure)
	* [... and the next steps](#...-and-the-next-steps)
* [Supervised Learning](#Supervised-Learning)
* [Unsupervised Learning](#Unsupervised-Learning)


## Classical scientific discovery

1. Observe and record data linked to outcome;
1. Analyze data for deeper physical connection;
1. Devise a **simple theory** to explain the outcome from observed data;
1. Predict new outcomes with the theory.


Advantages of simple theories:

* Minimal assumptions about underlying physics;
* Generally built on other accepted background theories.

Disadvantages:

* Gross simplifications: massless ropes, point particles, ideal gases;
* Simple theories can be difficult or impossible to design;
* Difficulty choosing between multiple rival theories.

## Statistical Learning

**What if we didn't care about the underlying theory?**

A **statistical model** is an algorithm that uses *previous* knowledge of observations and outcomes to predict outcomes over *new* observations

1. Gather a set of observations and known outcomes;
1. Prepare a statistical model based on *proximity to previous observations with known outcomes*;
1. Evaluate the effectiveness of the model using a set of *previously hidden observations and outcomes*;
1. Make changes to statistical model and repeat until satisfied;
1. Predict outcomes given new observations.


Advantages:

* Reduced time to *build the model* and start predicting;
* Ability to re-build the model with new data;
* Ability to link previously unrelated sets of observations into the model;
* Rigorous statistical methods to evaluate effectiveness and compare multiple models;
* Trade *complexity* against *generality*.

Disadvantages:

* Large amounts of observation and outcome data is required;
* Observation data must be *cleaned* and *processed* to match assumptions of the model algorithm;
* *Very many* model algorithms to pick from;
* Reduced ability to *explain* why the model works.

# Model Selection

http://scikit-learn.org/stable/tutorial/machine_learning_map/

<img src='notebooks/img/ml_map.png'>

## Model structure

<table style="border:None">
<tr style="border:None; font-size:20px; padding:10px;"><th colspan=2>``model.fit(X_train, [y_train])``</td></tr>
<tr style="border:None; font-size:20px; padding:10px;"><th>``model.predict(X_test)``</th><th>``model.transform(X_test)``</th></tr>
<tr style="border:None; font-size:20px; padding:10px;"><td>Classification</td><td>Preprocessing</td></tr>
<tr style="border:None; font-size:20px; padding:10px;"><td>Regression</td><td>Dimensionality Reduction</td></tr>
<tr style="border:None; font-size:20px; padding:10px;"><td>Clustering</td><td>Feature Extraction</td></tr>
<tr style="border:None; font-size:20px; padding:10px;"><td>&nbsp;</td><td>Feature selection</td></tr>
</table>

## ... and the next steps

Credit: https://xkcd.com/1838/

<img src="notebooks/img/xkcd-machine_learning.png" width="40%" align="left"/>

# Supervised Learning

<a href='notebooks/Supervised.ipynb' class='btn btn-lg btn-primary'>Overview of Supervised Learning</a>

<div class="row">
    <a href='notebooks/Pipelines.ipynb' class='btn btn-primary btn-lg' style="float:left">Pipelines</a>
</div>

<div class="row">
    <a href='notebooks/Evaluation_Metrics.ipynb' class='btn btn-primary btn-lg' style="float:left">Classification and Evaluation</a>
    <div class="row" style="float:right">
        <a href='notebooks/Evaluation_Metrics_Exercises.ipynb' class='btn btn-primary btn-lg' style="float:left">Exercises</a>
    </div>
</div>

<div class="row">
    <a href='notebooks/Cross_Validation_and_Grid_Searches.ipynb' class='btn btn-primary btn-lg' style="float:left">Cross Validation and Grid Searches</a>
    <div class="row" style="float:right">
        <a href='notebooks/Cross_Validation_and_Grid_Searches_Exercises.ipynb' class='btn btn-primary btn-lg' style="float:left">Exercises</a>
    </div>

<div class="row">
    <a href='notebooks/Pipelines - GridSearch.ipynb' class='btn btn-primary btn-lg' style="float:left">Pipelines - GridSearch</a>
    <div class="row" style="float:right">
        <a href='notebooks/Pipelines_Exercises.ipynb' class='btn btn-primary btn-lg' style="float:left">Exercises</a>
    </div>

# Unsupervised Learning

<div class="row">
    <a href='notebooks/Unsupervised_Feature_Extraction.ipynb' class='btn btn-primary btn-lg' style="float:left">Feature Extraction</a>
    <div class="row" style="float:right">
        <a href='notebooks/Unsupervised_Feature_Extraction_Exercises.ipynb' class='btn btn-primary btn-lg' style="float:left">Exercises</a>
    </div>
</div>

<div class='row'>
    <a href='notebooks/Clustering.ipynb' class='btn btn-primary btn-lg'>Clustering</a>
</div>

<div class='row'>
    <a href='notebooks/Outliers.ipynb' class='btn btn-primary btn-lg'>Outlier Detection</a>
</div>

<img src='notebooks/img/copyright.png'>