# Table of Contents

* [1. Keynote: Scaling Human Learning](#scaling_human_learning)
* [2. Deconstructing Feather](#feather_deconstruction)
* [3. Microsoft AI](#microsoft_ai)
* [4. Explaining Classification Algorithms](#explaining_classification)

<a id='scaling_human_learning'></a>

# Keynote: Scaling Human Learning

[Katy Huff](https://twitter.com/katyhuff)

Nuclear Engineer @ UIUC

- [The Hacker Within](http://www.thehackerwithin.org/)
    * Models that have worked
    * Models that have failed
        * single point-of-failure

- [Software Carpentry](https://swcarpentry.github.io/python-novice-inflammation/)

- [Data Carpentry](http://www.datacarpentry.org/)

- Tools
    * [Binder](http://mybinder.org/)
    * [Data Science Textbook](http://www.inferentialthinking.com/)
    * Effective Computation in Physics

- Why do it?
   * community
   * travel
   * teaching experience
   * felt need

- What we should do?
    * Lower the barrier, not the standards
    * New Dev instructions
    * Document well
    * Curate low hanging fruit
    * Targeted sprints
    * Appoint ambassador
    * Consider Users Conferences

<a id='feather_deconstruction'></a>

# Feather

[Bill Lattner](https://twitter.com/wlattner)

[Feather](https://github.com/wesm/feather)

- Exchange tabular data between Python, R, and others
- Fast read/write
- Represent categorical features
- it's about the metadata

- memory access cost depends on both location + predictability
- sequential access FTW

**Idea**
- on-disk representation should be similar to in-memory representation
- columnar layout is good fit for analytic workflows
    * columnar layout from [arrow](http://arrow.apache.org/)
    
    
**Simplicity**
```python
feather.read_dataframe(path, columns=None)
```

- each column is serialized into a dataframe
- bitmask of nulls
- values
- looks liike dataframe in R
- no current in-place concatenations
- how does it handle escape characters
- 

** Future of feather **

- in-place operations
- zero parsing or copying to Pandas memory
- [mmap](https://pymotw.com/2/mmap/) the feather file
- input to sklearn or statsmodels
- output from PostgreSQL

** Use case **
- passing data between R and Python in computation
- As part of a Luigi Pipeline

<a 'microsoft_ai'></a>

# Microsoft Cognitive Services

[David Girard](http://www.DavidGiard.com)

** Examples**
- [What Dog](https://www.what-dog.net/)
- [How-Old](https://how-old.net/)


[Cognitive Services API](https://www.microsoft.com/cognitive-services)
- Times per month
- Times per minute


[Jupyter Notebook Demo](https://github.com/Microsoft/Cognitive-Vision-Python)

[Demos](https://github.com/DavidGiard/CognitiveSvcsDemos)

<a id='explaining_classification'></a>

# Explaining classification algorithms

[Brian Lange](https://twitter.com/bjlange)

## Popular examples
- Sorting Hat
- Spam filter

## Notes
- Need labelled data for training
- feature = dimension - column = attribute 
- class= categorization

**Caveat**
- Choosing good features or getting more data will help more than changing algorithms

*How do we find the least terrible line using gradient descent?*

## Implementing a spam filter

In [15]:
import numpy as np
import sklearn

In [16]:
x = np.array([[0 , 0.1], [3, 0.2], [5, 0.1]])
y = np.array([1,2,1])

In [17]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
model = LinearDiscriminantAnalysis()
model.fit(x,y)



LinearDiscriminantAnalysis(n_components=None, priors=None, shrinkage=None,
              solver='svd', store_covariance=False, tol=0.0001)

In [18]:
new_point = np.array([1, .3])
print(model.predict(new_point))

[1]




Most of the time, we don't use a Linear Discriminant Classifier. Other models include logistic regression. When there's a gradient between the two classes that can be described with the logistic function.

In [26]:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression

**SVM** = better definition of 'terrible'
 - lines can turn into non-linear shapes if you transform your data
 - the kernel trick: take the square of each number'
 - **RBF SVM**: radial-basis svm. Creating more complex shapes. Most popular kernel
 - SVM also tries to maximize the margins
 


In [29]:
from sklearn.svm import LinearSVC
model = LinearSVC

**KNN**
- What do similar cases look like?
- k = how many? 
- Tie-breaking 


In [27]:
from sklearn.neighbors import NearestNeighbors

** Decision Tree learners**
- Make a flow chart of it
- In higher dimensions
- Prone to overfitting
- Use [Pydot](https://github.com/erocarrera/pydot)

In [30]:
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier

``` python
import pydot
sklearn.tree.export_graphviz() + pydot
```

** Ensemble Model**: deals with overfitting problems

Bagging:
- Split training set
- Train one model each
- Models 'vote'
- Sum of the decision boundaries of its components

Random Forest:
- Like bagging
- At each split randomly constrain features to choose from

Extra trees:
- For each split, make it random, non-optimally
- Compensate by making a ton of trees

Voting:
- Combine a bunch of different models of your design, have them 'vote' on the correct answer
- For example (KNN, SVM, Decision Tree) 

Boosting:
- Train models in order, make the other ones focus on the points the earliest ones missed


``` python
from sklearn.ensemble import [insert model here]
```

**How Do I pick?**
- Nonlinear decision boundary
- Providing probability estimates
- Tell how important a feature is to the model

 |Nonlinear boundary | Probability Estimate | Feature importance
 ---|--- | --- | --- 
Logistic Regression | Yes | Not really | no
KNN | Yes | Sort of (% nearby points) | No
Naive Bayes | Yes | Yes| No
Decision Tree | Yes | No | Kinda
Ensemble | Yes | kinda (% agreement) | kinda

 |Can I update? | Easy to parallel
---|--- | --- 
Logistic regression | kinda | kinda
SVM | kinda, depending on kernel | yes for some kernels, no for others
KNN| yes | yes
Naive Bayes | yes | yes
Decision Tree | no | no (but it's really fast)
Ensemble | kinda, by adding new models | yes
Boosted | kinda, by adding new models | no

***Other quirks:**

SVM: Pick a kernel

KNN: need to define what 'similarity' is in a good way. Fast to train, slow to classify

Naive bayes: Have to choose the distribution. Can deal with missing data

Decision Tree: Can provide literal flow charts, sensitive to outliers

Ensemble: Less prone to overfitting

Boosted: More parameters to tweak, most prone to overfit than normal ensembles