## Sidenotes (definitions, code snippets, resources, etc.)
- Note on data structure: list
    - empty list has a truth value of false
- [Feature Selection with scikit-learn for intro_to_ml](http://napitupulu-jon.appspot.com/posts/feature-selection-ud120.html)
    - Looks very helpful for copying notes, course materials
    - Investigate meaning of `# %%writefile new_enron_feature.py` inserted at top of edited studentMain.py module

### Latex
To use Python to display Latex equations etc.:
```python
from IPython.display import display, Math, Latex
display(Math(r'F(k) = \int_{-\infty}^{\infty} f(x) e^{2\pi i k} dx'))
```

# Evaluation Metrics
## Accuracy
__formula:__ 

$\text{accuracy} = \frac{\text{no. of data points labeled corrected}}{ \text{all data points}}$

Shortcoming of accuracy measurement:
- Not good for skewed classes (i.e. most of the data under one label)
    - because demoninator will be small, so measurement not trustworthy
- Not suited to particular labeling requirements i.e. need to err on one label over the other.
    - different performance metrics can focus on different types of errors (false positives, false negatives).
    
## Confusion Matrices
Exmaple of Confusion Matrix analysis with Decision Tree:
![confusion matrix example](lesson_14_images/confusion_matrix_example.png)

__formulas:__ 
- $\text{accuracy} = \frac{\text{true positives + true negatives}}{ \text{all data points}}$

in contrast to:
- $\text{Recall(x)} = \frac{\text{data points correctly labeled as x}}{ \text{total data points actually x}} = \frac{\text{true positives}}{ \text{false negatives + true positives}}$
    - i.e. of all the data points actually labeled x, how many are accurately predicted.
    - this gives us greater confidence with negative predictions.
    - i.e. the probability of the data point being correctly labeled as x
    - i.e. that when the alg assigns a label on data point with actual label x, that data point is actually x
- $\text{Precision(x)} = \frac{\text{data points correctly labeled as x}}{ \text{total data points predicted x}} = \frac{\text{true positives}}{ \text{false positives + true positives}}$
    - i.e. of all the predicted positives, how many are true?
    - this gives us greater confidence with positive predictions.
    - i.e. the probability of the label being correctly assigned
    - i.e. that when the alg assigns label x, that that data point is actually x
- $F_{1} = \frac{\text{precision $\cdot$ recall}}{\text{precision + recall}}$


## Mini-project! Applying Metrics to Your POI Identifier
[not labeled mini-project in course]

Go back to your code from the last lesson, where you built a simple first iteration of a POI identifier using a decision tree and one feature. Copy the POI identifier that you built into the skeleton code in evaluation/evaluate_poi_identifier.py. Recall that at the end of that project, your identifier had an accuracy (on the test set) of 0.724. Not too bad, right? Let’s dig into your predictions a little more carefully.

From Python 3.3 forward, a change to the order in which dictionary keys are processed was made such that the orders are randomized each time the code is run. This will cause some compatibility problems with the graders and project code, which were run under Python 2.7. To correct for this, add the following argument to the featureFormat call on line 25 of evaluate_poi_identifier.py:

sort_keys = '../tools/python2_lesson14_keys.pkl'

This will open up a file in the tools folder with the Python 2 key order.

In [1]:
# Final model from L13
from evaluate_poi_identifier import *

### it's all yours from here forward!
from sklearn.cross_validation import train_test_split
features_train, features_test, labels_train, labels_test = \
    train_test_split(features, labels, test_size=0.3, random_state=42)

from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score    
clf = DecisionTreeClassifier()
clf.fit(features_train, labels_train)
pred = clf.predict(features_test)
print "Accuracy score: ", accuracy_score(labels_test, pred)
print "No. POIs predicted in test set: ", len([x for x in pred if x == 1])
print "No. of true positives: ", (
    len([i for i, j in zip(labels_test, pred) if i and j == 1]))


Accuracy score:  0.724137931034
No. POIs predicted in test set:  4
No. of true positives:  0


As you may now see, having imbalanced classes like we have in the Enron dataset (many more non-POIs than POIs) introduces some special challenges, namely that you can just guess the more common class label for every point, not a very insightful strategy, and still get pretty good accuracy!

Precision and recall can help illuminate your performance better. Use the precision_score and recall_score available in sklearn.metrics to compute those quantities.

What’s the precision?

In [2]:
from sklearn.metrics import precision_score, recall_score
print "Precision score: ", precision_score(labels_test, pred)
print "Recall score: ", recall_score(labels_test, pred)

Precision score:  0.0
Recall score:  0.0
