## Sidenotes (definitions, code snippets, resources, etc.)
- Note on data structure: list
    - empty list has a truth value of false
- [Feature Selection with scikit-learn for intro_to_ml](http://napitupulu-jon.appspot.com/posts/feature-selection-ud120.html)
    - Looks very helpful for copying notes, course materials
    - Investigate meaning of `# %%writefile new_enron_feature.py` inserted at top of edited studentMain.py module

### Latex
To use Python to display Latex equations etc.:
```python
from IPython.display import display, Math, Latex
display(Math(r'F(k) = \int_{-\infty}^{\infty} f(x) e^{2\pi i k} dx'))
```


### ML Order of Operations
![order of operations](lesson_13_images/ml_order_of_operations.png)

### Python 3 change
- From Python 3.3, dict keys are iterating through in a random order for each iteration (will alter GridSearchCV's output).
    - See note with validation mini-project for info on coverting code from 2.7 to 3.3.


### Python 2
```python
### there can be many "to" emails, but only one "from", so the
### "to" processing needs to be a little more complicated
# uses counter for iterating through, duplicates process for cc_emails
#   does not seem very pythonic, but maybe clearest method
if to_emails:
    ctr = 0  # counter for iterating through, perhaps not pythonic
    while not to_poi and ctr < len(to_emails):
        if to_emails[ctr] in poi_email_list:
            to_poi = True
        ctr += 1
```

### Useful git code snippets
- `git reset --soft HEAD~`
    - Leaves working tree as it was before git commit

# Evaluation Metrics
## Accuracy
__formula:__ 

$\text{accuracy} = \frac{\text{no. of data points labeled corrected}}{ \text{all data points}}$

Shortcoming of accuracy measurement:
- Not good for skewed classes (i.e. most of the data under one label)
    - because demoninator will be small, so measurement not trustworthy
- Not suited to particular labeling requirements i.e. need to err on one label over the other.
    - different performance metrics can focus on different types of errors (false positives, false negatives).
    
## Confusion Matrices
Exmaple of Confusion Matrix analysis with Decision Tree:
![confusion matrix example](lesson_14_images/confusion_matrix_example.png)

__formulas:__ 
- $\text{Recall(x)} = \frac{\text{data points correctly labeled as x}}{ \text{total data points actually x}} = \frac{\text{true positives}}{ \text{false negatives + true positives}}$
    - i.e. the probability of the data point being correctly labeled as x
    - i.e. that when the alg assigns a label on data point with actual label x, that data point is actually x
    - i.e. its saying its not, when in fact it is
- $\text{Precision(x)} = \frac{\text{data points correctly labeled as x}}{ \text{total data points labeled x}} = \frac{\text{true positives}}{ \text{false positives + true positives}}$
    - i.e. the probability of the label being correctly assigned
    - that when the alg assigns label x, that that data point is actually x
    - i.e. it's saying it is, when in fact its not

- To get the names :
<pre>
```python
for name in vars().keys():
      print(name)
```
</pre>
- To get the values:
```python
for value in vars().values():
      print(value)
```