In [None]:
import pandas as pd
from sklearn import dummy
import numpy as np

#### Pandas dataframe .copy demo
When we use .copy() to duplicate a dataframe, we are taking extra precaution to not modify the source data frame. Although it's probably unnecessary, when you're throwing things together interactively, it's better to be safe than sorry.

In [1]:
df = pd.DataFrame({'x': [1,2]})
print(df)

   x
0  1
1  2


In [2]:
df_sub = df[0:1]
df_sub.x = -1
print(df)

   x
0 -1
1  2


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[name] = value


In [3]:
df = pd.DataFrame({'x': [1,2]})
df_sub_copy = df[0:1].copy()
df_sub_copy.x = -1
print(df)

   x
0  1
1  2


#### Understanding predict_proba() output
Assume, for example, that your target belongs to the set (0,1). Then, the classifier would output a probability matrix of (N,2). The first value is the probability that the data belong to class 0, and the second value is the probability that the data belong to class 1. These two values sum to 1.

You can output the result by:
```python
probability = model.predict_proba(X)[:,1]
```
If you have k classes, the output would be (N,k), you would have to specify the probability of which class you want.

In [4]:
df = pd.read_csv('./datasets/flight_delays.csv')

model_binary = dummy.DummyClassifier()
model_multi = dummy.DummyClassifier()

# Below we fit 2 models: a binary (2 classes) classifier and a multi-class (3 classes) classifier
model_binary.fit(np.random.randint(2, size=(20, 4)), np.random.randint(2, size=(20, 1)))
model_multi.fit(np.random.randint(2, size=(20, 4)), np.random.randint(3, size=(20, 1)))

# We can compare the first sample output from predict_proba() for each and notice the different shape (n = # of classes)
print(model_binary.predict_proba(np.random.randint(2, size=(20, 4)))[0][0])
print(model_multi.predict_proba(np.random.randint(2, size=(20, 4)))[0][0])

[0 1]
[1 0 0]


#### Understanding enumerate()

In [5]:
choices = ['pizza', 'pasta', 'salad', 'nachos']
list(enumerate(choices))

[(0, 'pizza'), (1, 'pasta'), (2, 'salad'), (3, 'nachos')]