<a href="https://colab.research.google.com/github/jirvingphd/my_data_science_notes/blob/master/My_Flatiron_Bootcamp_Notes_Mod_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://www.dropbox.com/s/fchpltm5rnwd5ce/Flatiron%20Logo%202Wordmark.png?raw=1" width=100 >

# My Flatiron Bootcamp Notes  - Mod 3
- James M. Irving, Ph.D.
- james.irving.phd@gmail.com
- Repo:  https://github.com/jirvingphd/my_data_science_notes

- Previous Notebook
    - [My Flatiron Bootcamp Notes - Mod 1 & 2.ipynb](https://drive.google.com/file/d/1Kd4jD0pzpaN2bsR7EkrXsvUMRLwS5a3l/view?usp=sharing)
    


## Time Series 



### Time Series with Pandas
- Converting to datetime, setting index

```python
import pandas as pd
import numpy as np
from pandas.core import datetools

temp_data.Date = pd.to_datetime(temp_data.Date, format='%d/%m/%y')
temp_data.set_index('Date', inplace = True)

```
- Downsampling or upsampling time series

```python

# Downsampling (to larger time unit):
temp_monthly= temp_data.resample('MS') # MS = month start

# Upsampling (to smaller time unit, may cause NaN
temp_bidaily= temp_data.resample('12H').asfreq()

# Fill in emppty time indices:
temp_bidaily_fill= temp_data.resample('12H').ffill() # Forwards fill
temp_bidaily_fill= temp_data.resample('12H').bfill() #Backwards fill

```
- Slicing time series


```python
temp_1985_onwards = temp_data['1985':]
```

- Plotting time series: (ts=time series dataframe)
```python
# Line plot
ts.plot(subplots=True/False)
# Dot plot
ts.plot(style='.b')
# Histogram
ts.hist()
# KDE
ts.plot(kind='kde')
# Box & Whiskers
ts.boxplot()
# Heat maps
year_matrix = nyse_annual.T  # First must transpose.
plt.matshow(year_matrix, interpolation=None, aspect='auto', cmap=plt.cm.Spectral_r)
plt.show()
```

### Types of Time Series Trends

- Stationary vs Non-Stationary 

<img src="https://www.dropbox.com/s/utn0m1ry9raefx0/Mean_nonstationary.png?raw=1" width=400>

<img src="https://www.dropbox.com/s/d5o899hhus5ppxx/Var_nonstationary.png?raw=1" width=400>

- Trends can be:
    - Linear
    - Exponential
    - Periodic/seasonal
    - Trends with Increasing/Decreasing Variance
    
<img src="https://www.dropbox.com/s/pfpygr22gnrdz6m/trendseasonal.png?raw=1" width=500>

#### Trend detection: Rolling statistics:

    - Moving average/variance calculations using ```.rolling()```

```python
rolmean = ts.rolling(window = 8, center = False).mean()
rolstd = ts.rolling(window = 8, center = False).std()
fig = plt.figure(figsize=(12,7))
orig = plt.plot(ts, color='blue',label='Original')
mean = plt.plot(rolmean, color='red', label='Rolling Mean')
std = plt.plot(rolstd, color='black', label = 'Rolling Std')
```
<img src="https://www.dropbox.com/s/n6pfycjt0jntk1l/index_38_0.png?raw=1" width=400>

#### Trend detection: Dickey Fuller Test
- [adfuller from statsmodels.tsa.statstools](http://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.adfuller.html)
- The Dickey Fuller Test null hypothesis is that the series is NOT stationary, so a significant result means that it IS stationary. 

```python
from statsmodels.tsa.stattools import adfuller

dftest = adfuller(ts)

# Extract and display test results in a user friendly manner
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
    dfoutput['Critical Value (%s)'%key] = value
print(dftest)

```

#### def stationarity_check(): from lessons
```python
def stationarity_check(TS):
    
    # Import adfuller
    from statsmodels.tsa.stattools import adfuller
    
    # Calculate rolling statistics
    rolmean = TS.rolling(window = 8, center = False).mean()
    rolstd = TS.rolling(window = 8, center = False).std()
    
    # Perform the Dickey Fuller Test
    dftest = adfuller(TS['#Passengers']) # change the passengers column as required 
    
    #Plot rolling statistics:
    fig = plt.figure(figsize=(12,6))
    orig = plt.plot(TS, color='blue',label='Original')
    mean = plt.plot(rolmean, color='red', label='Rolling Mean')
    std = plt.plot(rolstd, color='black', label = 'Rolling Std')
    plt.legend(loc='best')
    plt.title('Rolling Mean & Standard Deviation')
    plt.show(block=False)
    
    # Print Dickey-Fuller test results
    print ('Results of Dickey-Fuller Test:')

    dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
    for key,value in dftest[4].items():
        dfoutput['Critical Value (%s)'%key] = value
    print (dfoutput)
    
    return None
```
- [Article on testing for non-stationary](https://machinelearningmastery.com/time-series-data-stationary-python/)

#### Eliminating trends
Have several methods for elimianting different trends:

- **Taking the log transformation (or square root, cube root)**
        - ```np.log(ts) or np.sqrt(ts) ```
    - Will make time series more "uniform" over time. 
    - Higher values are penalized more than lower ones. 
- **Subtracting the rolling mean**
    - Calculate the rolling mean ( using.rolling() ) and subtract it from the ts.

```python

rolmean = ts.rolling(window = 4).mean()
ts_diff = ts - rolmean

```
        
- **Weighted rolling mean.**
    - Pandas has Exponentially Weighted Moving Average (ts.ewm())
    - Halflife parameter determines exponentail decay. Can use other parameters like span and center of mass to define decay. 
        - Discussed in [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.ewm.html)
        
```python
# Use Pandas ewma() to calculate Weighted Moving Average of ts_log
exp_rolmean = ts.ewm(halflife = 2).mean()
data_minus_exp_rolmean = ts - exp_rolmean

```

- **Differencing**
    - Common way dealing with both trends and seasonality is differencing.
    - Take the difference between one instant and the previous instant (1-period /first order lag). # of time periods lag = the 'order' of diff. First, second, third, etc. 
    
```python
data_diff = data.diff(periods=365)
```

### Time Series Decomposition
- Turns a time series into multiple different time series. Most often in 3 parts:
    1. Seasonal 
    2. Trend
    3. Random (noise/irregular/remainder/residuals)
    
- Must pick between addititve or multiplicative decomposition:
    - Must analzye time series to help decide:
        - Does the magnitude of seasonality increase or decrease when the time series increases?
    - Statsmodels has seasonal_decompose function. 

```python
# import seasonal_decompose
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(np.log(ts))

# Gather the trend, seasonality and noise of decomposed object
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid

# Plot gathered statistics
 plt.plot(np.log(ts), label='Original', color="blue")
plt.plot(trend, label='Trend', color="blue")
plt.plot(seasonal,label='Seasonality', color="blue")
plt.plot(residual, label='Residuals', color="blue")
```
<img src="https://www.dropbox.com/s/6dh8ogkytzjreky/index_4_0.png?raw=1" width=500>


- Article on [decomposing time series](https://machinelearningmastery.com/decompose-time-series-data-trend-seasonality/)


### Section 25 Recap
The key takeaways from this section include:
* When you import time series data into Pandas, make sure to use the time/date information as index values using either a Pandas Timestamp or Python DateTime data type
* There are a range of built in functions in Pandas for easily downsampling or upsampling time series data
* Line plots and dot plots can be useful for getting a sense of how a time series data set has changed over time
* Histograms and density plots can be useful for getting a sense of the time independent distribution of a time series data set
* Box and whisker plots per year (or other seasonality period - day, week, month, etc) can be a great way to easily see trends in the distribution of time series data over time
* Heat maps can also be useful for comparing changes of time series data across a couple of dimensions. For example, with months on one axis and years on another they can be a great way to see both seasonality and year on year trends
* A time series is said to be stationary if its statistical properties such as mean and variance remain constant over time
* Most time series models work on the assumption that the time series are stationary (assumption of homoscedasticity)
* Many time series data sets *do* have trends, violating the assumption of homoscedasticity
* Common examples are trends include linear (straight line over time), exponential and periodic. Some data sets also have increasing (or decreasing) variance over time
* Any given data set may exhibit multiple trends (e.g. linear, periodic and reduction in variance)
* Rolling statistics can be used to test for trends to see whether the centrality and/or dispersion of the data set changes over time
* The Dickey Fuller Test is a common test for determining whether a data set contains trends
* Common approaches for removing trends and seasonality include taking a log transform,. subtracting the rolling mean and differencing
* Decomposing allows you to separately view seasonality (which could be daily, weekly, annual, etc), trend and "random" which is the variability in the data set after removing the effects of the seasonality and trend




# Time Series Models
- Note: for almost all models you need to make time series stationary first.
### White Noise Model
- The white noise model has three properties:
    - Fixed and constant mean
    - Fixed and constant variance
    - No correlation over time 

<img src="https://www.dropbox.com/s/jk1pf891qfs2l4d/index_10_0.png?raw=1" width=400>
    
- Special case is Gaussian White Noise
    - Constant mean = 0
    - Constant variance =1
- [Article on white noise series in python](https://machinelearningmastery.com/white-noise-time-series-python/)

### Random walk model
- Very common in finance (i.e. exchange rates) 
    - Tomorrow's rate is heavily influenced by today's
- Contrary to the white noise model, random walk has:
    - No specific mean or variance.
    - A strong dependence over time. 

- the changes over time are basically a white noise model 

$$Y_t = Y_{t-1} + \epsilon_t$$
where $\epsilon_t$ is a *mean zero* white noise model!

<img src="https://www.dropbox.com/s/cnlyxoos54ztlbx/index_12_0.png?raw=1" width=400>


#### Random Walk with a drift
- The drift (c) steers the model in a certain direction.
$$Y_t = c+ Y_{t-1} + \epsilon_t$$


### Correlation & Autocorrelation

- [Article: "A Gentle Introduction to Autocorrelation and Partial Autocorrelations"](https://machinelearningmastery.com/gentle-introduction-autocorrelation-partial-autocorrelation/
)
#### Autocorrelation Function (ACF)
- Autocorrelation exmaines a time series against itself over increasing values of lag.
- Pandas has autocorrelation_plot

```
pd.plotting.autocorrelation_plot(diet)
```

<img src="https://www.dropbox.com/s/e2uknvwydcijqnl/index_33_0%20%282%29.png?raw=1" width=500 >

- Same data, but after removing trends with differencing:

<img src="https://www.dropbox.com/s/88u9r2cvnqrob2n/index_37_1.png?raw=1" width=500>

- Can also plot with statsmodels:.
```python
from statsmodels.graphics.tsaplots import plot_acf
plot_acf(diet, lags = 100);
```
<img src="https://www.dropbox.com/s/33llhqo96t8sh3j/index_45_0.png?raw=1" width=500>
#### Partial Autocorrelation Function (PACF)

- Similar to ACF, but it controls for values at shorter labs (which ACF does not).
    - "Summary of tge relationship between a time series element and observations at a lab, _with the relationships of intervening observations removed_."
    - Can be interpreted as a regression of the series against its PAST lags. 
    - Can use to help pick what order of ARF to use in modeling.

Plotted from statsmodels tsaplots:

```python
from statsmodels.graphics.tsaplots import plot_pacf
from matplotlib.pylab import rcParams
plot_pacf(diet, lags = 100);

```
<img src="https://www.dropbox.com/s/cr7p0o3prwnmqrs/index_42_0.png?raw=1" width=500>

### ARMA Models
- Combination of Autoregressive (AR) model and Moving Average (MA) model.
    - AR: $Y_t = \mu + \phi * Y_{t-1}+\epsilon_t$
    - MA: $Y_t = \mu +\epsilon_t + \theta * \epsilon_{t-1}$

#### The Autoregressive Model
- A value from a time series is regressed on preivous values from same time series.

$$ \text{Today = constant + slope} \times \text{yesterday + noise} $$

Or, mathematically:
$$Y_t = \mu + \phi * Y_{t-1}+\epsilon_t$$

$\phi$ is slope. 


- Notes on this formula:
    - If the slope is 0, the ts is a white noise model with mean $\mu$
    - If slope is not 0, the ts is autocorrelated.
    - Bigger slop means bigger autocorrelation
    - Negative slope =  time series follows oscillatory process. 



##### AR Model Time Series (at varying $\phi$)
**AR time series:**

<img src="https://www.dropbox.com/s/k9mnam1wv4eltp2/AR_model.png?raw=1" width =500>

**AR series' ACF:**


<img src="https://www.dropbox.com/s/5ucfnsrlxjev7k8/AR_ACF.png?raw=1" width =500>
> The oscillatory process of the time series with $\phi=0.9$ is clearly reflected in the autocorrelation function, returning an oscillatory autocorrelation function as well. $\phi=0.2$ leads to a very low, insignificant,  autocorrelation. $\phi=0.8$ leads to a strong autocorrelation for the first few lags, and then incurs a steep decline. Having a $\phi=1.02$ (just slightly bigger than 1) leads to strong and longlasting autocorrelation.


**AR series' PACF:**


<img src="https://www.dropbox.com/s/joazuyts1xmqhzh/AR_PACF.png?raw=1" width=500>


> For each of these PACFs, we notice a high value for 1 lag, then autocorrelations of 0, except for the second one. This is no big surprise, as the slope parameter is fairly small, so the relationship between a value and the next one is fairly limited.

#### The Moving Average Model
- The weighted sum of today's and yesterday's noise 

$$ \text{Today = Mean + Noise + Slope} \times \text{yesterday's noise} $$

Or, mathematically:
$$Y_t = \mu +\epsilon_t + \theta * \epsilon_{t-1}$$

- Some notes based on this formula:
    - If the slope is 0, the time series is a white noise model with mean $\mu$
    - If the slope is not 0, the time series is autocorrelated and depends on the previous white noise process
    - Bigger slope means bigger autocorrelation
    - When there is a negative slope, the time series follow an oscillatory process

##### MA Model Time Series (at varying $\phi$)
**MA time series:**

<img src="https://www.dropbox.com/s/ic7uzmgtuhdoqu4/MA_model.png?raw=1"  width=500>

>When there is a posivite $\theta$ there is a certain persistence in level, meaning that each observation is generally close to its neighbors. This is more pronounced for higher . values of $\theta$. MA series with negative coefficients, however, show oscillatory patterns. Recall that when $\theta=0$, the process is a true White Noise Process! 


**MA ACF:**

<img src="https://www.dropbox.com/s/fv7sryfxyazve82/MA_ACF.png?raw=1" width=500>

> MA processes have autocorrelations, but because of the structure of the MA formula (regressing it on the noise term of the previous observation) there is **only a dependence for one period, and the autocorrelation is zero for lags 2 and higher.**

> If $\theta >0$ the lag one autocorrelation is positive, if $\theta <0$ the lag one autocorrelation is negative.


**MA PACF:**

<img src="https://www.dropbox.com/s/fsijauyvae9hj2v/MA_PACF.png?raw=1" width=500>

> Typically a strong correlation with the 1-period lag (strength depending in theta), and then the PACF gradually tails off. 

### Higher Order AR(p) and MA(q) Models 
- First order:
    - AR: $Y_t = \mu + \phi * Y_{t-1}+\epsilon_t$
    - MA: $Y_t = \mu +\epsilon_t + \theta * \epsilon_{t-1}$
- Second order:
    - AR(2): $Y_t = \mu + \phi_1 * Y_{t-1}+\phi_2 * Y_{t-2}+\epsilon_t$
    - MA(2): $Y_t = \mu +\epsilon_t + \theta_1 * \epsilon_{t-1}+ \theta_2 * \epsilon_{t-2}$


- AR(p):
    - ACF for AR(p) would be strong until lag of p, then stagnant, then trail off. 
    - PACF for AR(p): Generally no correlation for lag values beyond p.
- MA(q):
    - ACF for MA(q) would show strong correlation up to a lag of q, the immedately delcine to minimal/no correction.
    - PACF would show strong relationship to the lab and tailing off to no correlation afterwards.
    
    
### ARMA Models:
- In an ARMA model, is a regression on paste values (AR part) and the error term is modeled as a linear combo of error terms in the recent past (MA part). 
- Notation is generally ARMA(p,q)
    - Example: ARMA(2,1) model equation
     $$Y_t = \mu + \phi_1 Y_{t-1}+\phi_2 Y_{t-2}+ \theta \epsilon_{t-1}+\epsilon_t$$

| | AR(p)   |   MA(q)  | ARMA(p,q)|
|------|------|------|------|
|   ACF | Tails off   |  Cuts off after lag q |  Tails off   |
|   PACF | Cuts off after lag p  |   Tails off  |  Tails off  |


 #### General process when modeling with a time series:

- Detrend your time series using differencing. ARMA models represent stationary processes, so we have to make sure there are no trends in our time series
- Look at ACF and PACF of the time series
- Decide on the AR, MA and order of these models
- Fit the model to get the correct parameters and use for prediction


[Additional Information on ARMA can be found here  in lessons 1 and 2.](https://newonlinecourses.science.psu.edu/stat510/node/41/)
    

### sARIMA Models [BOOKMARK]
- Integrated ARMA models. 

[BOOKMARK]

### Section 26: Key Takeaways

The key takeaways from this section include:
* A White Noise model has a fixed and constant mean and variance, and no correlation over time
* A Random Walk model has no specified mean or variance, but has a strong dependance over time
* The Pandas `corr()` function can be used to return the correlation between various time series data sets
* Autocorrelation allows us to identify how strongly each time serties observation is related to previous observations
*  The autocorrelation function (ACF) is a function that represents autocorrelation of a time series as a function of the time lag
* The Partial Autocorrelation Function (or PACF) gives the partial correlation of a time series with its own lagged values, controlling for the values of the time series at all shorter lags
* ARMA (AutoRegressive and Moving Average) modeling is a tool for forecasting time series values by regressing the variable on its own lagged (past) values
* ARMA models assume that you've already detrended your data and that there is no seasonality
* ARIMA (Integrated ARMA) models allow for detrending as part of the modeling process and work well for data sets with trends but no seasonality
* SARIMA (Seasonal ARIMA) models allow for both detrending and seasonality as part of the modeling process
* Fracebook Prophet enables data analysts and developers alike to perform forecasting at scale in Python
* Prophet uses Additive Synthesis for time series forecasting


## Distance Metrics & k-Nearest Neighbors
### Distance Metrics:
Distance helps us quantity similarity.
Distance can be measured in different metrics.
1. Manhattan Distance
    - Movement by X/Y blocks.
    - $d(x,y) =  \sum_{k=1}^n |x_k - y_k|$
        
    <img src="https://www.dropbox.com/s/0q217qlbc9xtb7t/manhattan-distance.png?raw=1" width=200>
2. Euclidian/Pythagorean Distance
    - Straight line (as-the-bird flies)
    - $d(x,y) = \sqrt{ \sum_{k=1}^n  (x_k - y_k)^2)}$
    
    <img src="https://www.dropbox.com/s/h3ogtkukgp6pwin/euclidean-distance.png?raw=1" width=250>
    
3. Minkowski distance
    - Generalized distance metric across a _Normed Vector Space_. 
        - Meaning each point has been through the same function.  Can be any function as long as:
            - A zero vector(just a vecotr of zeros) will output length=0
            - Every other vector has positive length.
        - Both Manhattan and Euclidian are actually special cases of Minkowski
    - $d(p,q) = (\sum_{i=1}^n (|p_i - q_i|)^c)^{1/c}$
    
    
    
```python 
# Manhattan Distance is the sum of all side lengths to the first power
manhattan_distance = (length_side_1 + length_side2 + ... length_side_n)**1  

# Euclidean Distance is the square root of the sum of all side lengths to the second power
euclidean_distance = np.sqrt((length_side_1 + length_side2 + ... length_side_n)*2)

# Minkowski Distance with a value of 3 would be the cube root of the sum of all side lengths to the third power
minkowski_distance_3 = np.cbrt((length_side_1 + length_side2 + ... length_side_n)**3)

# Minkowski Distance with a value of 5
mink_distance_5 = np.power((length_side_1 + length_side2 + ... length_side_n)**5, 1./5)
```


### K-Nearest Neighbors (KNN)
<img src="https://www.dropbox.com/s/77747858h369yzx/knn.gif?raw=1" width=500>
- **KNN is a supervised learning algorithm that can be used for both classification and regression.**
    - Distance-based, looks for the smaller distance between 2 points to identify similarity. 
        - Each column acts as a dimension. 
        - Can use any of the distance metrics discussed
    - since its supervised, must give it labeled training data. 
    
- **Fitting**
    - KNN does very little during the fit step, just stores the data and labels.
- **Predicting**
    - For each point, KNN calculates the distances to _every single point_ int he training set. 
    - It then finds the ```k``` closest neighbors, and examines their labels.
        - its 'democratic', in that each of the nearest points submits a vote as to which group it should belong to.
        - the group with the largest # of votes win. 
- **Evaluating Model Performance**
    - Evaluation is different depending on if using for classification or regression task.
    - Need a test set of data to compare its predicitons against to calc:
        - Precision
        - Recall
        - Accuracy
        - F1-Score

#### Confusion Matrices - to Evaluate Classification
For Example, using simply binary classification 0 or 1. 
<img src="https://www.dropbox.com/s/1kt3vniy7h1vodw/rf-conf-matrix.png?raw=1" width=300>

    
- **Confusion Matrices tell us 4 things:**
    - True Positives (TP): The model predicted the person has the disease (1), and they actually have the disease (1).

    - True Negatives (TN): The model predicted the person is healthy (0), and they are actually healthy (0).

    - False Positives (FP): The model predicted the person has the disease (1), but they are actually healthy (0). 

    - False Negatives (FN): The model predicted the person is healthy (0), but they actually have the disease (1).

- **To construct a confusion matrix, we need:**
    -  Predicitons for each data point in training or test set
    - Labels for same data points in that test set.
    
- To create a Confusion Matrix from scratch, we:
    1. Iterate through both lists and grab the item at the same the label and corresponding prediction.  
        - Note that `enumerate` is great here, since it gives us both an item and the index of that item from a list. 
    2. Use some control flow to determine if its a TP, TN, FP, or FN. 
    3. Store our results in a dictionary or 2-dimensional array. 
    4. Return our results once we've checked every prediction against its corresponding label. 
    
```python
def confusion_matrix(labels, predictions):
    conf_matrix = {"TP": 0, "FP": 0, "TN": 0, "FN": 0}
    for ind, label in enumerate(labels):
        pred = predictions[ind]
        if label == 1:
            # CASE: True Positive
            if label == pred:
                conf_matrix['TP'] += 1
            # CASE: False Negative 
            else:
                conf_matrix['FN'] += 1
        else:
            # CASE: True Negative
            if label == pred:
                conf_matrix['TN'] += 1
            # CASE: False Positive
            else:
                conf_matrix['FP'] += 1
    
    return conf_matrix
```

- **Confusion Matrices for Multi-Categorical Classificaitons:**
    - Diagonal represents true positives

<img src="https://www.dropbox.com/s/qgy3t90fyxztjni/cm2.png?raw=1" width=400>


#### Confusion Matrices with sklearn
- A nice positive of sklearn's implementation:
    - it automatically adjusts to the# of categories present in the labels.
    
```python
# Calcualate confusion matrix
from sklearn.metrics import confusion_matrix
cf = confusion_matrix(example_labels, example_preds)

# Plot confusion matrix with matplotlib
import numpy as np
import itertools
import matplotlib.pyplot as plt
% matplotlib inline

def show_cf(y_true, y_pred, class_names=None, model_name=None):
    cf = confusion_matrix(y_true, y_pred)
    plt.imshow(cf, cmap=plt.cm.Blues)
    
    if model_name:
        plt.title("Confusion Matrix: {}".format(model_name))
    else:
        plt.title("Confusion Matrix")
    plt.ylabel('True Label')
    plt.xlabel('Predicted Label')
    
    class_names = set(y_true)
    tick_marks = np.arange(len(class_names))
    if class_names:
        plt.xticks(tick_marks, class_names)
        plt.yticks(tick_marks, class_names)
    
    thresh = cf.max() / 2.
    
    for i, j in itertools.product(range(cf.shape[0]), range(cf.shape[1])):
        plt.text(j, i, cf[i, j], horizontalalignment='center', color='white' if cf[i, j] > thresh else 'black')

    plt.colorbar()

show_cf(example_labels, example_preds)
```




### Evaluation Metrics [BOOKMARK]
- **Precision**
$$Precision = \frac{\text{Number of True Positives}}{\text{Number of Predicted Positives}}$$

- **Recall**
$$Recall = \frac{\text{Number of True Positives}}{\text{Number of Actual Total Positives}}$$ 

 Precision and Recall have an inverse relationship.  As our recall goes up, our precision will go down, and vice versa. If this doesn't seem intuitive, let's examine this.
 
<img src="https://www.dropbox.com/s/p7yy1t34lx9k82j/Precisionrecall.png?raw=1" width=400>

<img src="https://www.dropbox.com/s/ij75yic63m32x5z/performance-comparisons.png?raw=1" width =400>


- **Accuracy**

$$Accuracy = \frac{\text{Number of True Positives + True Negatives}}{\text{Total Observations}}$$

- **F-1 Score**

$$F1-Score = 2\ \frac{Precision\ x\ Recall}{Precision + Recall}$$


## Graph Theory

>A "Graph" in mathematical and computer science terms consists of "Nodes" or "Vertices". Nodes/Vertices may or may not be connected with one another. The connecting line between two nodes is called an "edge". 

- Linked Refs:
    - [Graph Theory Basics](https://www.geeksforgeeks.org/mathematics-graph-theory-basics-set-1/)
    - [A Gentle Intro to Graph Theory](https://medium.com/basecs/a-gentle-introduction-to-graph-theory-77969829ead8)

### Graph Components  and Characteristics

<img src="https://www.dropbox.com/s/s73t4jezg0wz6ml/Nodes%20and%20Edges.png?raw=1" width =250>
- **Basic Pieces of a Graph**
    - __Node / Vertex__: The entity of analysis which has a relationship. 
        Node is used in the network context, vertex is used in the graph theory context, but commonly interchanged.

    - __Link / Edge / Relationship__: The connections between the nodes.
        Link is used in the network context, edge is used in the graph theory context, and all words are used interchangably with *relationship*.

    - __Attributes__: Both nodes and edges can store attributes, which contain additional data about that object.

    - __Weight__: A common *attribute* of edges, used to indicate *strength* or *value* of a relationship.

- **Terminology**
    - Adjacent Nodes:
        - Node v is adjacent to node u if and only if there exists an edge between u and v.

    - Path:
        - A path of length n from node u to note v is defined as sequence of n+1 nodes.
        $$P(u,v)=(v0,v1,v2,v3…….vn)$$

        
    - Degree of a node:
        - In undirected graph
            - A node's **degree** is the  # of nodes 'incident upon' the node. (AKA connected).
        - In a directed graph,
            - a node's **Indegree** is the # of arriving edges to the node
            - **Outdegree** is the # of departing edges. 
    - Isolated Nodes
        - Have no connection (degree=0)
        - Isolated nodes cannot be found by _breadth first search_ (BFS)
        
<img src="https://www.dropbox.com/s/6ssw6smhwsntktw/deg.png?raw=1" width=500>    

- **Graph Thoery Summary**

<img src="https://www.dropbox.com/s/g5sxt0udv7wkbck/summary.png?raw=1" width=700>


|                | Absent     | Present  |
|----------------|------------|----------|
| __Weights__ | Unweighted | Weighted |
| __Directionality__ | Undirected | Directed |


### Graphs in Python with NetworkX
