In [1]:
import warnings
warnings.filterwarnings("ignore")

In this Notebook we will go over what Logistic Regression is, and how to implement it in Python with [RAPIDS](https://medium.com/future-vision/what-is-rapids-ai-7e552d80a1d2?source=friends_link&sk=64b79c363beeffb9923e16482f3977cc) cuML.

This Notebook can be run with a free GPU at [app.blazingsql.com](http://bit.ly/intro_ds_notebooks): `git clone https://github.com/Dropout-Analytics/cuml_logistic_regression`

# Beginner's Guide to Logistic Regression with cuML

Logistic regression is a model used for predicting the probability of events, given some other measurements. Logistic Regression is used when the dependent variable ("target") is categorical.

For example,
- Will the team win (1) or lose (0) this game?
- Are users going to stop using our app (1) or not (0)?

Logistic regression can also be used in non-binary situations, but let's cover that in a later post and stick to binary logistic regression for now.

![Logistic Regression gif (University of Toronto)](https://cdn-images-1.medium.com/max/800/0*JgBI4I1QeTYQRj8j.gif)

[Read more on Medium](https://medium.com/dropout-analytics/beginners-guide-to-logistic-regression-with-cuml-5061086d8694?source=friends_link&sk=2d8d0f7ddd43ccaaf264afcbadeea231)

In [None]:
import cudf

df = cudf.read_csv('https://raw.githubusercontent.com/gumdropsteve/datasets/master/dog_or_horse.csv')

In [None]:
df

## EDA - What's the data look like?
Before jumping in, let's explore our dataset. By converting cuDF `.to_pandas()`, we can utilize Matplotlib to visualize the overlaps in height and weight.

In [None]:
import matplotlib.pyplot as plt

# scatter dogs


# scatter horses




#### Height Histagram
First let's plot just the heights, by using a histagram we can also see how the height of our samples from each animal are distributed.

In [None]:
# histagram dog heights in purple 

# histagram horse heights in teal

# add plot details


#### Weight Scatter Plot
And now let's do something similar for weight. Since the distributions looked pretty uneven, let's focus in on their overlap when it comes to weight. We can do this by scattering the weights on the x-axis with a common y value of 0.

In [None]:
import numpy as np

# scatter dog weights

# scatter horse weights

# add plot details


#### Data Prep
Using cuML's `train_test_split()` we can split our dataset into smaller training (`train`) and testing (`test`) datasets. This allows us to test our model with real data that it has never seen before. We'll drop the `type` column as the model will use `target` to differentiate between dogs and horses.

In [None]:
# 200 rows (160/40), 3 columns (2/1(''))


## Logistic Regression with cuML

![Dog rides her Horse](https://cdn-images-1.medium.com/max/800/0*ChQFw9yu7BD6Fz7g.gif)

`.fit()` the model to train it.

Make predictions.

#### How'd we do?

# Continued Learning
Here are some resources to help fill in any gaps and provide a more complete understanding of Logistic Regression.

### **Reading**

#### CSC 411: Lecture 04: Logistic Regression
- University of Toronto: [04_prob_classif_handout.pdf](https://www.cs.toronto.edu/~urtasun/courses/CSC411_Fall16/04_prob_classif_handout.pdf)
- by Richard Zemel, Raquel Urtasun and Sanja Fidler

#### Logistic Regression 
- Wikipedia: [wikipedia.org/wiki/Logistic_regression](https://wikipedia.org/wiki/Logistic_regression)

### **Videos**

#### StatQuest: Logistic Regression
- Watch on YouTube: https://youtu.be/yIYKR4sgzI8
- Channel: StatQuest with Josh Starmer ([Subscribe](https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw?sub_confirmation=1))

In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo('yIYKR4sgzI8', width=(1280*0.667), height=(720*0.667))

#### Laplace Transform: First Order Equation
- Watch on YouTube: [https://youtu.be/9RJml41PFnc](https://youtu.be/9RJml41PFnc)
- Channel: MIT OpenCourseWare ([Subscribe](https://www.youtube.com/channel/UCEBb1b_L6zDS3xTUrIALZOw?sub_confirmation=1))

In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo('9RJml41PFnc', width=(1280*0.667), height=(720*0.667))

#### Lecture 6.1 — Logistic Regression | Classification — — [ Machine Learning | Andrew Ng]
- Watch on YouTube: https://youtu.be/-la3q9d7AKQ
- Channel:  Artificial Intelligence - All in One ([Subscribe](https://www.youtube.com/channel/UC5zx8Owijmv-bbhAK6Z9apg?sub_confirmation=1))
  - **Note**: I'd recomend the [whole 6.x Lecture](https://www.youtube.com/playlist?list=PLNeKWBMsAzboR8vvhnlanxCNr2V7ITuxy) (6.1 - 6.7) if you want to understand the math behind logistic regression.

In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo('-la3q9d7AKQ', width=(854), height=(480))