### Fitting Logistic Regression

In this first notebook, you will be fitting a logistic regression model to a dataset where we would like to predict if a transaction is fraud or not.

To get started let's read in the libraries and take a quick look at the dataset.

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm


df = pd.read_csv('./fraud_dataset.csv')
df.head()

  from pandas.core import datetools


Unnamed: 0,transaction_id,duration,day,fraud
0,28891,21.3026,weekend,False
1,61629,22.932765,weekend,False
2,53707,32.694992,weekday,False
3,47812,32.784252,weekend,False
4,43455,17.756828,weekend,False


`1.` As you can see, there are two columns that need to be changed to dummy variables.  Replace each of the current columns to the dummy version.  Use the 1 for `weekday` and `True`, and 0 otherwise.  Use the first quiz to answer a few questions about the dataset.

In [None]:
df[['weekday','No_weekday']]=pd.get_dummies(df['day'])
df[['No_fraud','fraud']]=pd.get_dummies(df['fraud'])
df.head()

In [None]:
df.describe()

In [None]:
print(df[df['fraud']==1]['duration'].mean())
print(df[df['fraud']==0]['duration'].mean())

`2.` Now that you have dummy variables, fit a logistic regression model to predict if a transaction is fraud using both day and duration.  Don't forget an intercept!  Use the second quiz below to assure you fit the model correctly.

In [None]:
df['intercept'] = 1
logistic_model = sm.Logit(df['fraud'], df[['intercept','weekday','duration']])
res = logistic_model.fit()
res.summary()

In [None]:
a,b,c = res.params
a,b,c

In [None]:
# weekday or NOT
p_weekday = np.exp(a+b*1+c*0)
p_NOTweekday = np.exp(a+b*0+c*0)
p_weekday / p_NOTweekday

In [None]:
# duration
p_1min = np.exp(a+b*0+c*1)
p_2min = np.exp(a+b*0+c*2)
p_1min / p_2min