# Logistic Regression Example

Used to predict a target binary variable, although variations can be used to predict multiclass variables

Logistic function transforms inputs into a range between 0 and 1

Dataset source: https://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+

Parameter C: Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.

In [13]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

In [7]:
occupancy_test1 = pd.read_csv('occupancy_data/datatest.txt')

In [8]:
occupancy_test2 = pd.read_csv('occupancy_data/datatest2.txt')

In [23]:
occupancy_training = pd.read_csv('occupancy_data/datatraining.txt')

In [24]:
occupancy_test1.shape

(2665, 7)

In [25]:
occupancy_test2.shape

(9752, 7)

In [26]:
occupancy_training.shape

(8143, 7)

In [27]:
occupancy_training.head()

Unnamed: 0,date,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy
1,2015-02-04 17:51:00,23.18,27.272,426.0,721.25,0.004793,1
2,2015-02-04 17:51:59,23.15,27.2675,429.5,714.0,0.004783,1
3,2015-02-04 17:53:00,23.15,27.245,426.0,713.5,0.004779,1
4,2015-02-04 17:54:00,23.15,27.2,426.0,708.25,0.004772,1
5,2015-02-04 17:55:00,23.1,27.2,426.0,704.5,0.004757,1


In [28]:
occupancy_training.describe()

Unnamed: 0,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy
count,8143.0,8143.0,8143.0,8143.0,8143.0,8143.0
mean,20.619084,25.731507,119.519375,606.546243,0.003863,0.21233
std,1.016916,5.531211,194.755805,314.320877,0.000852,0.408982
min,19.0,16.745,0.0,412.75,0.002674,0.0
25%,19.7,20.2,0.0,439.0,0.003078,0.0
50%,20.39,26.2225,0.0,453.5,0.003801,0.0
75%,21.39,30.533333,256.375,638.833333,0.004352,0.0
max,23.18,39.1175,1546.333333,2028.5,0.006476,1.0


In [49]:
y = occupancy_training.Occupancy
X = occupancy_training.drop(columns=['Occupancy', 'date'])

In [50]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)

In [51]:
X_train.head()

Unnamed: 0,Temperature,Humidity,Light,CO2,HumidityRatio
3706,19.7,19.39,0.0,452.666667,0.002744
5187,19.26,31.1,14.0,434.0,0.004294
759,20.89,23.1,0.0,448.5,0.003523
5111,19.2,31.39,0.0,436.333333,0.004318
4195,21.5,18.79,37.0,440.0,0.002973


In [56]:
for c in [0.000001, 0.00001, 0.0001, 0.1, 1]:
    regressor = LogisticRegression(C = c)
    regressor.fit(X_train, y_train)
    print("Logistic Regression with C = {:.6f}".format(c))
    print("Training Accuracy: {}".format(regressor.score(X_train, y_train)))
    print("Test Accuracy: {}".format(regressor.score(X_test, y_test)))
    print("----")

Logistic Regression with C = 0.000001
Training Accuracy: 0.9210741771737351
Test Accuracy: 0.9263261296660118
----
Logistic Regression with C = 0.000010
Training Accuracy: 0.9575896512199116
Test Accuracy: 0.9587426326129665
----
Logistic Regression with C = 0.000100
Training Accuracy: 0.9772392336662846
Test Accuracy: 0.9744597249508841
----
Logistic Regression with C = 0.100000
Training Accuracy: 0.9883739970525627
Test Accuracy: 0.9882121807465619
----
Logistic Regression with C = 1.000000
Training Accuracy: 0.9883739970525627
Test Accuracy: 0.9882121807465619
----


Higher value of C implies lower regularization