# Training a model using Logistic Regression

## Preprocessing

First we will start by importing the necessary libraries and loading the data.


In [3]:
import pandas as pd

df = pd.read_csv('../data/airlines_delay_cleaned.csv')
df.head()


Unnamed: 0,Time,Length,Airline,AirportFrom,AirportTo,DayOfWeek,Delayed
0,1296.0,141.0,DL,ATL,HOU,1,0
1,360.0,146.0,OO,COS,ORD,4,0
2,1170.0,143.0,B6,BOS,CLT,3,0
3,692.0,98.0,FL,BMI,ATL,4,0
4,580.0,60.0,WN,MSY,BHM,4,0


Now we will apply the preprocessing function from the `preprocessing.py` file to the data. This function will apply the following transformations:
- Separate the data into features and target
- Scale the numerical features and encode the categorical features
- Split the data into train and test sets

In [5]:
from preprocessing import preprocess_data

X_train, X_test, y_train, y_test = preprocess_data(df)

After preprocessing, we will start the training.

In [6]:
from sklearn.linear_model import LogisticRegression

# Initialize Logistic Regression model
lr = LogisticRegression(max_iter=1000)

# Fit the model to your training data
lr.fit(X_train, y_train)


LogisticRegression(max_iter=1000)

After training, we can evaluate the model.

In [7]:
from sklearn.metrics import accuracy_score

# Make predictions using your model
y_pred = lr.predict(X_test)

# Evaluate your model
print('Accuracy:', accuracy_score(y_test, y_pred))


Accuracy: 0.5892268230120924


Now we will save and export the model into our folder `Models`.

In [8]:
import pickle

with open('../Models/logistic_regression_model.pkl', 'wb') as f:
    pickle.dump(lr, f)

