In this notebook, we are going to create 2 AI models. One a regression model and the other a classification model.

The first model is a regression model that can predict the price of new and used phones (regression).

The second model is a classification that can predict whether a person has diabetes depending on the certain characteristics.

We are going to have to create these models and save them to files for use later

The model training process is similar for both of them

1. Import the data
2. Separate the features from the outcomes (X and Y)
3. Clean the data (convert from string to float, remove NaNs etc.)
4. Create and fit the model to the features and outcomes

In [14]:
# Imports

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

from matplotlib.widgets import TextBox

import pickle

# Task 1: Phone price prediction, regression model

In [16]:
# 1. import the data
phone_data = pd.read_csv("used_device_data.csv")

# Print first 5 rows and data types
print(phone_data.dtypes)
print(phone_data.head())

device_brand              object
os                        object
screen_size              float64
4g                        object
5g                        object
rear_camera_mp           float64
front_camera_mp          float64
internal_memory          float64
ram                      float64
battery                  float64
weight                   float64
release_year               int64
days_used                  int64
normalized_used_price    float64
normalized_new_price     float64
dtype: object
  device_brand       os  screen_size   4g   5g  rear_camera_mp  \
0        Honor  Android        14.50  yes   no            13.0   
1        Honor  Android        17.30  yes  yes            13.0   
2        Honor  Android        16.69  yes  yes            13.0   
3        Honor  Android        25.50  yes  yes            13.0   
4        Honor  Android        15.32  yes   no            13.0   

   front_camera_mp  internal_memory  ram  battery  weight  release_year  \
0              5.0 

In [27]:
# Convert 4g and 5g columns to 1s (yes) and 0s (no)
phone_data['4g'], unique_4g = pd.factorize(phone_data['4g'])
phone_data['5g'], unique_5g = pd.factorize(phone_data['5g'])
phone_data['os'], unique_os = pd.factorize(phone_data['os'])
phone_data['device_brand'], unique_brand = pd.factorize(phone_data['device_brand'])

# Drop na
phone_data = phone_data.apply(lambda x: x.fillna(x.mean()))

print(f"OSs: {unique_os}, brands: {unique_brand}")
print(phone_data.head(10))

OSs: Index([0, 1, 2, 3], dtype='int64'), brands: Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33],
      dtype='int64')
   device_brand  os  screen_size  4g  5g  rear_camera_mp  front_camera_mp  \
0             0   0        14.50   0   0            13.0              5.0   
1             0   0        17.30   0   0            13.0             16.0   
2             0   0        16.69   0   0            13.0              8.0   
3             0   0        25.50   0   0            13.0              8.0   
4             0   0        15.32   0   0            13.0              8.0   
5             0   0        16.23   0   0            13.0              8.0   
6             0   0        13.84   0   0             8.0              5.0   
7             0   0        15.77   0   0            13.0              8.0   
8             0   0        15.32   0   0            13.0             16.0   
9        

In [28]:
# Separate features from targets
target_columns = [ column for column in phone_data.columns if column not in ["normalized_used_price", "normalized_new_price"] ]

X = phone_data[["device_brand", "os", "screen_size", "4g", "5g", "rear_camera_mp", "front_camera_mp", "internal_memory", "ram", "battery", "weight", "release_year", "days_used"]]
y = phone_data['normalized_new_price']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train the model
model = LinearRegression()
model.fit(X_train, y_train)

# Evaluate the model
y_pred = model.predict(X_test)
accuracy = model.score(X_test, y_test)
print(f'Model Accuracy: {accuracy:.2f}')

# Saving the model to a file
with open('phone_model.pkl', 'wb') as f:
    pickle.dump(model, f)

Model Accuracy: 0.55


# Task 2: Diabetes prediction, classification model

In [4]:
diabetes_data = pd.read_csv("diabetes.csv")

# Print first 5 rows and data types
print(diabetes_data.dtypes)
print(diabetes_data.head())

Pregnancies                   int64
Glucose                       int64
BloodPressure                 int64
SkinThickness                 int64
Insulin                       int64
BMI                         float64
DiabetesPedigreeFunction    float64
Age                           int64
Outcome                       int64
dtype: object
   Pregnancies  Glucose  BloodPressure  SkinThickness  Insulin   BMI  \
0            6      148             72             35        0  33.6   
1            1       85             66             29        0  26.6   
2            8      183             64              0        0  23.3   
3            1       89             66             23       94  28.1   
4            0      137             40             35      168  43.1   

   DiabetesPedigreeFunction  Age  Outcome  
0                     0.627   50        1  
1                     0.351   31        0  
2                     0.672   32        1  
3                     0.167   21        0  
4        

In [13]:
X = diabetes_data[["Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI", "DiabetesPedigreeFunction", "Age"]]
y = diabetes_data["Outcome"]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train the model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Evaluate the model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Model Accuracy: {accuracy:.2f}')

# Saving the model to a file
with open('diabetes_model.pkl', 'wb') as f:
    pickle.dump(model, f)

Model Accuracy: 0.74


# Task 3: using the models applications, 