Step 1: Import the necessary libraries

We'll start by importing the necessary libraries for data manipulation, visualization, and modeling.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score
  

Step 2: Load the dataset

Next, we'll load the "Wine Quality" dataset from the UCI Machine Learning Repository.

In [2]:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
data = pd.read_csv(url, delimiter=';')


Step 3: Exploratory Data Analysis (EDA)

We'll now perform some exploratory data analysis (EDA) on the dataset to get an understanding of the data.

In [3]:
# Check the first few rows of the dataset
print(data.head())

# Check the shape of the dataset
print(data.shape)

# Check for missing values
print(data.isnull().sum())

# Check the summary statistics of the dataset
print(data.describe())

# Check the correlation between variables
print(data.corr())


   fixed acidity  volatile acidity  citric acid  residual sugar  chlorides  \
0            7.4              0.70         0.00             1.9      0.076   
1            7.8              0.88         0.00             2.6      0.098   
2            7.8              0.76         0.04             2.3      0.092   
3           11.2              0.28         0.56             1.9      0.075   
4            7.4              0.70         0.00             1.9      0.076   

   free sulfur dioxide  total sulfur dioxide  density    pH  sulphates  \
0                 11.0                  34.0   0.9978  3.51       0.56   
1                 25.0                  67.0   0.9968  3.20       0.68   
2                 15.0                  54.0   0.9970  3.26       0.65   
3                 17.0                  60.0   0.9980  3.16       0.58   
4                 11.0                  34.0   0.9978  3.51       0.56   

   alcohol  quality  
0      9.4        5  
1      9.8        5  
2      9.8        5 

Step 4: Data Preparation

Next, we'll prepare the data for modeling by splitting it into training and testing datasets.

In [4]:
# Convert the quality column into a binary classification problem
data["good_wine"] = data["quality"] >= 7

# Split the dataset into training and testing sets
X = data.drop(['quality', 'good_wine'], axis=1)
y = data['quality']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Split the binary classification dataset into training and testing sets
X_c = data.drop(['quality', 'good_wine'], axis=1)
y_c = data['good_wine']
X_train_c, X_test_c, y_train_c, y_test_c = train_test_split(X_c, y_c, test_size=0.3, random_state=42)


Step 5: Modeling and Evaluation

We'll now implement and evaluate linear regression, decision tree, and logistic regression models on the prepared datasets.
##Linear Regression

In [5]:
# Train the model
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = lr_model.predict(X_test)

# Evaluate the model
print("Linear Regression Model")
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R^2 Score:", r2_score(y_test, y_pred))


Linear Regression Model
Mean Squared Error: 0.41123487175042034
R^2 Score: 0.3513885332505231


In [None]:
##Decision Tree Regression

In [6]:
# Train the model
dt_model = DecisionTreeRegressor(random_state=42)
dt_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = dt_model.predict(X_test)

# Evaluate the model
print("Decision Tree Regression Model")
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R^2 Score:", r2_score(y_test, y_pred))


Decision Tree Regression Model
Mean Squared Error: 0.625
R^2 Score: 0.014232025137083437


Logistic Regression

In [7]:
# Train the model
log_model = LogisticRegression()
log_model.fit(X_train_c, y_train_c)

# Make predictions on the test set
y_pred


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


array([6., 5., 5., 5., 6., 5., 5., 5., 7., 6., 7., 6., 5., 5., 5., 6., 5.,
       6., 7., 5., 5., 6., 6., 6., 6., 6., 7., 5., 5., 6., 5., 6., 6., 6.,
       5., 5., 7., 6., 5., 6., 5., 7., 6., 5., 6., 6., 6., 7., 5., 6., 5.,
       6., 6., 7., 5., 4., 6., 7., 5., 5., 6., 8., 6., 6., 6., 5., 7., 6.,
       7., 5., 6., 5., 6., 6., 6., 5., 7., 5., 5., 7., 5., 7., 5., 6., 7.,
       7., 5., 6., 7., 6., 6., 6., 4., 5., 5., 5., 5., 6., 5., 5., 3., 5.,
       6., 6., 6., 7., 6., 5., 6., 5., 7., 5., 6., 5., 5., 6., 5., 6., 5.,
       6., 6., 6., 6., 6., 6., 5., 7., 6., 5., 6., 6., 6., 5., 6., 6., 5.,
       5., 6., 4., 6., 7., 7., 6., 4., 6., 5., 6., 6., 5., 5., 6., 6., 6.,
       5., 6., 5., 5., 7., 5., 6., 6., 5., 6., 5., 5., 6., 6., 6., 5., 5.,
       5., 5., 7., 6., 7., 6., 6., 5., 5., 5., 6., 5., 5., 6., 5., 5., 6.,
       5., 7., 6., 5., 5., 5., 6., 5., 5., 3., 6., 6., 7., 5., 6., 7., 5.,
       7., 5., 5., 4., 6., 5., 8., 5., 5., 7., 7., 5., 6., 5., 6., 5., 6.,
       6., 7., 6., 4., 5.