## Importing the Dataset

In [2]:
import pandas as pd

# URL of the dataset
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'

# Load the dataset
column_names = ['MPG', 'Cylinders', 'Displacement', 'Horsepower', 'Weight', 'Acceleration', 'Model Year', 'Origin']
df = pd.read_csv(url, names=column_names, na_values='?', comment='\t', sep=' ', skipinitialspace=True)

## Preprocessing the Data

In [4]:
# Dropping missing values
df = df.dropna()

# Converting categorical data if there's any
df['Origin'] = df['Origin'].map({1: 'USA', 2: 'Europe', 3: 'Japan'})
df = pd.get_dummies(df, columns=['Origin'])

## Splitting the Data

In [6]:
from sklearn.model_selection import train_test_split

X = df.drop('MPG', axis=1)
y = df['MPG']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Performing Regression Analysis

In [9]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Create a linear regression model
model = LinearRegression()

# Train the model
model.fit(X_train, y_train)

# Predict on the testing set
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'Mean Squared Error: {mse}')
print(f'R2 Score: {r2}')

Mean Squared Error: 10.60227901168834
R2 Score: 0.7922774714022587


The model shows a strong fit with an R² score of 0.79, indicating that it explains approximately 79% of the variance in the target variable. However, with a Mean Squared Error of 10.60, there's still a notable error margin in the predictions, suggesting areas for potential improvement.