# Auto MPG — Quick Exploration

Load `data/auto-mpg.tab` and run initial checks: head(), info(), describe(), missing values, and simple visualizations.

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# Load the Auto MPG data
cols = ['mpg','cylinders','displacement','horsepower','weight','acceleration','model_year','origin','car_name']
df = pd.read_csv(r'..\\data\\auto-mpg.tab', sep='\t', names=cols, comment='#', na_values='?')

df.head()

In [None]:
# Quick checks
print('Shape:', df.shape)
print('\nInfo:\n')
print(df.info())

# Describe numeric columns
print('\nDescribe:\n', df.describe())

# Missing values
print('\nMissing values:\n', df.isnull().sum())

# Simple plots
sns.histplot(df['mpg'].dropna(), kde=True)
plt.title('MPG Distribution')
plt.show()

sns.scatterplot(data=df, x='horsepower', y='mpg')
plt.title('MPG vs Horsepower')
plt.show()


In [None]:
# Grouped plots inside the notebook
grp = df.groupby('cylinders')
avg_hp = grp['horsepower'].mean()
avg_mpg = grp['mpg'].mean()

plt.figure(figsize=(6,4))
avg_hp.plot(kind='bar', color='C1')
plt.xlabel('Cylinders')
plt.ylabel('Average Horsepower')
plt.title('Average Horsepower by Cylinder Count')
plt.show()

plt.figure(figsize=(6,4))
avg_mpg.plot(kind='bar', color='C2')
plt.xlabel('Cylinders')
plt.ylabel('Average MPG')
plt.title('Average MPG by Cylinder Count')
plt.show()

plt.figure(figsize=(6,5))
sns.scatterplot(data=df, x='horsepower', y='mpg', hue='cylinders', palette='tab10')
plt.title('MPG vs Horsepower (colored by cylinders)')
plt.show()

## Rationale

- Average horsepower by cylinders: bar chart shows how engine size (proxy by cylinder count) increases horsepower; high cylinders typically mean larger engines and higher horsepower. This explains why 8-cylinder cars top the horsepower list.
- Average MPG by cylinders: bar chart shows fuel efficiency decreases as cylinder count increases; smaller engines (4 cylinders) tend to have higher MPG on average.
- MPG vs Horsepower scatter: shows the negative relationship between horsepower and MPG, and the cylinder coloring highlights the clusters (4, 6, 8 cylinders). This supports the insight that higher horsepower (usually higher cylinders) leads to lower MPG.

Next steps: run the notebook cells to render the plots. If you want, I can run them here and export the images into `results/`.