# Sales Prediction using Python

This notebook demonstrates how to use Python and machine learning to predict product sales based on advertising spend across TV, Radio, and Newspaper channels. The workflow includes data loading, exploration, visualization, model training, evaluation, and interpretation.

## 1. Import Required Libraries
Import pandas, numpy, matplotlib, seaborn, and scikit-learn modules required for data analysis and modeling.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

## 2. Load the Dataset
Load the 'advertising.csv' dataset into a pandas DataFrame and display the first few rows.

In [None]:
df = pd.read_csv('advertising.csv')
df.head()

## 3. Explore the Data
Display dataset information and summary statistics using DataFrame methods like info() and describe().

In [None]:
df.info()
df.describe()

## 4. Visualize Feature Relationships
Use seaborn's pairplot to visualize the relationships between advertising channels (TV, Radio, Newspaper) and Sales.

In [None]:
sns.pairplot(df, x_vars=['TV', 'Radio', 'Newspaper'], y_vars='Sales', height=5, aspect=0.7, kind='reg')
plt.show()

## 5. Preprocess the Data
Check for missing values or categorical variables and note that no preprocessing is needed for this dataset.

In [None]:
# In this dataset, there are no missing values or categorical variables.
pass

## 6. Split Data into Training and Test Sets
Split the features and target variable into training and test sets using train_test_split.

In [None]:
X = df[['TV', 'Radio', 'Newspaper']]
y = df['Sales']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## 7. Build and Train the Linear Regression Model
Create a LinearRegression model and fit it to the training data.

In [None]:
model = LinearRegression()
model.fit(X_train, y_train)

## 8. Evaluate the Model
Predict sales on the test set and calculate metrics such as RMSE and R^2 score.

In [None]:
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print(f"Root Mean Squared Error: {rmse:.2f}")
print(f"R^2 Score: {r2:.2f}")

## 9. Visualize Actual vs Predicted Sales
Create a scatter plot comparing actual and predicted sales values.

In [None]:
plt.figure(figsize=(8,6))
plt.scatter(y_test, y_pred)
plt.xlabel("Actual Sales")
plt.ylabel("Predicted Sales")
plt.title("Actual vs Predicted Sales")
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--')
plt.show()

## 10. Show Model Coefficients
Display the learned coefficients for each advertising channel in a DataFrame.

In [None]:
coeff_df = pd.DataFrame(model.coef_, X.columns, columns=['Coefficient'])
coeff_df