# Analyzing Ad Budgets for Different Media Channels:
The given dataset contains ad budgets for different media channels and the corresponding ad sales of a firm.

### I'll perform the following tasks here:
1. I'll find the features or media channels used by the firm
2. Find the sales figures for each channel
3. I'll create a model to predict the sales outcome
4. Split the dataset as training and testing datasets for the model
5. Finally, I'll calculate the Mean Squared Error (MSE)

#### Importing the dataset

In [1]:
#Importing the required libraries
import pandas as pd

In [2]:
#Importing the advertising dataset
df = pd.read_csv('Advertising Budget and Sales.csv')

#### Analyzing the dataset

In [3]:
#Viewing the initial few records of the dataset
df.head()

Unnamed: 0.1,Unnamed: 0,TV Ad Budget ($),Radio Ad Budget ($),Newspaper Ad Budget ($),Sales ($)
0,1,230.1,37.8,69.2,22.1
1,2,44.5,39.3,45.1,10.4
2,3,17.2,45.9,69.3,9.3
3,4,151.5,41.3,58.5,18.5
4,5,180.8,10.8,58.4,12.9


In [4]:
#Checking the total number of elements in the dataset
df.size

1000

#### Finding the features or media channels used by the firm

In [5]:
#Checking the number of observations (rows) and attributes (columns) in the dataset
df.shape

(200, 5)

In [6]:
#Viewing the names of each of the attributes
df.columns

Index(['Unnamed: 0', 'TV Ad Budget ($)', 'Radio Ad Budget ($)',
       'Newspaper Ad Budget ($)', 'Sales ($)'],
      dtype='object')

#### Creating objects to train and test the model; find the sales figures for each channel

In [7]:
#Creating a feature object from the columns
x_features=df[['TV Ad Budget ($)','Radio Ad Budget ($)','Newspaper Ad Budget ($)']]

In [8]:
#Viewing the feature object
x_features.head()

Unnamed: 0,TV Ad Budget ($),Radio Ad Budget ($),Newspaper Ad Budget ($)
0,230.1,37.8,69.2
1,44.5,39.3,45.1
2,17.2,45.9,69.3
3,151.5,41.3,58.5
4,180.8,10.8,58.4


In [9]:
#Creating a target object (Hint: use the sales column as it is the response of the dataset)
y_target=df['Sales ($)']

In [10]:
#Viewing the target object
y_target.head()

0    22.1
1    10.4
2     9.3
3    18.5
4    12.9
Name: Sales ($), dtype: float64

In [11]:
#Verifying if all the observations have been captured in the feature object
x_features.shape

(200, 3)

In [12]:
#Verifying if all the observations have been captured in the target object
y_target.shape

(200,)

#### Splitting the original dataset into training and testing datasets for the model

In [13]:
#Splitting the dataset (by default, 75% is the training data and 25% is the testing data)
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x_features, y_target, random_state=1)

In [14]:
#Verifying if the training and testing datasets are split correctly (Hint: use the shape() method)
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)

(150, 3)
(150,)
(50, 3)
(50,)


#### Creating a model  to predict the sales outcome

In [15]:
#Creating a linear regression model
from sklearn.linear_model import LinearRegression
linreg=LinearRegression()
linreg.fit(x_train,y_train)

LinearRegression()

In [16]:
#Printing the intercept and coefficients 
print(linreg.intercept_)
print(linreg.coef_)

2.8769666223179318
[0.04656457 0.17915812 0.00345046]


In [17]:
#Predicting the outcome for the testing dataset
y_predict=linreg.predict(x_test)
y_predict

array([21.70910292, 16.41055243,  7.60955058, 17.80769552, 18.6146359 ,
       23.83573998, 16.32488681, 13.43225536,  9.17173403, 17.333853  ,
       14.44479482,  9.83511973, 17.18797614, 16.73086831, 15.05529391,
       15.61434433, 12.42541574, 17.17716376, 11.08827566, 18.00537501,
        9.28438889, 12.98458458,  8.79950614, 10.42382499, 11.3846456 ,
       14.98082512,  9.78853268, 19.39643187, 18.18099936, 17.12807566,
       21.54670213, 14.69809481, 16.24641438, 12.32114579, 19.92422501,
       15.32498602, 13.88726522, 10.03162255, 20.93105915,  7.44936831,
        3.64695761,  7.22020178,  5.9962782 , 18.43381853,  8.39408045,
       14.08371047, 15.02195699, 20.35836418, 20.57036347, 19.60636679])

#### Calculating the Mean Squared Error (MSE)

In [18]:
#Importing required libraries for calculating MSE (mean squared error)
from sklearn import metrics
import numpy as np

In [19]:
#Calculating the MSE
print(np.sqrt(metrics.mean_squared_error(y_test,y_predict)))

1.4046514230328955


The task is done. Thank you for checking out my notebook. Regards,
* Rachit Shukla :)