Predicting the future sales of a product helps a business manage 
    
    the manufacturing and advertising cost of the product.

In [2]:
# let's read the dataset using Pandas
    # dataset download web link
        # https://www.kaggle.com/bumba5341/advertisingcsv?select=Advertising.csv

import pandas as pd
df = pd.read_csv('Advertising.csv')
df

Unnamed: 0.1,Unnamed: 0,TV,Radio,Newspaper,Sales
0,1,230.1,37.8,69.2,22.1
1,2,44.5,39.3,45.1,10.4
2,3,17.2,45.9,69.3,9.3
3,4,151.5,41.3,58.5,18.5
4,5,180.8,10.8,58.4,12.9
...,...,...,...,...,...
195,196,38.2,3.7,13.8,7.6
196,197,94.2,4.9,8.1,9.7
197,198,177.0,9.3,6.4,12.8
198,199,283.6,42.0,66.2,25.5


In [1]:
# TV: Advertising cost spent in dollars for advertising on TV;
# Radio: Advertising cost spent in dollars for advertising on Radio;
# Newspaper: Advertising cost spent in dollars for advertising on Newspaper;
# Sales: Number of units sold;

# So the sales of the product depend on the advertisement cost of the product.

In [5]:
# let's check whether this dataset contains any null values:
df.isnull().sum()

# So this dataset doesn’t have any null values. 

Unnamed: 0    0
TV            0
Radio         0
Newspaper     0
Sales         0
dtype: int64

In [14]:
# let’s visualize the relationship between "the amount spent on advertising on TV" and "units sold":

import plotly.express as px
figure = px.scatter(data_frame = df,
                   x = 'Sales',
                   y = 'TV',
                   size = 'TV',
                   trendline = 'ols')
figure.show()

In [15]:
# let’s visualize the relationship between "the amount spent on advertising on newspapers" and "units sold":

import plotly.express as px
figure = px.scatter(data_frame = df,
                   x = 'Sales',
                   y = 'Newspaper',
                   #size = 'Newspaper',
                   trendline = 'ols')
figure.show()

In [16]:
# let’s visualize the relationship between "the amount spent on advertising on Radio" and "units sold":

import plotly.express as px
figure = px.scatter(data_frame = df,
                   x = 'Sales',
                   y = 'Radio',
                   #size = 'Radio',
                   trendline = 'ols')
figure.show()

In [29]:
# Observation:
# Out of all the amount spent on advertising on various platforms, 
    #the amount spent on advertising the product on TV results in more sales of the product

In [30]:
# let’s check the correlation of all the columns with the sales column:
correlation = df.corr()
correlation.Sales.sort_values(ascending=False)

Sales         1.000000
TV            0.782224
Radio         0.576223
Newspaper     0.228299
Unnamed: 0   -0.051616
Name: Sales, dtype: float64

In [31]:
#  Let's split the data into in-dependent(X) columns
X = df.drop(columns = ["Sales"])
X

Unnamed: 0.1,Unnamed: 0,TV,Radio,Newspaper
0,1,230.1,37.8,69.2
1,2,44.5,39.3,45.1
2,3,17.2,45.9,69.3
3,4,151.5,41.3,58.5
4,5,180.8,10.8,58.4
...,...,...,...,...
195,196,38.2,3.7,13.8
196,197,94.2,4.9,8.1
197,198,177.0,9.3,6.4
198,199,283.6,42.0,66.2


In [32]:
#  Let's split the data into in-dependent(X) columns
X = df[["TV", "Radio", "Newspaper"]]
X

Unnamed: 0,TV,Radio,Newspaper
0,230.1,37.8,69.2
1,44.5,39.3,45.1
2,17.2,45.9,69.3
3,151.5,41.3,58.5
4,180.8,10.8,58.4
...,...,...,...
195,38.2,3.7,13.8
196,94.2,4.9,8.1
197,177.0,9.3,6.4
198,283.6,42.0,66.2


In [33]:
#AND
#  Let's split the data into dependent (Y) column
Y = df[["Sales"]]
Y

Unnamed: 0,Sales
0,22.1
1,10.4
2,9.3
3,18.5
4,12.9
...,...
195,7.6
196,9.7
197,12.8
198,25.5


In [35]:
#  Let's split the data into training and test sets:

from sklearn.model_selection import train_test_split
Xtrain, Xtest, Ytrain, Ytest = train_test_split(X,Y,
                                               test_size=0.33,
                                               random_state=42)

In [36]:
# let’s train the Linear Regression model to predict future sales:

from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(Xtrain, Ytrain)

LinearRegression()

In [37]:
# Lets calculate the ACCURACY of our "Linear Regression" model
accuracy = model.score(Xtest, Ytest) 
print(accuracy)

print("accuracy is", "%.2f" % (accuracy * 100), " %")
# ""%.2f" %" limit the float to 2 decimal places

0.8555568430680087
accuracy is 85.56  %


### Prediction

In [38]:
Xtest

Unnamed: 0,TV,Radio,Newspaper
95,163.3,31.6,52.9
15,195.4,47.7,52.9
30,292.9,28.3,43.2
158,11.7,36.9,45.2
128,220.3,49.0,3.2
...,...,...,...
97,184.9,21.0,22.0
31,112.9,17.4,38.6
12,23.8,35.1,65.9
35,290.7,4.1,8.5


In [39]:
 # let’s  predict the Sales
        #by feeding about a transaction into the model:
#features = [TV, Radio, Newspaper]
model.predict([[19.4, 16.0, 22.3]])

array([[6.90611478]])

In [40]:
Ytest

Unnamed: 0,Sales
95,16.9
15,22.4
30,21.4
158,7.3
128,24.7
...,...
97,15.5
31,11.9
12,9.2
35,12.8


So this is how we can train a machine learning model to predict the future sales of a product. 

Predicting the future sales of a product helps a business
       
       manage the manufacturing and advertising cost of the product