<a href="https://colab.research.google.com/github/flaviarbatista/Assignments/blob/main/Lab_6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Lab 6: Introduction to Artificial Intelligence**
### **Name:** Flavia Batista
### **Course:** Data Analytics and Business Intelligence Analyst
### **Institution:** Willis College

## **Walkthrough of a Simple machine Learning Project cycle**

### **_Future Sales Prediction_**
This dataset provides information about the sales of a product and the advertising expenditures incurred by the business across various platforms. The following is a description of each column in the dataset:

- **TV**: Advertising cost spent in dollars for advertising on TV;
- **Radio**: Advertising cost spent in dollars for advertising on Radio;
- **Newspaper**: Advertising cost spent in dollars for advertising on Newspaper;
- **Sales**: Number of units sold;


In this dataset, the sales of the product are influenced by the advertising expenditures across different platforms. With this understanding of the dataset, the following section will guide you through the process of predicting future sales using machine learning techniques in Python.

## Set Up Git


In [1]:
!apt-get install -y git
!git config --global user.email "flavia.bi.progress@gmail.com"
!git config --global user.name "flaviarbatista"

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
git is already the newest version (1:2.34.1-1ubuntu1.15).
0 upgraded, 0 newly installed, 0 to remove and 41 not upgraded.


In [2]:
import getpass, os
token = getpass.getpass('Token')
os.environ['GHTOKEN'] = token

Token··········


## Mount Google Drive

In [3]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


## Inspect

In [4]:
!ls "/content/drive/MyDrive/Colab Notebooks"

 advertising.csv  'Lab 6.ipynb'


## Clone the Repository

In [5]:
!git clone https://github.com/flaviarbatista/Assignments.git

Cloning into 'Assignments'...
remote: Enumerating objects: 118, done.[K
remote: Counting objects: 100% (11/11), done.[K
remote: Compressing objects: 100% (11/11), done.[K
remote: Total 118 (delta 4), reused 0 (delta 0), pack-reused 107 (from 2)[K
Receiving objects: 100% (118/118), 2.25 MiB | 16.71 MiB/s, done.
Resolving deltas: 100% (66/66), done.


### **Data Collection or Loading**

In [6]:
# Import libraries and load the dataset
# You might need to load the data into the same directory

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

data = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/advertising.csv")
data.head()

Unnamed: 0,TV,radio,newspaper,sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,9.3
3,151.5,41.3,58.5,18.5
4,180.8,10.8,58.4,12.9


### **EDA and Data Preprocessing**

In [7]:
# Let’s have a look at whether this dataset contains any null values or not

data.isnull().sum()

Unnamed: 0,0
TV,0
radio,0
newspaper,0
sales,0


In [8]:
# Let's Visualise the data
# Visualise relationship between amount spendt on platforms and theie corresponding units sold

import plotly.express as px
import plotly.graph_objects as go

# TV

figure = px.scatter(data_frame = data, x="sales", # Changed 'Sales' to 'sales'
                    y="TV", size="TV", trendline="ols")
figure.show()

In [9]:
figure = px.scatter(data_frame = data, x="sales", # Changed 'Sales' to 'sales' to match the column name
                    y="newspaper", size="newspaper", trendline="ols")
figure.show()

In [10]:
# Radio
figure = px.scatter(data_frame = data, x="sales",
                    y="radio", size="radio", trendline="ols")
figure.show()

**Exercise**:
You have just done a repetitive task to visualise similar data  for the three platforms, there is an opportunity here to use reusable function.

Create a function to perform the plots by taking the column name as parameters

In [11]:
# @title Create a function to visualise the data
# exercise

# def <call-it-a-suitable-name(camelCase)>:
  #the rest of the function goes here
import plotly.express as px

def visualizeSalesData(data, platform):
  """
  Visualizes the relationship between advertising expenditure on a given platform and sales.

  Args:
    data: The pandas DataFrame containing the advertising data.
    platform: The name of the advertising platform (e.g., "TV", "radio", "newspaper").

  Returns:
    None. Displays the generated scatter plot.
  """

  # Use the original platform name instead of converting to lowercase to match column name
  figure = px.scatter(data_frame=data,
                      x="sales",
                      y=platform,
                      size=platform,
                      trendline="ols")
  figure.show()

# Example usage:
visualizeSalesData(data, "TV") # Changed "tV" to "TV" to match the column name
visualizeSalesData(data, "radio")
visualizeSalesData(data, "newspaper")

In [12]:
# let’s have a look at the correlation of all the columns with the sales column:

correlation = data.corr()

correlation["sales"].sort_values(ascending=False)

Unnamed: 0,sales
sales,1.0
TV,0.782224
radio,0.576223
newspaper,0.228299


###**ML Model Training**

In [13]:
# Before training, we need to create train and test Split

# Before training, we need to create train and test Split

x = np.array(data.drop(columns=["sales"])) # Use columns keyword instead of positional argument for axis
y = np.array(data["sales"])
xtrain, xtest, ytrain, ytest = train_test_split(x, y,
                                                test_size=0.2,
                                                random_state=42)

In [14]:
# Now let’s train the model to predict future sales

model = LinearRegression() # this is a model we earlier imported from sklearn.linear_model
model.fit(xtrain, ytrain)

model.score(xtest, ytest)

0.899438024100912

###**Model Evaluation**

**Exercise**

Discuss methods to evaluate the performance of this ML trained model and implement one of the methods.

Is this model reliable to predict sales? Why not?

In [15]:
# model evaluation
# Model evaluation using R-squared
r_squared = model.score(xtest, ytest)
f"R-squared: {r_squared}"

'R-squared: 0.899438024100912'

### **Model Prediction**

In [16]:
#features = [[TV, Radio, Newspaper]]

features = np.array([[230.1, 37.8, 69.2]])
print(model.predict(features))

[20.61397147]


## Commit and Push Changes

In [17]:
%cd /content/Assignments

/content/Assignments


In [18]:
!git add --all

In [19]:
!git status

On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean


In [20]:
!git commit -m "Complete Lab_6"

On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean


In [21]:
!git checkout

Your branch is up to date with 'origin/main'.


In [22]:
!git fetch origin
!git pull origin main

From https://github.com/flaviarbatista/Assignments
 * branch            main       -> FETCH_HEAD
Already up to date.


In [23]:
!git push https://$GHTOKEN@github.com/flaviarbatista/Assignments.git main

Everything up-to-date


In [24]:
os.environ.pop('GHTOKEN', None)
print("GHTOKEN removed from the session.")

GHTOKEN removed from the session.
