<a href="https://colab.research.google.com/github/sadansabo/AI/blob/main/module_1_mini_project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Mini Project: Perform Multiple Linear Regression

## Overview
This project focuses on using multiple linear regression to analyze the relationship between sales and various marketing promotion strategies. By leveraging multiple independent variables, such as TV, social media, radio, and influencer promotions, we aim to build a predictive model that estimates sales based on these factors. This project will cover the complete data science pipeline, including data exploration, preprocessing, model building, evaluation, and interpretation.

## Objective
Develop a multiple linear regression model to predict sales based on marketing promotion data. The project will involve exploring the dataset, selecting relevant independent variables, fitting the model, checking assumptions, and interpreting the results to provide actionable insights.

## Learning Outcomes
1. Understand the concept of multiple linear regression and its applications.
2. Learn to preprocess and explore data for regression analysis.
3. Gain experience in fitting and evaluating a multiple linear regression model.
4. Check and validate regression assumptions.
5. Interpret model coefficients and communicate results to stakeholders.

---


## Step 1: Define the Problem
### Task:
Understand the problem and its real-world implications. The goal is to predict sales based on various marketing promotion strategies, which can help the business optimize its marketing efforts and allocate resources effectively.

### Mini-task:
Write a brief paragraph on how predicting sales using multiple linear regression can benefit the business.

---

## Step 2: Data Collection
### Task:
Collect the dataset required for building the regression model. The dataset used in this project is `marketing_sales_data.csv`, which contains information about TV, social media, radio, and influencer promotions, along with sales data.

### Mini-task:
Load the dataset and inspect the first few rows to understand its structure.

#### Hint:
Use the `pandas` library to load the dataset and display the first five rows.

```python
import pandas as pd

# Load the dataset
data = pd.read_csv('marketing_sales_data.csv')

# Display the first five rows
### YOUR CODE HERE ###
```

---

In [None]:
# Import Data Files from Google Drive

import requests
import pandas as pd
from io import StringIO
def read_gd(sharingurl):
    file_id = sharingurl.split('/')[-2]
    download_url='https://drive.google.com/uc?export=download&id=' + file_id
    url = requests.get(download_url).text
    csv_raw = StringIO(url)
    return csv_raw

url = "https://drive.google.com/file/d/1WChLou3qt_JaPjYLZBMHJ8WxSssQ-enL/view?usp=drive_link"
gdd = read_gd(url)

df = pd.read_csv(gdd)

df.head()

Unnamed: 0,TV,Radio,Social Media,Influencer,Sales
0,Low,3.51807,2.29379,Micro,55.261284
1,Low,7.756876,2.572287,Mega,67.574904
2,High,20.348988,1.22718,Micro,272.250108
3,Medium,20.108487,2.728374,Mega,195.102176
4,High,31.6532,7.776978,Nano,273.960377


## Step 3: Exploratory Data Analysis (EDA)
### Task:
Analyze the dataset to understand the distribution of variables, identify relationships between variables, and prepare the data for modeling.

### Mini-task:
Create a pairplot to visualize the relationships between continuous variables in the dataset.

#### Hint:
Use `seaborn.pairplot()` to create a pairplot of the continuous variables.

```python
import seaborn as sns

# Create a pairplot of the continuous variables
### YOUR CODE HERE ###
```

---


## Step 4: Data Preparation
### Task:
Prepare the data for regression analysis by handling missing values, encoding categorical variables, and splitting the data into training and testing sets.

### Mini-task:
Drop rows with missing values and encode categorical variables if necessary.

#### Hint:
Use `data.dropna()` to remove rows with missing values and `pd.get_dummies()` to encode categorical variables.

```python
# Drop rows with missing values
### YOUR CODE HERE ###

# Encode categorical variables (if needed)
### YOUR CODE HERE ###
```

---

## Step 5: Model Building
### Task:
Fit a multiple linear regression model to predict sales using the selected independent variables.

### Mini-task:
Define the OLS formula and fit the model using the `statsmodels` library.

#### Hint:
Use `statsmodels.formula.api.ols()` to define the model and fit it to the data.

```python
import statsmodels.formula.api as smf

# Define the OLS formula
### YOUR CODE HERE ###

# Fit the model
### YOUR CODE HERE ###

# Display the model summary
### YOUR CODE HERE ###
```

---

## Step 6: Model Evaluation
### Task:
Evaluate the model by checking the regression assumptions, including linearity, independence, normality, constant variance, and multicollinearity.

### Mini-task:
Create scatterplots to check the linearity assumption and calculate the residuals to check the normality assumption.

#### Hint:
Use `seaborn.scatterplot()` to create scatterplots and `statsmodels.qqplot()` to check the normality of residuals.

```python
import matplotlib.pyplot as plt
import statsmodels.api as sm

# Create scatterplots for linearity
### YOUR CODE HERE ###

# Calculate residuals and create a Q-Q plot
### YOUR CODE HERE ###
```

---


## Step 7: Results and Interpretation
### Task:
Interpret the model results, including the R-squared value, coefficients, and their statistical significance.

### Mini-task:
Display the model summary and interpret the coefficients.

#### Hint:
Use `model.summary()` to display the model results and interpret the coefficients.

```python
# Display the model summary
### YOUR CODE HERE ###

# Interpret the coefficients
### YOUR CODE HERE ###
```

---

## Considerations
**What are some key takeaways you learned from this project?**
- Reflect on the importance of checking regression assumptions and interpreting model coefficients.
- Consider how the model's insights can be used to optimize marketing strategies.

**How would you share your findings with a team?**
- Prepare a presentation with visualizations and key metrics.
- Discuss the impact of different marketing strategies on sales.

**What would you share with and recommend to stakeholders?**
- Highlight the key factors that contribute most to sales.
- Recommend strategies to optimize marketing efforts based on the model's insights.

In [2]:
!git init

[33mhint: Using 'master' as the name for the initial branch. This default branch name[m
[33mhint: is subject to change. To configure the initial branch name to use in all[m
[33mhint: [m
[33mhint: 	git config --global init.defaultBranch <name>[m
[33mhint: [m
[33mhint: Names commonly chosen instead of 'master' are 'main', 'trunk' and[m
[33mhint: 'development'. The just-created branch can be renamed via this command:[m
[33mhint: [m
[33mhint: 	git branch -m <name>[m
Initialized empty Git repository in /content/.git/


# Task
Explain the error in the selected code. If possible, fix the error and incorporate the changes into the existing code. Otherwise, try to diagnose the error. Link this project to my GitHub repository.

## Install and configure git

### Subtask:
Ensure Git is installed and configured with your username and email.


**Reasoning**:
Check if Git is installed.



In [2]:
!git init

[33mhint: Using 'master' as the name for the initial branch. This default branch name[m
[33mhint: is subject to change. To configure the initial branch name to use in all[m
[33mhint: [m
[33mhint: 	git config --global init.defaultBranch <name>[m
[33mhint: [m
[33mhint: Names commonly chosen instead of 'master' are 'main', 'trunk' and[m
[33mhint: 'development'. The just-created branch can be renamed via this command:[m
[33mhint: [m
[33mhint: 	git branch -m <name>[m
Initialized empty Git repository in /content/.git/


In [4]:
!git commit -m "Initial commit of project files"

Author identity unknown

*** Please tell me who you are.

Run

  git config --global user.email "you@example.com"
  git config --global user.name "Your Name"

to set your account's default identity.
Omit --global to set the identity only in this repository.

fatal: unable to auto-detect email address (got 'root@235200cfe521.(none)')


In [6]:
!git config --global user.email "sadansabo@gmail.com"
!git config --global user.name "sadansabo"

In [9]:
!git push -u origin master

error: src refspec master does not match any
[31merror: failed to push some refs to 'origin'
[m

In [10]:
!git status
!git branch

On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31m.config/[m
	[31msample_data/[m

nothing added to commit but untracked files present (use "git add" to track)


In [11]:
!git add .
!git commit -m "Initial commit"

[master (root-commit) aa0c1af] Initial commit
 21 files changed, 51025 insertions(+)
 create mode 100644 .config/.last_opt_in_prompt.yaml
 create mode 100644 .config/.last_survey_prompt.yaml
 create mode 100644 .config/.last_update_check.json
 create mode 100644 .config/active_config
 create mode 100644 .config/config_sentinel
 create mode 100644 .config/configurations/config_default
 create mode 100644 .config/default_configs.db
 create mode 100644 .config/gce
 create mode 100644 .config/hidden_gcloud_config_universe_descriptor_data_cache_configs.db
 create mode 100644 .config/logs/2025.08.14/13.34.48.261592.log
 create mode 100644 .config/logs/2025.08.14/13.35.24.373035.log
 create mode 100644 .config/logs/2025.08.14/13.35.33.045372.log
 create mode 100644 .config/logs/2025.08.14/13.35.36.983070.log
 create mode 100644 .config/logs/2025.08.14/13.35.45.872550.log
 create mode 100644 .config/logs/2025.08.14/13.35.46.562476.log
 create mode 100755 sample_data/README.md
 create mode 1007

## Push changes to GitHub

### Subtask:
Push your committed changes from your local repository to the GitHub repository.

#### Hint:
Use `git push origin master` to push your changes to the `master` branch.

In [12]:
!git push origin master

fatal: 'origin' does not appear to be a git repository
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.


In [13]:
!git remote -v

In [14]:
!git remote add origin <repository_url>

/bin/bash: -c: line 1: syntax error near unexpected token `newline'
/bin/bash: -c: line 1: `git remote add origin <repository_url>'


In [16]:
!git remote add origin https://github.com/sadansabo/MULTIPLE-LINEAR-REGRESSION.git

## Push changes to GitHub

### Subtask:
Push your committed changes from your local repository to the GitHub repository.

#### Hint:
Use `git push origin master` to push your changes to the `master` branch.

In [17]:
!git push origin master

fatal: could not read Username for 'https://github.com': No such device or address


## Generate an SSH Key Pair in Colab

### Subtask:
Generate an SSH key pair using `ssh-keygen`.

#### Hint:
Run the `ssh-keygen` command in a code cell. You can use the default file location and optionally set a passphrase.

In [19]:
!ssh-keygen -t rsa -b 4096 -C "sadansabo@gmail.com"

Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): /content/sample_data/README.md
/content/sample_data/README.md already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase): 3mtt
Enter same passphrase again: 3mtt
Your identification has been saved in /content/sample_data/README.md
Your public key has been saved in /content/sample_data/README.md.pub
The key fingerprint is:
SHA256:m5HbxwXzBX2sLkODBxNxG8FnJlwihKhDKr7zaXCkTAQ sadansabo@gmail.com
The key's randomart image is:
+---[RSA 4096]----+
|E      . o===ooo |
| .  . . . oo+++.+|
|.  o .     ++= .o|
|....o    .. ++.. |
|+.o  .  S  o oo  |
| = .     * .o..  |
|  +     + . oo   |
| o ..      .     |
|  +o             |
+----[SHA256]-----+


In [None]:
h

## Copy your Public SSH Key

### Subtask:
Display the content of your public SSH key file to copy it.

#### Hint:
Use the `cat` command to display the content of the `.pub` file. Remember to use the path where your key was saved.

In [20]:
!cat /content/sample_data/README.md.pub

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQC+fjLLNimue/LUX91Ru4DkaeiBRWcbAFLE3ExuVbJW0uL5v33rhc2UpwuWlUbCPGhGRzlb4I9EwG3j6Q1tAZgWsRTdxs8iF287pi0dxNCdyKfA5aN5OIwCQ4XsFtAxa3sxKr1yu5pfQh6RvVTLJG0Tf3c7ls5yZ8Ee89exgvDFxWDEp5GeurGzD2kwjOOymDcgb2Vc+F6Cm77wKemlN/Z9VkhkvwdnYtUTGAH1D/qHfA5uoTF+0wIl4GFXsW8ST2UF4pVm+9jMatuWdEgc21L5Q1WCNBqoHLA8JmQ7QzEkXxHxSnSALvkG4W+d7Hi1g2FYPXgG41dDXPDHbONCrqw+cXUU0PmS10wA48SMGo7jiAVoyUU7rgEvzZUtCVeOIlnM+NxQy5DtqLI0UC/7hhQO2QxtXPqRyY1EIJyLp1e/kA0dGcs55ChYJPp3sUl2dNfcGU8c/XNTSVfxUfA8praZJeNWuiOYX56RPt5TbOrQEAi0KuE8kllXp2z0oYCL+1qYzPmEPNk/SApNDFMYKiV8SLQV2ll5W9q1K8+TePZbh/TKVhJiiLmWUcUtllbDIrmAq8WenYpGifDG8vZAT841VuBLbEDXLjOuP5RwCR/RGNl6kt/hMOiP8oZNVKu0q1+f0Oc0jVgDOulsWZnejoqoa43FHYHomdQnIkM+p14enw== sadansabo@gmail.com


## Update Remote URL to SSH

### Subtask:
Remove the existing HTTPS remote origin.

#### Hint:
Use `git remote remove origin`.

In [21]:
!git remote remove origin

### Subtask:
Add the remote origin again using the SSH URL format.

#### Hint:
Use `git remote add origin git@github.com:your_username/your_repository_name.git`. Replace `your_username` and `your_repository_name` with your GitHub username and repository name.

In [23]:
!git remote add origin git@github.com:sadansabo/MULTIPLE-LINEAR-REGRESSION.git

error: remote origin already exists.


## Push changes to GitHub

### Subtask:
Push your committed changes from your local repository to the GitHub repository.

#### Hint:
Use `git push origin master` to push your changes to the `master` branch.

In [24]:
!git push origin master

Host key verification failed.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
