## Dataset Features:
* Daily Time Spent on Site: Consumer's time on the site in minutes.
* Age: Age of the consumer in years.
* Area Income: Average income of the geographical area of the consumer.
* Daily Internet Usage: Average minutes per day the consumer is on the internet.
* Ad Topic Line: Headline of the advertisement.
* City: City of the consumer.
* Male: Binary indicator of whether the consumer is male (1 for male, 0 for female).
* Country: Country of the consumer.
* Timestamp: Time at which the consumer clicked on the ad or closed the window.
* Clicked on Ad: Binary label (0 or 1) indicating whether the consumer clicked on the advertisement.

## Project Tasks:

Data Analysis and Preprocessing: Explore and preprocess the dataset, handling missing values, encoding categorical features, and scaling numerical features.
Exploratory Data Analysis (EDA): Gain insights into the relationships between different features, identify patterns, and understand the distribution of the target variable.
Feature Engineering: Extract relevant features and create new ones that might enhance the predictive performance of the model.
Model Development: Build a machine learning model to predict whether a user will click on an ad. Evaluate the model's performance using appropriate metrics and fine-tune if necessary.
Deployment: Deploy the trained model, making it accessible for predictions in a real-world scenario. Choose an appropriate deployment platform and ensure that it integrates seamlessly.
Monitoring and Maintenance: Implement monitoring mechanisms to keep track of the model's performance over time. Regularly update the model as needed to maintain its effectiveness.

In [None]:
#
# conda  


In [None]:
# %pip install -r "requirements.txt"



In [None]:
import numpy as np
import pandas as pd


In [None]:
data_df  = pd.read_csv(filepath_or_buffer="./Artifacts/advertising.csv")

In [None]:
data_df.head()



In [None]:
data_df.shape

In [None]:
data_df.isna().count()


In [None]:
data_df.head()

In [None]:
data_df.info()

In [None]:
data_df.describe() # Learn

In [None]:
print(data_df["Ad Topic Line"].head(15))

In [None]:
import sklearn as sk
from sklearn.feature_extraction.text import TfidfVectorizer

In [None]:
tfidf_vectorizer = TfidfVectorizer()

In [None]:
# %pip install spacy

In [None]:
import gensim
from gensim.models import word2vec
from gensim.models.word2vec import Word2Vec
import spacy
import string

In [None]:
import gensim.downloader as api

In [None]:
wv = api.load('glove-twitter-50')

In [None]:
def sent_vec(sent):
    vector_size = wv.vector_size
    print(vector_size)
    wv_res = np.zeros(vector_size)
    print(wv_res)
    ctr = 1
    for w in sent:
        if w in wv:
            ctr += 1
            wv_res += wv[w]
            # print(wv_res)
    wv_res = wv_res/ctr
    # print(wv_res)
    return wv_res

In [None]:

data_df['vec'] = data_df['Ad Topic Line'].apply(sent_vec)


In [None]:
from sklearn.metrics.pairwise import cosine_similarity


In [None]:
data_df.head()


In [None]:
data_df.shape

In [None]:
data_df['City'].nunique()

In [None]:
data_df['Time'] = pd.to_datetime(data_df['Timestamp'])

In [None]:
data_df.drop(columns=['City','Country','Timestamp','Time'],inplace=True)


In [None]:
data_df = data_df.drop(columns='Ad Topic Line')

In [None]:
data_df.head()

In [None]:
data_df['Ad Topic Line'] = data_df['vec'].apply(lambda x: np.linalg.norm(x))


In [None]:
data_df.head()
data_df.drop(columns=['vec'],inplace=True)

In [None]:
import seaborn as sns
print(data_df.corr)
dataplt = sns.heatmap(data_df.corr(),cmap='YlGnBu',annot=True)

In [None]:
target_col = data_df.pop('Clicked on Ad')
data_df['Clicked on Ad']=target_col

In [None]:
data_df

In [None]:
X = data_df[["Daily Time Spent on Site","Age","Area Income","Daily Internet Usage","Male","Ad Topic Line"]]
y = data_df['Clicked on Ad']

In [None]:
X

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report,accuracy_score

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)
rf = RandomForestClassifier(max_features=100,random_state=0)
rf.fit(X_train,y_train)



In [None]:
predictions = rf.predict(X_test)


In [None]:
print(classification_report(predictions,y_test))

In [None]:
accuracy_score(y_test, predictions)