In [1]:
pip --version python

pip 25.1 from C:\Users\302sy\anaconda3\Lib\site-packages\pip (python 3.13)

Note: you may need to restart the kernel to use updated packages.


# AdEase Time Series - Forecasting webpage views for smarter Ad Placements
**Leveraging AI and timeseries forecasting to Optimize Digital Advertising**

## Introduction
In today's digital-first world, advertising is everywhere -- but success depends on putting the right message in front of the right audience at the right time.

This project explores how AdEase, an advertising and marketing company, can use data science to make ad placements smarter and more cost-effective.

We are provided with a fascinating dataset:
- 145,000+ webpages
- Daily page views tracked over 550 days
- Additional information about special events or campaigns that might influence traffic(for english pages.)

The core challenge is to: 
- Understand past page view patterns
- Forecast future views
- Use these predictions to optimize ad placements, ensuring our clients achieve maximum visibility at minimum cost.

Whats in it for you:
- Sudden spike in page views (e.g., during elections, sports events, or cultural moments) creates oppurtunities for targeted ads.
- Different regions and languages have unique trends, so ads must adapt to the right audience.
- with accurate forecasting, business can place ads before the wave hits, riding trends instead of chasing them.


In [2]:
# Import libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")

## Load and Understand the Data

In [8]:
# adjust path
from pathlib import Path
DATA_DIR = Path('data')

# load the data
original_train_data = pd.read_csv(DATA_DIR/"train_1.csv")
original_exog_data= pd.read_csv(DATA_DIR/'Exog_Campaign_eng.csv')

# create copies for better preserving
exog_df = original_exog_data.copy(deep= True)
df = original_data.copy(deep = True)
df.head()

Unnamed: 0,Page,2015-07-01,2015-07-02,2015-07-03,2015-07-04,2015-07-05,2015-07-06,2015-07-07,2015-07-08,2015-07-09,...,2016-12-22,2016-12-23,2016-12-24,2016-12-25,2016-12-26,2016-12-27,2016-12-28,2016-12-29,2016-12-30,2016-12-31
0,2NE1_zh.wikipedia.org_all-access_spider,18.0,11.0,5.0,13.0,14.0,9.0,9.0,22.0,26.0,...,32.0,63.0,15.0,26.0,14.0,20.0,22.0,19.0,18.0,20.0
1,2PM_zh.wikipedia.org_all-access_spider,11.0,14.0,15.0,18.0,11.0,13.0,22.0,11.0,10.0,...,17.0,42.0,28.0,15.0,9.0,30.0,52.0,45.0,26.0,20.0
2,3C_zh.wikipedia.org_all-access_spider,1.0,0.0,1.0,1.0,0.0,4.0,0.0,3.0,4.0,...,3.0,1.0,1.0,7.0,4.0,4.0,6.0,3.0,4.0,17.0
3,4minute_zh.wikipedia.org_all-access_spider,35.0,13.0,10.0,94.0,4.0,26.0,14.0,9.0,11.0,...,32.0,10.0,26.0,27.0,16.0,11.0,17.0,19.0,10.0,11.0
4,52_Hz_I_Love_You_zh.wikipedia.org_all-access_s...,,,,,,,,,,...,48.0,9.0,25.0,13.0,3.0,11.0,27.0,13.0,36.0,10.0


In [9]:
exog_df.head()

Unnamed: 0,Exog
0,0
1,0
2,0
3,0
4,0


In [17]:
dr, dc = df.shape
er, ec = exog_df.shape

# 
print(f"Number of rows in views data main: {dr}")
print(f"Number of columns in views data main: {dc}\n")

print(f"Number of rows in exog data: {er}")
print(f"Number of columns in exog data: {ec}")

Number of rows in views data main: 145063
Number of columns in views data main: 551

Number of rows in exog data: 550
Number of columns in exog data: 1


In [11]:
# What does each row represent (a page) and each column represent(a date)


(550, 1)

In [12]:
# do we have exactly 550 days of data? Any missing days

Unnamed: 0,Page,2015-07-01,2015-07-02,2015-07-03,2015-07-04,2015-07-05,2015-07-06,2015-07-07,2015-07-08,2015-07-09,...,2016-12-22,2016-12-23,2016-12-24,2016-12-25,2016-12-26,2016-12-27,2016-12-28,2016-12-29,2016-12-30,2016-12-31
145058,Underworld_(serie_de_películas)_es.wikipedia.o...,,,,,,,,,,...,,,,,13.0,12.0,13.0,3.0,5.0,10.0
145059,Resident_Evil:_Capítulo_Final_es.wikipedia.org...,,,,,,,,,,...,,,,,,,,,,
145060,Enamorándome_de_Ramón_es.wikipedia.org_all-acc...,,,,,,,,,,...,,,,,,,,,,
145061,Hasta_el_último_hombre_es.wikipedia.org_all-ac...,,,,,,,,,,...,,,,,,,,,,
145062,Francisco_el_matemático_(serie_de_televisión_d...,,,,,,,,,,...,,,,,,,,,,
