# 📈 Time Series Forecasting of Wikipedia Pageviews for Figma (software)

This notebook explores daily pageviews for the [Figma Wikipedia article](https://en.wikipedia.org/wiki/Figma_(software)) from January 2022 onwards.  
The objective is to build a time series model that forecasts daily pageviews until mid-2026.  
This task was developed as part of a Data Science Internship technical challenge at SearchPilot.


## 📚 Table of Contents
1. [Introduction](#introduction)
2. [Importing Libraries](#import)
3. [Data Loading](#load)
4. [Exploratory Data Analysis](#eda)
5. [Time Series Decomposition](#decomp)
6. [Forecasting with Prophet](#prophet)
7. [Forecast Visualization](#viz)
8. [Conclusion](#conclusion)

In [10]:
!pip install prophet

Collecting prophet
  Downloading prophet-1.1.7-py3-none-macosx_10_11_x86_64.whl.metadata (3.5 kB)
Collecting cmdstanpy>=1.0.4 (from prophet)
  Downloading cmdstanpy-1.2.5-py3-none-any.whl.metadata (4.0 kB)
Collecting holidays<1,>=0.25 (from prophet)
  Downloading holidays-0.74-py3-none-any.whl.metadata (39 kB)
Collecting stanio<2.0.0,>=0.4.0 (from cmdstanpy>=1.0.4->prophet)
  Downloading stanio-0.5.1-py3-none-any.whl.metadata (1.6 kB)
Downloading prophet-1.1.7-py3-none-macosx_10_11_x86_64.whl (8.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.8/8.8 MB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hDownloading cmdstanpy-1.2.5-py3-none-any.whl (94 kB)
Downloading holidays-0.74-py3-none-any.whl (990 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m991.0/991.0 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0mm
[?25hDownloading stanio-0.5.1-py3-none-any.whl (8.1 kB)
Installing collected packages: stanio, holidays, cmd

In [14]:
# Data manipulation
import pandas as pd
import numpy as np

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Time series forecasting
from prophet import Prophet

# Date handling
from datetime import datetime

# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

%matplotlib inline

## Introduction
This notebook aims to forecast the **daily number of pageviews** for the Wikipedia page "*Figma (software)*" from now (June 2025) until **mid-2026**.

To do so, we apply **time series forecasting techniques** based on historical data collected from Wikipedia's public API. We include multiple years of data to capture **seasonal trends**, **long-term patterns**, and **periodic fluctuations** in user interest.


In [18]:
import requests
import pandas as pd
from datetime import datetime

def fetch_wikipedia_pageviews(article, start_date, end_date, project='en.wikipedia', access='all-access', agent='user'):
    """
    Fetches daily pageviews for a Wikipedia article using the Wikimedia REST API.
    """
    url = f'https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/{project}/{access}/{agent}/{article}/daily/{start_date}/{end_date}'
    headers = {
        'User-Agent': 'SamiraDSInternProject/1.0 (samira.yousefzadeh@example.com)'  # replace with your real email if needed
    }
    response = requests.get(url, headers=headers)
    
    if response.status_code != 200:
        raise Exception(f"API request failed with status code {response.status_code}")
    
    data = response.json()
    records = []
    for item in data['items']:
        date = datetime.strptime(str(item['timestamp']), '%Y%m%d%H')
        views = item['views']
        records.append({'date': date, 'views': views})
    
    df = pd.DataFrame(records)
    return df

# Parameters
article_title = 'Figma_(software)'
start = '20220101'
end = datetime.today().strftime('%Y%m%d')

# Fetch data
df = fetch_wikipedia_pageviews(article_title, start, end)


In [20]:
df.head()

Unnamed: 0,date,views
0,2022-01-01,632
1,2022-01-02,742
2,2022-01-03,1134
3,2022-01-04,1217
4,2022-01-05,1378


In [22]:
df.describe()

Unnamed: 0,date,views
count,1250,1250.0
mean,2023-09-17 12:00:00,948.308
min,2022-01-01 00:00:00,13.0
25%,2022-11-09 06:00:00,51.25
50%,2023-09-17 12:00:00,985.0
75%,2024-07-25 18:00:00,1563.75
max,2025-06-03 00:00:00,36996.0
std,,1697.810192


In [24]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1250 entries, 0 to 1249
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   date    1250 non-null   datetime64[ns]
 1   views   1250 non-null   int64         
dtypes: datetime64[ns](1), int64(1)
memory usage: 19.7 KB


In [26]:
df.to_csv("figma_wikipedia_pageviews_2022_onward.csv", index=False)

In [None]:
import os
print(os.getcwd())