Author: Marco Pellegrino\
Year: 2024

With these SQL queries, the page language and days of the week impact are analyzed in BigQuery and Looker Studio.

# Language Analysis

In [None]:
CREATE OR REPLACE TABLE `web-traffic-time-series.web_traffic_dataset.web_traffic_language_analysis` AS (
  SELECT 
    date,
    language,
    total_traffic,
    avg_traffic,
    n_pages
  FROM (
      SELECT
          SUM(traffic) AS total_traffic,
          AVG(traffic) AS avg_traffic,
          COUNT(*) AS n_pages,
          date,
          CASE
            WHEN REGEXP_CONTAINS(page, r'[a-z][a-z].wikipedia.org') THEN
              SUBSTR(REGEXP_EXTRACT(page, r'(.{2}).wikipedia.org'), 1, 2)
            ELSE
              'na'
          END AS language
      FROM 
          `web_traffic_dataset.web_traffic_tb`
      GROUP BY 
        date,
        language
  )
)

Data visualization in Looker Studio:

![Plot language analysis](../plots_looker/plot_language_analysis.png)

Insights:
- Traffic of English-based pages is considerably larger than in other languages
- English and Russian demand showed very large spikes around August 2016
  - Maybe due to the Summer Olympics and the election in the US

# Weekday Analysis

In [None]:
CREATE OR REPLACE TABLE `web-traffic-time-series.web_traffic_dataset.web_traffic_weekday_analysis` AS
(
  SELECT
    FORMAT_DATE('%A', date) AS day_of_week,
    SUM(traffic) AS total_traffic,
    AVG(traffic) AS avg_traffic,
    CASE FORMAT_DATE('%A', date)
      WHEN 'Monday' THEN 1
      WHEN 'Tuesday' THEN 2
      WHEN 'Wednesday' THEN 3
      WHEN 'Thursday' THEN 4
      WHEN 'Friday' THEN 5
      WHEN 'Saturday' THEN 6
      WHEN 'Sunday' THEN 7
    END AS day_number
  FROM `web-traffic-time-series.web_traffic_dataset.web_traffic_tb`
  GROUP BY
    day_of_week, day_number
  ORDER BY
    day_number
);

Data visualization in Looker Studio:

![Weekday analysis](../plots_looker/plot_weekday_analysis.png)

Insights:
- More views on Monday and Sunday

# Months Analysis

In [None]:
CREATE OR REPLACE TABLE `web-traffic-time-series.web_traffic_dataset.web_traffic_monthly_analysis` AS
(
  SELECT
    EXTRACT(MONTH FROM date) AS month_number,
    FORMAT_TIMESTAMP('%B', TIMESTAMP(date)) AS month_name,
    SUM(traffic) AS total_traffic,
    AVG(traffic) AS avg_traffic
  FROM `web-traffic-time-series.web_traffic_dataset.web_traffic_tb`
  GROUP BY
    month_number,
    month_name
  ORDER BY
    month_number
);

Data visualization in Looker Studio:

![Weekday analysis](../plots_looker/plot_monthly_analysis.png)

Insights:
- Less during the warmer months, but peak in August maybe due to elections and sports