# 01 â€“ GA4 Schema & Session-Level Feasibility

## Objective
Explore the GA4 sample dataset to understand event structure, session
reconstruction, and feasibility of building a session-level dataset
for customer-journey diagnostics and modeling.

## Key questions
- What events exist and how frequent are they?
- How are sessions identified?
- How many sessions end in a purchase?
- Which session-level features are feasible?


In [1]:
from google.cloud import bigquery
import pandas as pd

PROJECT_ID = "wide-origin-477108-i9"

client = bigquery.Client(project=PROJECT_ID)
client.project

'wide-origin-477108-i9'

In [2]:
dataset_id = "bigquery-public-data.ga4_obfuscated_sample_ecommerce"

datasets = [d.dataset_id for d in client.list_datasets("bigquery-public-data")]
"ga4_obfuscated_sample_ecommerce" in datasets

True

In [4]:
import sys
!{sys.executable} -m pip install db-dtypes



Collecting db-dtypes
  Using cached db_dtypes-1.5.0-py3-none-any.whl.metadata (3.4 kB)
Using cached db_dtypes-1.5.0-py3-none-any.whl (18 kB)
Installing collected packages: db-dtypes
Successfully installed db-dtypes-1.5.0


In [3]:
query = """
SELECT
  event_name,
  COUNT(*) AS n_events
FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
GROUP BY event_name
ORDER BY n_events DESC
LIMIT 30
"""

df_events = client.query(query).to_dataframe()
df_events



Unnamed: 0,event_name,n_events
0,page_view,1350428
1,user_engagement,1058721
2,scroll,493072
3,view_item,386068
4,session_start,354970
5,first_visit,257462
6,view_promotion,190104
7,add_to_cart,58543
8,begin_checkout,38757
9,select_item,31007


### Observations
- These events define the potential funnel.
- Purchase events exist and can be used as a session-level target (y).
- Event counts suggest strong class imbalance (expected in e-commerce).

In [4]:
query = """
SELECT
  event_name,
  event_timestamp,
  user_pseudo_id,
  event_params
FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
LIMIT 5
"""

client.query(query).to_dataframe()




Unnamed: 0,event_name,event_timestamp,user_pseudo_id,event_params
0,page_view,1609568188059459,1005484.1092567296,"[{'key': 'gclsrc', 'value': {'string_value': N..."
1,user_engagement,1609568195189041,1005484.1092567296,"[{'key': 'gclid', 'value': {'string_value': No..."
2,first_visit,1609568182969088,1005484.1092567296,"[{'key': 'page_title', 'value': {'string_value..."
3,page_view,1609568182969088,1005484.1092567296,"[{'key': 'all_data', 'value': {'string_value':..."
4,session_start,1609568182969088,1005484.1092567296,"[{'key': 'ga_session_number', 'value': {'strin..."
