![ga4](https://www.google-analytics.com/collect?v=2&tid=G-6VDTYWLKX6&cid=1&en=page_view&sid=1&dl=statmike%2Fvertex-ai-mlops%2Farchitectures%2Ftracking%2Fsetup%2Fga4&dt=GA4+Setup.ipynb)

## GA4 Setup

**Goal** 

Count how many times each document is viewed without tracking anything more than the view - no user information!

**Constraints**

This repository is primarily markdown documents (`.md`) and Jupyter Notebooks (`.ipynb`) which are both static when viewed. Viewers include: IDE like VSCode, JupyterLab, GitHub.com, Colab.  

**Approach**

Include a tracking pixel as an image at the top of each document.  As the files load in a viewer they load images `![](path/to/image)` as they render markdown.  

The pixels path is a GA4 measurement protocol path that includes tracking information.  The only information passed to the tracking pixel will be the document name and path within the repository.  **Use dummy values for session and user.**

**Storage**

Google Analytics can automatically export to BigQuery daily or streaming.

---
## Google Analytics Setup (GA4)

First, create a Google Analytics Account and a Property:
- Go to [Google Analytics](https://analytics.google.com) and login
- Go to `Admin` (lower left corner)
- `+ Create Account`
    1. Account name = vertex-ai-mlops, click `Next`
    2. Property name = github, click `Next`
    3. optional fill out business info, click `Create`
- Select the Account and Property, then on Property menu:
    - Click `Data Streams`
    - Click `Add stream` and select `Web`
        - Website URL is https://www.github.com, but can be anything!
        - Stream name = github
        - Make sure `Enhanced measurement` is selected
        - Click `Create stream`
        - You may be prompted to Install your Google tag, dismiss this setup by clicking `X`
    - Note the Measurment ID for this stream
    - Retrieve the API secret for this Measurement ID
        - Under `Web stream details` select `Measurment Protocol API secrets`
            - Navigation if needed: Admin > Account > Property > Data Streams > github (name assigned above) > Measurement Protocol API secrets
        - Select `Create` (upper right)
        - Nickname = vertex-ai-mlops, Click `Create`
        - Note the Secret value (but do not store in notebook!)

---
## Create Tracking Pixels

Tracking pixels are URLs constructed of information and using the measurment ID create during Google Analytics Setup.

>A seemingly not well documented version of the measurment protocol with `&v=2`, version 2, exists.  I discovered these blogs and tips oneline for it:
>- https://www.optimizesmart.com/what-is-measurement-protocol-in-google-analytics-4-ga4/
>- https://stackoverflow.com/questions/59264782/analytics-track-custom-events-in-new-webapp




The main url is: https://www.google-analytics.com/collect
Options are added to this url:
- `?v=2` - specifies version 2 of the measurment protocol
    - [Reference](https://developers.google.com/analytics/devguides/collection/protocol/v1/parameters#v)
- `?tid=<value here>` measurement id - points information to the property we created above
    - [Reference](https://developers.google.com/analytics/devguides/collection/protocol/v1/parameters#tid)
- `?cid=1` - the users client id, required when not sending `uid` (user id), in this case is set to dummy value of 1 for all users/clients.
    - [Reference](https://developers.google.com/analytics/devguides/collection/protocol/v1/parameters#cid)
- `?en=page_view`
    - [Reference](https://support.google.com/analytics/answer/9216061#)
- `?sid=1` - session id, is required, but is set to a dummy value of 1 (no cookies)
    - [Reference](https://developers.google.com/analytics/devguides/collection/protocol/v1/parameters)
- `?dt=` - the name of the file, make sure it is url encoded (space is %20)
    - [Reference](https://developers.google.com/analytics/devguides/collection/protocol/v1/parameters#dt)
- `?dl=` - the path to the file, make sure it is url encoded
    - [Reference](https://developers.google.com/analytics/devguides/collection/protocol/v1/parameters#dl)


The tracking pixels are automatically added to all `.md` and `.ipynb` files in this repository using the notebook [tracking_ga4_add.ipynb](./tracking_ga4_add.ipynb).



---
## GA4 Export To BigQuery

This is a process you setup that runs continously, not just one time.

**References**
- [GA4 BigQuery Export](https://support.google.com/analytics/answer/9358801?hl=en&utm_id=ad)
- [GA4 Setup BigQuery Export](https://support.google.com/analytics/answer/9823238?hl=en&ref_topic=9359001#zippy=%2Cin-this-article)

Setup Process:
- Note: Use the same login for Google Analytics and GCP.  This login for GCP needs owner access to the BigQuery project that will be used and the editor role for the Google Analytics Property created above.
- Go to [Google Analytics](https://analytics.google.com) and login
- Go to `Admin` (lower left corner)
- Select Account = vertex-ai-mlops (created above)
- Select Property = github (created above)
    - Select `BigQuery Links` under `Product Links`
    - Click `Link`
    - Click `Choose a BigQuery project`
    - Select a project from the list, click `Confirm`
    - Select a location form the list, US multi-regions, click `Next`
    - Select `Configure data streams and events` and select the data stream named github (created above). No need to exclude any events.
    - Click `Done`
    - Select Frequency - both `Daily` and `Streaming`
    - Click `Next`
    - Click `Submit`


---
## Data in BigQuery

**Dataset**

The process above creates a dataset in choosen BigQuery project that is named `analytics_########` where the `########` is the property id.

**Tables**

The daily tables are named `events_YYYYMMDD`.  
The streaming tables are named `events_intraday_YYYYMMDD`.

These are sharded tables that can be read individually or using a wildcard.  The read can be filtered with a `WHERE` statement that uses `_TABLE_SUFFIX`. [Reference](https://cloud.google.com/bigquery/docs/querying-wildcard-tables)

---
## Review Data In BigQuery

The following subsection setup a python session to interact with BigQuery and retrieve GA4 datasets and tables for review within this notebook.

---
### Colab Setup

To run this notebook in Colab click [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/architectures/tracking/setup/ga4/GA4%20Setup.ipynb) and run the cells in this section.  Otherwise, skip this section.

This cell will authenticate to GCP (follow prompts in the popup). 

In [2]:
PROJECT_ID = 'vertex-ai-mlops-369716' # replace with project ID

In [3]:
try:
    import google.colab
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
except Exception:
    pass

Updated property [core/project].


---
### Setup

In [4]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'vertex-ai-mlops-369716'

In [5]:
BQ_PROJECT = PROJECT_ID

In [6]:
from google.cloud import bigquery

In [7]:
bq = bigquery.Client(project = PROJECT_ID)

### List BigQuery Datasets with GA4 Tables

GA4 Exports create datasets that start with `analytics-` followed by the property id.

In [31]:
for d in list(bq.list_datasets()):
  if d.dataset_id.startswith('analytics_'):
    dataset = d
    print(dataset.dataset_id)
    break

analytics_343629755


In [32]:
dataset.dataset_id

'analytics_343629755'

In [33]:
dataset.full_dataset_id

'vertex-ai-mlops-369716:analytics_343629755'

### List BigQuery GA4 Tables in Dataset

In [34]:
for t in list(bq.list_tables(dataset)):
  print(t.full_table_id)

vertex-ai-mlops-369716:analytics_343629755.events_20221124
vertex-ai-mlops-369716:analytics_343629755.events_20221125
vertex-ai-mlops-369716:analytics_343629755.events_intraday_20221127
vertex-ai-mlops-369716:analytics_343629755.events_intraday_20221128
vertex-ai-mlops-369716:analytics_343629755.events_intraday_20221129
vertex-ai-mlops-369716:analytics_343629755.events_intraday_20221130
vertex-ai-mlops-369716:analytics_343629755.events_intraday_20221201
vertex-ai-mlops-369716:analytics_343629755.events_intraday_20221202
vertex-ai-mlops-369716:analytics_343629755.events_intraday_20221203
vertex-ai-mlops-369716:analytics_343629755.events_intraday_20221204
vertex-ai-mlops-369716:analytics_343629755.events_intraday_20221205
vertex-ai-mlops-369716:analytics_343629755.events_intraday_20221206
vertex-ai-mlops-369716:analytics_343629755.events_intraday_20221207
vertex-ai-mlops-369716:analytics_343629755.events_intraday_20221208
vertex-ai-mlops-369716:analytics_343629755.events_intraday_2022120

In [35]:
table = list(bq.list_tables(dataset))[-1]
table.full_table_id

'vertex-ai-mlops-369716:analytics_343629755.events_intraday_20230218'

In [36]:
bq.query(query = f"SELECT * FROM `{table.full_table_id.replace(':', '.')}` LIMIT 10").to_dataframe()

Unnamed: 0,event_date,event_timestamp,event_name,event_params,event_previous_timestamp,event_value_in_usd,event_bundle_sequence_id,event_server_timestamp_offset,user_id,user_pseudo_id,...,user_ltv,device,geo,app_info,traffic_source,stream_id,platform,event_dimensions,ecommerce,items
0,20230218,1676761164969530,page_view,"[{'key': 'page_title', 'value': {'string_value...",,,1637643834,,,1,...,"{'revenue': 0.0, 'currency': 'USD'}","{'category': 'desktop', 'mobile_brand_name': '...","{'continent': '(not set)', 'country': '', 'reg...",,,4308611918,WEB,,,[]
1,20230218,1676761166453183,page_view,"[{'key': 'page_title', 'value': {'string_value...",,,1639127487,,,1,...,"{'revenue': 0.0, 'currency': 'USD'}","{'category': 'desktop', 'mobile_brand_name': '...","{'continent': '(not set)', 'country': '', 'reg...",,,4308611918,WEB,,,[]
2,20230218,1676761530360230,page_view,"[{'key': 'page_title', 'value': {'string_value...",,,2003034534,,,1,...,"{'revenue': 0.0, 'currency': 'USD'}","{'category': 'desktop', 'mobile_brand_name': '...","{'continent': '(not set)', 'country': '', 'reg...",,,4308611918,WEB,,,[]
3,20230218,1676761536396303,page_view,"[{'key': 'ga_session_id', 'value': {'string_va...",,,2009070607,,,1,...,"{'revenue': 0.0, 'currency': 'USD'}","{'category': 'desktop', 'mobile_brand_name': '...","{'continent': '(not set)', 'country': '', 'reg...",,,4308611918,WEB,,,[]
4,20230218,1676762044808846,page_view,"[{'key': 'page_location', 'value': {'string_va...",,,-1777484146,,,1,...,"{'revenue': 0.0, 'currency': 'USD'}","{'category': 'desktop', 'mobile_brand_name': '...","{'continent': '(not set)', 'country': '', 'reg...",,,4308611918,WEB,,,[]
5,20230218,1676762483104543,page_view,"[{'key': 'page_location', 'value': {'string_va...",,,-1339188449,,,1,...,"{'revenue': 0.0, 'currency': 'USD'}","{'category': 'desktop', 'mobile_brand_name': '...","{'continent': '(not set)', 'country': '', 'reg...",,,4308611918,WEB,,,[]
6,20230218,1676762496533447,page_view,"[{'key': 'page_title', 'value': {'string_value...",,,-1325759545,,,1,...,"{'revenue': 0.0, 'currency': 'USD'}","{'category': 'desktop', 'mobile_brand_name': '...","{'continent': '(not set)', 'country': '', 'reg...",,,4308611918,WEB,,,[]
7,20230218,1676732542535220,page_view,"[{'key': 'page_location', 'value': {'string_va...",,,-1214986700,,,1,...,"{'revenue': 0.0, 'currency': 'USD'}","{'category': 'desktop', 'mobile_brand_name': N...","{'continent': '(not set)', 'country': '', 'reg...",,,4308611918,WEB,,,[]
8,20230218,1676732547057879,page_view,"[{'key': 'page_location', 'value': {'string_va...",,,-1210464041,,,1,...,"{'revenue': 0.0, 'currency': 'USD'}","{'category': 'desktop', 'mobile_brand_name': N...","{'continent': '(not set)', 'country': '', 'reg...",,,4308611918,WEB,,,[]
9,20230218,1676732738486707,page_view,"[{'key': 'page_title', 'value': {'string_value...",,,-1019035213,,,1,...,"{'revenue': 0.0, 'currency': 'USD'}","{'category': 'desktop', 'mobile_brand_name': N...","{'continent': '(not set)', 'country': '', 'reg...",,,4308611918,WEB,,,[]


### Review Table In Console

This create a hyperlink directly to the table retrieved above:

In [37]:
print(f"Review Table In Console:\nhttps://console.cloud.google.com/bigquery?project={PROJECT_ID}&ws=!1m5!1m4!4m3!1s{BQ_PROJECT}!2s{dataset.dataset_id}!3s{table.table_id}")

Review Table In Console:
https://console.cloud.google.com/bigquery?project=vertex-ai-mlops-369716&ws=!1m5!1m4!4m3!1svertex-ai-mlops-369716!2sanalytics_343629755!3sevents_intraday_20230218
