![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2Farchitectures%2Ftracking%2Fsetup%2Fpixel&file=prod-tracking-pixel.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/architectures/tracking/setup/pixel/prod-tracking-pixel.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2Farchitectures%2Ftracking%2Fsetup%2Fpixel%2Fprod-tracking-pixel.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/architectures/tracking/setup/pixel/prod-tracking-pixel.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/architectures/tracking/setup/pixel/prod-tracking-pixel.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

# Production: Custom Tracking With Data Privacy

The builds on the learning and experimentation in:
- [developing-tracking-pixel](../../pixel/developing-tracking-pixel.ipynb)

**Note:** This was developed to replace and migrate away from the GA4 method covered in [../ga4/readme.md](../ga4/readme.md)

Read more at: [readme](./readme.md)


## Production

Create BigQuery Table:
- `pixel-tracking-data`

Create PubSub Topic:
- `pixel-tracking-data`

Create Two cloud functions:
- `pixel-tracking`
    - receives request, validates that it is from repository, returns a generic tracking pixel
    - send validates request to pubsub topic: `pixel-tracking-data`
- `pixel-tracking-data`
    - receives data via Pubsub topic `pixel-tracking-data`
    - apped records to BigQuery Table `pixel-tracking-data`

---
## Colab Setup

When running this notebook in [Colab](https://colab.google/) or [Colab Enterprise](https://cloud.google.com/colab/docs/introduction), this section will authenticate to GCP (follow prompts in the popup) and set the current project for the session.

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    from google.colab import auth
    auth.authenticate_user(project_id = PROJECT_ID)
    print('Colab authorized to GCP')
except Exception:
    print('Not a Colab Environment')
    pass

Not a Colab Environment


---
## Installs

The list `packages` contains tuples of package import names and install names.  If the import name is not found then the install name is used to install quitely for the current user.

In [3]:
# tuples of (import name, install name, min_version)
packages = [
    ('google.cloud.functions', 'google-cloud-functions'),
    ('google.cloud.storage', 'google-cloud-storage'),
    ('google.cloud.pubsub', 'google-cloud-pubsub'),
    ('google.cloud.bigquery', 'google-cloud-bigquery'),
    ('flask', 'flask'),
    ('PIL', 'Pillow'),
    ('zipfile', 'zipfile')
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

## API Enablement

In [4]:
!gcloud services enable cloudfunctions.googleapis.com
!gcloud services enable run.googleapis.com
!gcloud services enable pubsub.googleapis.com
!gcloud services enable eventarc.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [5]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

---
## Setup

Inputs

In [6]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [9]:
REGION = 'us-central1'

SERIES = 'tracking'
EXPERIMENT = 'pixel'

GCS_BUCKET = PROJECT_ID

Packages

In [10]:
import os
import json
import datetime

import PIL, PIL.Image
import zipfile

from google.cloud import functions_v2
from google.cloud import pubsub_v1
from google.cloud import storage
from google.cloud import bigquery

Clients

In [11]:
# cloud functions
functions_client = functions_v2.FunctionServiceClient()

# pubsub client
pubsub_pubclient = pubsub_v1.PublisherClient()

# gcs
gcs = storage.Client(project = PROJECT_ID)
bucket = gcs.bucket(GCS_BUCKET)

# bigquery
bq = bigquery.Client(project = PROJECT_ID)

parameters:

In [12]:
DIR = f"cloud-functions"

In [13]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'1026793852137-compute@developer.gserviceaccount.com'

environment:

In [14]:
if not os.path.exists(DIR):
    os.makedirs(DIR)

---
## Create BigQuery Table: `pixel-tracking-data`

### Create A BigQuery Dataset

Create a new [BigQuery Dataset](https://cloud.google.com/bigquery/docs/datasets) as a working location for this workflow:

In [16]:
job = bq.query(f'''
    CREATE SCHEMA IF NOT EXISTS `{PROJECT_ID}.pixel_tracking`
        OPTIONS(
            location = 'US'
        )''')
job.result()

<google.cloud.bigquery.table._EmptyRowIterator at 0x7f503b6fe860>

### Create A BigQuery Table

Simple Schema

Partitioned By Timestamp

In [17]:
job = bq.query(f'''
CREATE OR REPLACE TABLE `{PROJECT_ID}.pixel_tracking.pixel-tracking-data` (event_timestamp TIMESTAMP, file_path STRING, file_name STRING, client STRING, source STRING)
PARTITION BY TIMESTAMP_TRUNC(event_timestamp, DAY)
''')
job.result()

<google.cloud.bigquery.table._EmptyRowIterator at 0x7f501a4a67a0>

## Create Pubsub Topic: `pixel-tracking-data`

The main concepts:
- Topic - a feed of messages
     - Publish - send a new message to a topic
     - Subscription - receive messages that arrive on topic
          - Push - the subscriber has new messages pushed to it
          - Pull - the subscriber request new messages by pulling them
          
In this example, a topic will be set up for receiving new event entries for the tracking pixel.  Publishing a new message to this topic will trigger a data load to BigQuery by the Cloud Function (setup below).  The Cloud Funtion will have a push subscription to the topic.

In [18]:
PUBSUB_TOPIC = 'pixel-tracking-data'
PUBSUB_TOPIC

'pixel-tracking-data'

In [19]:
try:
    topic = pubsub_pubclient.get_topic(
        topic = pubsub_pubclient.topic_path(PROJECT_ID, PUBSUB_TOPIC)
    )
    print(topic.name)
except Exception:
    topic = pubsub_pubclient.create_topic(
        name = pubsub_pubclient.topic_path(PROJECT_ID, PUBSUB_TOPIC)
    )
    print(topic.name)   

projects/statmike-mlops-349915/topics/pixel-tracking-data


## Create Cloud Function: `pixel-tracking-data`

In [23]:
function_name = 'pixel-tracking-data'

In [24]:
if not os.path.exists(DIR + f'/{function_name}'):
    os.makedirs(DIR + f'/{function_name}')

In [43]:
%%writefile {DIR}/{function_name}/main.py
import base64
import json
from google.cloud import bigquery

PROJECT_ID = 'statmike-mlops-349915'
bq = bigquery.Client(project = PROJECT_ID)

# Triggered from a message on pubsub topic, which is sent by cloud function that collect events
def pixel_tracking_data(event, context):
    
    # decode the data input, convert to python dictionary
    function_input = json.loads(
        base64.b64decode(event['data']).decode('utf-8')
    )
    
    load_job = bq.load_table_from_json(
        json_rows = [function_input],
        destination = bigquery.TableReference(
            dataset_ref = bigquery.DatasetReference(PROJECT_ID, 'pixel_tracking'),
            table_id = f'pixel-tracking-data'
        ),
        job_config = bigquery.LoadJobConfig(
            source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON,
            write_disposition = bigquery.WriteDisposition.WRITE_APPEND
        )
    )
    load_job.result()

Overwriting cloud-functions/pixel-tracking-data/main.py


In [44]:
%%writefile {DIR}/{function_name}/requirements.txt
google-cloud-bigquery

Overwriting cloud-functions/pixel-tracking-data/requirements.txt


In [45]:
with zipfile.ZipFile(f'{DIR}/{function_name}/{function_name}.zip', mode = 'w') as archive:
    archive.write(f'{DIR}/{function_name}/main.py', 'main.py')
    archive.write(f'{DIR}/{function_name}/requirements.txt', 'requirements.txt')

In [46]:
with zipfile.ZipFile(f'{DIR}/{function_name}/{function_name}.zip', mode = 'r') as zip:
    zip.printdir()

File Name                                             Modified             Size
main.py                                        2024-03-29 15:15:38          935
requirements.txt                               2024-03-29 15:15:40           22


In [47]:
blob = bucket.blob(f'architectures/{SERIES}/{EXPERIMENT}/{function_name}.zip')
blob.upload_from_filename(f'{DIR}/{function_name}/{function_name}.zip')

In [56]:
try:
    function = functions_client.get_function(
        name = f"projects/{PROJECT_ID}/locations/{REGION}/functions/{function_name.replace('_', '-')}"
    )
except Exception:
    function = ''
    
function

name: "projects/statmike-mlops-349915/locations/us-central1/functions/pixel-tracking-data"
build_config {
  build: "projects/1026793852137/locations/us-central1/builds/cc1f5a20-5118-4c7e-bb54-c232f8ef99f8"
  runtime: "python312"
  entry_point: "pixel_tracking_data"
  source {
    storage_source {
      bucket: "gcf-v2-sources-1026793852137-us-central1"
      object_: "pixel-tracking-data/function-source.zip"
      generation: 1711725415878589
    }
  }
  docker_repository: "projects/statmike-mlops-349915/locations/us-central1/repositories/gcf-artifacts"
  source_provenance {
    resolved_storage_source {
      bucket: "gcf-v2-sources-1026793852137-us-central1"
      object_: "pixel-tracking-data/function-source.zip"
      generation: 1711725415878589
    }
  }
  docker_registry: ARTIFACT_REGISTRY
}
service_config {
  service: "projects/statmike-mlops-349915/locations/us-central1/services/pixel-tracking-data"
  timeout_seconds: 60
  max_instance_count: 50
  ingress_settings: ALLOW_ALL
 

In [50]:
functionDef = functions_v2.Function(
    name = f"projects/{PROJECT_ID}/locations/{REGION}/functions/{function_name.replace('_', '-')}",
    build_config = functions_v2.BuildConfig(
        runtime = 'python312',
        entry_point = f"{function_name.replace('-', '_')}",
        source = functions_v2.Source(
            storage_source = functions_v2.StorageSource(
                bucket = bucket.name,
                object_ = blob.name
            )
        ),
    ),
    service_config = functions_v2.ServiceConfig(
        timeout_seconds = 60,
        available_memory = '128Mi',
        max_instance_count = 50,
        max_instance_request_concurrency = 1
    ),
    event_trigger = functions_v2.EventTrigger(
        event_type = 'google.cloud.pubsub.topic.v1.messagePublished',
        pubsub_topic = topic.name
    ),
    environment = functions_v2.Environment(2) 
)

In [51]:
topic.name

'projects/statmike-mlops-349915/topics/pixel-tracking-data'

In [52]:
function_name

'pixel-tracking-data'

In [53]:
if function:
    operation = functions_client.update_function(
        function = functionDef
    )
else:
    operation = functions_client.create_function(
        parent = f"projects/{PROJECT_ID}/locations/{REGION}",
        function = functionDef,
        function_id = function_name.replace('_', '-')
    )

In [54]:
response = operation.result()
print(response)

name: "projects/statmike-mlops-349915/locations/us-central1/functions/pixel-tracking-data"
build_config {
  build: "projects/1026793852137/locations/us-central1/builds/cc1f5a20-5118-4c7e-bb54-c232f8ef99f8"
  runtime: "python312"
  entry_point: "pixel_tracking_data"
  source {
    storage_source {
      bucket: "gcf-v2-sources-1026793852137-us-central1"
      object_: "pixel-tracking-data/function-source.zip"
      generation: 1711725415878589
    }
  }
  docker_repository: "projects/statmike-mlops-349915/locations/us-central1/repositories/gcf-artifacts"
  source_provenance {
    resolved_storage_source {
      bucket: "gcf-v2-sources-1026793852137-us-central1"
      object_: "pixel-tracking-data/function-source.zip"
      generation: 1711725415878589
    }
  }
  docker_registry: ARTIFACT_REGISTRY
}
service_config {
  service: "projects/statmike-mlops-349915/locations/us-central1/services/pixel-tracking-data"
  timeout_seconds: 60
  max_instance_count: 50
  ingress_settings: ALLOW_ALL
 

In [55]:
response.name

'projects/statmike-mlops-349915/locations/us-central1/functions/pixel-tracking-data'

## Create Cloud Function: `pixel-tracking`

In [161]:
function_name = 'pixel-tracking'

In [168]:
if not os.path.exists(DIR + f'/{function_name}'):
    os.makedirs(DIR + f'/{function_name}')

### Create A Pixel

0,0,0 is black and alpha chanel of 0 also black makes this transparent...

In [162]:
pixel = PIL.Image.new(mode = 'RGBA', size = (1,1), color = (0,0,0,0))

In [163]:
pixel.save(f'{DIR}/{function_name}/pixel.png')

### Create Custom Service Account

This is going to be a public facing cloud function.  Reduce its permission to the bare minimum which is ability to write to the one pubsub topic.

In [152]:
!gcloud iam service-accounts create pixel-tracking --display-name='pixel-tracking'

Created service account [pixel-tracking].


In [164]:
!gcloud iam service-accounts list --filter='display_name=pixel-tracking'

DISPLAY NAME    EMAIL                                                         DISABLED
pixel-tracking  pixel-tracking@statmike-mlops-349915.iam.gserviceaccount.com  False


In [165]:
!gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:pixel-tracking@{PROJECT_ID}.iam.gserviceaccount.com" \
--role="roles/pubsub.publisher"

Updated IAM policy for project [statmike-mlops-349915].
bindings:
- members:
  - serviceAccount:service-1026793852137@gcp-sa-aiplatform-cc.iam.gserviceaccount.com
  role: roles/aiplatform.customCodeServiceAgent
- members:
  - serviceAccount:service-1026793852137@gcp-sa-aiplatform-vm.iam.gserviceaccount.com
  role: roles/aiplatform.notebookServiceAgent
- members:
  - serviceAccount:service-1026793852137@gcp-sa-aiplatform.iam.gserviceaccount.com
  role: roles/aiplatform.serviceAgent
- members:
  - serviceAccount:bqcx-1026793852137-bmph@gcp-sa-bigquery-condel.iam.gserviceaccount.com
  - serviceAccount:bqcx-1026793852137-d2h9@gcp-sa-bigquery-condel.iam.gserviceaccount.com
  - serviceAccount:bqcx-1026793852137-dyw1@gcp-sa-bigquery-condel.iam.gserviceaccount.com
  - serviceAccount:bqcx-1026793852137-pdxa@gcp-sa-bigquery-condel.iam.gserviceaccount.com
  - serviceAccount:bqcx-1026793852137-te86@gcp-sa-bigquery-condel.iam.gserviceaccount.com
  - serviceAccount:bqcx-1026793852137-tqpc@gcp-sa-big

In [166]:
!gcloud projects get-iam-policy $PROJECT_ID \
--filter="bindings.members:pixel-tracking@{PROJECT_ID}.iam.gserviceaccount.com" \
--format='table(bindings.role)' --flatten="bindings[].members"

ROLE
roles/pubsub.publisher


In [167]:
# delete the service account:
#!gcloud iam service-accounts delete 'pixel-tracking@{PROJECT_ID}.iam.gserviceaccount.com' -q

### Create The Function

In [169]:
%%writefile {DIR}/{function_name}/main.py
import datetime
import json
import flask
from flask import abort
from google.cloud import pubsub_v1

PROJECT_ID = 'statmike-mlops-349915'
pubsub_pubclient = pubsub_v1.PublisherClient() 

def pixel_tracking(request: flask.Request) -> flask.Response:
    
    repo_path = request.args.get('path', 'direct')
    repo_file = request.args.get('file', 'direct')
    application = request.headers.get('User-Agent')
    source = 'custom'
    
    if repo_path.startswith('statmike') and len(repo_path) < 500:
        if len(repo_file) > 5 and len(repo_file) < 500:
            data = dict(
                event_timestamp = datetime.datetime.now(datetime.timezone.utc).strftime("%Y-%m-%d %H:%M:%S.%f %Z"),
                file_path = repo_path,
                file_name = repo_file,
                client = application,
                source = source
            )
            message = json.dumps(data).encode('utf-8')
            future = pubsub_pubclient.publish(
                f'projects/{PROJECT_ID}/topics/pixel-tracking-data',
                message,
                trigger = 'manual'
            )
        else:
            return abort(406) # not acceptable
    else:
        return abort(404) # bad request
    
    return flask.send_file('pixel.png', max_age=0)

Overwriting cloud-functions/pixel-tracking/main.py


In [170]:
%%writefile {DIR}/{function_name}/requirements.txt
google-cloud-pubsub

Overwriting cloud-functions/pixel-tracking/requirements.txt


In [171]:
with zipfile.ZipFile(f'{DIR}/{function_name}/{function_name}.zip', mode = 'w') as archive:
    archive.write(f'{DIR}/{function_name}/main.py', 'main.py')
    archive.write(f'{DIR}/{function_name}/requirements.txt', 'requirements.txt')
    archive.write(f'{DIR}/{function_name}/pixel.png', 'pixel.png')

In [172]:
with zipfile.ZipFile(f'{DIR}/{function_name}/{function_name}.zip', mode = 'r') as zip:
    zip.printdir()

File Name                                             Modified             Size
main.py                                        2024-03-29 16:50:06         1272
requirements.txt                               2024-03-29 16:50:14           20
pixel.png                                      2024-03-29 16:49:46           70


In [173]:
blob = bucket.blob(f'architectures/{SERIES}/{EXPERIMENT}/{function_name}.zip')
blob.upload_from_filename(f'{DIR}/{function_name}/{function_name}.zip')

In [174]:
try:
    function = functions_client.get_function(
        name = f"projects/{PROJECT_ID}/locations/{REGION}/functions/{function_name.replace('_', '-')}"
    )
except Exception:
    function = ''
    
function

name: "projects/statmike-mlops-349915/locations/us-central1/functions/pixel-tracking"
build_config {
  build: "projects/1026793852137/locations/us-central1/builds/1a8a0406-6287-42ce-abbf-2cab4ea8f2ef"
  runtime: "python312"
  entry_point: "pixel_tracking"
  source {
    storage_source {
      bucket: "gcf-v2-sources-1026793852137-us-central1"
      object_: "pixel-tracking/function-source.zip"
      generation: 1711728461470537
    }
  }
  docker_repository: "projects/statmike-mlops-349915/locations/us-central1/repositories/gcf-artifacts"
  source_provenance {
    resolved_storage_source {
      bucket: "gcf-v2-sources-1026793852137-us-central1"
      object_: "pixel-tracking/function-source.zip"
      generation: 1711728461470537
    }
  }
  docker_registry: ARTIFACT_REGISTRY
}
service_config {
  service: "projects/statmike-mlops-349915/locations/us-central1/services/pixel-tracking"
  timeout_seconds: 10
  max_instance_count: 10
  ingress_settings: ALLOW_ALL
  uri: "https://pixel-trac

In [175]:
functionDef = functions_v2.Function(
    name = f"projects/{PROJECT_ID}/locations/{REGION}/functions/{function_name.replace('_', '-')}",
    build_config = functions_v2.BuildConfig(
        runtime = 'python312',
        entry_point = f"{function_name.replace('-', '_')}",
        source = functions_v2.Source(
            storage_source = functions_v2.StorageSource(
                bucket = bucket.name,
                object_ = blob.name
            )
        ),
    ),
    service_config = functions_v2.ServiceConfig(
        timeout_seconds = 10,
        available_memory = '128Mi',
        max_instance_count = 10,
        max_instance_request_concurrency = 1,
        service_account_email = f'pixel-tracking@{PROJECT_ID}.iam.gserviceaccount.com'
    ),
    environment = functions_v2.Environment(2) 
)

In [176]:
topic.name

'projects/statmike-mlops-349915/topics/pixel-tracking-data'

In [177]:
function_name

'pixel-tracking'

In [178]:
if function:
    operation = functions_client.update_function(
        function = functionDef
    )
else:
    operation = functions_client.create_function(
        parent = f"projects/{PROJECT_ID}/locations/{REGION}",
        function = functionDef,
        function_id = function_name.replace('_', '-')
    )

In [179]:
response = operation.result()
print(response)

name: "projects/statmike-mlops-349915/locations/us-central1/functions/pixel-tracking"
build_config {
  build: "projects/1026793852137/locations/us-central1/builds/14497353-bc5f-4cdc-b3cb-947e09670685"
  runtime: "python312"
  entry_point: "pixel_tracking"
  source {
    storage_source {
      bucket: "gcf-v2-sources-1026793852137-us-central1"
      object_: "pixel-tracking/function-source.zip"
      generation: 1711731227681419
    }
  }
  docker_repository: "projects/statmike-mlops-349915/locations/us-central1/repositories/gcf-artifacts"
  source_provenance {
    resolved_storage_source {
      bucket: "gcf-v2-sources-1026793852137-us-central1"
      object_: "pixel-tracking/function-source.zip"
      generation: 1711731227681419
    }
  }
  docker_registry: ARTIFACT_REGISTRY
}
service_config {
  service: "projects/statmike-mlops-349915/locations/us-central1/services/pixel-tracking"
  timeout_seconds: 10
  max_instance_count: 10
  ingress_settings: ALLOW_ALL
  uri: "https://pixel-trac

In [180]:
response.name

'projects/statmike-mlops-349915/locations/us-central1/functions/pixel-tracking'

In [181]:
response.url

'https://us-central1-statmike-mlops-349915.cloudfunctions.net/pixel-tracking'

### Make Public

- https://cloud.google.com/functions/docs/securing/managing-access-iam#allowing_unauthenticated_http_function_invocation

In [182]:
!gcloud run services get-iam-policy {function_name.replace('_', '-')} --region=$REGION #--format=json

bindings:
- members:
  - allUsers
  role: roles/run.invoker
etag: BwYUztvsC90=
version: 1


In [183]:
#json.loads(''.join(test))

if missing `allUsers` with `roles/run.invoker` the add it:

In [184]:
!gcloud run services add-iam-policy-binding {function_name.replace('_', '-')} --region=$REGION --member="allUsers" --role="roles/run.invoker"

Updated IAM policy for service [pixel-tracking].
bindings:
- members:
  - allUsers
  role: roles/run.invoker
etag: BwYUz4AEwpM=
version: 1


To remove open access use the following:

Note: this can take a minute to stop the requests

In [185]:
#!gcloud run services remove-iam-policy-binding {function_name.replace('_', '-')} --region=$REGION --member="allUsers" --role="roles/run.invoker"

## Test Tracking

- directly test cloud function `pixel-tracking-data`
- directly test cloud function `pixel-tracking`
- remove test data from BigQuery table

### Test `pixel-tracking-data` directly

In [130]:
data = dict(
    event_timestamp = datetime.datetime.now(datetime.timezone.utc).strftime("%Y-%m-%d %H:%M:%S.%f %Z"),
    file_path = 'statmike/path/to/file',
    file_name = f'my notebook.ipynb',
    client = 'test',
    source = 'test'
)

In [131]:
future = pubsub_pubclient.publish(
    topic.name,
    json.dumps(data).encode('utf-8'),
    trigger = 'manual'
)
future.result()

'10808208139134898'

Note - it can take a few seconds for the insert to happen

In [134]:
bq.query(f'SELECT * FROM `{PROJECT_ID}.pixel_tracking.pixel-tracking-data`').to_dataframe()

Unnamed: 0,event_timestamp,file_path,file_name,client,source
0,2024-03-29 15:45:32.867219+00:00,statmike/path/to/file,my notebook.ipynb,test,test
1,2024-03-29 15:45:32.867219+00:00,statmike/path/to/file,my notebook.ipynb,test,test
2,2024-03-29 16:09:10.197998+00:00,statmike/path/to/file,my notebook.ipynb,test,test


## Test System Via HTTPS

In [186]:
response.url

'https://us-central1-statmike-mlops-349915.cloudfunctions.net/pixel-tracking'

In [191]:
test_response = !curl -s POST '{response.url}?path=statmike&file=newtest.ipynb'
test_response

['�PNG',
 '\x1a',
 '\x00\x00\x00',
 'IHDR\x00\x00\x00\x01\x00\x00\x00\x01\x08\x06\x00\x00\x00\x1f\x15ĉ\x00\x00\x00',
 'IDATx�c````\x00\x00\x00\x05\x00\x01��E@\x00\x00\x00\x00IEND�B`�']

In [188]:
print(f"open this link:\n{response.url}?path=statmike&file=tester")

open this link:
https://us-central1-statmike-mlops-349915.cloudfunctions.net/pixel-tracking?path=statmike&file=tester


In [189]:
print(f"open this link:\n{response.url}?path=notstatmike&file=tester")

open this link:
https://us-central1-statmike-mlops-349915.cloudfunctions.net/pixel-tracking?path=notstatmike&file=tester


Note - it can take a few seconds for the record inserts to happend

In [193]:
bq.query(f'SELECT * FROM `{PROJECT_ID}.pixel_tracking.pixel-tracking-data`').to_dataframe()

Unnamed: 0,event_timestamp,file_path,file_name,client,source
0,2024-03-29 15:45:32.867219+00:00,statmike/path/to/file,my notebook.ipynb,test,test
1,2024-03-29 16:10:17.951834+00:00,statmike,test.ipynb,curl/7.74.0,custom
2,2024-03-29 16:56:05.100753+00:00,statmike,newtest.ipynb,curl/7.74.0,custom
3,2024-03-29 16:10:48.995500+00:00,statmike,tester,Mozilla/5.0 (X11; CrOS x86_64 14541.0.0) Apple...,custom
4,2024-03-29 15:45:32.867219+00:00,statmike/path/to/file,my notebook.ipynb,test,test
5,2024-03-29 16:55:14.154529+00:00,statmike,test.ipynb,curl/7.74.0,custom
6,2024-03-29 16:09:10.197998+00:00,statmike/path/to/file,my notebook.ipynb,test,test


In [194]:
bq.query(f"DELETE FROM `{PROJECT_ID}.pixel_tracking.pixel-tracking-data` WHERE file_path IN ('statmike', 'statmike/path/to/file')")

QueryJob<project=statmike-mlops-349915, location=US, id=622b7609-197f-434c-ae14-b8a1eaf22ce5>

In [195]:
bq.query(f'SELECT * FROM `{PROJECT_ID}.pixel_tracking.pixel-tracking-data`').to_dataframe()

Unnamed: 0,event_timestamp,file_path,file_name,client,source
