![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2Farchitectures%2Ftracking%2Fpixel&file=developing-tracking-pixel.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/architectures/tracking/pixel/developing-tracking-pixel.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2Farchitectures%2Ftracking%2Fpixel%2Fdeveloping-tracking-pixel.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/architectures/tracking/pixel/developing-tracking-pixel.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/architectures/tracking/pixel/developing-tracking-pixel.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

# Developing: Custom Tracking With Data Privacy

Create a custom pixel tracker that only track loads and does not collect any personal information.

**Goal:** A Cloud Function that is http triggered that will return a tracking pixel while receiving parameter for path and document.  Write the timestamp|path|document to a BigQuery Table.

---
## Colab Setup

When running this notebook in [Colab](https://colab.google/) or [Colab Enterprise](https://cloud.google.com/colab/docs/introduction), this section will authenticate to GCP (follow prompts in the popup) and set the current project for the session.

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    from google.colab import auth
    auth.authenticate_user(project_id = PROJECT_ID)
    print('Colab authorized to GCP')
except Exception:
    print('Not a Colab Environment')
    pass

Not a Colab Environment


---
## Installs

The list `packages` contains tuples of package import names and install names.  If the import name is not found then the install name is used to install quitely for the current user.

In [565]:
# tuples of (import name, install name, min_version)
packages = [
    ('google.cloud.functions', 'google-cloud-functions'),
    ('google.cloud.storage', 'google-cloud-storage'),
    ('google.cloud.pubsub', 'google-cloud-pubsub'),
    ('google.cloud.bigquery', 'google-cloud-bigquery'),
    ('google.cloud.bigquery_storage', 'google-cloud-bigquery-storage'),
    ('flask', 'flask'),
    ('PIL', 'Pillow'),
    ('zipfile', 'zipfile')
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

## API Enablement

In [609]:
!gcloud services enable cloudfunctions.googleapis.com
!gcloud services enable run.googleapis.com
!gcloud services enable pubsub.googleapis.com
!gcloud services enable eventarc.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [5]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

---
## Setup

Inputs

In [6]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [23]:
REGION = 'us-central1'

SERIES = 'tracking'
EXPERIMENT = 'pixel'

GCS_BUCKET = PROJECT_ID

Packages

In [568]:
import os
import json
import datetime

import PIL, PIL.Image
import zipfile

from google.cloud import functions_v2
from google.cloud import pubsub_v1
from google.cloud import storage
from google.cloud import bigquery
from google.cloud import bigquery_storage

Clients

In [569]:
# cloud functions
functions_client = functions_v2.FunctionServiceClient()

# pubsub client
pubsub_pubclient = pubsub_v1.PublisherClient()

# gcs
gcs = storage.Client(project = PROJECT_ID)
bucket = gcs.bucket(GCS_BUCKET)

# bigquery
bq = bigquery.Client(project = PROJECT_ID)
bqstorage_write = bigquery_storage.BigQueryWriteClient()

parameters:

In [10]:
DIR = f"cloud_function"

In [11]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'1026793852137-compute@developer.gserviceaccount.com'

environment:

In [12]:
if not os.path.exists(DIR):
    os.makedirs(DIR)

---
## Local Development

Make a flask app:
- https://flask.palletsprojects.com/en/3.0.x/api/

In [59]:
function_name = f'{SERIES}_{EXPERIMENT}'
function_name

'tracking_pixel'

### Create A Pixel

0,0,0 is black and alpha chanel of 0 also black makes this transparent...

In [18]:
pixel = PIL.Image.new(mode = 'RGBA', size = (1,1), color = (0,0,0,0))

In [19]:
pixel.save(f'{DIR}/pixel.png')

### Create Code

In [83]:
%%writefile {DIR}/main.py
import flask
from flask import abort

def tracking_pixel(request: flask.Request) -> flask.Response:
    
    repo_path = request.args.get('path', 'direct')
    repo_file = request.args.get('file', 'direct')
    
    if repo_path.startswith('statmike') and len(repo_path) < 500:
        if len(repo_file) < 500:
            print('This is where the values can be streamed to BQ with timestamp')
        else:
            return abort(406) # not acceptable
    else:
        return abort(404) # bad request
    
    return flask.send_file('pixel.png')

Overwriting cloud_function/main.py


In [84]:
%%writefile {DIR}/requirements.py
flask

Overwriting cloud_function/requirements.py


### Zip Files

In [85]:
with zipfile.ZipFile(f'{DIR}/{function_name}.zip', mode = 'w') as archive:
    archive.write(f'{DIR}/main.py', 'main.py')
    archive.write(f'{DIR}/requirements.py', 'requirements.py')
    archive.write(f'{DIR}/pixel.png', 'pixel.png')

In [86]:
with zipfile.ZipFile(f'{DIR}/{function_name}.zip', mode = 'r') as zip:
    zip.printdir()

File Name                                             Modified             Size
main.py                                        2024-03-23 14:17:58          550
requirements.py                                2024-03-23 14:18:00            6
pixel.png                                      2024-03-23 07:45:14           70


---
## Deploy Function

### Copy To GCS

In [87]:
blob = bucket.blob(f'architectures/{SERIES}/{EXPERIMENT}/{function_name}.zip')
blob.upload_from_filename(f'{DIR}/{function_name}.zip')

### Create/Update Function

In [94]:
try:
    function = functions_client.get_function(
        name = f"projects/{PROJECT_ID}/locations/{REGION}/functions/{function_name.replace('_', '-')}"
    )
except Exception:
    function = ''
    
function

name: "projects/statmike-mlops-349915/locations/us-central1/functions/tracking-pixel"
build_config {
  build: "projects/1026793852137/locations/us-central1/builds/69aecb52-b4d3-4ade-b36b-6c15ba69e7f0"
  runtime: "python312"
  entry_point: "tracking_pixel"
  source {
    storage_source {
      bucket: "gcf-v2-sources-1026793852137-us-central1"
      object_: "tracking-pixel/function-source.zip"
      generation: 1711203261535891
    }
  }
  docker_repository: "projects/statmike-mlops-349915/locations/us-central1/repositories/gcf-artifacts"
  source_provenance {
    resolved_storage_source {
      bucket: "gcf-v2-sources-1026793852137-us-central1"
      object_: "tracking-pixel/function-source.zip"
      generation: 1711203261535891
    }
  }
  docker_registry: ARTIFACT_REGISTRY
}
state: FAILED
update_time {
  seconds: 1711203330
  nanos: 233330017
}
state_messages {
  severity: ERROR
  type_: "CloudRunServiceNotFound"
  message: "Cloud Run service projects/statmike-mlops-349915/location

In [95]:
functionDef = functions_v2.Function(
    name = f"projects/{PROJECT_ID}/locations/{REGION}/functions/{function_name.replace('_', '-')}",
    build_config = functions_v2.BuildConfig(
        runtime = 'python312',
        entry_point = f'{function_name}',
        source = functions_v2.Source(
            storage_source = functions_v2.StorageSource(
                bucket = bucket.name,
                object_ = blob.name
            )
        ),
    ),
    service_config = functions_v2.ServiceConfig(
        timeout_seconds = 10,
        available_memory = '128Mi',
        max_instance_count = 5,
        max_instance_request_concurrency = 1
    ),
    environment = functions_v2.Environment(2) 
)

In [96]:
#!gcloud functions runtimes list

In [97]:
#!gcloud functions event-types list

In [98]:
if function:
    operation = functions_client.update_function(
        function = functionDef
    )
else:
    operation = functions_client.create_function(
        parent = f"projects/{PROJECT_ID}/locations/{REGION}",
        function = functionDef,
        function_id = function_name.replace('_', '-')
    )

In [99]:
response = operation.result()
print(response)

name: "projects/statmike-mlops-349915/locations/us-central1/functions/tracking-pixel"
build_config {
  build: "projects/1026793852137/locations/us-central1/builds/9045b356-1ad7-42c1-a778-dfda9691211d"
  runtime: "python312"
  entry_point: "tracking_pixel"
  source {
    storage_source {
      bucket: "gcf-v2-sources-1026793852137-us-central1"
      object_: "tracking-pixel/function-source.zip"
      generation: 1711203824108304
    }
  }
  docker_repository: "projects/statmike-mlops-349915/locations/us-central1/repositories/gcf-artifacts"
  source_provenance {
    resolved_storage_source {
      bucket: "gcf-v2-sources-1026793852137-us-central1"
      object_: "tracking-pixel/function-source.zip"
      generation: 1711203824108304
    }
  }
  docker_registry: ARTIFACT_REGISTRY
}
service_config {
  service: "projects/statmike-mlops-349915/locations/us-central1/services/tracking-pixel"
  timeout_seconds: 10
  max_instance_count: 5
  ingress_settings: ALLOW_ALL
  uri: "https://tracking-pi

In [113]:
response.url

'https://us-central1-statmike-mlops-349915.cloudfunctions.net/tracking-pixel'

### Make Public

- https://cloud.google.com/functions/docs/securing/managing-access-iam#allowing_unauthenticated_http_function_invocation

In [150]:
!gcloud run services get-iam-policy {function_name.replace('_', '-')} --region=$REGION #--format=json

etag: BwYUVYrEWQs=
version: 1


In [151]:
#json.loads(''.join(test))

if missing `allUsers` with `roles/run.invoker` the add it:

In [152]:
!gcloud run services add-iam-policy-binding {function_name.replace('_', '-')} --region=$REGION --member="allUsers" --role="roles/run.invoker"

Updated IAM policy for service [tracking-pixel].
bindings:
- members:
  - allUsers
  role: roles/run.invoker
etag: BwYUVZOnbwM=
version: 1


To remove open access use the following:

Note: this can take a minute to stop the requests

In [153]:
#!gcloud run services remove-iam-policy-binding {function_name.replace('_', '-')} --region=$REGION --member="allUsers" --role="roles/run.invoker"

### Test Function

In [156]:
test_response = !curl -s POST https://us-central1-statmike-mlops-349915.cloudfunctions.net/tracking-pixel?path=statmike&file=tester

In [157]:
test_response

['�PNG',
 '\x1a',
 '\x00\x00\x00',
 'IHDR\x00\x00\x00\x01\x00\x00\x00\x01\x08\x06\x00\x00\x00\x1f\x15ĉ\x00\x00\x00',
 'IDATx�c````\x00\x00\x00\x05\x00\x01��E@\x00\x00\x00\x00IEND�B`�']

In [161]:
print(f"{response.url}?path=statmike&file=tester")

https://us-central1-statmike-mlops-349915.cloudfunctions.net/tracking-pixel?path=statmike&file=tester


In [162]:
print(f"{response.url}?path=notstatmike&file=tester")

https://us-central1-statmike-mlops-349915.cloudfunctions.net/tracking-pixel?path=notstatmike&file=tester


This is a markdown cell.  The tracking link is tested below with an image inclusion in markdown with `![]()`:

Begin positive test:

![](https://us-central1-statmike-mlops-349915.cloudfunctions.net/tracking-pixel?path=statmike&file=tester)

End positive test

Begin negative test:

![](https://us-central1-statmike-mlops-349915.cloudfunctions.net/tracking-pixel?path=notstatmike&file=tester)

End negative test

---
## Integrate With BigQuery

Stream accepted tracking entries to a BigQuery table that is date partitioned.


### Create A BigQuery Dataset

Create a new [BigQuery Dataset](https://cloud.google.com/bigquery/docs/datasets) as a working location for this workflow:

In [169]:
job = bq.query(f'''
    CREATE SCHEMA IF NOT EXISTS `{PROJECT_ID}.{SERIES}`
        OPTIONS(
            location = 'US'
        )''')
job.result()

<google.cloud.bigquery.table._EmptyRowIterator at 0x7fb130f014e0>

### Create A BigQuery Table

Simple Schema

Partitioned By Timestamp

In [658]:
job = bq.query(f'''
CREATE OR REPLACE TABLE `{PROJECT_ID}.{SERIES}.{EXPERIMENT}_event_capture` (event_timestamp TIMESTAMP, file_path STRING, file_name STRING, client STRING)
PARTITION BY TIMESTAMP_TRUNC(event_timestamp, DAY)
''')
job.result()

<google.cloud.bigquery.table._EmptyRowIterator at 0x7fb130ec2e60>

### Create Example Records

Create JSON rows - Python list of dictionaries.

- [BQ Loading JSON Data](https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-json#data_types) 

In [458]:
datetime.datetime.now()

datetime.datetime(2024, 3, 25, 21, 35, 22, 585933)

In [459]:
datetime.datetime.now(datetime.timezone.utc)

datetime.datetime(2024, 3, 25, 21, 35, 23, 279067, tzinfo=datetime.timezone.utc)

In [460]:
datetime.datetime.now(datetime.timezone.utc).strftime("%Y-%m-%d %H:%M:%S.%f %Z")

'2024-03-25 21:35:23.845114 UTC'

In [461]:
examples = []
for i in range(10):
    basetime = datetime.datetime.now(datetime.timezone.utc)
    data = dict(
        event_timestamp = (basetime - datetime.timedelta(days = i)).strftime("%Y-%m-%d %H:%M:%S.%f %Z"),
        file_path = 'statmike/path/to/file',
        file_name = f'{i+1} notebook.ipynb'
    )
    examples.append(data)

In [462]:
examples

[{'event_timestamp': '2024-03-25 21:35:24.672250 UTC',
  'file_path': 'statmike/path/to/file',
  'file_name': '1 notebook.ipynb'},
 {'event_timestamp': '2024-03-24 21:35:24.672289 UTC',
  'file_path': 'statmike/path/to/file',
  'file_name': '2 notebook.ipynb'},
 {'event_timestamp': '2024-03-23 21:35:24.672299 UTC',
  'file_path': 'statmike/path/to/file',
  'file_name': '3 notebook.ipynb'},
 {'event_timestamp': '2024-03-22 21:35:24.672306 UTC',
  'file_path': 'statmike/path/to/file',
  'file_name': '4 notebook.ipynb'},
 {'event_timestamp': '2024-03-21 21:35:24.672311 UTC',
  'file_path': 'statmike/path/to/file',
  'file_name': '5 notebook.ipynb'},
 {'event_timestamp': '2024-03-20 21:35:24.672317 UTC',
  'file_path': 'statmike/path/to/file',
  'file_name': '6 notebook.ipynb'},
 {'event_timestamp': '2024-03-19 21:35:24.672323 UTC',
  'file_path': 'statmike/path/to/file',
  'file_name': '7 notebook.ipynb'},
 {'event_timestamp': '2024-03-18 21:35:24.672328 UTC',
  'file_path': 'statmike/pat

### Load Records To BigQuery

- https://cloud.google.com/bigquery/docs/batch-loading-data#loading_data_from_local_files
- https://cloud.google.com/python/docs/reference/bigquery/latest/google.cloud.bigquery.client.Client#google_cloud_bigquery_client_Client_load_table_from_json



In [463]:
load_job = bq.load_table_from_json(
    json_rows = examples,
    destination = bigquery.TableReference(
        dataset_ref = bigquery.DatasetReference(PROJECT_ID, SERIES),
        table_id = f'pixel_event_capture'
    ),
    job_config = bigquery.LoadJobConfig(
        source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON,
        write_disposition = bigquery.WriteDisposition.WRITE_APPEND
    )
)

In [464]:
load_job.result()

LoadJob<project=statmike-mlops-349915, location=US, id=712ac0cc-c3e5-4c20-96ee-c363f12042b3>

In [219]:
review = bq.query(f'SELECT * FROM `{PROJECT_ID}.{SERIES}.{EXPERIMENT}_event_capture`').to_dataframe()
review

Unnamed: 0,event_timestamp,file_path,file_name
0,2024-03-20 13:02:33.860302+00:00,statmike/path/to/file,5 notebook.ipynb
1,2024-03-18 13:02:33.860315+00:00,statmike/path/to/file,7 notebook.ipynb
2,2024-03-19 13:02:33.860309+00:00,statmike/path/to/file,6 notebook.ipynb
3,2024-03-15 13:02:33.860332+00:00,statmike/path/to/file,10 notebook.ipynb
4,2024-03-17 13:02:33.860321+00:00,statmike/path/to/file,8 notebook.ipynb
5,2024-03-23 13:02:33.860279+00:00,statmike/path/to/file,2 notebook.ipynb
6,2024-03-24 13:02:33.860245+00:00,statmike/path/to/file,1 notebook.ipynb
7,2024-03-22 13:02:33.860290+00:00,statmike/path/to/file,3 notebook.ipynb
8,2024-03-16 13:02:33.860327+00:00,statmike/path/to/file,9 notebook.ipynb
9,2024-03-21 13:02:33.860297+00:00,statmike/path/to/file,4 notebook.ipynb


### Async BigQuery Load

Design the function to trigger the load to BigQuery but then return result (pixel image in this case) without waiting on the load to finish.

In [236]:
import threading

In [365]:
def loadBQ(rows):
    load_job = bq.load_table_from_json(
        json_rows = rows,
        destination = bigquery.TableReference(
            dataset_ref = bigquery.DatasetReference(PROJECT_ID, SERIES),
            table_id = f'pixel_event_capture'
        ),
        job_config = bigquery.LoadJobConfig(
            source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON,
            write_disposition = bigquery.WriteDisposition.WRITE_APPEND
        )
    )
    load_job.result()
    print('finished loading')
    time.sleep(10)
    print('now returning')

In [366]:
thread = threading.Thread(target = loadBQ, args = (examples,))
thread.start()

In [367]:
print('does this work')

does this work
finished loading
now returning


---
## 2nd Cloud Function: Write To BigQuery

The write to BigQuery can take a few seconds.  Make a second function to handle this via PubSub triggers.  Then the first function can send the record to the the PubSub topic and immediately return the response to the client.  Then this second funnction can work async to append the record to the BigQuery table.

---
### Create Pub/Sub Topic

The main concepts:
- Topic - a feed of messages
     - Publish - send a new message to a topic
     - Subscription - receive messages that arrive on topic
          - Push - the subscriber has new messages pushed to it
          - Pull - the subscriber request new messages by pulling them
          
In this example, a topic will be set up for receiving new event entries for the tracking pixel.  Publishing a new message to this topic will trigger a data load to BigQuery by the Cloud Function (setup below).  The Cloud Funtion will have a push subscription to the topic.

In [659]:
PUBSUB_TOPIC = function_name + '_data'
PUBSUB_TOPIC

'tracking_pixel_data'

In [660]:
try:
    topic = pubsub_pubclient.get_topic(
        topic = pubsub_pubclient.topic_path(PROJECT_ID, PUBSUB_TOPIC)
    )
    print(topic.name)
except Exception:
    topic = pubsub_pubclient.create_topic(
        name = pubsub_pubclient.topic_path(PROJECT_ID, PUBSUB_TOPIC)
    )
    print(topic.name)   

projects/statmike-mlops-349915/topics/tracking_pixel_data


### Create Code For Cloud Function

In [661]:
if not os.path.exists(DIR + '/loadBQ'):
    os.makedirs(DIR + '/loadBQ')

In [596]:
%%writefile {DIR}/loadBQ/main.py
import base64
import json
from google.cloud import bigquery

PROJECT_ID = 'statmike-mlops-349915'
bq = bigquery.Client(project = PROJECT_ID)

# Triggered from a message on pubsub topic, which is sent by cloud function that collect events
def tracking_pixel_data(event, context):
    
    # decode the data input, convert to python dictionary
    function_input = json.loads(
        base64.b64decode(event['data']).decode('utf-8')
    )
    
    load_job = bq.load_table_from_json(
        json_rows = [function_input],
        destination = bigquery.TableReference(
            dataset_ref = bigquery.DatasetReference(PROJECT_ID, 'tracking'),
            table_id = f'pixel_event_capture'
        ),
        job_config = bigquery.LoadJobConfig(
            source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON,
            write_disposition = bigquery.WriteDisposition.WRITE_APPEND
        )
    )
    load_job.result()

Overwriting cloud_function/loadBQ/main.py


In [597]:
%%writefile {DIR}/loadBQ/requirements.txt
google-cloud-bigquery

Overwriting cloud_function/loadBQ/requirements.txt


### Zip Files

In [598]:
with zipfile.ZipFile(f'{DIR}/loadBQ/{function_name}_data.zip', mode = 'w') as archive:
    archive.write(f'{DIR}/loadBQ/main.py', 'main.py')
    archive.write(f'{DIR}/loadBQ/requirements.txt', 'requirements.txt')

In [599]:
with zipfile.ZipFile(f'{DIR}/loadBQ/{function_name}_data.zip', mode = 'r') as zip:
    zip.printdir()

File Name                                             Modified             Size
main.py                                        2024-03-26 15:56:18          929
requirements.txt                               2024-03-26 15:56:18           22


### Copy To GCS

In [600]:
blob = bucket.blob(f'architectures/{SERIES}/{EXPERIMENT}/{function_name}_data.zip')
blob.upload_from_filename(f'{DIR}/loadBQ/{function_name}_data.zip')

### Create/Update Function

In [602]:
try:
    function2 = functions_client.get_function(
        name = f"projects/{PROJECT_ID}/locations/{REGION}/functions/{function_name.replace('_', '-')}_data"
    )
except Exception:
    function2 = ''
    
function2

''

In [588]:
function_name

'tracking_pixel'

In [589]:
topic.name

'projects/statmike-mlops-349915/topics/tracking_pixel_data'

In [611]:
functionDef2 = functions_v2.Function(
    name = f"projects/{PROJECT_ID}/locations/{REGION}/functions/{function_name.replace('_', '-')}-data",
    build_config = functions_v2.BuildConfig(
        runtime = 'python312',
        entry_point = f'{function_name}_data',
        source = functions_v2.Source(
            storage_source = functions_v2.StorageSource(
                bucket = bucket.name,
                object_ = blob.name
            )
        ),
    ),
    service_config = functions_v2.ServiceConfig(
        timeout_seconds = 60,
        available_memory = '128Mi',
        max_instance_count = 50,
        max_instance_request_concurrency = 1
    ),
    event_trigger = functions_v2.EventTrigger(
        event_type = 'google.cloud.pubsub.topic.v1.messagePublished',
        pubsub_topic = topic.name
    ),
    environment = functions_v2.Environment(2) 
)

In [612]:
#!gcloud functions runtimes list

In [613]:
#!gcloud functions event-types list

In [614]:
if function2:
    operation = functions_client.update_function(
        function = functionDef2
    )
else:
    operation = functions_client.create_function(
        parent = f"projects/{PROJECT_ID}/locations/{REGION}",
        function = functionDef2,
        function_id = function_name.replace('_', '-') + '-data'
    )

In [615]:
response = operation.result()
print(response)

name: "projects/statmike-mlops-349915/locations/us-central1/functions/tracking-pixel-data"
build_config {
  build: "projects/1026793852137/locations/us-central1/builds/01ad80d3-9711-42b5-a49f-f8fb4c7bf2bd"
  runtime: "python312"
  entry_point: "tracking_pixel_data"
  source {
    storage_source {
      bucket: "gcf-v2-sources-1026793852137-us-central1"
      object_: "tracking-pixel-data/function-source.zip"
      generation: 1711468838039649
    }
  }
  docker_repository: "projects/statmike-mlops-349915/locations/us-central1/repositories/gcf-artifacts"
  source_provenance {
    resolved_storage_source {
      bucket: "gcf-v2-sources-1026793852137-us-central1"
      object_: "tracking-pixel-data/function-source.zip"
      generation: 1711468838039649
    }
  }
  docker_registry: ARTIFACT_REGISTRY
}
service_config {
  service: "projects/statmike-mlops-349915/locations/us-central1/services/tracking-pixel-data"
  timeout_seconds: 60
  max_instance_count: 50
  ingress_settings: ALLOW_ALL
 

In [616]:
response.name

'projects/statmike-mlops-349915/locations/us-central1/functions/tracking-pixel-data'

### Test Function

In [662]:
data = dict(
    event_timestamp = datetime.datetime.now(datetime.timezone.utc).strftime("%Y-%m-%d %H:%M:%S.%f %Z"),
    file_path = 'statmike/path/to/file',
    file_name = f'my notebook.ipynb'
)

In [663]:
message = json.dumps(data).encode('utf-8')

In [664]:
message

b'{"event_timestamp": "2024-03-26 17:30:25.454065 UTC", "file_path": "statmike/path/to/file", "file_name": "my notebook.ipynb"}'

In [665]:
future = pubsub_pubclient.publish(topic.name, message, trigger = 'manual')

In [666]:
future.result()

'10789273693857203'

---
## Update Function - Add BQ Integration

### Update Code For Cloud Function

In [667]:
%%writefile {DIR}/main.py
import datetime
import json
import flask
from flask import abort
from google.cloud import pubsub_v1

pubsub_pubclient = pubsub_v1.PublisherClient() 

def tracking_pixel(request: flask.Request) -> flask.Response:
    
    repo_path = request.args.get('path', 'direct')
    repo_file = request.args.get('file', 'direct')
    application = request.headers.get('User-Agent')
    
    if repo_path.startswith('statmike') and len(repo_path) < 500:
        if len(repo_file) > 5 and len(repo_file) < 500:
            data = dict(
                event_timestamp = datetime.datetime.now(datetime.timezone.utc).strftime("%Y-%m-%d %H:%M:%S.%f %Z"),
                file_path = repo_path,
                file_name = repo_file,
                client = application
            )
            message = json.dumps(data).encode('utf-8')
            future = pubsub_pubclient.publish(
                'projects/statmike-mlops-349915/topics/tracking_pixel_data',
                message,
                trigger = 'manual'
            )
        else:
            return abort(406) # not acceptable
    else:
        return abort(404) # bad request
    
    return flask.send_file('pixel.png', max_age=0)

Overwriting cloud_function/main.py


In [668]:
%%writefile {DIR}/requirements.txt
google-cloud-pubsub

Overwriting cloud_function/requirements.txt


### Zip Files

In [669]:
with zipfile.ZipFile(f'{DIR}/{function_name}.zip', mode = 'w') as archive:
    archive.write(f'{DIR}/main.py', 'main.py')
    archive.write(f'{DIR}/requirements.txt', 'requirements.txt')
    archive.write(f'{DIR}/pixel.png', 'pixel.png')

In [670]:
with zipfile.ZipFile(f'{DIR}/{function_name}.zip', mode = 'r') as zip:
    zip.printdir()

File Name                                             Modified             Size
main.py                                        2024-03-26 17:31:30         1188
requirements.txt                               2024-03-26 17:31:30           20
pixel.png                                      2024-03-23 07:45:14           70


### Copy To GCS

In [671]:
blob = bucket.blob(f'architectures/{SERIES}/{EXPERIMENT}/{function_name}.zip')
blob.upload_from_filename(f'{DIR}/{function_name}.zip')

### Create/Update Function

In [672]:
try:
    function = functions_client.get_function(
        name = f"projects/{PROJECT_ID}/locations/{REGION}/functions/{function_name.replace('_', '-')}"
    )
except Exception:
    function = ''
    
function

name: "projects/statmike-mlops-349915/locations/us-central1/functions/tracking-pixel"
build_config {
  build: "projects/1026793852137/locations/us-central1/builds/824889aa-bf29-42af-bd91-ee93cf3e867c"
  runtime: "python312"
  entry_point: "tracking_pixel"
  source {
    storage_source {
      bucket: "gcf-v2-sources-1026793852137-us-central1"
      object_: "tracking-pixel/function-source.zip"
      generation: 1711472737156041
    }
  }
  docker_repository: "projects/statmike-mlops-349915/locations/us-central1/repositories/gcf-artifacts"
  source_provenance {
    resolved_storage_source {
      bucket: "gcf-v2-sources-1026793852137-us-central1"
      object_: "tracking-pixel/function-source.zip"
      generation: 1711472737156041
    }
  }
  docker_registry: ARTIFACT_REGISTRY
}
service_config {
  service: "projects/statmike-mlops-349915/locations/us-central1/services/tracking-pixel"
  timeout_seconds: 10
  max_instance_count: 10
  ingress_settings: ALLOW_ALL
  uri: "https://tracking-p

In [673]:
functionDef = functions_v2.Function(
    name = f"projects/{PROJECT_ID}/locations/{REGION}/functions/{function_name.replace('_', '-')}",
    build_config = functions_v2.BuildConfig(
        runtime = 'python312',
        entry_point = f'{function_name}',
        source = functions_v2.Source(
            storage_source = functions_v2.StorageSource(
                bucket = bucket.name,
                object_ = blob.name
            )
        ),
    ),
    service_config = functions_v2.ServiceConfig(
        timeout_seconds = 10,
        available_memory = '128Mi',
        max_instance_count = 10,
        max_instance_request_concurrency = 1
    ),
    environment = functions_v2.Environment(2) 
)

In [674]:
#!gcloud functions runtimes list

In [675]:
#!gcloud functions event-types list

In [676]:
if function:
    operation = functions_client.update_function(
        function = functionDef
    )
else:
    operation = functions_client.create_function(
        parent = f"projects/{PROJECT_ID}/locations/{REGION}",
        function = functionDef,
        function_id = function_name.replace('_', '-')
    )

In [677]:
response = operation.result()
print(response)

name: "projects/statmike-mlops-349915/locations/us-central1/functions/tracking-pixel"
build_config {
  build: "projects/1026793852137/locations/us-central1/builds/ae54cd87-dd29-482b-a9d1-7d2b6f1b480a"
  runtime: "python312"
  entry_point: "tracking_pixel"
  source {
    storage_source {
      bucket: "gcf-v2-sources-1026793852137-us-central1"
      object_: "tracking-pixel/function-source.zip"
      generation: 1711474296978591
    }
  }
  docker_repository: "projects/statmike-mlops-349915/locations/us-central1/repositories/gcf-artifacts"
  source_provenance {
    resolved_storage_source {
      bucket: "gcf-v2-sources-1026793852137-us-central1"
      object_: "tracking-pixel/function-source.zip"
      generation: 1711474296978591
    }
  }
  docker_registry: ARTIFACT_REGISTRY
}
service_config {
  service: "projects/statmike-mlops-349915/locations/us-central1/services/tracking-pixel"
  timeout_seconds: 10
  max_instance_count: 10
  ingress_settings: ALLOW_ALL
  uri: "https://tracking-p

In [678]:
response.url

'https://us-central1-statmike-mlops-349915.cloudfunctions.net/tracking-pixel'

### Make Public

The public/not-public status will persist from above.

### Test Function

In [638]:
test_response = !curl -s POST 'https://us-central1-statmike-mlops-349915.cloudfunctions.net/tracking-pixel?path=statmike&file=tester'

In [639]:
test_response

['�PNG',
 '\x1a',
 '\x00\x00\x00',
 'IHDR\x00\x00\x00\x01\x00\x00\x00\x01\x08\x06\x00\x00\x00\x1f\x15ĉ\x00\x00\x00',
 'IDATx�c````\x00\x00\x00\x05\x00\x01��E@\x00\x00\x00\x00IEND�B`�']

In [640]:
print(f"{response.url}?path=statmike&file=tester")

https://us-central1-statmike-mlops-349915.cloudfunctions.net/tracking-pixel?path=statmike&file=tester


In [641]:
print(f"{response.url}?path=notstatmike&file=tester")

https://us-central1-statmike-mlops-349915.cloudfunctions.net/tracking-pixel?path=notstatmike&file=tester


This is a markdown cell.  The tracking link is tested below with an image inclusion in markdown with `![]()`:

Begin positive test:

![](https://us-central1-statmike-mlops-349915.cloudfunctions.net/tracking-pixel?path=statmike&file=tester)

End positive test

Begin negative test:

![](https://us-central1-statmike-mlops-349915.cloudfunctions.net/tracking-pixel?path=notstatmike&file=tester)

End negative test

---
## Prepare For Repo Test

### Remove test records from BigQuery Table

In [642]:
job = bq.query(f'TRUNCATE TABLE `{PROJECT_ID}.{SERIES}.{EXPERIMENT}_event_capture`;')
job.result()

<google.cloud.bigquery.table._EmptyRowIterator at 0x7fb1136f0970>

## Update Repository

See updates to ['../../headers/add_headers.ipynb']('../../headers/add_headers.ipynb')

## TODO

- Tasks
    - run as new sevice account with minimum roles
    - add testing to function

- Notes
    - GA4 data timestamps are UTC