# Custom Tracking With Data Privacy

Create a custom pixel tracker that only track loads and does not collect any personal information.

**Goal:** A Cloud Function that is http triggered that will return a tracking pixel while receiving parameter for path and document.  Write the timestamp|path|document to a BigQuery Table.

---
## Colab Setup

When running this notebook in [Colab](https://colab.google/) or [Colab Enterprise](https://cloud.google.com/colab/docs/introduction), this section will authenticate to GCP (follow prompts in the popup) and set the current project for the session.

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    from google.colab import auth
    auth.authenticate_user(project_id = PROJECT_ID)
    print('Colab authorized to GCP')
except Exception:
    print('Not a Colab Environment')
    pass

Not a Colab Environment


---
## Installs

The list `packages` contains tuples of package import names and install names.  If the import name is not found then the install name is used to install quitely for the current user.

In [26]:
# tuples of (import name, install name, min_version)
packages = [
    ('google.cloud.functions', 'google-cloud-functions'),
    ('google.cloud.storage', 'google-cloud-storage'),
    ('flask', 'flask'),
    ('PIL', 'Pillow'),
    ('zipfile', 'zipfile')
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

## API Enablement

In [4]:
!gcloud services enable cloudfunctions.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [5]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

---
## Setup

Inputs

In [6]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [23]:
REGION = 'us-central1'

SERIES = 'tracking'
EXPERIMENT = 'pixel'

GCS_BUCKET = PROJECT_ID

Packages

In [158]:
import os
import json

import PIL, PIL.Image
import zipfile

from google.cloud import functions_v2
from google.cloud import storage

Clients

In [37]:
functions_client = functions_v2.FunctionServiceClient()
gcs = storage.Client(project = PROJECT_ID)
bucket = gcs.bucket(GCS_BUCKET)

parameters:

In [10]:
DIR = f"cloud_function"

In [11]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'1026793852137-compute@developer.gserviceaccount.com'

environment:

In [12]:
if not os.path.exists(DIR):
    os.makedirs(DIR)

---
## Local Development

In [59]:
function_name = f'{SERIES}_{EXPERIMENT}'
function_name

'tracking_pixel'

### Create A Pixel

0,0,0 is black and alpha chanel of 0 also black makes this transparent...

In [18]:
pixel = PIL.Image.new(mode = 'RGBA', size = (1,1), color = (0,0,0,0))

In [19]:
pixel.save(f'{DIR}/pixel.png')

### Create Code

In [83]:
%%writefile {DIR}/main.py
import flask
from flask import abort

def tracking_pixel(request: flask.Request) -> flask.Response:
    
    repo_path = request.args.get('path', 'direct')
    repo_file = request.args.get('file', 'direct')
    
    if repo_path.startswith('statmike') and len(repo_path) < 500:
        if len(repo_file) < 500:
            print('This is where the values can be streamed to BQ with timestamp')
        else:
            return abort(406) # not acceptable
    else:
        return abort(404) # bad request
    
    return flask.send_file('pixel.png')

Overwriting cloud_function/main.py


In [84]:
%%writefile {DIR}/requirements.py
flask

Overwriting cloud_function/requirements.py


### Zip Files

In [85]:
with zipfile.ZipFile(f'{DIR}/{function_name}.zip', mode = 'w') as archive:
    archive.write(f'{DIR}/main.py', 'main.py')
    archive.write(f'{DIR}/requirements.py', 'requirements.py')
    archive.write(f'{DIR}/pixel.png', 'pixel.png')

In [86]:
with zipfile.ZipFile(f'{DIR}/{function_name}.zip', mode = 'r') as zip:
    zip.printdir()

File Name                                             Modified             Size
main.py                                        2024-03-23 14:17:58          550
requirements.py                                2024-03-23 14:18:00            6
pixel.png                                      2024-03-23 07:45:14           70


## Deploy Function

### Copy To GCS

In [87]:
blob = bucket.blob(f'architectures/{SERIES}/{EXPERIMENT}/{function_name}.zip')
blob.upload_from_filename(f'{DIR}/{function_name}.zip')

### Create/Update Function

In [94]:
try:
    function = functions_client.get_function(
        name = f"projects/{PROJECT_ID}/locations/{REGION}/functions/{function_name.replace('_', '-')}"
    )
except Exception:
    function = ''
    
function

name: "projects/statmike-mlops-349915/locations/us-central1/functions/tracking-pixel"
build_config {
  build: "projects/1026793852137/locations/us-central1/builds/69aecb52-b4d3-4ade-b36b-6c15ba69e7f0"
  runtime: "python312"
  entry_point: "tracking_pixel"
  source {
    storage_source {
      bucket: "gcf-v2-sources-1026793852137-us-central1"
      object_: "tracking-pixel/function-source.zip"
      generation: 1711203261535891
    }
  }
  docker_repository: "projects/statmike-mlops-349915/locations/us-central1/repositories/gcf-artifacts"
  source_provenance {
    resolved_storage_source {
      bucket: "gcf-v2-sources-1026793852137-us-central1"
      object_: "tracking-pixel/function-source.zip"
      generation: 1711203261535891
    }
  }
  docker_registry: ARTIFACT_REGISTRY
}
state: FAILED
update_time {
  seconds: 1711203330
  nanos: 233330017
}
state_messages {
  severity: ERROR
  type_: "CloudRunServiceNotFound"
  message: "Cloud Run service projects/statmike-mlops-349915/location

In [95]:
functionDef = functions_v2.Function(
    name = f"projects/{PROJECT_ID}/locations/{REGION}/functions/{function_name.replace('_', '-')}",
    build_config = functions_v2.BuildConfig(
        runtime = 'python312',
        entry_point = f'{function_name}',
        source = functions_v2.Source(
            storage_source = functions_v2.StorageSource(
                bucket = bucket.name,
                object_ = blob.name
            )
        ),
    ),
    service_config = functions_v2.ServiceConfig(
        timeout_seconds = 10,
        available_memory = '128Mi',
        max_instance_count = 5,
        max_instance_request_concurrency = 1
    ),
    environment = functions_v2.Environment(2) 
)

In [96]:
#!gcloud functions runtimes list

In [97]:
#!gcloud functions event-types list

In [98]:
if function:
    operation = functions_client.update_function(
        function = functionDef
    )
else:
    operation = functions_client.create_function(
        parent = f"projects/{PROJECT_ID}/locations/{REGION}",
        function = functionDef,
        function_id = function_name.replace('_', '-')
    )

In [99]:
response = operation.result()
print(response)

name: "projects/statmike-mlops-349915/locations/us-central1/functions/tracking-pixel"
build_config {
  build: "projects/1026793852137/locations/us-central1/builds/9045b356-1ad7-42c1-a778-dfda9691211d"
  runtime: "python312"
  entry_point: "tracking_pixel"
  source {
    storage_source {
      bucket: "gcf-v2-sources-1026793852137-us-central1"
      object_: "tracking-pixel/function-source.zip"
      generation: 1711203824108304
    }
  }
  docker_repository: "projects/statmike-mlops-349915/locations/us-central1/repositories/gcf-artifacts"
  source_provenance {
    resolved_storage_source {
      bucket: "gcf-v2-sources-1026793852137-us-central1"
      object_: "tracking-pixel/function-source.zip"
      generation: 1711203824108304
    }
  }
  docker_registry: ARTIFACT_REGISTRY
}
service_config {
  service: "projects/statmike-mlops-349915/locations/us-central1/services/tracking-pixel"
  timeout_seconds: 10
  max_instance_count: 5
  ingress_settings: ALLOW_ALL
  uri: "https://tracking-pi

In [113]:
response.url

'https://us-central1-statmike-mlops-349915.cloudfunctions.net/tracking-pixel'

### Make Public

- https://cloud.google.com/functions/docs/securing/managing-access-iam#allowing_unauthenticated_http_function_invocation

In [150]:
!gcloud run services get-iam-policy {function_name.replace('_', '-')} --region=$REGION #--format=json

etag: BwYUVYrEWQs=
version: 1


In [151]:
#json.loads(''.join(test))

if missing `allUsers` with `roles/run.invoker` the add it:

In [152]:
!gcloud run services add-iam-policy-binding {function_name.replace('_', '-')} --region=$REGION --member="allUsers" --role="roles/run.invoker"

Updated IAM policy for service [tracking-pixel].
bindings:
- members:
  - allUsers
  role: roles/run.invoker
etag: BwYUVZOnbwM=
version: 1


To remove open access use the following:

Note: this can take a minute to stop the requests

In [153]:
#!gcloud run services remove-iam-policy-binding {function_name.replace('_', '-')} --region=$REGION --member="allUsers" --role="roles/run.invoker"

### Test Function

In [156]:
test_response = !curl -s POST https://us-central1-statmike-mlops-349915.cloudfunctions.net/tracking-pixel?path=statmike&file=tester

In [157]:
test_response

['�PNG',
 '\x1a',
 '\x00\x00\x00',
 'IHDR\x00\x00\x00\x01\x00\x00\x00\x01\x08\x06\x00\x00\x00\x1f\x15ĉ\x00\x00\x00',
 'IDATx�c````\x00\x00\x00\x05\x00\x01��E@\x00\x00\x00\x00IEND�B`�']

In [161]:
print(f"{response.url}?path=statmike&file=tester")

https://us-central1-statmike-mlops-349915.cloudfunctions.net/tracking-pixel?path=statmike&file=tester


In [162]:
print(f"{response.url}?path=notstatmike&file=tester")

https://us-central1-statmike-mlops-349915.cloudfunctions.net/tracking-pixel?path=notstatmike&file=tester


## TODO

- run as new sevice account with minimum roles
- log file+path to bigquery table - include datetime
- add testing to function