- Enable Google Analytics API: In the Cloud Console, APIs & Services > Library, search for Google Analytics API, not the Google Analytics Reporting API.
- Create service account key: In the Cloud Console, IAM & Admin > Service Accounts > Create Service Account. Fill out the necessary details, such as service account name, service account email, creating & downloading the key file (additional permissions not needed). Please store the service account email & service key file safely.
- Ensure that you have the right permissions to create the workflow: If using Compute Engine - BigQuery Data Editor, BigQuery Job User, Compute Admin, Service Account User, Storage Object Creator & Viewer, ML Engine Developer & Viewer. Also, ensure that your Compute Engine instance allows for 'Allow full access to all Cloud APIs'. (Compute Engine > "Select VM instance when stopped" > Edit > Cloud API access scopes > Allow full access to all Cloud APIs). Cheapest instance type: n1-micro.
- Add the service account email as a user to Google Analytics: Using the service account email generated from the previous step, create a user within Admin > Account > Account User Management with Edit, Read and Analyze permissions.
- Setup GA: Create the custom dimensions. If using Data Import, setup schema.
To view a complete end-to-end example from model creation to upload, click here.
If you have already trained your model, ensure that the RUNTIME_VERSION,
FRAMEWORK (version), PYTHON_VERSION are compatible. E.g., bash RUNTIME_VERSION = 1.15, FRAMEWORK = 'SCIKIT-LEARN' (0.20.4), PYTHON_VERSION = 3.7`
-
Download the model as a pickle file (model.pkl).
# my_model_obj = sklearn.linear_model.LogisticRegression() with open('model.pkl', 'wb') as f: pickle.dump(my_model_obj,f)
-
Upload the model to AI Platform (after updating the parameters) -
gsutil cp model.pkl $GCS_MODEL_DIRECTORY gcloud ai-platform models create $MODEL_NAME --regions $REGION \ gcloud ai-platform versions create $VERSION_NAME \ --model $MODEL_NAME --origin $GCS_MODEL_DIR \ --runtime-version=$RUNTIME_VERSION --framework $FRAMEWORK \ --python-version=$PYTHON_VERSION`
-
Ensure that you have noted the following params for automation -
- AI Platform model [AI_PLATFORM_MODEL_NAME]
- AI Platform version name [AI_PLATFORM_VERSION_NAME]
- BQ columns to be used as feature inputs to the model [MODEL_INPUT_COL_NAMES]
- Mapping between BQ columns/AI Platform model outputs & import format (Data Import schema or Measurement Protocol hit parameters) [COLUMN_MAP].
Detailed instructions for MODEL_INPUT_COL_NAMES & COLUMN_MAP can be found in this Colab here under 'Automation with Modem - Parameter Specification' section.
-
Download code: SSH into the Compute Engine VM and run the following command -
sudo apt install git && rm -rf modem && git clone https://github.com/google/modem.git && cd modem/pythonML/pipeline
-
If needed, edit 'main.py' file to add pre and post-processing logic: Add the respective logic to the start of the functions - preprocess_features() and postprocess_output(). Check the comments in the file and/or Colab example under 'Automation with Modem - Modifying code to add pre and post processing logic' section.
-
Edit 'svc_key.json' file: Update the svc_key.json file with the details from the downloaded service key file (check Prerequisites - Step 2).
-
Edit 'params.py' file: with the correct Google Analytics account id, property id, and dataset id. Also, update the GA-BQ dataset/table names, model input columns and mapping between columns. Finally, update the AI Platform model and version name from Step 1.
-
Test the code: by running the following command -
sh test.sh
-
Edit 'deploy.sh' file: Edit the cron schedule to your desired frequency.
-
Setup the cron job: by running the following command -
sh deploy.sh