This project can be run from the Cloud Shell of your Google Cloud project.
You will need a Google Cloud project with owner permissions, and you also need to have the Google Cloud SDK configured to use that project. For instance, you could use the Cloud Shell in your Google Cloud project, which is configured by default with the Google Cloud SDK.
This repository contains some Terraform code in the terraform
directory to setup
Vertex AI and all the required APIs and permissions in the Google Cloud project.
Please check the README.md in the terraform/ directory for more details. You only need to run the Terraform code once.
PROJECT_ID=<PROJECT_ID> gcloud storage cp data/creditcard.csv.gz gs://$PROJECT_ID/data/
bq load --project_id $PROJECT_ID --autodetect --source_format=CSV --replace=true data_playground.transactions gs://$PROJECT_ID/data/creditcard.csv.gz
Please don't use Python < 3.7 (e.g. 3.6) or Python > 3.9 (e.g. 3.10), they will not work with TFX. For more details, please check:
At the moment of writing this, the Cloud Shell has Python 3.9. You can check your Python version by running the following command:
python --version
Once you have made sure you have the correct Python version, create a virtualenv:
python -m venv tfxenv
Activate it:
source ./tfxenv/bin/activate
And install the dependencies in the file requirements.txt
, by running:
pip install -r requirements.txt
Edit the scripts in the directory scripts
to point to your project id and region
of choice.
The playground
branch of this repository contains incomplete code that you need to
finish, as an exercise to learn the ropes of TFX pipelines.
To run the pipeline in Google Cloud, you need to run the provided scripts from the top level directory of the repository:
./scripts/launch_google_cloud.sh