This repository is for a project that does the following:
- Extracts data from Salesforce.com using AWS Glue (an ETL job)
- Lands that data into S3
- Builds a series of Athena tables via Athena Queries in AWS Step Functions
- Exposes dashboards in AWS Quicksight with some visualizations from the data
Here are some helpful resources that this repo is patterned from:
- Salesforce/Pyspark tutorial by JJ's world
- An AWS blogpost for developing Glue Jobs locally (because running Notebooks in the console gets expensive quickly and... local unit tests are good)
- An AWS blogpost about extracting Salesforce data into Athena using Glue
Note This repo has a justfile
instead of a makefile
. See the justfile
target glue-start-jupyter
to understand how to set up the local glue environment and run docker-compose
.
brew install just
apt-get update && apt-get install -y just
choco install just
SF_USERNAME=username
SF_PASSWORD=password
# get this here: https://docs.idalko.com/exalate/display/ED/Salesforce%3A+How+to+generate+a+security+token
SF_SECURITY_TOKEN=security-token
python -m venv ./venv/
source ./venv/bin/activate
just install
just glue-start-jupyter
just deploy-glue