In order to run the data pipeline, please set up your environment with the following:
- EC2 instance with RHEL 9 operating system.
- Install
jquery
viasudo yum install jq
. - Install
miniconda3
via the installer. - Install
pandas
andvaderSentiment
viapip
.
To execute the Yelp data pipeline found in the generate_trecs_data.sh
bash script, run ./generate_trecs_data.sh
.
This script contains following components:
- Preprocess Oncologist Dataset
- Preprocess Yelp Dataset
- Execute VADER Sentiment Analysis
- Collate Yelp Location and Sentiment Data
- Integrate Placekeys API
- Generate T-Recs Dataset