-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
T140 path to extracts: #140 #153
Conversation
…k extracts; tweetset_loader copies JSON (instead of re-generating)
Would you add to And then could you also add something to the README about the |
…ies/TweetSets into t140-path-to-extracts
Reviewed the updated documentation and looks good! |
Features
full_datasets
path (assuming this is or can be configured as an NFS mount)dataset_loading
totweetsets-data/full_datasets
(or equivalent paths as defined in.env
).env
variableSetup
full_datasets
folder must be a shared NFS mount available to all nodes in the Spark cluster. (On my VM, I moved thetweetsets_data
folder to/storage
on both the primary and secondary nodes, then mapped thefull_datasets
folder on the primary to the same location on the secondary VM.)tweetsets_data
folder, as that will likely cause problems for Elasticsearch./storage/tweetsets_data/full_datasets
(VM 1) to a folder on VM 2 in/home/dsmith
, but that did not seem to work..env
files accordingly with the new paths, if necessary.docker-compose.yml
as follows:spark-worker
section, undervolumes
:${TWEETSETS_DATA_PATH}/full_datasets:/tweetsets_data/full_datasets
loader.docker-compose.yml
as follows:volumes
, add${TWEETSETS_DATA_PATH}/full_datasets:/tweetsets_data/full_datasets
environment
, addSPARK_MAX_FILE_SIZE
SPARK_PARTITION_SIZE
.env
, add the following:SPARK_MAX_FILE_SIZE=2g
SPARK_PARTITION_SIZE=128m
server-flaskrun
andloader
containers should be built locally. Make sure you rebuild the images before restarting the containers.Testing
tweet-ids
extract matches the number of tweets in the UI.Benchmarks
The following metrics were obtained using a subset of the Summer Olympics collection.