Skip to content
A collection tools/scripts to explore the ListenBrainz data using Apache Spark.
Python HTML Shell Dockerfile
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docker
listenbrainz_spark
.gitignore
Dockerfile
LICENSE
SCRIPTS.md
config.sh.sample
develop.sh
import.py
manage.py
mlhd_manage.py
queries.md
read.py
readme.md
requirements.txt
run.sh
spark-submit.sh

readme.md

Things to do in order for them to run correctly:

Set env var:

export PYSPARK_PYTHON=which python3

Install required modules:

pip3 install -r requirements.txt

Install java and scala:

apt-get install default-jdk scala

Install spark (download 2.3.0 tgz for hadoop and unzip in /usr/local/spark

To run the scripts:

spark-submit --master spark://195.201.112.36:7077 --executor-memory=29g pwd/<script>

spark-submit --master spark://195.201.112.36:7077 --executor-memory=29g pwd/train_models.py df models

You can’t perform that action at this time.