Use Apache Spark to analyze Wikipedia data.
This notebook parses the Wikipedia dump for mediawiki coord templates and loads it into Spark.
This notebook to produce maps like the one shown below, visualizing all {{coord...}'s in the English Wikipedia.
The resulting map looks like:
Notebooks in this repository were tested using the Jupyter all-spark-notebook launched with the script below.
docker run --dns=8.8.8.8 \
--rm -p 0.0.0.0:8888:8888 \
-p 0.0.0.0:4040:4040 \
-e JUPYTER_ENABLE_LAB=yes \
-v `pwd`:/home/jovyan/work \
jupyter/all-spark-notebook \
start-notebook.sh --NotebookApp.token=$reasonable_password
Your milage may vary getting the deck.gl and/or kepler.gl cells to render correctly on Google Colab or Databricks notebooks.
