Python tools for Big Data
Notebooks for Master of Data Science Rennes
Run Jupyter notebooks with docker
Get docker app
You can run these notebooks with Docker. The following command starts a container with the Notebook server listening for HTTP connections on port 8888 and 4040 without authentication configured.
git clone https://github.com/pnavaro/big-data.git docker run --rm -v $PWD/big-data:/home/jovyan/ -p 8888:8888 -p 4040:4040 pnavaro/big-data
- Analyzing and Manipulating Data with Pandas Beginner: SciPy 2016 Tutorial by Jonathan Rocher.
- Dask Examples
- Parallel Data Analysis with Dask Dask tutorial at PyCon 2018 by Tom Augspurger.
- Parallelizing Scientific Python with Dask SciPy 2018 Tutorial by James Crist and Martin Durant
- Parallelizing Scientific Python with Dask, SciPy 2017 Tutorial by James Crist.
- Parallel Python: Analyzing Large Datasets Intermediate, SciPy 2016 Tutorial by Matthew Rocklin.
- Parallel Data Analysis in Python, SciPy 2017 Tutorial by Matthew Rocklin, Ben Zaitlen & Aron Ahmadia.
- Writing an Hadoop MapReduce Program in Python by Michael G. Noll.
- Don't use Hadoop - your data isn't that big
- Format Wars: From VHS and Beta to Avro and Parquet overview of Hadoop File formats.
- Should you replace Hadoop with your laptop? by Vicki Boykis.
- Implementing MapReduce with multiprocessing by Doug Hellmann.
- Deploying Dask on YARN by Jim Crist.
- Native Hadoop file system (HDFS) connectivity in Python by Wes McKinney.
- Working Notes from Matthew Rocklin (must read)
- DataCamp Cheat Sheets
- Outils pour le Big Data by Pierre Nerzic.
- wikistat - Ateliers Big Data by Philippe Besse.
- Data Science and Big Data with Python by Steve Phelps.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.