Skip to content
Presentation I gave in June 2018 for the PyBCN meetup
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

How does that PySpark thing work? And why Arrow makes it faster?

Presentation I (@berenguel) gave at the PyBCN meetup on June 2018, Spark London on September 2018 and Spark Barcelona to explain how Spark 2.3 has optimised UDFs for Pandas use as well as how PySpark works. A recording of this talk (the one given in Python Barcelona, in English) is available here, you can find the slides on Slideshare or here. I recommend you check the version with presenter notes which is only available here.

If you want additional information about Spark in general, I gave an introduction to Spark talk with Carlos Peña that you can find here.

This presentation is formatted in Markdown and prepared to be used with Deckset. The drawings were done on an iPad Pro using Procreate. Here only the final PDF and the source Markdown is available. Sadly the animated gifs are just static images in the PDF.

You can’t perform that action at this time.