Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Internals of Speeding Up Pyspark with Arrow

Presentation I (@berenguel) gave at the PyBCN meetup on June 2018, Spark London on September 2018, Spark Barcelona and Spark Summit Europe 2019 to explain how Spark 2.3/2.4 has optimised UDFs for Pandas use as well as how PySpark works. A recording of this talk (the one given in Python Barcelona, in English) is available here, the recording from Spark Summit is available here. You can find the slides here (some images might look slightly blurry). I recommend you check the version with presenter notes which is only available here.

If you want additional information about Spark in general, I gave an introduction to Spark talk with Carlos Peña that you can find here.

This presentation is formatted in Markdown and prepared to be used with Deckset. The drawings were done on an iPad Pro using Procreate. Here only the final PDF and the source Markdown are available. Sadly the animated gifs are just static images in the PDF.

You can find an exported version using reveal.js of the version given at Spark Summit here. It is not 100% faithful to the PDF/Deckset version but is close enough (and animated gifs play). The export was generated with this and tweaked to add a footer.

Buy Me A Coffee


Presentation about Pyspark and how Arrow makes it faster






No releases published


No packages published