Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Internals of Speeding Up Pyspark with Arrow

Presentation I (@berenguel) gave at the PyBCN meetup on June 2018, Spark London on September 2018, Spark Barcelona and Spark Summit Europe 2019 to explain how Spark 2.3/2.4 has optimised UDFs for Pandas use as well as how PySpark works. A recording of this talk (the one given in Python Barcelona, in English) is available here, the recording from Spark Summit is available here. You can find the slides here (some images might look slightly blurry). I recommend you check the version with presenter notes which is only available here.

If you want additional information about Spark in general, I gave an introduction to Spark talk with Carlos Peña that you can find here.


This presentation is formatted in Markdown and prepared to be used with Deckset. The drawings were done on an iPad Pro using Procreate. Here only the final PDF and the source Markdown are available. Sadly the animated gifs are just static images in the PDF.


You can find an exported version using reveal.js of the version given at Spark Summit here. It is not 100% faithful to the PDF/Deckset version but is close enough (and animated gifs play). The export was generated with this and tweaked to add a footer.


Buy Me A Coffee


About

Presentation about Pyspark and how Arrow makes it faster

Resources

Releases

No releases published

Packages

No packages published

Languages