How does that PySpark thing work? And why Arrow makes it faster?
Presentation I (@berenguel) gave at the PyBCN meetup on June 2018, Spark London on September 2018 and Spark Barcelona to explain how Spark 2.3 has optimised UDFs for Pandas use as well as how PySpark works. A recording of this talk (the one given in Python Barcelona, in English) is available here, you can find the slides on Slideshare or here. I recommend you check the version with presenter notes which is only available here.
This presentation is formatted in Markdown and prepared to be used with Deckset. The drawings were done on an iPad Pro using Procreate. Here only the final PDF and the source Markdown is available. Sadly the animated gifs are just static images in the PDF.