Skip to content

jx1226/pyspark-tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pyspark-tutorial

A diary of my learning journey into the world of Apache Spark (pyspark) from an developer (Data Engineering) perspective

To follow my journey you will need:

  • Azure Acount
  • Azure Databricks
  • Azure Data Lake Gen 2
  • Python packages (ref. Pipfile)
    • findspark
    • jupyter
    • numpy
    • pandas
    • pypandoc
    • pyspark2.4.5

My learning path:

  • Day 1: Installing a local Spark environment
  • Day 2: My first Spark application and some basic concepts
  • Day 3: Taking a deeper insight into DataFrames
  • Day 4: Getting an Overview on the pyspark.sql module
  • Day 5: Doing some math and aggregations
  • Day 6: Tackling the date and time challenge
  • Day 7: Handling of NULL values
  • Day 8: JSON and complex data types to analyse semi-/unstructured data
  • Day 9: Joins
  • Day 10 : Connectors and I/O performance

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published