Skip to content

hikouki-gumo/Spark-Training

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 

Repository files navigation

Spark-Training

This repository stores my training course work on Databricks platform. It includes two parts:

  • DataFrame Lab:
    • Practiced read, write data using DataFrameReader and DataFrameWriter
    • Using DataFrame API, perform transformation and action to analyze data.
  • Structured Streaming:
    • Practiced read, write streams from file and messaging system Kafka using DataStreamReader and DataStreamWriter.
    • Using DataFrame API to perform ETL jobs.
    • Built a Twitter realtime data pipeline to get information like top most tweeted hashtag in last 5 minute and where they came from.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages