Skip to content

jeganthirumeni/Spark-API-Guide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

Spark-Practical Reference Guide

This projects contains Databricks notebooks explaining about the various functions avaiable in Spark scala API in Spark 2.x. Tried to add little explanation before each API to understand the concept behind it and then actual code and how to use it with an explanation

S.No. Topic Contents
1 Spark Session This notebook contains the basic functions available with spark API like configurations, reading data and metadata functions.
2 Dataframe Vs Dataset This notebook compares the two structured APIs i.e Dataframes & Datasets and try explain the difference between these two programatically.
3 Catalogue Functions This notebook explains the standard API to access the metadata like temp table, registered udfs on SQL context or permanent metadata like Hive meta store or HCatalog.
4 Basic DataSet Functions This notebook explains some of the basic methods available in Dataset API i.e schema , explain, view creations, etc.,
5 Datasets -Typed Transformation Functions This notebook explains some of the tranformation functions map, filter, flatmap, randomsplit, repartition, groupByKey, sample,etc
6 Basic SQL Functions This notebook explains basic sql functions like select, filter, where, orderBy, sort, limit, NA , stat,etc.,
7 Aggregate Functions This notebook covers all the aggregate functions available in Spark i.e groupBy, window, pivot, rolup,cube.
8 Join Functions This notebook covers all the join functions available in Spark i.e Inner,Outer, LeftOuter,RightOuter, LeftSemi, LeftAnti,Cross & Neutral Joins
9 Datasources This notebook covers reading and writing data from/to various datasources i.e csv, json, orc,parquet,avro, hive table, sql table, xml files.
10 DatasetActions This notebook covers all the available actions in Dataset
11 ColumnFunctions This notebook covers some of the functions that works on columns in Dataframe/Dataset

About

Repository of databricks notebooks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published