Home
Rohit edited this page Apr 2, 2017
·
4 revisions
Spark Scala API: https://spark.apache.org/docs/latest/api/scala/index.html#package
Pages 21
- Home
- Cluster Topology Matters!
- Data Parallel to Distributed Data Parallel
- DataFrames (1)
- DataFrames (2)
- Datasets
- Evaluation in Spark: Unlike Scala Collections!
- Introduction
- Latency
- Optimizing with Partitioners
- Pair RDDs
- Pair RDDs: Joins
- Pair RDDs: Transformations and Actions
- Partitioning
- RDDs: Spark's Distributed Collection
- RDDs: Transformation and Action
- Reduction Operations
- Shuffling: What it is and why it's important
- Spark SQL
- Structured vs Unstructured Data
- Wide vs Narrow Dependencies
Week 1
- Introduction
- Data Parallel to Distributed Data Parallel
- Latency
- RDDs: Spark's Distributed Collection
- RDDs: Transformation and Action
- Evaluation in Spark: Unlike Scala Collections!
- Cluster Topology Matters!
Week 2
- Reduction Operations (fold, foldLeft, aggregate)
- Pair RDDs
- Pair RDDs: Transformations and Actions
- Pair RDDs: Joins
Week 3
- Shuffling: What it is and why it's important
- Partitioning
- Optimizing with Partitioners
- Wide vs Narrow Dependencies
Week 4