Skip to content

sumitarora/awesome-spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spark

A curated list of Apache Spark resources that developers may find useful. Focused on Apache Spark resources for different use cases. Ordered alphabetically in each category.

Inspired by the Awesome thing.

Awesome Circle CI

Table of Contents

What is Spark?

Apache Spark is a cluster computing platform designed to be fast and general purpose engine for large-scale data processing.

Why Spark?

  • Spark supports wide range of diverse workflows including Map Reduce, Machine Learning, Graph processing etc.
  • Apache Spark makes use of RDD (Resilient Distributed Dataset) the basic abstraction in Spark.
  • RDDs are immutable, partitioned collection of elements that can be operated on in parallel
  • Consists of Rich Standard Library
  • Spark consists of API in many programming languages supported - Scala, Java, Python, R consists of Unified development and deployment environment for all
  • Regardless of which programming language you are good at, be it Scala, Java, Python or R, you can use the same single clustered runtime environment for prototyping

Books


Courses


Links & Tutorials


Tools


Videos

Releases

No releases published

Packages

No packages published