Spark

Spark course from scratch GitHub

What is Apache Spark?

Apache Spark™ is a unified analytics engine for large-scale data processing.
It is a powerful tool for Data Scientist and allow them to analyse large-scale data.

Speed

Run workloads 100x faster.
Apache Spark achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine.

Ease of Use

Write applications quickly in Java, Scala, Python, R, and SQL.
Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python, R, and SQL shells.

Generality

Combine SQL, streaming, and complex analytics.
Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.

Runs Everywhere

Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud. It can access diverse data sources
You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, on Mesos, or on Kubernetes. Access data in HDFS, Alluxio, Apache Cassandra, Apache HBase, Apache Hive, and hundreds of other data sources.

Example

Podemos escribir una aplicación Spark que clasifique información en tiempo real a través de la biblioteca de machine learning de Spark
Información sea agregada a través de fuentes de streaming mediante Spark Streaming.
Al mismo tiempo, los Data Scientists también pueden consultar los datos resultantes en tiempo real a través de Spark SQL

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
advanced		advanced
commons		commons
in		in
pairRdd		pairRdd
rdd		rdd
sparkSql		sparkSql
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark

What is Apache Spark?

Speed

Ease of Use

Generality

Runs Everywhere

Example

About

Releases

Packages

Languages

emunozlorenzo/Spark

Folders and files

Latest commit

History

Repository files navigation

Spark

What is Apache Spark?

Speed

Ease of Use

Generality

Runs Everywhere

Example

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages