Skip to content

The project deals with analyzing Retail-Dataset using Apache Spark.

Notifications You must be signed in to change notification settings

ft-abhx/Retail-Sales-Analytics-Using-Apache-Spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Retail-Sales-Analytics-Using-Apache-Spark

The project deals with analyzing Retail-Dataset using Apache Spark. Apache Spark enables large and big data analyses. It does this by parallel processing using different threads and cores optimally. It can therefore improve performance on a cluster but also on a single machine. Analysis is implemented using pyspark

PYSPARK

PySpark has been released in order to support the collaboration of Apache Spark and Python, it actually is a Python API for Spark. In addition, PySpark, helps you interface with Resilient Distributed Datasets (RDDs) in Apache Spark and Python programming language. Along with writing Spark applications using Python APIs, it also provides the PySpark shell for interactively analyzing your data in a distributed environment. PySpark supports most of Spark’s features such as Spark SQL, DataFrame, Streaming, MLlib (Machine Learning) and Spark Core.

About

The project deals with analyzing Retail-Dataset using Apache Spark.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published