Retail-Sales-Analytics-Using-Apache-Spark

The project deals with analyzing Retail-Dataset using Apache Spark. Apache Spark enables large and big data analyses. It does this by parallel processing using different threads and cores optimally. It can therefore improve performance on a cluster but also on a single machine. Analysis is implemented using pyspark

PYSPARK

PySpark has been released in order to support the collaboration of Apache Spark and Python, it actually is a Python API for Spark. In addition, PySpark, helps you interface with Resilient Distributed Datasets (RDDs) in Apache Spark and Python programming language. Along with writing Spark applications using Python APIs, it also provides the PySpark shell for interactively analyzing your data in a distributed environment. PySpark supports most of Spark’s features such as Spark SQL, DataFrame, Streaming, MLlib (Machine Learning) and Spark Core.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
BigData_Project.ipynb		BigData_Project.ipynb
OnlineRetail.zip		OnlineRetail.zip
README.md		README.md
Report-Group-13.pdf		Report-Group-13.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retail-Sales-Analytics-Using-Apache-Spark

PYSPARK

About

Releases

Packages

Languages

ft-abhx/Retail-Sales-Analytics-Using-Apache-Spark

Folders and files

Latest commit

History

Repository files navigation

Retail-Sales-Analytics-Using-Apache-Spark

PYSPARK

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages