Skip to content

Large dataSet of IPL Data till 2017 analysis using PySpark.

Notifications You must be signed in to change notification settings

raghul3/IPL_Data_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

IPL Data Analysis Using Apache Spark

Overview

This project involves the analysis of Indian Premier League (IPL) cricket data using Apache Spark, a powerful open-source unified analytics engine. The primary objective is to uncover valuable insights and trends within the IPL datasets, utilizing Spark's capabilities for large-scale data processing.

Project Objectives

  • Data Ingestion and Cleaning: Efficiently load and preprocess raw IPL data.
  • Exploratory Data Analysis (EDA): Generate descriptive statistics and visualizations to understand the underlying patterns in the data.
  • Advanced Analytics: Implement advanced analytical techniques to derive meaningful insights from the data.
  • Visualization: Create interactive and static visualizations to present the findings effectively.

Datasets

The datasets used in this project can be found at the following link:

Technologies Used

  • Apache Spark (PySpark)
  • Databricks
  • SparkSQL
  • Pandas
  • Matplotlib

Architecture Diagram

Below is the architecture diagram that illustrates the data flow and components used in this IPL data analysis project:

IPL Analysis Architecture

About

Large dataSet of IPL Data till 2017 analysis using PySpark.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published