Skip to content

itzrv19/Data-Cleaning-and-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data-Cleaning-and-Analysis

Overview

This repository contains a set of SQL scripts designed to perform data cleaning and analysis on layoff-related data. The project aims to process raw data by cleaning it, handling duplicates, standardizing values, and dealing with missing data. The analysis part explores the cleaned data, generating insights such as the total number of layoffs by company, industry, and country, along with other key metrics.

Project Structure

  • DATA_CLEANING.sql: Contains SQL scripts for cleaning and preparing the data.
  • DATA_ANALYSIS.sql: Contains SQL scripts for performing exploratory data analysis (EDA) and generating insights.
  • layoffs.csv: Contains RAW data .

Requirements

  • Database: MySQL
  • Input Data: The project works with a table named layoffs, which contain raw layoff data.

Data Cleaning Overview

The cleaning process involves several key steps to ensure that the data is ready for analysis:

1. Removing Duplicates

We begin by identifying and removing duplicate records. Duplicates are identified based on several columns such as company, location, industry, and other relevant fields. Once identified, duplicate rows are deleted to ensure unique data.

2. Standardizing Data

Next, we standardize the data to ensure consistency across the dataset:

  • Whitespace is trimmed from text fields such as company, location, and industry.
  • Certain columns, like industry, are corrected for known inconsistencies (e.g., grouping similar values like "Crypto" into a single category).
  • The date column is converted into a proper date format to facilitate time-based analysis.

3. Handling Missing and Blank Values

Rows with missing or blank values in key columns are identified. If appropriate, missing values are filled or replaced by information from other records. In cases where a value cannot be determined, rows with crucial missing data are deleted to maintain data integrity.

4. Removing Unnecessary Rows and Columns

Any rows that contain irrelevant or non-essential data (e.g., rows with no layoffs recorded) are removed. Additionally, columns that are not necessary for analysis are dropped to streamline the dataset.

Data Analysis Overview

Once the data is cleaned, we proceed with exploratory data analysis (EDA) to generate meaningful insights. The analysis includes:

1. Maximum Layoffs, Percentage, and Funds Raised

We calculate the maximum number of layoffs, the highest percentage of layoffs, and the maximum amount of funds raised across all records. This gives a quick overview of the most significant layoff events in the dataset.

2. Monthly Layoff Trends

The total number of layoffs per month is calculated. Additionally, we calculate a rolling total of layoffs over time, which helps visualize trends in layoff occurrences across different months.

3. Company-wise Layoffs

We aggregate the data by company to determine which companies have had the highest number of layoffs. This helps to identify trends or patterns specific to certain companies.

4. Industry-wise and Country-wise Layoffs

The dataset is analyzed by industry and country to understand the distribution of layoffs across different sectors and geographical locations.

5. Year-wise Layoffs

The data is grouped by year to analyze the overall trend of layoffs over time. This provides insights into how layoffs have evolved from year to year.

How to Run the Project

  1. Set Up Your Database: Ensure that you have a MySQL database set up.
  2. Import the Raw Data: Import the layoffs table containing the raw layoff data.
  3. Execute Data Cleaning: Run the DATA_CLEANING.sql script to clean and prepare the data.
  4. Execute Data Analysis: Once the data is cleaned, run the DATA_ANALYSIS.sql script to analyze the data and generate insights.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published