Data-Cleaning-and-Analysis

Overview

This repository contains a set of SQL scripts designed to perform data cleaning and analysis on layoff-related data. The project aims to process raw data by cleaning it, handling duplicates, standardizing values, and dealing with missing data. The analysis part explores the cleaned data, generating insights such as the total number of layoffs by company, industry, and country, along with other key metrics.

Project Structure

DATA_CLEANING.sql: Contains SQL scripts for cleaning and preparing the data.
DATA_ANALYSIS.sql: Contains SQL scripts for performing exploratory data analysis (EDA) and generating insights.
layoffs.csv: Contains RAW data .

Requirements

Database: MySQL
Input Data: The project works with a table named layoffs, which contain raw layoff data.

Data Cleaning Overview

The cleaning process involves several key steps to ensure that the data is ready for analysis:

1. Removing Duplicates

We begin by identifying and removing duplicate records. Duplicates are identified based on several columns such as company, location, industry, and other relevant fields. Once identified, duplicate rows are deleted to ensure unique data.

2. Standardizing Data

Next, we standardize the data to ensure consistency across the dataset:

Whitespace is trimmed from text fields such as company, location, and industry.
Certain columns, like industry, are corrected for known inconsistencies (e.g., grouping similar values like "Crypto" into a single category).
The date column is converted into a proper date format to facilitate time-based analysis.

3. Handling Missing and Blank Values

Rows with missing or blank values in key columns are identified. If appropriate, missing values are filled or replaced by information from other records. In cases where a value cannot be determined, rows with crucial missing data are deleted to maintain data integrity.

4. Removing Unnecessary Rows and Columns

Any rows that contain irrelevant or non-essential data (e.g., rows with no layoffs recorded) are removed. Additionally, columns that are not necessary for analysis are dropped to streamline the dataset.

Data Analysis Overview

Once the data is cleaned, we proceed with exploratory data analysis (EDA) to generate meaningful insights. The analysis includes:

1. Maximum Layoffs, Percentage, and Funds Raised

We calculate the maximum number of layoffs, the highest percentage of layoffs, and the maximum amount of funds raised across all records. This gives a quick overview of the most significant layoff events in the dataset.

2. Monthly Layoff Trends

The total number of layoffs per month is calculated. Additionally, we calculate a rolling total of layoffs over time, which helps visualize trends in layoff occurrences across different months.

3. Company-wise Layoffs

We aggregate the data by company to determine which companies have had the highest number of layoffs. This helps to identify trends or patterns specific to certain companies.

4. Industry-wise and Country-wise Layoffs

The dataset is analyzed by industry and country to understand the distribution of layoffs across different sectors and geographical locations.

5. Year-wise Layoffs

The data is grouped by year to analyze the overall trend of layoffs over time. This provides insights into how layoffs have evolved from year to year.

How to Run the Project

Set Up Your Database: Ensure that you have a MySQL database set up.
Import the Raw Data: Import the layoffs table containing the raw layoff data.
Execute Data Cleaning: Run the DATA_CLEANING.sql script to clean and prepare the data.
Execute Data Analysis: Once the data is cleaned, run the DATA_ANALYSIS.sql script to analyze the data and generate insights.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Cleaned_layoffs.csv		Cleaned_layoffs.csv
DATA_ANALYSIS.sql		DATA_ANALYSIS.sql
DATA_CLEANING.sql		DATA_CLEANING.sql
README.md		README.md
layoffs.csv		layoffs.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data-Cleaning-and-Analysis

Overview

Project Structure

Requirements

Data Cleaning Overview

1. Removing Duplicates

2. Standardizing Data

3. Handling Missing and Blank Values

4. Removing Unnecessary Rows and Columns

Data Analysis Overview

1. Maximum Layoffs, Percentage, and Funds Raised

2. Monthly Layoff Trends

3. Company-wise Layoffs

4. Industry-wise and Country-wise Layoffs

5. Year-wise Layoffs

How to Run the Project

About

Uh oh!

Releases

Packages

itzrv19/Data-Cleaning-and-Analysis

Folders and files

Latest commit

History

Repository files navigation

Data-Cleaning-and-Analysis

Overview

Project Structure

Requirements

Data Cleaning Overview

1. Removing Duplicates

2. Standardizing Data

3. Handling Missing and Blank Values

4. Removing Unnecessary Rows and Columns

Data Analysis Overview

1. Maximum Layoffs, Percentage, and Funds Raised

2. Monthly Layoff Trends

3. Company-wise Layoffs

4. Industry-wise and Country-wise Layoffs

5. Year-wise Layoffs

How to Run the Project

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages