Skip to content

Dự án đầu tiên, làm quen với github và Dbeaver. Sử dụng SQL để làm sạch dữ liệu

Notifications You must be signed in to change notification settings

hackbathien/SQL-Data-Cleaning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

SQL-Data-Cleaning

Dự án đầu tiên, làm quen với github và Dbeaver. Sử dụng SQL để làm sạch dữ liệu

tải xuống

This is an educational project on data cleaning and preparation using SQL. The original database in CSV format is located in the file club_member_info.csv. Here, we will explore the steps that need to be applied to obtain a cleansed version of the dataset.

1. View Data: Use query to view data

FROM club_member_info cmi
limit 5;

Result:

full_name age martial_status email phone full_address job_title membership_date
addie lush 40 married alush0@shutterfly.com 254-389-8708 3226 Eastlawn Pass,Temple,Texas Assistant Professor 7/31/2013
ROCK CRADICK 46 married rcradick1@newsvine.com 910-566-2007 4 Harbort Avenue,Fayetteville,North Carolina Programmer III 5/27/2018
Sydel Sharvell 46 divorced ssharvell2@amazon.co.jp 702-187-8715 4 School Place,Las Vegas,Nevada Budget/Accounting Analyst I 10/6/2017
Constantin de la cruz 35 co3@bloglines.com 402-688-7162 6 Monument Crossing,Omaha,Nebraska Desktop Support Technician 10/20/2015
Gaylor Redhole 38 married gredhole4@japanpost.jp 917-394-6001 88 Cherokee Pass,New York City,New York Legal Assistant 5/29/2019

2. Create a Cleaned table

Create a new table

	full_name VARCHAR(50),
	age INTEGER,
	martial_status VARCHAR(50),
	email VARCHAR(50),
	phone NVARCHAR(50),
	full_address NVARCHAR(50),
	job_title VARCHAR(50),
	membership_date NVARCHAR(50)
);

Copy Data to new table

INSERT INTO club_member_info_CLEANED 
select * from club_member_info ;

3. Check the duplicate

Write the sript to check the dup;icate

SELECT 
	full_name
	, age
	, martial_status 
	, email
	, phone
	, full_address
	, job_title
	, membership_date
	, COUNT(*) as count_dulicate
FROM club_member_info_CLEANED cmic 
GROUP BY 
	full_name
	, age
	, martial_status 
	, email
	, phone
	, full_address
	, job_title
	, membership_date
HAVING COUNT (*) >1;

About

Dự án đầu tiên, làm quen với github và Dbeaver. Sử dụng SQL để làm sạch dữ liệu

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published