Skip to content

An ETL Pipeline for NYPD Arrest Data using Python, Snowflake, and Power BI

Notifications You must be signed in to change notification settings

sarthakgirdhar/NYPD-Arrests

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

NYPD Arrests

In this project, I build an ETL Pipeline for NYPD Arrest Data using Python, Snowflake, and Power BI.

I start with extracting the data from NYC Open Data API, then perform some transformations, do data validation, and then, finally load it into Snowflake.

In Snowflake, I perform some Exploratory Data Analysis (EDA) to answer some questions. Finally, I also connect Snowflake to Power BI to gather some further detailed insights.

About the data

There are about 227,000 rows present for the 2023 arrest data. Each record represents an arrest effected in NYC by the NYPD and includes information about the type of crime, the location, time of enforcement, suspect demographics, etc.

Skills demonstrated

  • Understanding dataset and reading supporting documentation.

  • Writing code in Python to extract data from NYC Open Data API.

  • Checking for data quality and transformations.

  • Creating a data warehouse in Snowflake so that it's easier for the downstream users to perform data analysis.

  • Data analysis - answering stakeholder’s questions.

  • Visualizations in Power BI.

    Read the detailed story in two parts - part 1 and part 2.

About

An ETL Pipeline for NYPD Arrest Data using Python, Snowflake, and Power BI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages