Crowdfunding-ETL

Overview of Project

In this module, we were introduced to a crowdfunding platform called Independent Funding. Once the platform became more popular for funding independent projects, data needed to be stored in a database. Hence, we were tasked with helping, Britta, a junior SQL developer, move the data to a database and generate reports for stakeholders. This way, we also learned about the extract, transform, and load (ETL) process used to create a data pipeline. However, towards the end of the ETL process, new data was received and had to be analyzed.

Purpose

The purpose of this project was to carry out the ETL process and data analysis on the new dataset from the backers info that had pledged to the live projects.

Results

The extract and transform phases were done using Python, Pandas, and Jupyter notebook. Once the data was collected and cleaned, an entity relationship diagram (ERD) and table schema were created as a guide for storing the data in a database that was created using PostgressSQL.

When updating the ERD and table schema, the primary and foreign keys from the backers csv file had to be carefully checked to confirm that they referenced the relevant table. As seen in Figure 1, the Backers foreign key (cf_id) references the Campaign table's primary key (cf_id).

Figure 1

As a visual piece of information, the ERD came in handy when loading the data into the tables in pgAdmin 4. This is because as mentioned in the module tables referenced by a foreign key constraint on another table, must be imported first or we get an error message. There are other things to consider as well. For example, it took me a while to figure out how to import the data for the campaign table because there was an extra column that went unnoticed. Once I went back and corrected the mistake, data importing was a success. Once all the data was imported into the corresponding tables, an SQL Analysis was conducted to create a final table with information about the remaining goal amounts.

Summary

Overall, ETL is a very long process that requires a lot of attention to detail. It is crucial to clearly understand the end goal and process needed to reach such end goal. Otherwise, you can easily find yourself retracing your steps to fix the problem. This can be very time consuming when dealing with massive amounts of data.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Resources		Resources
data		data
.gitignore		.gitignore
Extract-Transform_final_code.ipynb.ipynb		Extract-Transform_final_code.ipynb.ipynb
README.md		README.md
backers.csv		backers.csv
crowdfunding_SQL_Analysis.sql.sql		crowdfunding_SQL_Analysis.sql.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resources

Resources

data

data

.gitignore

.gitignore

Extract-Transform_final_code.ipynb.ipynb

Extract-Transform_final_code.ipynb.ipynb

README.md

README.md

backers.csv

backers.csv

crowdfunding_SQL_Analysis.sql.sql

crowdfunding_SQL_Analysis.sql.sql

Repository files navigation

Crowdfunding-ETL

Overview of Project

Purpose

Results

Figure 1

Summary

About

Releases

Packages

Languages

LLudivina/Crowdfunding-ETL

Folders and files

Latest commit

History

Repository files navigation

Crowdfunding-ETL

Overview of Project

Purpose

Results

Figure 1

Summary

About

Topics

Resources

Stars

Watchers

Forks

Languages