Amazon-SQL-Data-Warehouse-Project

The purpose of the Amazon-SQL-Data-Warehouse-Project is to build knowledge in data warehousing and analytics.
The project incorporates data modelling, performing ETL processes through SQL, normalization of data, creating schemas, databases, and tables in SQL Server. The objective is to get the data from the data sources to the end users (data analyst, business users, management).

Project Requirements

Building The Data Warehouse (Data Engineering)

Objective

Create a data warehouse using SQL Server to centralize data from Amazon and provide production ready data to end-users.

Specifications

Data Sources: Import data from several different sources in excel files.
Data Quality: Cleanse and resolve data quality issues prior to loading data to final stage.
Integration: Combine all sources into a single, user_friendly data model designed for analytical queries.
Scope: Historization is not needed for this project, only updated data is wanted.
Documentation: Provide clear documentation of the data model to support both business stakeholders and analytics teams.

BI: Analytics & Reporting

Objective

Develop SQL-Based analytics reporting for detailed insights of:

Sales Trends
Product Performance
Intelligence on Customer buying trends

Data Architechture

The data architecture for this project is using the Medallion model. Going by Bronze, Silver, and Gold layers:

Bronze Lyaer: The first layer where the extracted data is loaded too. The bronze layer consist of raw data, extracted from a csv file, into SQL server database.
Silver Layer: In this layer is where data is normalized, cleansed, standardized, and prepared for the final destination for analysis.
Gold Layer: During this phase, the data is production ready and ready to be used by data analyst and business users. This data can be used for reporting or ad hoc queries.

Top 5 products by # of orders

SELECT * 
FROM
(
SELECT  [product_name], count(order_id) count_of_orders, dense_rank() over(order by count(order_id) desc) rank_by_orders
FROM [gold].[Amazon_Data_Set_products] p
LEFT JOIN [gold].[Amazon_Data_Set_orders_items] oi
ON p.product_id = oi.product_id
GROUP BY [product_name]
) product_by_orders

WHERE rank_by_orders <=5

Top ten products with lowest profit by margin

select *
from
(
SELECT product_name, sum([quantity]*[price_per_unit])/ sum(cogs) profit_margin_by_product, 
dense_rank() over(order by sum([quantity]*[price_per_unit])/ sum(cogs) desc) rank_by_profit_margin
FROM [gold].[Amazon_Data_Set_orders_items] oi INNER JOIN 
[gold].[Amazon_Data_Set_products] p ON 
oi.product_id = p.product_id
GROUP BY product_name
) m
where rank_by_profit_margin <=10

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
Datasets		Datasets
Docs		Docs
scripts		scripts
Ad Hoc Analysis		Ad Hoc Analysis
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Amazon-SQL-Data-Warehouse-Project

Project Requirements

Building The Data Warehouse (Data Engineering)

Objective

Specifications

BI: Analytics & Reporting

Objective

About

Uh oh!

Releases

Packages

Languages

License

niampython/Amazon-SQL-Data-Warehouse-Project

Folders and files

Latest commit

History

Repository files navigation

Amazon-SQL-Data-Warehouse-Project

Project Requirements

Building The Data Warehouse (Data Engineering)

Objective

Specifications

BI: Analytics & Reporting

Objective

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages