The purpose of the Amazon-SQL-Data-Warehouse-Project is to build knowledge in data warehousing and analytics.
The project incorporates data modelling, performing ETL processes through SQL, normalization of data, creating schemas, databases, and tables in SQL Server.
The objective is to get the data from the data sources to the end users (data analyst, business users, management).
Create a data warehouse using SQL Server to centralize data from Amazon and provide production ready data to end-users.
- Data Sources: Import data from several different sources in excel files.
- Data Quality: Cleanse and resolve data quality issues prior to loading data to final stage.
- Integration: Combine all sources into a single, user_friendly data model designed for analytical queries.
- Scope: Historization is not needed for this project, only updated data is wanted.
- Documentation: Provide clear documentation of the data model to support both business stakeholders and analytics teams.
Develop SQL-Based analytics reporting for detailed insights of:
- Sales Trends
- Product Performance
- Intelligence on Customer buying trends
Data Architechture
The data architecture for this project is using the Medallion model. Going by Bronze, Silver, and Gold layers:
- Bronze Lyaer: The first layer where the extracted data is loaded too. The bronze layer consist of raw data, extracted from a csv file, into SQL server database.
- Silver Layer: In this layer is where data is normalized, cleansed, standardized, and prepared for the final destination for analysis.
- Gold Layer: During this phase, the data is production ready and ready to be used by data analyst and business users. This data can be used for reporting or ad hoc queries.
Top 5 products by # of orders
SELECT *
FROM
(
SELECT [product_name], count(order_id) count_of_orders, dense_rank() over(order by count(order_id) desc) rank_by_orders
FROM [gold].[Amazon_Data_Set_products] p
LEFT JOIN [gold].[Amazon_Data_Set_orders_items] oi
ON p.product_id = oi.product_id
GROUP BY [product_name]
) product_by_orders
WHERE rank_by_orders <=5
Top ten products with lowest profit by margin
select *
from
(
SELECT product_name, sum([quantity]*[price_per_unit])/ sum(cogs) profit_margin_by_product,
dense_rank() over(order by sum([quantity]*[price_per_unit])/ sum(cogs) desc) rank_by_profit_margin
FROM [gold].[Amazon_Data_Set_orders_items] oi INNER JOIN
[gold].[Amazon_Data_Set_products] p ON
oi.product_id = p.product_id
GROUP BY product_name
) m
where rank_by_profit_margin <=10