# Milestone 4

Your boss is excited that you now have the schema for the database and all the sales data is in one location.
Since you've done such a great job they would like you to get some up-to-date metrics from the data.
The business can then start making more data-driven decisions and get a better understanding of its sales.
In this milestone, you will be tasked with answering business questions and extracting the data from the database using SQL.

1. How many stores dpoes the business have and in which countries?

The Operations team would like to know which countries we currently operate in and which country now has the most stores. Perform a query on the database to get the information, it should return the following information:

+----------+-----------------+
| country  | total_no_stores |
+----------+-----------------+
| GB       |             265 |
| DE       |             141 |
| US       |              34 |
+----------+-----------------+
Note: DE is short for Deutschland(Germany)

The information for this is in the dim_stores_details table. We want to SELECT country and COUNT the numbers of stores, grouped by country.

In [None]:
SELECT 
    country_code AS country, COUNT(country_code) AS total_no_stores
FROM 
    dim_stores_details 
GROUP BY 
    country
ORDER BY total_no_stores DESC;

I have one more store, in the GB. This must be the webstore, where there was a GB value for the webstore. I have now amended this.

2. Which locations have the most stores?

The business stakeholders would like to know which locations currently have the most stores.

They would like to close some stores before opening more in other locations.

Find out which locations have the most stores currently. The query should return the following:

In [None]:
SELECT 
    locality, 
    COUNT(locality) AS total_no_stores
FROM
    dim_stores_details
GROUP BY
    locality
ORDER BY 
    total_no_stores DESC
LIMIT 7;

3. Which months produce the average highest cost of sales typically?

Query the database to find out which months typically have the most sales your query should return the following information:

    +-------------+-------+
    | total_sales | month |
    +-------------+-------+
    |   673295.68 |     8 |
    |   668041.45 |     1 |
    |   657335.84 |    10 |
    |   650321.43 |     5 |
    |   645741.70 |     7 |
    |   645463.00 |     3 |
    +-------------+-------+

I think I'm going to have to go back and not amalgamate my dates - or I need to extract the month as a number.

We need to join date_uuid on orders with date_uuid on dim_date_times to get the dates, and then we need to join product code on orders with product code in dim_products to get the product_price, which we then need to multiply by the product quantity to get the individual amoutn of the transaction, and then sum this per month.

https://www.commandprompt.com/education/how-to-group-by-month-in-postgresql/ MONTH() doesn't appear in postgres. EXTRACT('MONTH' FROM published_date);

In [None]:
--SELECT EXTRACT('MONTH' FROM date) AS month from dim_date_times;

SELECT 
    ROUND(CAST(SUM(product_price * product_quantity) AS numeric), 2) AS total_sales, 
    --SUM(product_price * product_quantity) AS total_sales, 
    EXTRACT('MONTH' FROM date) AS month
FROM
    orders_table 
    INNER JOIN dim_date_times ON orders_table.date_uuid = dim_date_times.date_uuid
    INNER JOIN dim_products ON orders_table.product_code = dim_products.product_code
 GROUP BY EXTRACT('MONTH' FROM date)
 ORDER BY total_sales DESC
 LIMIT 6;

4. How many sales are coming from online?

The company is looking to increase its online sales.

They want to know how many sales are happening online vs offline.

Calculate how many products were sold and the amount of sales made for online and offline purchases.

You should get the following information:

    +------------------+-------------------------+----------+
    | numbers_of_sales | product_quantity_count  | location |
    +------------------+-------------------------+----------+
    |            26957 |                  107739 | Web      |
    |            93166 |                  374047 | Offline  |
    +------------------+-------------------------+----------+

    Web is any transaction where the store_code begins with WEB. 
    A "sale" is an entryin the orders table. So we count the date_uuid as this is the only unique value in this table, i think. We then also sum the product quantity. Then we need the "location" from the stores info, so we need to do an inner join on the dim_stores_details table.

In [None]:
SELECT COUNT(date_uuid) AS number_of_sales, SUM(product_quantity) AS product_quantity_count,
	CASE
		WHEN store_code ILIKE 'WEB%' THEN 'Web'
		ELSE 'Offline'
	END location
FROM orders_table  -- INNER JOIN orders_table ON dim_stores_details.store_code = orders_table.store_code;
GROUP BY location
ORDER BY number_of_sales; -- number_of_sales, product_quantity_count, location;


5. What percentage of sales comes through each type of store?

The sales team wants to know which of the different store types is generated the most revenue so they know where to focus.

Find out the total and percentage of sales coming from each of the different store types.

The query should return:

+-------------+-------------+---------------------+
| store_type  | total_sales | percentage_total(%) |
+-------------+-------------+---------------------+
| Local       |  3440896.52 |               44.87 |
| Web portal  |  1726547.05 |               22.44 |
| Super Store |  1224293.65 |               15.63 |
| Mall Kiosk  |   698791.61 |                8.96 |
| Outlet      |   631804.81 |                8.10 |
+-------------+-------------+---------------------+

So, store_type is in dim_stores_details, and total sales will be in orders_table and as product quantity * product_price in dim_products.

There is a discrepancy with the percentage figures, which is odd as the percentages I have appear to be correct percentages according to the figures as a proportion of the total.


In [None]:
SELECT 
    store_type, 
    ROUND(SUM(product_quantity::numeric * product_price::numeric), 2) AS total_sales, -- trying a different way to cast to get around the annoying CAST with brackets. Either way, casting appears to be necessary for ROUND to work
    ROUND(SUM(product_quantity::numeric * product_price::numeric) / (
        SELECT 
            SUM(product_quantity * product_price)
        FROM 
            orders_table
        INNER JOIN dim_products ON orders_table.product_code = dim_products.product_code
    )::numeric * 100, 2) AS "percentage_total(%)"
FROM 
    orders_table
INNER JOIN dim_stores_details ON orders_table.store_code = dim_stores_details.store_code 
INNER JOIN dim_products ON orders_table.product_code = dim_products.product_code
GROUP BY store_type
ORDER BY total_sales DESC;