# **1\. Data Cleansing Steps**

In a single query, perform the following operations and generate a new table in the `data_mart` schema named `clean_weekly_sales`:

- Convert the `week_date` to a `DATE` format
    
- Add a `week_number` as the second column for each `week_date` value, for example any value from the 1st of January to 7th of January will be 1, 8th to 14th will be 2 etc
    
- Add a `month_number` with the calendar month for each `week_date` value as the 3rd column
    
- Add a `calendar_year` column as the 4th column containing either 2018, 2019 or 2020 values
    
- Add a new column called `age_band` after the original `segment` column using the following mapping on the number inside the `segment` value
    

| segment | age\_band |
| --- | --- |
| 1 | Young Adults |
| 2 | Middle Aged |
| 3 or 4 | Retirees |

- Add a new `demographic` column using the following mapping for the first letter in the `segment` values:

|  |  |
| --- | --- |
| segment | demographic |
| C | Couples |
| F | Families |

- Ensure all `null` string values with an `"unknown"` string value in the original `segment` column as well as the new `age_band` and `demographic` columns
    
- Generate a new `avg_transaction` column as the `sales` value divided by `transactions` rounded to 2 decimal places for each record

In [3]:
CREATE VIEW clean_weekly_sales as
with cte2 as(
	with cte1 as(
		SELECT
			*,
			to_date(week_Date, 'DD MM YY') as date
		FROM weekly_sales)
	SELECT
		*,
		DATE_PART('week',date) as week_number,
		DATE_PART('month',date) as month_number,
		DATE_PART('year',date) as calendar_year,
		CASE WHEN segment='null' then 'unknown' else segment END as new_segment
	FROM cte1)
SELECT
	date,
	week_number,
	month_number,
	calendar_year,
	CASE
		WHEN new_segment LIKE '%1' THEN 'Young Adults'
		WHEN new_segment LIKE '%2' THEN 'Middle Aged'
		WHEN new_segment LIKE '%3' or new_segment LIKE '%4' THEN 'Retirees'
		ELSE new_segment
	END as age_band,
	CASE
		WHEN new_segment LIKE 'C%' THEN 'Couples'
		WHEN new_segment LIKE 'F%' THEN 'Families'
		ELSE new_segment
	END as demographic,
	ROUND((sales/transactions),2) as avg_transaction,
	region,
	platform,
	customer_type,
	transactions,
	sales	
FROM cte2;

In [4]:
SELECT
*
FROM clean_weekly_sales

date,week_number,month_number,calendar_year,age_band,demographic,avg_transaction,region,platform,customer_type,transactions,sales
2020-08-31,36.0,8.0,2020.0,Retirees,Couples,30.0,ASIA,Retail,New,120631,3656163
2020-08-31,36.0,8.0,2020.0,Young Adults,Families,31.0,ASIA,Retail,New,31574,996575
2020-08-31,36.0,8.0,2020.0,unknown,unknown,31.0,USA,Retail,Guest,529151,16509610
2020-08-31,36.0,8.0,2020.0,Young Adults,Couples,31.0,EUROPE,Retail,New,4517,141942
2020-08-31,36.0,8.0,2020.0,Middle Aged,Couples,30.0,AFRICA,Retail,New,58046,1758388
2020-08-31,36.0,8.0,2020.0,Middle Aged,Families,182.0,CANADA,Shopify,Existing,1336,243878
2020-08-31,36.0,8.0,2020.0,Retirees,Families,206.0,AFRICA,Shopify,Existing,2514,519502
2020-08-31,36.0,8.0,2020.0,Young Adults,Families,172.0,ASIA,Shopify,Existing,2158,371417
2020-08-31,36.0,8.0,2020.0,Middle Aged,Families,155.0,AFRICA,Shopify,New,318,49557
2020-08-31,36.0,8.0,2020.0,Retirees,Couples,35.0,AFRICA,Retail,New,111032,3888162
