*Source of the case study:* https://8weeksqlchallenge.com/case-study-8/

---

## Case Study #8 - Fresh Segments

**Introduction**
<br>Danny created Fresh Segments, a digital marketing agency that helps other businesses analyze trends in online ad click behaviour for their unique customer base.

Clients share their customer lists with the Fresh Segments team, who then aggregate interest metrics and generate a single dataset worth of metrics for further analysis.

In particular - the composition and rankings for different interests are provided for each client showing the proportion of their customer list, who interacted with online assets related to each interest for each month.

Danny has asked for your assistance to analyze aggregated metrics for an example client and provide some high level insights about the customer list and their interests.

**Available Data**
<br>For this case study there is a total of 2 datasets which you will need to use to solve the questions.

**Interest Metrics**
<br>This table contains information about aggregated interest metrics for a specific major client of Fresh Segments which makes up a large proportion of their customer base.

Each record in this table represents the performance of a specific `interest_id` based on the client’s customer base interest measured through clicks and interactions with specific targeted advertising content.

For example - let’s interpret the first row of the `interest_metrics` table together:

|_month	|_year	|month_year	|interest_id	|composition	|index_value	|ranking	|percentile_ranking|
| ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
|7	|2018	|07-2018	|32486	|11.89	|6.19	|1	|99.86|

In July 2018, the `composition` metric is 11.89, meaning that 11.89% of the client’s customer list interacted with the interest `interest_id` = 32486 - we can link `interest_id` to a separate mapping table to find the segment name called “Vacation Rental Accommodation Researchers”

The `index_value` is 6.19, means that the `composition` value is 6.19x the average composition value for all Fresh Segments clients’ customer for this particular interest in the month of July 2018.

The `ranking` and `percentage_ranking` relates to the order of `index_value` records in each month year.

**Interest Map**
<br>This mapping table links the `interest_id` with their relevant interest information. You will need to join this table onto the previous `interest_details` table to obtain the `interest_name` as well as any details about the summary information.

**Case Study Questions**

The following questions can be considered key business questions that are required to be answered for the Fresh Segments team.

Most questions can be answered using a single query, however, some questions are more open ended and require additional thought and not just a coded solution!

**1. Data Exploration and Cleansing**
1. Update the `fresh_segments.interest_metrics` table by modifying the `month_year` column to be a date data type with the start of the month
1. What is count of records in the `fresh_segments.interest_metrics` for each `month_year` value sorted in chronological order (earliest to latest) with the null values appearing first?
1. What do you think we should do with these null values in the `fresh_segments.interest_metrics`
1. How many `interest_id` values exist in the `fresh_segments.interest_metrics` table but not in the `fresh_segments.interest_map` table? What about the other way around?
1. Summarise the `id` values in the `fresh_segments.interest_map` by its total record count in this table
1. What sort of table join should we perform for our analysis and why? Check your logic by checking the rows where `interest_id` = 21246 in your joined output and include all columns from `fresh_segments.interest_metrics` and all columns from `fresh_segments.interest_map` except from the `id` column.
1. Are there any records in your joined table where the `month_year` value is before the `created_at` value from the `fresh_segments.interest_map` table? Do you think these values are valid and why?

**2. Interest Analysis**
1. Which interests have been present in all `month_year` dates in our dataset?
1. Using this same `total_months` measure - calculate the cumulative percentage of all records starting at 14 months - which `total_months` value passes the 90% cumulative percentage value?
1. If we were to remove all `interest_id` values which are lower than the `total_months` value we found in the previous question - how many total data points would we be removing?
1. Does this decision make sense to remove these data points from a business perspective? Use an example where there are all 14 months present to a removed interest example for your arguments - think about what it means to have less months present from a segment perspective.
1. After removing these interests - how many unique interests are there for each month?

**3. Segment Analysis**
1. Using our filtered dataset by removing the interests with less than 6 months worth of data, which are the top 10 and bottom 10 interests which have the largest composition values in any `month_year`? Only use the maximum composition value for each interest but you must keep the corresponding `month_year`
1. Which 5 interests had the lowest average `ranking` value?
1. Which 5 interests had the largest standard deviation in their `percentile_ranking` value?
1. For the 5 interests found in the previous question - what was minimum and maximum `percentile_ranking` values for each interest and its corresponding `year_month` value? Can you describe what is happening for these 5 interests?
1. How would you describe our customers in this segment based off their `composition` and `ranking` values? What sort of products or services should we show to these customers and what should we avoid?

**4. Index Analysis**
<br>The `index_value` is a measure which can be used to reverse calculate the average composition for Fresh Segments’ clients.

Average composition can be calculated by dividing the `composition` column by the `index_value` column rounded to 2 decimal places.

1. What is the top 10 interests by the average composition for each month?
1. For all of these top 10 interests - which interest appears the most often?
1. What is the average of the average composition for the top 10 interests for each month?
1. What is the 3 month rolling average of the max average composition value from September 2018 to August 2019 and include the previous top ranking interests in the same output shown below.
1. Provide a possible reason why the max average composition might change from month to month? Could it signal something is not quite right with the overall business model for Fresh Segments?

Required output for question 4:

|month_year	|interest_name	|max_index_composition	|3_month_moving_avg	|1_month_ago	|2_months_ago|
| ----- | ----- | ----- | ----- | ----- | ----- |
|2018-09-01	|Work Comes First Travelers	|8.26	|7.61	|Las Vegas Trip Planners: 7.21	|Las Vegas Trip Planners: 7.36|
|2018-10-01	|Work Comes First Travelers	|9.14	|8.20	|Work Comes First Travelers: 8.26	|Las Vegas Trip Planners: 7.21|
|2018-11-01	|Work Comes First Travelers	|8.28	|8.56	|Work Comes First Travelers: 9.14	|Work Comes First Travelers: 8.26|
|2018-12-01	|Work Comes First Travelers	|8.31	|8.58	|Work Comes First Travelers: 8.28	|Work Comes First Travelers: 9.14|
|2019-01-01	|Work Comes First Travelers	|7.66	|8.08	|Work Comes First Travelers: 8.31	|Work Comes First Travelers: 8.28|
|2019-02-01	|Work Comes First Travelers	|7.66	|7.88	|Work Comes First Travelers: 7.66	|Work Comes First Travelers: 8.31|
|2019-03-01	|Alabama Trip Planners	|6.54	|7.29	|Work Comes First Travelers: 7.66	|Work Comes First Travelers: 7.66|
|2019-04-01	|Solar Energy Researchers	|6.28	|6.83	|Alabama Trip Planners: 6.54	|Work Comes First Travelers: 7.66|
|2019-05-01	|Readers of Honduran Content	|4.41	|5.74	|Solar Energy Researchers: 6.28	|Alabama Trip Planners: 6.54|
|2019-06-01	|Las Vegas Trip Planners	|2.77	|4.49	|Readers of Honduran Content: 4.41	|Solar Energy Researchers: 6.28|
|2019-07-01	|Las Vegas Trip Planners	|2.82	|3.33	|Las Vegas Trip Planners: 2.77	|Readers of Honduran Content: 4.41|
|2019-08-01	|Cosmetics and Beauty Shoppers	|2.73	|2.77	|Las Vegas Trip Planners: 2.82	|Las Vegas Trip Planners: 2.77|

---

**1. Data Exploration and Cleansing**

**1.1. Update the `fresh_segments.interest_metrics` table by modifying the `month_year` column to be a date data type with the start of the month**

**Query #1**

    ALTER TABLE fresh_segments.interest_metrics
    ALTER COLUMN month_year TYPE DATE 
    USING to_date(month_year, 'MM-YYYY');

There are no results to be displayed.

---

**1.2. What is count of records in the `fresh_segments.interest_metrics` for each `month_year` value sorted in chronological order (earliest to latest) with the null values appearing first?**

**Query #2**

    SELECT month_year,
    	   COUNT(*)
    FROM fresh_segments.interest_metrics
    GROUP BY 1
    ORDER BY 1 NULLS FIRST;

| month_year               | count |
| ------------------------ | ----- |
|                          | 1194  |
| 2018-07-01T00:00:00.000Z | 729   |
| 2018-08-01T00:00:00.000Z | 767   |
| 2018-09-01T00:00:00.000Z | 780   |
| 2018-10-01T00:00:00.000Z | 857   |
| 2018-11-01T00:00:00.000Z | 928   |
| 2018-12-01T00:00:00.000Z | 995   |
| 2019-01-01T00:00:00.000Z | 973   |
| 2019-02-01T00:00:00.000Z | 1121  |
| 2019-03-01T00:00:00.000Z | 1136  |
| 2019-04-01T00:00:00.000Z | 1099  |
| 2019-05-01T00:00:00.000Z | 857   |
| 2019-06-01T00:00:00.000Z | 824   |
| 2019-07-01T00:00:00.000Z | 864   |
| 2019-08-01T00:00:00.000Z | 1149  |

---

**1.3. What do you think we should do with these null values in the `fresh_segments.interest_metrics`**

Since there are not so many null values, we can remove rows with them:


**Query #3**

    CREATE TEMP TABLE interest_metrics_clean AS
    (SELECT *
     FROM fresh_segments.interest_metrics
     WHERE month_year IS NOT NULL);

There are no results to be displayed.

---

**1.4. How many `interest_id` values exist in the `fresh_segments.interest_metrics` table but not in the `fresh_segments.interest_map` table? What about the other way around?**

Let's change the datatype of the `interest_id` column to `integer`:

**Query #4**

    ALTER TABLE interest_metrics_clean
    ALTER COLUMN interest_id TYPE INT
    USING interest_id::INTEGER;

There are no results to be displayed.

And now we can count the number of `interest_id` values, which exist in the `interest_metrics_clean` table but not in the `fresh_segments.interest_map` table:

**Query #5**

    SELECT COUNT(interest_id) AS id_cnt_not_in_map
    FROM interest_metrics_clean
    WHERE interest_id NOT IN (SELECT id FROM fresh_segments.interest_map);

| id_cnt_not_in_map |
| ----------------- |
| 0                 |

Therefore, all the ids from the `interest_metrics_clean` table presented in the `fresh_segments.interest_map` table.

Let's check how many `id` values, which exist in the `fresh_segments.interest_map` but not in the `interest_metrics_clean` table:

**Query #6**

    SELECT COUNT(id) AS id_cnt_not_in_metrics
    FROM fresh_segments.interest_map
    WHERE id NOT IN (SELECT interest_id FROM interest_metrics_clean);

| id_cnt_not_in_metrics |
| --------------------- |
| 7                     |

So there are 7 `id`s that are in the `fresh_segments.interest_map` but not in the `interest_metrics_clean` table.

---

**1.5. Summarise the `id` values in the `fresh_segments.interest_map` by its total record count in this table**

**Query #7**

    SELECT COUNT(DISTINCT id) AS id_cnt
    FROM fresh_segments.interest_map;

| id_cnt |
| ------ |
| 1209   |

---

**1.6. What sort of table join should we perform for our analysis and why? Check your logic by checking the rows where `interest_id` = 21246 in your joined output and include all columns from `fresh_segments.interest_metrics` and all columns from `fresh_segments.interest_map` except from the `id` column.**

We can use INNER JOIN in order to have necessary data in all rows and columns:

**Query #8**

    CREATE TEMP TABLE interest_metrics_map_joined AS
    (SELECT c.*, 
    	    m.interest_name, 
            m.interest_summary, 
            m.created_at, 
            m.last_modified
     FROM fresh_segments.interest_map m
     INNER JOIN interest_metrics_clean c ON m.id = c.interest_id);

There are no results to be displayed.

---

**1.7. Are there any records in your joined table where the `month_year` value is before the `created_at` value from the `fresh_segments.interest_map` table? Do you think these values are valid and why?**

**Query #9**

    SELECT COUNT(*) AS date_before_created_at_cnt
    FROM interest_metrics_map_joined
    WHERE month_year < created_at;

| date_before_created_at_cnt |
| -------------------------- |
| 188                        |

There are 188 records, where the `month_year` value is before the `created_at` value.

Since the `month_year` value represents itself the begining of each month, we should ensure that `month_year` lies in the same month of `created_at`:

**Query #10**

    SELECT COUNT(*) AS date_before_created_at_cnt
    FROM interest_metrics_map_joined
    WHERE month_year < created_at
    AND _month::NUMERIC < EXTRACT(MONTH FROM created_at);

| date_before_created_at_cnt |
| -------------------------- |
| 0                          |

---

**2. Interest Analysis**

**2.1 Which interests have been present in all `month_year` dates in our dataset?**

**Query #11**

    SELECT COUNT(interest_id) AS interest_date_cnt
    FROM (SELECT interest_id,
    	   		 COUNT(DISTINCT month_year) AS dates_cnt
    	  FROM interest_metrics_map_joined
    	  GROUP BY 1
    	  HAVING COUNT(DISTINCT month_year) = (SELECT COUNT(DISTINCT month_year) FROM interest_metrics_map_joined)) t;

| interest_date_cnt |
| ----------------- |
| 480               |

---

**2.2. Using this same `total_months` measure - calculate the cumulative percentage of all records starting at 14 months - which `total_months` value passes the 90% cumulative percentage value?**

**Query #12**

    WITH temp AS
    (SELECT total_months,
    	    COUNT(interest_id) AS interest_id_cnt
     FROM (SELECT interest_id,
                  COUNT(DISTINCT month_year) AS total_months
    	   FROM interest_metrics_map_joined
    	   GROUP BY 1) t
     GROUP BY 1)
     
    SELECT total_months,
    	   ROUND(100*SUM(interest_id_cnt) OVER (ORDER BY total_months DESC) / SUM(interest_id_cnt) OVER(), 2) AS cum_percentage
    FROM temp;

| total_months | cum_percentage |
| ------------ | -------------- |
| 14           | 39.93          |
| 13           | 46.76          |
| 12           | 52.16          |
| 11           | 59.98          |
| 10           | 67.14          |
| 9            | 75.04          |
| 8            | 80.62          |
| 7            | 88.10          |
| 6            | 90.85          |
| 5            | 94.01          |
| 4            | 96.67          |
| 3            | 97.92          |
| 2            | 98.92          |
| 1            | 100.00         |

Starting from the 14th month down, only the sixth month passes the 90% cumulative percentage value.

---

**2.3. If we were to remove all `interest_id` values which are lower than the `total_months` value we found in the previous question - how many total data points would we be removing?**

**Query #13**

    SELECT COUNT(*) AS rows_cnt_to_remove
    FROM interest_metrics_map_joined
    WHERE interest_id IN (SELECT interest_id
    					  FROM (SELECT interest_id,
                                       COUNT(DISTINCT month_year) AS dates_cnt
                                FROM interest_metrics_map_joined
                                GROUP BY 1
                                HAVING COUNT(DISTINCT month_year) < 6) t);

| rows_cnt_to_remove |
| ------------------ |
| 400                |

---

**2.4. Does this decision make sense to remove these data points from a business perspective? Use an example where there are all 14 months present to a removed interest example for your arguments - think about what it means to have less months present from a segment perspective.**

Yes, it makes sense, because those `id`s with less than 6 months could be new ones and might change our perspective just because they are new. So it is better to analyze only `id`s that presented in over than 6 months.

**Query #14**

    CREATE TEMP TABLE interest_removed AS
    (SELECT *
     FROM interest_metrics_map_joined
     WHERE interest_id NOT IN (SELECT interest_id
    					   FROM (SELECT interest_id,
    					  	   		    COUNT(DISTINCT month_year) AS dates_cnt
    					  	     FROM interest_metrics_map_joined
    					  	     GROUP BY 1
    					  	     HAVING COUNT(DISTINCT month_year) < 6) t));

There are no results to be displayed.

---

**2.5. After removing these interests - how many unique interests are there for each month?**

**Query #15**

    SELECT month_year,
    	   COUNT (DISTINCT interest_id) AS interest_cnt
    FROM interest_removed
    GROUP BY 1;

| month_year               | interest_cnt |
| ------------------------ | ------------ |
| 2018-07-01T00:00:00.000Z | 709          |
| 2018-08-01T00:00:00.000Z | 752          |
| 2018-09-01T00:00:00.000Z | 774          |
| 2018-10-01T00:00:00.000Z | 853          |
| 2018-11-01T00:00:00.000Z | 925          |
| 2018-12-01T00:00:00.000Z | 986          |
| 2019-01-01T00:00:00.000Z | 966          |
| 2019-02-01T00:00:00.000Z | 1072         |
| 2019-03-01T00:00:00.000Z | 1078         |
| 2019-04-01T00:00:00.000Z | 1035         |
| 2019-05-01T00:00:00.000Z | 827          |
| 2019-06-01T00:00:00.000Z | 804          |
| 2019-07-01T00:00:00.000Z | 836          |
| 2019-08-01T00:00:00.000Z | 1062         |

---

**3. Segment Analysis**

**3.1. Using our filtered dataset by removing the interests with less than 6 months worth of data, which are the top 10 and bottom 10 interests which have the largest composition values in any `month_year`? Only use the maximum composition value for each interest but you must keep the corresponding `month_year`**

**Query #16**

    SELECT month_year,
    	   interest_id,
    	   MAX(composition) AS top10interests_composition
    FROM interest_removed
    GROUP BY 1,2
    ORDER BY 3 DESC
    LIMIT 10;

| month_year               | interest_id | top10interests_composition |
| ------------------------ | ----------- | -------------------------- |
| 2018-12-01T00:00:00.000Z | 21057       | 21.2                       |
| 2018-10-01T00:00:00.000Z | 21057       | 20.28                      |
| 2018-11-01T00:00:00.000Z | 21057       | 19.45                      |
| 2019-01-01T00:00:00.000Z | 21057       | 18.99                      |
| 2018-07-01T00:00:00.000Z | 6284        | 18.82                      |
| 2019-02-01T00:00:00.000Z | 21057       | 18.39                      |
| 2018-09-01T00:00:00.000Z | 21057       | 18.18                      |
| 2018-07-01T00:00:00.000Z | 39          | 17.44                      |
| 2018-07-01T00:00:00.000Z | 77          | 17.19                      |
| 2018-10-01T00:00:00.000Z | 12133       | 15.15                      |

**Query #17**

    SELECT month_year,
    	   interest_id,
    	   MAX(composition) AS bottom10interests_composition
    FROM interest_removed
    GROUP BY 1,2
    ORDER BY 3
    LIMIT 10;

| month_year               | interest_id | bottom10interests_composition |
| ------------------------ | ----------- | ----------------------------- |
| 2019-05-01T00:00:00.000Z | 45524       | 1.51                          |
| 2019-05-01T00:00:00.000Z | 4918        | 1.52                          |
| 2019-06-01T00:00:00.000Z | 34083       | 1.52                          |
| 2019-06-01T00:00:00.000Z | 35742       | 1.52                          |
| 2019-05-01T00:00:00.000Z | 20768       | 1.52                          |
| 2019-04-01T00:00:00.000Z | 44449       | 1.52                          |
| 2019-05-01T00:00:00.000Z | 39336       | 1.52                          |
| 2019-05-01T00:00:00.000Z | 36877       | 1.53                          |
| 2019-05-01T00:00:00.000Z | 6127        | 1.53                          |
| 2019-06-01T00:00:00.000Z | 6314        | 1.53                          |

---

**3.2. Which 5 interests had the lowest average `ranking` value?**

**Query #18**

    SELECT interest_id,
    	   ROUND(AVG(ranking), 2) AS avg_ranking
    FROM interest_removed
    GROUP BY 1
    ORDER BY 2
    LIMIT 5;

| interest_id | avg_ranking |
| ----------- | ----------- |
| 41548       | 1.00        |
| 42203       | 4.11        |
| 115         | 5.93        |
| 171         | 9.36        |
| 4           | 11.86       |

---

**3.3. Which 5 interests had the largest standard deviation in their `percentile_ranking` value?**

**Query #19**

    SELECT interest_id,
    	   ROUND(STDDEV(percentile_ranking)::numeric, 2) AS std_dev_percentile_ranking
    FROM interest_removed
    GROUP BY 1
    ORDER BY 2 DESC
    LIMIT 5;

| interest_id | std_dev_percentile_ranking |
| ----------- | -------------------------- |
| 23          | 30.18                      |
| 20764       | 28.97                      |
| 38992       | 28.32                      |
| 43546       | 26.24                      |
| 10839       | 25.61                      |

---

**3.4. For the 5 interests found in the previous question - what was minimum and maximum `percentile_ranking` values for each interest and its corresponding `year_month` value? Can you describe what is happening for these 5 interests?**

**Query #20**

    WITH t AS
    (SELECT interest_id,
    	    MIN(percentile_ranking) AS min_percentile_ranking,
            MAX(percentile_ranking) AS max_percentile_ranking
     FROM interest_removed
     WHERE interest_id IN (SELECT interest_id
                           FROM interest_removed
    					   GROUP BY 1
    					   ORDER BY STDDEV(percentile_ranking) DESC
    					   LIMIT 5)
     GROUP BY interest_id)
     
    SELECT t.interest_id,
    	   t.min_percentile_ranking,
           MIN(t1.month_year) AS month_year_of_min_percentile_ranking,
           t.max_percentile_ranking,
           MIN(t2.month_year) AS month_year_of_max_percentile_ranking
    FROM t
    LEFT JOIN interest_removed t1 ON t.min_percentile_ranking = t1.percentile_ranking
    LEFT JOIN interest_removed t2 ON t.max_percentile_ranking = t2.percentile_ranking
    GROUP BY 1,2,4
    ORDER BY 1;

| interest_id | min_percentile_ranking | month_year_of_min_percentile_ranking | max_percentile_ranking | month_year_of_max_percentile_ranking |
| ----------- | ---------------------- | ------------------------------------ | ---------------------- | ------------------------------------ |
| 23          | 7.92                   | 2019-08-01T00:00:00.000Z             | 86.69                  | 2018-07-01T00:00:00.000Z             |
| 10839       | 4.84                   | 2019-03-01T00:00:00.000Z             | 75.03                  | 2018-07-01T00:00:00.000Z             |
| 20764       | 11.23                  | 2019-08-01T00:00:00.000Z             | 86.15                  | 2018-07-01T00:00:00.000Z             |
| 38992       | 2.2                    | 2019-03-01T00:00:00.000Z             | 82.44                  | 2018-11-01T00:00:00.000Z             |
| 43546       | 5.7                    | 2019-06-01T00:00:00.000Z             | 73.15                  | 2019-03-01T00:00:00.000Z             |

---

**3.5. How would you describe our customers in this segment based off their `composition` and `ranking` values? What sort of products or services should we show to these customers and what should we avoid?**

To be continued...

**4. Index Analysis**

The `index_value` is a measure which can be used to reverse calculate the average composition for Fresh Segments’ clients.

Average composition can be calculated by dividing the `composition` column by the `index_value` column rounded to 2 decimal places.

**4.1. What is the top 10 interests by the average composition for each month?**

**Query #21**

    WITH t AS
    (SELECT month_year,
    	    interest_id,
    	    ROUND((composition/index_value)::numeric, 2) AS avg_composition,
            RANK() OVER (PARTITION BY month_year ORDER BY ROUND((composition/index_value)::numeric, 2) DESC)
     FROM interest_removed)
     
    SELECT *
    FROM t
    WHERE rank <=10;

| month_year               | interest_id | avg_composition | rank |
| ------------------------ | ----------- | --------------- | ---- |
| 2018-07-01T00:00:00.000Z | 6324        | 7.36            | 1    |
| 2018-07-01T00:00:00.000Z | 6284        | 6.94            | 2    |
| 2018-07-01T00:00:00.000Z | 4898        | 6.78            | 3    |
| 2018-07-01T00:00:00.000Z | 77          | 6.61            | 4    |
| 2018-07-01T00:00:00.000Z | 39          | 6.51            | 5    |
| 2018-07-01T00:00:00.000Z | 18619       | 6.10            | 6    |
| 2018-07-01T00:00:00.000Z | 6208        | 5.72            | 7    |
| 2018-07-01T00:00:00.000Z | 21060       | 4.85            | 8    |
| 2018-07-01T00:00:00.000Z | 21057       | 4.80            | 9    |
| 2018-07-01T00:00:00.000Z | 82          | 4.71            | 10   |
| 2018-08-01T00:00:00.000Z | 6324        | 7.21            | 1    |
| 2018-08-01T00:00:00.000Z | 6284        | 6.62            | 2    |
| 2018-08-01T00:00:00.000Z | 77          | 6.53            | 3    |
| 2018-08-01T00:00:00.000Z | 39          | 6.30            | 4    |
| 2018-08-01T00:00:00.000Z | 4898        | 6.28            | 5    |
| 2018-08-01T00:00:00.000Z | 21057       | 5.70            | 6    |
| 2018-08-01T00:00:00.000Z | 18619       | 5.68            | 7    |
| 2018-08-01T00:00:00.000Z | 6208        | 5.58            | 8    |
| 2018-08-01T00:00:00.000Z | 7541        | 4.83            | 9    |
| 2018-08-01T00:00:00.000Z | 5969        | 4.72            | 10   |
| 2018-09-01T00:00:00.000Z | 21057       | 8.26            | 1    |
| 2018-09-01T00:00:00.000Z | 21245       | 7.60            | 2    |
| 2018-09-01T00:00:00.000Z | 7541        | 7.27            | 3    |
| 2018-09-01T00:00:00.000Z | 5969        | 7.04            | 4    |
| 2018-09-01T00:00:00.000Z | 18783       | 6.70            | 5    |
| 2018-09-01T00:00:00.000Z | 10981       | 6.59            | 6    |
| 2018-09-01T00:00:00.000Z | 34          | 6.53            | 7    |
| 2018-09-01T00:00:00.000Z | 10977       | 6.47            | 8    |
| 2018-09-01T00:00:00.000Z | 13497       | 6.25            | 9    |
| 2018-09-01T00:00:00.000Z | 6065        | 6.24            | 10   |
| 2018-10-01T00:00:00.000Z | 21057       | 9.14            | 1    |
| 2018-10-01T00:00:00.000Z | 7541        | 7.10            | 2    |
| 2018-10-01T00:00:00.000Z | 21245       | 7.02            | 3    |
| 2018-10-01T00:00:00.000Z | 18783       | 7.02            | 3    |
| 2018-10-01T00:00:00.000Z | 5969        | 6.94            | 5    |
| 2018-10-01T00:00:00.000Z | 10981       | 6.91            | 6    |
| 2018-10-01T00:00:00.000Z | 34          | 6.78            | 7    |
| 2018-10-01T00:00:00.000Z | 10977       | 6.72            | 8    |
| 2018-10-01T00:00:00.000Z | 12133       | 6.53            | 9    |
| 2018-10-01T00:00:00.000Z | 6065        | 6.50            | 10   |
| 2018-11-01T00:00:00.000Z | 21057       | 8.28            | 1    |
| 2018-11-01T00:00:00.000Z | 21245       | 7.09            | 2    |
| 2018-11-01T00:00:00.000Z | 6065        | 7.05            | 3    |
| 2018-11-01T00:00:00.000Z | 7541        | 6.69            | 4    |
| 2018-11-01T00:00:00.000Z | 18783       | 6.65            | 5    |
| 2018-11-01T00:00:00.000Z | 5969        | 6.54            | 6    |
| 2018-11-01T00:00:00.000Z | 10981       | 6.31            | 7    |
| 2018-11-01T00:00:00.000Z | 10977       | 6.08            | 8    |
| 2018-11-01T00:00:00.000Z | 34          | 5.95            | 9    |
| 2018-11-01T00:00:00.000Z | 13497       | 5.59            | 10   |
| 2018-12-01T00:00:00.000Z | 21057       | 8.31            | 1    |
| 2018-12-01T00:00:00.000Z | 18783       | 6.96            | 2    |
| 2018-12-01T00:00:00.000Z | 7541        | 6.68            | 3    |
| 2018-12-01T00:00:00.000Z | 5969        | 6.63            | 4    |
| 2018-12-01T00:00:00.000Z | 21245       | 6.58            | 5    |
| 2018-12-01T00:00:00.000Z | 6065        | 6.55            | 6    |
| 2018-12-01T00:00:00.000Z | 10981       | 6.48            | 7    |
| 2018-12-01T00:00:00.000Z | 34          | 6.38            | 8    |
| 2018-12-01T00:00:00.000Z | 10977       | 6.09            | 9    |
| 2018-12-01T00:00:00.000Z | 21237       | 5.86            | 10   |
| 2019-01-01T00:00:00.000Z | 21057       | 7.66            | 1    |
| 2019-01-01T00:00:00.000Z | 6065        | 7.05            | 2    |
| 2019-01-01T00:00:00.000Z | 21245       | 6.67            | 3    |
| 2019-01-01T00:00:00.000Z | 5969        | 6.46            | 4    |
| 2019-01-01T00:00:00.000Z | 18783       | 6.46            | 4    |
| 2019-01-01T00:00:00.000Z | 7541        | 6.44            | 6    |
| 2019-01-01T00:00:00.000Z | 10981       | 6.16            | 7    |
| 2019-01-01T00:00:00.000Z | 34          | 5.96            | 8    |
| 2019-01-01T00:00:00.000Z | 10977       | 5.65            | 9    |
| 2019-01-01T00:00:00.000Z | 21237       | 5.48            | 10   |
| 2019-01-01T00:00:00.000Z | 15878       | 5.48            | 10   |
| 2019-02-01T00:00:00.000Z | 21057       | 7.66            | 1    |
| 2019-02-01T00:00:00.000Z | 18783       | 6.84            | 2    |
| 2019-02-01T00:00:00.000Z | 5969        | 6.76            | 3    |
| 2019-02-01T00:00:00.000Z | 7541        | 6.65            | 4    |
| 2019-02-01T00:00:00.000Z | 6065        | 6.58            | 5    |
| 2019-02-01T00:00:00.000Z | 10981       | 6.56            | 6    |
| 2019-02-01T00:00:00.000Z | 34          | 6.29            | 7    |
| 2019-02-01T00:00:00.000Z | 21245       | 6.24            | 8    |
| 2019-02-01T00:00:00.000Z | 19620       | 6.23            | 9    |
| 2019-02-01T00:00:00.000Z | 10977       | 5.98            | 10   |
| 2019-03-01T00:00:00.000Z | 7541        | 6.54            | 1    |
| 2019-03-01T00:00:00.000Z | 18783       | 6.52            | 2    |
| 2019-03-01T00:00:00.000Z | 5969        | 6.47            | 3    |
| 2019-03-01T00:00:00.000Z | 6065        | 6.40            | 4    |
| 2019-03-01T00:00:00.000Z | 21245       | 6.21            | 5    |
| 2019-03-01T00:00:00.000Z | 10981       | 6.21            | 5    |
| 2019-03-01T00:00:00.000Z | 19620       | 6.06            | 7    |
| 2019-03-01T00:00:00.000Z | 34          | 6.01            | 8    |
| 2019-03-01T00:00:00.000Z | 15878       | 5.65            | 9    |
| 2019-03-01T00:00:00.000Z | 10977       | 5.61            | 10   |
| 2019-03-01T00:00:00.000Z | 13497       | 5.61            | 10   |
| 2019-04-01T00:00:00.000Z | 6065        | 6.28            | 1    |
| 2019-04-01T00:00:00.000Z | 7541        | 6.21            | 2    |
| 2019-04-01T00:00:00.000Z | 5969        | 6.05            | 3    |
| 2019-04-01T00:00:00.000Z | 21245       | 6.02            | 4    |
| 2019-04-01T00:00:00.000Z | 18783       | 6.01            | 5    |
| 2019-04-01T00:00:00.000Z | 10981       | 5.65            | 6    |
| 2019-04-01T00:00:00.000Z | 19620       | 5.52            | 7    |
| 2019-04-01T00:00:00.000Z | 34          | 5.39            | 8    |
| 2019-04-01T00:00:00.000Z | 15878       | 5.30            | 9    |
| 2019-04-01T00:00:00.000Z | 13497       | 5.07            | 10   |
| 2019-05-01T00:00:00.000Z | 21245       | 4.41            | 1    |
| 2019-05-01T00:00:00.000Z | 15878       | 4.08            | 2    |
| 2019-05-01T00:00:00.000Z | 6065        | 3.92            | 3    |
| 2019-05-01T00:00:00.000Z | 19620       | 3.55            | 4    |
| 2019-05-01T00:00:00.000Z | 7541        | 3.34            | 5    |
| 2019-05-01T00:00:00.000Z | 2           | 3.29            | 6    |
| 2019-05-01T00:00:00.000Z | 5969        | 3.25            | 7    |
| 2019-05-01T00:00:00.000Z | 15884       | 3.19            | 8    |
| 2019-05-01T00:00:00.000Z | 10981       | 3.19            | 8    |
| 2019-05-01T00:00:00.000Z | 18783       | 3.15            | 10   |
| 2019-06-01T00:00:00.000Z | 6324        | 2.77            | 1    |
| 2019-06-01T00:00:00.000Z | 6284        | 2.55            | 2    |
| 2019-06-01T00:00:00.000Z | 4898        | 2.55            | 2    |
| 2019-06-01T00:00:00.000Z | 18619       | 2.52            | 4    |
| 2019-06-01T00:00:00.000Z | 77          | 2.46            | 5    |
| 2019-06-01T00:00:00.000Z | 39          | 2.39            | 6    |
| 2019-06-01T00:00:00.000Z | 6253        | 2.35            | 7    |
| 2019-06-01T00:00:00.000Z | 6208        | 2.27            | 8    |
| 2019-06-01T00:00:00.000Z | 7535        | 2.21            | 9    |
| 2019-06-01T00:00:00.000Z | 107         | 2.20            | 10   |
| 2019-07-01T00:00:00.000Z | 6324        | 2.82            | 1    |
| 2019-07-01T00:00:00.000Z | 77          | 2.81            | 2    |
| 2019-07-01T00:00:00.000Z | 6284        | 2.79            | 3    |
| 2019-07-01T00:00:00.000Z | 39          | 2.79            | 3    |
| 2019-07-01T00:00:00.000Z | 18619       | 2.78            | 5    |
| 2019-07-01T00:00:00.000Z | 4898        | 2.78            | 5    |
| 2019-07-01T00:00:00.000Z | 6253        | 2.77            | 7    |
| 2019-07-01T00:00:00.000Z | 7535        | 2.73            | 8    |
| 2019-07-01T00:00:00.000Z | 6208        | 2.72            | 9    |
| 2019-07-01T00:00:00.000Z | 7536        | 2.66            | 10   |
| 2019-08-01T00:00:00.000Z | 4898        | 2.73            | 1    |
| 2019-08-01T00:00:00.000Z | 6284        | 2.72            | 2    |
| 2019-08-01T00:00:00.000Z | 6324        | 2.70            | 3    |
| 2019-08-01T00:00:00.000Z | 18619       | 2.68            | 4    |
| 2019-08-01T00:00:00.000Z | 6065        | 2.66            | 5    |
| 2019-08-01T00:00:00.000Z | 39          | 2.59            | 6    |
| 2019-08-01T00:00:00.000Z | 77          | 2.59            | 6    |
| 2019-08-01T00:00:00.000Z | 4931        | 2.56            | 8    |
| 2019-08-01T00:00:00.000Z | 6253        | 2.55            | 9    |
| 2019-08-01T00:00:00.000Z | 6208        | 2.53            | 10   |

---

**4.2. For all of these top 10 interests - which interest appears the most often?**

**Query #22**

    WITH t AS
    (SELECT month_year,
    	    interest_id,
    	    ROUND((composition/index_value)::numeric, 2) AS avg_composition,
            RANK() OVER (PARTITION BY month_year ORDER BY ROUND((composition/index_value)::numeric, 2) DESC) AS avg_composition_rank
     FROM interest_removed),
     
     temp AS
    (SELECT interest_id,
    	    COUNT(*) AS interest_cnt,
            RANK() OVER(ORDER BY COUNT(*) DESC) AS interest_cnt_rank
     FROM t
     WHERE avg_composition_rank <=10
     GROUP BY 1)
     
    SELECT interest_id,
    	   interest_cnt
    FROM temp
    WHERE interest_cnt_rank = 1;

| interest_id | interest_cnt |
| ----------- | ------------ |
| 7541        | 10           |
| 6065        | 10           |
| 5969        | 10           |

---

**4.3. What is the average of the average composition for the top 10 interests for each month?**

**Query #23**

    WITH t AS
    (SELECT month_year,
    	    interest_id,
    	    ROUND((composition/index_value)::numeric, 2) AS avg_composition,
            RANK() OVER (PARTITION BY month_year ORDER BY ROUND((composition/index_value)::numeric, 2) DESC)
     FROM interest_removed)
     
    SELECT month_year,
           ROUND(AVG(avg_composition), 2) AS avg_of_avg_composition
    FROM t
    WHERE rank <=10
    GROUP BY 1;

| month_year               | avg_of_avg_composition |
| ------------------------ | ---------------------- |
| 2018-07-01T00:00:00.000Z | 6.04                   |
| 2018-08-01T00:00:00.000Z | 5.95                   |
| 2018-09-01T00:00:00.000Z | 6.90                   |
| 2018-10-01T00:00:00.000Z | 7.07                   |
| 2018-11-01T00:00:00.000Z | 6.62                   |
| 2018-12-01T00:00:00.000Z | 6.65                   |
| 2019-01-01T00:00:00.000Z | 6.32                   |
| 2019-02-01T00:00:00.000Z | 6.58                   |
| 2019-03-01T00:00:00.000Z | 6.12                   |
| 2019-04-01T00:00:00.000Z | 5.75                   |
| 2019-05-01T00:00:00.000Z | 3.54                   |
| 2019-06-01T00:00:00.000Z | 2.43                   |
| 2019-07-01T00:00:00.000Z | 2.77                   |
| 2019-08-01T00:00:00.000Z | 2.63                   |

---

**4.4. What is the 3 month rolling average of the max average composition value from September 2018 to August 2019 and include the previous top ranking interests in the same output shown below.**

**Query #24**

    CREATE TEMP TABLE month_interest_composition AS
    (SELECT *
     FROM (SELECT month_year,
    	    	  interest_id,
    	    	  ROUND((composition/index_value)::numeric, 2) AS avg_composition,
            	  RANK() OVER (PARTITION BY month_year ORDER BY ROUND((composition/index_value)::numeric, 2) DESC)
     	   FROM interest_removed) t
    WHERE rank <=10);

There are no results to be displayed.

**Query #25**

    WITH t AS
    (SELECT month_year,
    	    MAX(avg_composition) AS max_index_composition
     FROM month_interest_composition
     GROUP BY 1)
    
    SELECT *
    FROM (SELECT t.month_year,
    	   		 interest_id,
           		 max_index_composition,
           		 ROUND(AVG(max_index_composition) OVER (ORDER BY t.month_year ROWS BETWEEN 2 PRECEDING AND CURRENT ROW), 2) AS three_month_moving_avg,
           		 CONCAT(LAG(interest_id) OVER(ORDER BY t.month_year), ': ', LAG(max_index_composition) OVER(ORDER BY t.month_year)) AS one_month_ago,
           		 CONCAT(LAG(interest_id, 2) OVER(ORDER BY t.month_year), ': ', LAG(max_index_composition, 2) OVER(ORDER BY t.month_year)) AS two_months_ago
    	  FROM t
    	  LEFT JOIN month_interest_composition m ON t.max_index_composition = m.avg_composition) temp
    WHERE month_year BETWEEN '2018-09-01' AND '2019-08-01';

| month_year               | interest_id | max_index_composition | three_month_moving_avg | one_month_ago | two_months_ago |
| ------------------------ | ----------- | --------------------- | ---------------------- | ------------- | -------------- |
| 2018-09-01T00:00:00.000Z | 21057       | 8.26                  | 7.61                   | 6324: 7.21    | 6324: 7.36     |
| 2018-10-01T00:00:00.000Z | 21057       | 9.14                  | 8.20                   | 21057: 8.26   | 6324: 7.21     |
| 2018-11-01T00:00:00.000Z | 21057       | 8.28                  | 8.56                   | 21057: 9.14   | 21057: 8.26    |
| 2018-12-01T00:00:00.000Z | 21057       | 8.31                  | 8.58                   | 21057: 8.28   | 21057: 9.14    |
| 2019-01-01T00:00:00.000Z | 21057       | 7.66                  | 8.08                   | 21057: 8.31   | 21057: 8.28    |
| 2019-01-01T00:00:00.000Z | 21057       | 7.66                  | 7.88                   | 21057: 7.66   | 21057: 8.31    |
| 2019-02-01T00:00:00.000Z | 21057       | 7.66                  | 7.66                   | 21057: 7.66   | 21057: 7.66    |
| 2019-02-01T00:00:00.000Z | 21057       | 7.66                  | 7.66                   | 21057: 7.66   | 21057: 7.66    |
| 2019-03-01T00:00:00.000Z | 5969        | 6.54                  | 7.29                   | 21057: 7.66   | 21057: 7.66    |
| 2019-03-01T00:00:00.000Z | 7541        | 6.54                  | 6.91                   | 5969: 6.54    | 21057: 7.66    |
| 2019-04-01T00:00:00.000Z | 4898        | 6.28                  | 6.45                   | 7541: 6.54    | 5969: 6.54     |
| 2019-04-01T00:00:00.000Z | 6065        | 6.28                  | 6.37                   | 4898: 6.28    | 7541: 6.54     |
| 2019-05-01T00:00:00.000Z | 21245       | 4.41                  | 5.66                   | 6065: 6.28    | 4898: 6.28     |
| 2019-06-01T00:00:00.000Z | 6324        | 2.77                  | 4.49                   | 21245: 4.41   | 6065: 6.28     |
| 2019-06-01T00:00:00.000Z | 6253        | 2.77                  | 3.32                   | 6324: 2.77    | 21245: 4.41    |
| 2019-07-01T00:00:00.000Z | 6324        | 2.82                  | 2.79                   | 6253: 2.77    | 6324: 2.77     |
| 2019-08-01T00:00:00.000Z | 7535        | 2.73                  | 2.77                   | 6324: 2.82    | 6253: 2.77     |
| 2019-08-01T00:00:00.000Z | 4898        | 2.73                  | 2.76                   | 7535: 2.73    | 6324: 2.82     |

---