### Segment Analysis
---
    1. Using our filtered dataset by removing the interests with less than 6 months worth of data, which are the top 10 and bottom 10 interests which have the largest composition values in any `month_year`? Only use the maximum composition value for each interest but you must keep the corresponding `month_year`

- Create table `sub_metrics` removing the interests with less than 6 months worth of data.

In [1]:
DROP TABLE IF EXISTS sub_metrics;
WITH tbl AS(
    SELECT  
        interest_id, 
        COUNT(DISTINCT month_year) as total_months
    FROM interest_metrics i1
    WHERE month_year IS NOT NULL
    GROUP BY interest_id
    HAVING COUNT(DISTINCT month_year) < 6
)
SELECT *
INTO sub_metrics
FROM interest_metrics
WHERE interest_id NOT IN (SELECT interest_id FROM tbl) 


- Top 10 which have the largest composition values in any `month_year`

In [2]:
SELECT TOP 10
    month_year,
    interest_name,
    composition
FROM sub_metrics s
JOIN interest_map i ON i.id = s.interest_id
ORDER BY composition DESC

month_year,interest_name,composition
2018-12-01,Work Comes First Travelers,21.2
2018-10-01,Work Comes First Travelers,20.28
2018-11-01,Work Comes First Travelers,19.45
2019-01-01,Work Comes First Travelers,18.99
2018-07-01,Gym Equipment Owners,18.82
2019-02-01,Work Comes First Travelers,18.39
2018-09-01,Work Comes First Travelers,18.18
2018-07-01,Furniture Shoppers,17.44
2018-07-01,Luxury Retail Shoppers,17.19
2018-10-01,Luxury Boutique Hotel Researchers,15.15


- Bottom 10 which have the largest composition values in any `month_year`

In [3]:
SELECT TOP 10
    month_year,
    interest_name,
    composition
FROM sub_metrics s
JOIN interest_map i ON i.id = s.interest_id
ORDER BY composition

month_year,interest_name,composition
2019-05-01,Mowing Equipment Shoppers,1.51
2019-05-01,Beer Aficionados,1.52
2019-05-01,Gastrointestinal Researchers,1.52
2019-04-01,United Nations Donors,1.52
2019-05-01,Philadelphia 76ers Fans,1.52
2019-06-01,New York Giants Fans,1.52
2019-06-01,Disney Fans,1.52
2019-06-01,Online Directory Searchers,1.53
2019-05-01,Crochet Enthusiasts,1.53
2019-05-01,LED Lighting Shoppers,1.53


    2. Which 5 interests had the lowest average ranking value?

In [4]:
SELECT TOP 5
    interest_name,
    AVG(ranking) AS avg_ranking_value
FROM sub_metrics s
JOIN interest_map i ON i.id = s.interest_id
GROUP BY interest_name
ORDER BY avg_ranking_value DESC

interest_name,avg_ranking_value
League of Legends Video Game Fans,1037
Computer Processor and Data Center Decision Makers,974
Astrology Enthusiasts,968
Budget Mobile Phone Researchers,961
Medieval History Enthusiasts,961


    3. Which 5 interests had the largest standard deviation in their `percentile_ranking` value?

In [5]:
SELECT TOP 5
    interest_name,
    ROUND(STDEV(percentile_ranking),2) AS std_percentile_ranking
FROM sub_metrics s
JOIN interest_map i ON i.id = s.interest_id
GROUP BY interest_name
ORDER BY std_percentile_ranking DESC

interest_name,std_percentile_ranking
Techies,30.18
Entertainment Industry Decision Makers,28.97
Oregon Trip Planners,28.32
Personalized Gift Shoppers,26.24
Tampa and St Petersburg Trip Planners,25.61


    4. For the 5 interests found in the previous question - what was minimum and maximum percentile_ranking values for each interest and its corresponding `year_month` value? Can you describe what is happening for these 5 interests?

In [6]:
WITH tbl AS(
    SELECT TOP 5
        interest_id,
        ROUND(STDEV(percentile_ranking),2) AS std_percentile_ranking
    FROM sub_metrics
    GROUP BY interest_id
    ORDER BY std_percentile_ranking DESC
),
tbl2 AS(
    SELECT 
        interest_name, 
        month_year,
        percentile_ranking,
        RANK() OVER(PARTITION BY interest_name ORDER BY percentile_ranking DESC) AS rank_max,
        RANK() OVER(PARTITION BY interest_name ORDER BY percentile_ranking) AS rank_min
    FROM tbl
    JOIN interest_metrics me ON me.interest_id = tbl.interest_id
    JOIN interest_map ma ON ma.id = tbl.interest_id
)
SELECT
    interest_name,
    month_year,
    percentile_ranking
FROM tbl2
WHERE rank_max = 1 OR rank_min = 1

interest_name,month_year,percentile_ranking
Entertainment Industry Decision Makers,2019-08-01,11.23
Entertainment Industry Decision Makers,2018-07-01,86.15
Oregon Trip Planners,2019-07-01,2.2
Oregon Trip Planners,2018-11-01,82.44
Personalized Gift Shoppers,2019-06-01,5.7
Personalized Gift Shoppers,2019-03-01,73.15
Tampa and St Petersburg Trip Planners,2019-03-01,4.84
Tampa and St Petersburg Trip Planners,2018-07-01,75.03
Techies,2019-08-01,7.92
Techies,2018-07-01,86.69


    5. How would you describe our customers in this segment based off their composition and ranking values? What sort of products or services should we show to these customers and what should we avoid?

- Customers have high interests in travelling, fitness products, buying luxury furniture in terms of composition and ranking values. We should increase advertising these types of products.
- In contrast to these top products, Video Games or Astrology should be excluded in promotion because of low composition and ranking values.
