# Week 2 Fact Data Modeling
The homework this week will be using the `devices` and `events` dataset

Construct the following eight queries:

- A query to deduplicate `game_details` from Day 1 so there's no duplicates

In [None]:
-- deduplicate_game_details.sql

WITH game_details_deduped AS (
    SELECT *, ROW_NUMBER() OVER(PARTITION BY game_id, team_id, player_id) as row_num
	FROM game_details
)
select * FROM game_details_deduped WHERE row_num > 1

- A DDL for an `user_devices_cumulated` table that has:
  - a `device_activity_datelist` which tracks a users active days by `browser_type`
  - data type here should look similar to `MAP<STRING, ARRAY[DATE]>`
    - or you could have `browser_type` as a column with multiple rows for each user (either way works, just be consistent!)


In [None]:
-- user_devices_cumulated.sql
CREATE TABLE user_devices_cumulated (
    user_id STRING,
    device_activity_datelist MAP<STRING, ARRAY<DATE>>
);

- A cumulative query to generate `device_activity_datelist` from `events`

In [None]:
-- cumulative_device_activity_datelist.sql
SELECT user_id,
       browser_type,
       ARRAY_AGG(DISTINCT event_date) AS device_activity_datelist
FROM events
GROUP BY user_id, browser_type;

- A `datelist_int` generation query. Convert the `device_activity_datelist` column into a `datelist_int` column 

In [None]:
-- datelist_int_generation.sql
SELECT user_id,
       browser_type,
       ARRAY_TRANSFORM(device_activity_datelist, date -> CAST(date AS INT)) AS datelist_int
FROM user_devices_cumulated;

- A DDL for `hosts_cumulated` table 
  - a `host_activity_datelist` which logs to see which dates each host is experiencing any activity
  

In [None]:
-- hosts_cumulated.sql
CREATE TABLE hosts_cumulated (
    host_id STRING,
    host_activity_datelist ARRAY<DATE>
);

- The incremental query to generate `host_activity_datelist`


In [None]:
-- incremental_host_activity_datelist.sql
INSERT INTO hosts_cumulated (host_id, host_activity_datelist)
SELECT host_id,
       ARRAY_AGG(DISTINCT event_date)
FROM events
WHERE event_date >= CURRENT_DATE - INTERVAL '1 DAY'
GROUP BY host_id;

- A monthly, reduced fact table DDL `host_activity_reduced`
   - month
   - host
   - hit_array - think COUNT(1)
   - unique_visitors array -  think COUNT(DISTINCT user_id)

In [None]:
-- host_activity_reduced.sql
CREATE TABLE host_activity_reduced (
    month STRING,
    host STRING,
    hit_array INT,
    unique_visitors ARRAY<STRING>
);

- An incremental query that loads `host_activity_reduced`
  - day-by-day

In [None]:
-- incremental_host_activity_reduced.sql
INSERT INTO host_activity_reduced (month, host, hit_array, unique_visitors)
SELECT DATE_TRUNC('month', event_date) AS month,
       host_id,
       COUNT(1) AS hit_array,
       ARRAY_AGG(DISTINCT user_id) AS unique_visitors
FROM events
GROUP BY month, host_id;

Please add these queries into a folder, zip them up and submit [here](https://bootcamp.techcreator.io)