## Self Review Questions

- ✅ Were you able to answer the data question asked i.e. What is our user repeat rate?
- ✅ Were you able to create a marts folder for the three business lines?
- ✅ Were you able to create at least 1 intermediate model and 1 dimension/fact model within each marts model?
- ✅ Were you able to apply dbt tests to your week 1 or week 2 models?

### Question 1: Repeate Customers
1. What is our user repeat rate? Repeat Rate is defined as users who purchased 2 or more times / users who purchased
2. What are good indicators of a user who will likely purchase again? 
3. What about indicators of users who are likely NOT to purchase again? 
4. If you had more data, what features would you want to look into to answer this question?

In [2]:
%load_ext sql
%sql postgresql://corise:corise@localhost:5432/dbt
%config SqlMagic.displaylimit=100
%config SqlMagic.displaycon = False
%config SqlMagic.feedback = False

In [5]:
%%sql
WITH nb_orders_by_user AS (
SELECT user_id,
       COUNT(DISTINCT order_id) AS nb_orders
  FROM dbt_ramnath_v.stg_greenery__orders
 GROUP BY 1
)

SELECT ROUND(SUM(CASE WHEN nb_orders > 1 THEN 1 ELSE 0 END)::NUMERIC / COUNT(*), 2) AS pct_users_repeat_purchase
  FROM nb_orders_by_user 

pct_users_repeat_purchase
0.8


I would build a model that used features like recency of purchase, frequency of purchase, and the monetary value of the purchase to predict the probability that a customer would purchase again. I would also develop a model to predict customer LTV and apply it on a broader customer dataset to identify valuable customers to target.

### Question 2: Create Mart Models

Create a marts folder to organize models for business units (core, marketing, and product), and within each marts folder, create at least 1-2 intermediate models and 1-2 dimension/fact models.

1. Explain the marts models you added. 
2. Why did you organize the models in the way you did?
3. Use the dbt docs to visualize your model DAGs to ensure the model layers make sense
4. Paste in an image of your DAG from the docs

I added a folder for `core` models, and broke it down into three groups.

1. `dimensions` folder to hold  `dim_***` models.
2. `facts` folder to hold `fct_***` models.
3. `intermediate` folder to hold intermediate models (`agg_***`, `int_***`).

The idea is that these core models should constitute the single source of truth from which the marts and metrics should be built. 

In [2]:
!tree /workspace/dbt-explore/dbt-greenery/models/marts/core

[01;34m/workspace/dbt-explore/dbt-greenery/models/marts/core[00m
├── [01;34mdimensions[00m
│   ├── dim_address.sql
│   ├── dim_event.sql
│   ├── dim_order.sql
│   ├── dim_product.sql
│   ├── dim_promo.sql
│   ├── dim_tracking.sql
│   ├── dim_user.sql
│   └── README.md
├── [01;34mfacts[00m
│   ├── fct_place_order_product.sql
│   ├── fct_place_order.sql
│   ├── fct_register_event.sql
│   └── fct_user.sql
└── [01;34mintermediate[00m
    ├── agg_events_by_user.sql
    ├── agg_order_items_by_order.sql
    └── agg_orders_by_user.sql

3 directories, 15 files


![dbt-greenery-core-dim-fact.png](dbt-greenery-core-dim-fact.png)

I started creating `mart_***` and `metric_***` tables from the dimensional models. The idea is for a `mart_***` table to be a really wide table that combines multiple `dim_***` and `fct_***` tables that can eventually be turned into metrics automatically. I ran into some issues with the time-range of the different fact tables, which limited my ability to combine them.

In [4]:
!tree /workspace/dbt-explore/dbt-greenery/models/marts --filelimit=3

[01;34m/workspace/dbt-explore/dbt-greenery/models/marts[00m
├── [01;34mcore[00m
│   ├── [01;34mdimensions[00m [8 entries exceeds filelimit, not opening dir]
│   ├── [01;34mfacts[00m [4 entries exceeds filelimit, not opening dir]
│   └── [01;34mintermediate[00m
│       ├── agg_events_by_user.sql
│       ├── agg_order_items_by_order.sql
│       └── agg_orders_by_user.sql
├── [01;34mmarketing[00m
└── [01;34mproduct[00m
    ├── mart_event.sql
    └── metric_event.sql

6 directories, 5 files


![dbt-greenery-dag-week-2.png](dbt-greenery-dag-week-2.png)

__What was most challenging/surprising in completing this week’s project?__

The most challenging aspect of this week's project was to think through the architecture of the models and the naming conventions. I think I was able to arrive a solid set of conventions that I can already apply at work. The biggest learning for me was that breaking the `mart` layer into  three sub-layers: `dim` + `fact`, `mart`, and `metric` makes it really easy to compute arbitrary metrics of interest, while also making it BI tool friendly by providing ready-to-use wide datasets.

__Is there a particular part of the project where you want focused feedback from your reviewers?__

I have tried to keep my facts "pure" at the grain level, leaving most aggregations to a metrics layer. I would love to get feedback on this approach, especially around its pros and cons and scalability.

__What are you most proud of about your project?__

One of the useful utility tables I use at work is `date_periods` which is a `date_spine` on steroids and makes it easy to compute aggregates over multiple time periods in one shot. I was happy to recreate it as a macro using the `date_spine` macro to generate calendar dates and then adding custom logic to derive `date_periods`.