# GMV Forecast Solution Introduction

## Problem Transform
- Transform the Forecast GMV time series in terms of users or daily sum by the following three steps:
   1. Estimate the probability of user purchase at a store on a daily basis:
      - For each user, build a binary classification (CTR-like) model to predict the probability of the user placing an order at each store in January 2022. Let's denote this probability as \(P(\text{Order}_{\text{user, store}}| \text{date})\).
   2. For forecasting, use the trained model to predict \(P(\text{Order}_{\text{user, store}}| \text{date})\) for each user and store.
   3. Calculate the expected GMV for each user by taking the weighted average of the GMV amounts at the predicted stores:
     $$
     \hat{\text{GMV}}_{\text{user}, \text{date}} = \sum_{\text{store}} \hat{P}(\text{Order}_{\text{user, store}}| \text{features}_\text{date}) \cdot \hat{\text{GMV}}_{\text{store}}
     $$
   - We assume \(\hat{\text{GMV}}_{\text{store}}\) can be estimated by averaging their previous transactions due to low variation (based on exploratory data analysis). Alternatively, it can be estimated using another ML algorithm.
   - Context features \(\text{features}_\text{date}\) can be updated over time.

- Main Forecast Problems:
   - **User-Level Forecast in a Month:**
      - For forecasting GMV for YayYay as a whole, aggregate the expected GMV amounts across all users.
      - Mathematically:
      $$
      \text{GMV}_{\text{YayYay}}(\text{user}) = \sum_{\text{date}}  \hat{\text{GMV}}_{\text{user}, \text{date}} 
      $$
      where the sum is over all users.
   - **YayYay-Level Daily Forecast in a Month:**
      - For forecasting GMV for YayYay as a whole, aggregate the expected GMV amounts across all users.
      - Mathematically:
      $$
      \text{GMV}_{\text{YayYay}}(\text{date}) = \sum_{\text{user}}  \hat{\text{GMV}}_{\text{user}, \text{date}} 
      $$
      where the sum is over all users.

## Data Setup:
1. **Label Definition:**
   - The label is binary information indicating whether the user placed an order at a specific store in January 2022.
   - Use negative sampling for negative examples with a globally random sampling strategy.

2. **Features:**
   - Utilize historical data features for each user and store, such as previous order history, user demographics, store characteristics, etc.
   - Include context features, especially those describing the future time context for prediction.

## Evaluation:
- Assess the performance of CTR-like model using standard binary classification metrics (recall, ROC-AUC, etc.).
- Evaluate the performance of GMV estimation using appropriate regression metrics (MAE, MSE, etc.).

This approach leverages the predicted probabilities to estimate the expected GMV for each user and store, aggregating these estimates for YayYay as a whole. Adjust the model complexity and features based on the characteristics of data and business requirements.
