![Crisp](img/logo.png)
# Store Closure Demand Redistribution

[![Open in Colab](https://img.shields.io/badge/Open%20in-Colab-orange?logo=google-colab&style=for-the-badge)](https://colab.research.google.com/github/gocrisp/blueprints/blob/main/notebooks/crisp_demand_redistribution.ipynb)
[![Open in Vertex AI](https://img.shields.io/badge/Open%20in-Vertex%20AI%20Workbench-brightgreen?logo=google-cloud&style=for-the-badge)](https://console.cloud.google.com/vertex-ai/notebooks/deploy-notebook?download_url=https://raw.githubusercontent.com/gocrisp/blueprints/main/notebooks/crisp_demand_redistribution.ipynb)
[![Open in Databricks](https://img.shields.io/badge/databricks-red?logo=databricks&style=for-the-badge)](https://www.databricks.com/try-databricks)
[![View on GitHub](https://img.shields.io/badge/View%20on-GitHub-lightgrey?logo=github&style=for-the-badge)](https://github.com/gocrisp/blueprints/blob/main/notebooks/crisp_demand_redistriibution.ipynb)

> To deploy a notebook in Databricks:
> 1. Open your workspace and navigate to the folder where you want to import the notebook.
> 2. Click the triple-dot icon (next to the Share button).
> 3. Select **Import**, then choose URL as the import method.
> 4. Paste the notebook's URL, then click **Import** to complete the process.

This notebook demonstrates how to analyze the impact of store closures on surrounding stores by leveraging historical sales data. It quantifies store sizes and estimates the amount of demand that would be redistributed to nearby locations based on store distances. This analysis helps suppliers and retailers better plan their supply chain adjustments in response to store closures, enabling more informed decision-making about inventory allocation and distribution strategies.

## Set the required environment variables


In [1]:
import os

os.environ["ACCOUNT_ID"] = "999999"
# os.environ["CONNECTOR_ID"] = "7240"

### Run Crisp common notebook

This notebook uses the [crisp_common.ipynb](./crisp_common.ipynb) notebook to load the common functions and variables. The `crisp_common.ipynb` notebook contains the common functions and variables that are used across the Crisp notebooks.

In [None]:
import os

if not os.path.exists("crisp_common.ipynb"):
    print("Downloading crisp_common.ipynb")
    !wget https://raw.githubusercontent.com/gocrisp/blueprints/main/notebooks/crisp_common.ipynb -O crisp_common.ipynb
else:
    print("crisp_common.ipynb already exists")

%run crisp_common.ipynb

## Import dependencies

In [3]:
import pandas as pd
import numpy as np
import folium
from IPython.display import display
from math import radians
from sklearn.metrics.pairwise import haversine_distances

### Define important tables

We will use following tables in this tutorial.

In [4]:
src_project = project
src_dataset = dataset
dim_store = "exp_harmonized_retailer_dim_store"
fact_sales = "exp_harmonized_retailer_fact_sales"
dim_calendar = "exp_harmonized_retailer_dim_calendar"

retailer = "target"

## Load necessary data

We will load store location and sales data for the last one year for each store.

In [None]:
%%load store_df
SELECT
    store_id,
    retailer,
    store_state,
    store_longitude,
    store_latitude
FROM
    `{src_project}`.`{src_dataset}`.`{dim_store}`
WHERE
    store_longitude IS NOT NULL
    AND store_latitude IS NOT NULL
    AND retailer = '{retailer}'

In [None]:
%%load sales_df
SELECT
    store_id,
    SUM(sales_quantity) as sales_quantity
FROM
    `{src_project}`.`{src_dataset}`.`{fact_sales}`
WHERE
   retailer = '{retailer}'
   AND CAST(date_key AS DATE) >= DATE_SUB((SELECT MAX(CAST(date_key AS DATE)) FROM `{src_project}`.`{src_dataset}`.`{fact_sales}`), INTERVAL 365 DAY)
GROUP BY
    store_id

## Perform simple preprocessing steps

In [None]:
# Merge store_df and sales_df
store_sales_df = pd.merge(store_df, sales_df, on="store_id", how="inner")

# Remove records with negative sales quantity
store_sales_df = store_sales_df[store_sales_df["sales_quantity"] > 0]
store_sales_df

### Geospatial Analysis - Calculating store proximity

In this section, we analyze the spatial relationships between stores to understand their proximity to each other. This analysis is crucial for:

1. **Distance Calculation**: We use the Haversine formula to calculate accurate distances between store locations.
2. **Proximity Matrix**: We create a distance matrix where each cell represents the distance (in miles) between any two stores

This analysis will be used later to model how demand might be redistributed when stores are closed, by identifying nearby stores that could absorb the displaced customer base.


In [8]:
def calculate_distance_matrix(df: pd.DataFrame) -> pd.DataFrame:
    """
    Calculates a matrix of distances between all stores using the Haversine formula.

    Args:
        df: A DataFrame with store_id, store_latitude, and store_longitude.

    Returns:
        A DataFrame where rows and columns are store_ids and each cell is the
        distance in miles between the two stores.
    """
    # Create a unique store list to avoid issues with duplicate store_id rows if any
    unique_stores_df = (
        df[["store_id", "store_latitude", "store_longitude"]]
        .drop_duplicates(subset="store_id")
        .set_index("store_id")
    )

    unique_stores_df["lat_rad"] = unique_stores_df["store_latitude"].apply(radians)
    unique_stores_df["lon_rad"] = unique_stores_df["store_longitude"].apply(radians)

    distance_matrix = (
        haversine_distances(unique_stores_df[["lat_rad", "lon_rad"]].values) * 3959
    )  # Convert to miles (Earth radius in miles)

    distance_df = pd.DataFrame(
        distance_matrix, index=unique_stores_df.index, columns=unique_stores_df.index
    )
    return distance_df


print("\n--- Calculating Distances ---")
distance_matrix_df = calculate_distance_matrix(store_sales_df)
print("Distance Matrix created successfully. Shape:", distance_matrix_df.shape)
distance_matrix_df


--- Calculating Distances ---
Distance Matrix created successfully. Shape: (177, 177)


store_id,180897736809331718,226514034944246402,469289875586161863,531990793994649118,404285947561983633,883122303178476779,181099760259230758,259368195531910635,994991929642472607,532335007403805132,...,206433912630488013,184519704393700464,291324709886545615,141001838959905750,785998873441078696,582523980333159077,97000210939998383,312246388847531086,63231312250550556,721838660008808118
store_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
180897736809331718,0.000000,12.989590,12.921086,7.679203,13.741709,10.403630,36.383085,42.490888,22.479526,47.072002,...,188.801729,177.997540,190.839903,185.528161,193.816761,189.356288,201.229572,201.250047,205.016360,244.447086
226514034944246402,12.989590,0.000000,21.777932,13.678289,3.965956,5.103851,39.221297,30.377631,11.234749,35.773512,...,175.890144,165.066249,177.895536,172.632248,180.827537,176.384015,188.273554,188.268431,192.056348,231.597166
469289875586161863,12.921086,21.777932,0.000000,8.101897,24.278432,16.947848,48.240830,52.047298,32.871993,57.545904,...,195.666796,184.724329,197.428180,192.506575,199.581490,195.608177,206.089457,206.528332,209.919789,248.362020
531990793994649118,7.679203,13.678289,8.101897,0.000000,16.317360,8.880885,43.984799,43.961219,24.808839,49.450280,...,188.196723,177.290440,190.039952,185.003218,192.455617,188.322635,199.284894,199.572647,203.100690,241.937351
404285947561983633,13.741709,3.965956,24.278432,16.317360,0.000000,8.575768,35.651391,28.752139,8.803858,33.549437,...,175.317316,164.556053,177.429129,172.017112,180.671685,176.048125,188.441546,188.285340,192.205269,232.059416
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
582523980333159077,189.356288,176.384015,195.608177,188.322635,176.048125,179.954968,192.770659,148.691290,168.041508,146.989293,...,10.593325,13.667508,6.027112,12.953450,9.295240,0.000000,26.135832,20.205526,27.552801,68.895127
97000210939998383,201.229572,188.273554,206.089457,199.284894,188.441546,191.352356,208.665008,162.518111,180.969707,161.864637,...,36.318330,38.254145,31.165373,39.062799,16.873963,26.135832,0.000000,7.173683,3.882861,46.168864
312246388847531086,201.250047,188.268431,206.528332,199.572647,188.285340,191.498878,207.436414,161.874016,180.638222,160.890870,...,29.913183,33.175494,24.712499,32.904571,11.009783,20.205526,7.173683,0.000000,7.410862,49.485079
63231312250550556,205.016360,192.056348,209.919789,203.100690,192.205269,195.152874,212.248020,166.205974,184.708538,165.493127,...,37.322918,40.277867,32.121418,40.306585,18.288775,27.552801,3.882861,7.410862,0.000000,43.074837


### Modeling Demand Redistribution
  
This section implements a demand redistribution model based on the Huff Model that estimates how sales volume would be redistributed when stores are closed. The model uses a combined weighting approach that considers both distance and store size:
  
1. **Store Attractiveness**: Incorporates store size as a measure of attractiveness, assuming larger stores have higher probability of capturing redistributed demand
2. **Distance Decay**: Models decreasing probability of store selection as distance increases using an inverse square relationship
3. **Volume Conservation**: Assumes all sales from closed stores are redistributed to nearby locations
  
The model generates a report showing:
- Which stores would receive redistributed sales
- Estimated sales volume increases for receiving stores
- Distance between losing store and receiving store (in miles)

In [9]:
def generate_quantity_lift_report(
    stores_to_close_ids: list,
    store_sales_df: pd.DataFrame,
    distances_df: pd.DataFrame,
    distance_threshold_miles: int,
) -> pd.DataFrame:
    """
    Generates a report estimating how sales volume would be redistributed when stores are closed.

    This function uses a combined weighting model that considers both distance and store size to estimate
    how much of a closing store's sales volume would be captured by nearby stores. The model assumes that:
    1. Customers will prefer closer stores
    2. The probability of a customer choosing a store decreases with the square of the distance
    3. Larger stores have higher probability of capturing redistributed demand
    4. All sales volume from closed stores must be redistributed

    Args:
        stores_to_close_ids: List of store IDs that will be closed
        store_sales_df: DataFrame containing store sales data with columns [store_id, sales_quantity]
        distances_df: DataFrame containing distances between all stores
        distance_threshold_miles: Maximum distance to consider for redistribution

    Returns:
        DataFrame with columns:
        - closing_store_id: ID of the store being closed
        - receiving_store_id: ID of the store receiving redistributed sales
        - closing_store_quantity: Original sales volume of closing store
        - distance_miles: Distance between closing and receiving stores
        - estimated_quantity_lift: Estimated additional sales volume for receiving store
    """

    closing_store_ids_set = set(stores_to_close_ids)

    # Create lookup map of store quantities
    quantity_map = store_sales_df.set_index("store_id")["sales_quantity"].astype(float)

    all_results = []
    for closing_store_id in closing_store_ids_set:
        # Skip stores with no sales volume
        closing_store_quantity = float(quantity_map.get(closing_store_id, 0))
        if closing_store_quantity == 0:
            continue

        # Find all stores within the distance threshold, excluding the store itself
        nearby_series = distances_df[closing_store_id][
            (distances_df[closing_store_id] > 0)
            & (distances_df[closing_store_id] <= distance_threshold_miles)
        ]

        # Filter out stores that are also closing
        valid_receiver_ids = [
            store_id
            for store_id in nearby_series.index
            if store_id not in closing_store_ids_set
        ]

        if not valid_receiver_ids:
            continue

        # Create DataFrame for potential receiving stores
        receivers_df = pd.DataFrame(nearby_series[valid_receiver_ids]).rename(
            columns={closing_store_id: "distance_miles"}
        )

        # Add receiving store quantities
        receivers_df["receiving_store_quantity"] = receivers_df.index.map(quantity_map)

        # Calculate combined weights: inverse square distance * store size
        receivers_df["distance_weight"] = 1 / (receivers_df["distance_miles"] ** 2)
        receivers_df["size_weight"] = receivers_df["receiving_store_quantity"]
        receivers_df["weight"] = (
            receivers_df["distance_weight"] * receivers_df["size_weight"]
        )

        total_weight = receivers_df["weight"].sum()

        # Calculate proportional demand share and estimated quantity lift
        receivers_df["demand_share"] = receivers_df["weight"] / total_weight
        receivers_df["estimated_quantity_lift"] = (
            receivers_df["demand_share"] * closing_store_quantity
        )
        receivers_df["closing_store_quantity"] = closing_store_quantity

        # Restructure DataFrame for final report
        receivers_df = receivers_df.reset_index().rename(
            columns={"store_id": "receiving_store_id"}
        )
        receivers_df["closing_store_id"] = closing_store_id

        all_results.append(
            receivers_df[
                [
                    "closing_store_id",
                    "receiving_store_id",
                    "closing_store_quantity",
                    "distance_miles",
                    "estimated_quantity_lift",
                ]
            ]
        )

    # Handle case where no valid redistributions were found
    if not all_results:
        return pd.DataFrame(
            columns=[
                "closing_store_id",
                "receiving_store_id",
                "closing_store_quantity",
                "distance_miles",
                "estimated_quantity_lift",
            ]
        )

    # Combine all results and format final report
    final_report_df = pd.concat(all_results, ignore_index=True)

    # Round numeric columns for readability
    final_report_df["distance_miles"] = final_report_df["distance_miles"].round(1)
    final_report_df["estimated_quantity_lift"] = (
        final_report_df["estimated_quantity_lift"].round(0).astype(int)
    )
    final_report_df["closing_store_quantity"] = (
        final_report_df["closing_store_quantity"].round(0).astype(int)
    )

    # Sort by closing store ID and estimated lift (descending)
    return final_report_df.sort_values(
        by=["closing_store_id", "estimated_quantity_lift"], ascending=[True, False]
    )

### Execute Analysis and Review Recommendations
This section demonstrates how to:
1. Get a list of stores to close
2. Set analysis parameters (distance threshold of stores proximity to consider)
3. Run the quantity lift report
4. Display the actionable recommendations showing how sales would redistribute

In [10]:
# This is where a user should input list of store_ids to close. As an example, we will use a random sample of 5 store_ids.
# list_of_stores_to_close = pd.read_csv('list_of_stores_to_close.csv')['store_id'].tolist()
list_of_stores_to_close = store_sales_df["store_id"].sample(n=5).tolist()

In [11]:
# --- Define Your Inputs Here ---
STORES_TO_CLOSE_IDS = list_of_stores_to_close
DISTANCE_THRESHOLD = (
    10  ## Distance threshold in miles where it's plausible to consider redistribution
)

# --- Run the Report Generator ---
actionable_report = generate_quantity_lift_report(
    stores_to_close_ids=STORES_TO_CLOSE_IDS,
    store_sales_df=store_sales_df,
    distances_df=distance_matrix_df,
    distance_threshold_miles=DISTANCE_THRESHOLD,
)

# --- Display the Final Report ---
print("\n--- Quantity Lift Report ---")
if not actionable_report.empty:
    display(actionable_report.head())
else:
    print("No report generated based on the inputs.")


--- Quantity Lift Report ---


Unnamed: 0,closing_store_id,receiving_store_id,closing_store_quantity,distance_miles,estimated_quantity_lift
12,30138213318230882,482949351573850102,28440,1.7,5923
13,30138213318230882,219477234820598877,28440,1.8,5899
18,30138213318230882,476505392417838405,28440,2.1,4307
14,30138213318230882,985850273381907509,28440,2.4,3038
17,30138213318230882,623164924151517121,28440,2.6,2675


### Map Visualization
This section creates an map that visualizes the store closure analysis results. The map displays:
- Closing stores marked with red X markers
- Receiving stores shown as colored circles

In [12]:
def create_static_report_map(report_df: pd.DataFrame, store_sales_df: pd.DataFrame):
    """
    Generates a static map visualizing the results of the batch sales lift report.
    """
    if report_df.empty:
        print("Report is empty, cannot generate map.")
        return

    # Aggregate the total lift for each unique receiving store
    aggregated_lift = (
        report_df.groupby("receiving_store_id")["estimated_quantity_lift"]
        .sum()
        .reset_index()
    )
    aggregated_lift.rename(
        columns={"estimated_quantity_lift": "total_quantity_lift"}, inplace=True
    )

    # Get all stores that need to be on the map
    closing_store_ids = report_df["closing_store_id"].unique()
    receiving_store_ids = report_df["receiving_store_id"].unique()
    all_involved_ids = np.union1d(closing_store_ids, receiving_store_ids)

    plot_data = store_sales_df[store_sales_df["store_id"].isin(all_involved_ids)].copy()
    plot_data = pd.merge(
        plot_data,
        aggregated_lift,
        left_on="store_id",
        right_on="receiving_store_id",
        how="left",
    )

    # Create retailer color map
    unique_retailers = plot_data["retailer"].unique()
    retailer_colors = [
        "blue",
        "green",
        "purple",
        "orange",
        "darkred",
        "lightred",
    ]  ### Add more colors if needed
    retailer_color_map = {
        retailer: color for retailer, color in zip(unique_retailers, retailer_colors)
    }

    # Create the base map
    map_center = [
        plot_data["store_latitude"].mean(),
        plot_data["store_longitude"].mean(),
    ]
    m = folium.Map(location=map_center, zoom_start=5)

    # Plot closing stores
    closing_stores_data = plot_data[plot_data["store_id"].isin(closing_store_ids)]
    for _, store in closing_stores_data.iterrows():
        folium.Marker(
            location=[store["store_latitude"], store["store_longitude"]],
            tooltip=f"CLOSING: Store {store['store_id']}<br>Yearly Quantity: {store['sales_quantity']:,.0f}",
            icon=folium.Icon(color="red", icon="times-circle", prefix="fa"),
        ).add_to(m)

    # Plot receiving stores and connecting lines
    receiving_stores_data = plot_data[plot_data["store_id"].isin(receiving_store_ids)]

    for _, store in receiving_stores_data.iterrows():
        folium.CircleMarker(
            location=[store["store_latitude"], store["store_longitude"]],
            radius=8,
            tooltip=f"RECEIVER: Store {store['store_id']}<br>Total Lift (Units): {store['total_quantity_lift']:,.0f}",
            color=retailer_color_map.get(store["retailer"], "gray"),
            fill=True,
            fill_color=retailer_color_map.get(store["retailer"], "gray"),
            fill_opacity=0.5,
            weight=0.5,
        ).add_to(m)

    for _, row in report_df.iterrows():
        closing_info = plot_data[plot_data["store_id"] == row["closing_store_id"]].iloc[
            0
        ]
        receiver_info = plot_data[
            plot_data["store_id"] == row["receiving_store_id"]
        ].iloc[0]
        folium.PolyLine(
            locations=[
                [closing_info["store_latitude"], closing_info["store_longitude"]],
                [receiver_info["store_latitude"], receiver_info["store_longitude"]],
            ],
            color="gray",
            weight=1.5,
            opacity=0.7,
            dash_array="5, 5",
        ).add_to(m)

    display(m)


# --- Execute and Display the Static Map ---
print("\n--- Report Visualization ---")
create_static_report_map(actionable_report, store_sales_df)


--- Report Visualization ---
