%md
# Wrap-up Summary: Interactive Monthly Sales App with Databricks Widgets

This notebook demonstrates how to build an interactive monthly sales analysis app using Databricks widgets and PySpark. The workflow includes:

1. **Aggregating daily sales** from the `globalretail_silver.silver_transactions` table into a gold table (`gold_daily_sales`).
2. **Defining Python functions** to:
   * Retrieve available months from the data.
   * Calculate total sales for a selected month.
   * Display daily sales for the selected month.
3. **Creating a Databricks dropdown widget** for month selection, allowing users to interactively choose a month.
4. **Displaying results**:
   * The total sales for the selected month (e.g., December 2023: $111,872.53).
   * A table of daily sales for each day in the selected month.

This approach provides a user-friendly, interactive experience directly within the Databricks notebook, without requiring external web app hosting. Users can easily explore sales trends by month and day using the provided widgets and tables.

---


In [0]:
%sql
select *
from globalretail_gold.gold_daily_sales
limit 10;

In [0]:
%sql
select table_name 
from workspace.information_schema.tables 
where table_schema = 'globalretail_gold'

In [0]:
%sql
describe globalretail_gold.gold_daily_sales;

%md
**Step 1: Refactor Notebook Logic into a Python Function**

We will create a Python function that takes a month (e.g., '2024-06') as input, queries the `gold_daily_sales` table, and returns the total sales for that month. This function will be used in the Gradio app to display results based on user selection.

%md
## Step 1: Refactor Notebook Logic into a Python Function
We need a Python function that takes a month (e.g., '2024-06') as input, queries the gold_daily_sales table, and returns the total sales for that month. We'll use PySpark SQL to filter by the month and sum the sales. We'll also make the function robust to handle months with no data.

In [0]:
from pyspark.sql.functions import col, sum as spark_sum, date_format

def get_monthly_sales(selected_month):
    # Read the daily sales table
    df = spark.table('globalretail_gold.gold_daily_sales')
    # Extract year-month from transaction_date and filter
    df_month = df.withColumn('year_month', date_format(col('transaction_date'), 'yyyy-MM')) \
                 .filter(col('year_month') == selected_month)
    # Aggregate total sales for the month
    result = df_month.agg(spark_sum('daily_total_sales').alias('monthly_total_sales')).collect()
    if result and result[0]['monthly_total_sales'] is not None:
        return float(result[0]['monthly_total_sales'])
    else:
        return 0.0

%md
## Step 2: Create a Gradio App for Month Selection
Now that we have a function to get monthly sales, we'll build a Gradio app. The app will let the user select a month (from available months in the data) and display the total sales for that month. We'll also display a table of daily sales for the selected month for more detail. We'll need to install Gradio if not present, and fetch available months from the data for the dropdown.

In [0]:
# Install Gradio if not already installed
# (Databricks may require --upgrade for user installs)
%pip install --upgrade gradio

In [0]:
from pyspark.sql.functions import date_format

def get_available_months():
    df = spark.table('globalretail_gold.gold_daily_sales')
    months_rows = df.select(date_format('transaction_date', 'yyyy-MM').alias('year_month')) \
        .distinct().orderBy('year_month').collect()
    months = [row['year_month'] for row in months_rows]
    return months

In [0]:
def get_daily_sales_for_month(selected_month):
    df = spark.table('globalretail_gold.gold_daily_sales')
    df_month = df.withColumn('year_month', date_format(col('transaction_date'), 'yyyy-MM')) \
                 .filter(col('year_month') == selected_month)
    return df_month.select('transaction_date', 'daily_total_sales').orderBy('transaction_date').toPandas()

In [0]:
import gradio as gr

def app(selected_month):
    total_sales = get_monthly_sales(selected_month)
    daily_sales_df = get_daily_sales_for_month(selected_month)
    return f"Total sales for {selected_month}: ${total_sales:,.2f}", daily_sales_df

months = get_available_months()

demo = gr.Interface(
    fn=app,
    inputs=gr.Dropdown(choices=months, label="Select Month (YYYY-MM)"),
    outputs=[gr.Textbox(label="Total Sales"), gr.Dataframe(label="Daily Sales Table")],
    title="Monthly Sales Viewer",
    description="Select a month to view total and daily sales."
)
demo.launch(share=True)

%md
## Step 3: Use Databricks Widgets for Month Selection and Display Results
Since Gradio apps cannot be hosted in Databricks AWS notebooks, we'll use Databricks widgets to let the user select a month. We'll display the total sales and a table of daily sales for the selected month directly in the notebook. This approach is fully supported and interactive within Databricks notebooks.

In [0]:
# Create a dropdown widget for available months
months = get_available_months()
try:
    dbutils.widgets.remove('selected_month')
except Exception:
    pass
dbutils.widgets.dropdown('selected_month', months[-1] if months else '', months)
selected_month = dbutils.widgets.get('selected_month')
print(f"Selected month: {selected_month}")

In [0]:
# Display total sales for the selected month
monthly_total = get_monthly_sales(selected_month)
print(f"Total sales for {selected_month}: ${monthly_total:,.2f}")

# Display daily sales table for the selected month
daily_sales_df = get_daily_sales_for_month(selected_month)
display(daily_sales_df)