# 1251. Average Selling Price

SQL SchemaPandas SchemaTable: Prices+---------------+---------+| Column Name   | Type    |+---------------+---------+| product_id    | int     || start_date    | date    || end_date      | date    || price         | int     |+---------------+---------+(product_id, start_date, end_date) is the primary key (combination of columns with unique values) for this table.Each row of this table indicates the price of the product_id in the period from start_date to end_date.For each product_id there will be no two overlapping periods. That means there will be no two intersecting periods for the same product_id. Table: UnitsSold+---------------+---------+| Column Name   | Type    |+---------------+---------+| product_id    | int     || purchase_date | date    || units         | int     |+---------------+---------+This table may contain duplicate rows.Each row of this table indicates the date, units, and product_id of each product sold.  Write a solution to find the average selling price for each product. average_price should be rounded to 2 decimal places. If a product does not have any sold units, its average selling price is assumed to be 0.Return the result table in any order.The result format is in the following example. **Example 1:**Input: Prices table:+------------+------------+------------+--------+| product_id | start_date | end_date   | price  |+------------+------------+------------+--------+| 1          | 2019-02-17 | 2019-02-28 | 5      || 1          | 2019-03-01 | 2019-03-22 | 20     || 2          | 2019-02-01 | 2019-02-20 | 15     || 2          | 2019-02-21 | 2019-03-31 | 30     |+------------+------------+------------+--------+UnitsSold table:+------------+---------------+-------+| product_id | purchase_date | units |+------------+---------------+-------+| 1          | 2019-02-25    | 100   || 1          | 2019-03-01    | 15    || 2          | 2019-02-10    | 200   || 2          | 2019-03-22    | 30    |+------------+---------------+-------+Output: +------------+---------------+| product_id | average_price |+------------+---------------+| 1          | 6.96          || 2          | 16.96         |+------------+---------------+Explanation: Average selling price = Total Price of Product / Number of products sold.Average selling price for product 1 = ((100 * 5) + (15 * 20)) / 115 = 6.96Average selling price for product 2 = ((200 * 15) + (30 * 30)) / 230 = 16.96

## Solution Explanation
This problem requires us to calculate the average selling price for each product. The average selling price is defined as the total price of all units sold divided by the total number of units sold.To solve this problem, we need to:1. Join the `Prices` and `UnitsSold` tables to match each sale with its corresponding price.2. For each product sale, we need to find the price that was active on the purchase date.3. Calculate the total revenue (price * units) for each product.4. Calculate the total units sold for each product.5. Compute the average price as total revenue / total units sold.6. Handle the case where a product has no sales by setting its average price to 0.The key insight is that we need to join the tables based on the product_id and ensure that the purchase_date falls within the price's validity period (between start_date and end_date).

In [None]:
import pandas as pddef average_selling_price(prices: pd.DataFrame, units_sold: pd.DataFrame) -> pd.DataFrame:    # Merge the two dataframes based on product_id and date range    merged_df = pd.merge(        units_sold,        prices,        on='product_id',        how='left'    )        # Filter rows where purchase_date is within the price validity period    merged_df = merged_df[        (merged_df['purchase_date'] >= merged_df['start_date']) &         (merged_df['purchase_date'] <= merged_df['end_date'])    ]        # Calculate revenue for each sale    merged_df['revenue'] = merged_df['price'] * merged_df['units']        # Group by product_id and calculate total revenue and total units    result = merged_df.groupby('product_id').agg({        'revenue': 'sum',        'units': 'sum'    }).reset_index()        # Calculate average price    result['average_price'] = result['revenue'] / result['units']        # Round to 2 decimal places    result['average_price'] = result['average_price'].round(2)        # Get all unique product_ids from prices table    all_products = prices['product_id'].unique()        # Create a dataframe with all products    all_products_df = pd.DataFrame({'product_id': all_products})        # Left join with result to include products with no sales    final_result = pd.merge(all_products_df, result, on='product_id', how='left')        # Fill NaN values with 0 for products with no sales    final_result['average_price'] = final_result['average_price'].fillna(0)        # Select only required columns    final_result = final_result[['product_id', 'average_price']]        return final_result

## Time and Space Complexity
* *Time Complexity**: * Merging the dataframes takes O(P + U) time, where P is the number of rows in the Prices table and U is the number of rows in the UnitsSold table.* Filtering the merged dataframe takes O(U) time.* Calculating revenue and grouping by product_id takes O(U) time.* The remaining operations (calculating average price, rounding, etc.) take O(P) time.* Overall, the time complexity is O(P + U).* *Space Complexity**:* The merged dataframe requires O(U) space.* The result dataframe requires O(P) space (in the worst case, each product has at least one sale).* The all_products_df requires O(P) space.* The final_result dataframe requires O(P) space.* Overall, the space complexity is O(P + U).

## Test Cases


In [None]:
import pandas as pdimport numpy as np# Test Case 1: Example from the problem statementdef test_example_case():    prices_data = {        'product_id': [1, 1, 2, 2],        'start_date': pd.to_datetime(['2019-02-17', '2019-03-01', '2019-02-01', '2019-02-21']),        'end_date': pd.to_datetime(['2019-02-28', '2019-03-22', '2019-02-20', '2019-03-31']),        'price': [5, 20, 15, 30]    }        units_sold_data = {        'product_id': [1, 1, 2, 2],        'purchase_date': pd.to_datetime(['2019-02-25', '2019-03-01', '2019-02-10', '2019-03-22']),        'units': [100, 15, 200, 30]    }        prices = pd.DataFrame(prices_data)    units_sold = pd.DataFrame(units_sold_data)        result = average_selling_price(prices, units_sold)        expected = pd.DataFrame({        'product_id': [1, 2],        'average_price': [6.96, 16.96]    })        pd.testing.assert_frame_equal(result.sort_values('product_id').reset_index(drop=True),                                  expected.sort_values('product_id').reset_index(drop=True),                                 check_dtype=False)    print("Test Case 1 passed!")# Test Case 2: Product with no salesdef test_no_sales():    prices_data = {        'product_id': [1, 2],        'start_date': pd.to_datetime(['2019-02-17', '2019-02-01']),        'end_date': pd.to_datetime(['2019-02-28', '2019-02-20']),        'price': [5, 15]    }        units_sold_data = {        'product_id': [1],        'purchase_date': pd.to_datetime(['2019-02-25']),        'units': [100]    }        prices = pd.DataFrame(prices_data)    units_sold = pd.DataFrame(units_sold_data)        result = average_selling_price(prices, units_sold)        expected = pd.DataFrame({        'product_id': [1, 2],        'average_price': [5.0, 0.0]    })        pd.testing.assert_frame_equal(result.sort_values('product_id').reset_index(drop=True),                                  expected.sort_values('product_id').reset_index(drop=True),                                 check_dtype=False)    print("Test Case 2 passed!")# Test Case 3: Multiple price periods for the same productdef test_multiple_price_periods():    prices_data = {        'product_id': [1, 1, 1],        'start_date': pd.to_datetime(['2019-01-01', '2019-02-01', '2019-03-01']),        'end_date': pd.to_datetime(['2019-01-31', '2019-02-28', '2019-03-31']),        'price': [10, 20, 30]    }        units_sold_data = {        'product_id': [1, 1, 1],        'purchase_date': pd.to_datetime(['2019-01-15', '2019-02-15', '2019-03-15']),        'units': [50, 100, 150]    }        prices = pd.DataFrame(prices_data)    units_sold = pd.DataFrame(units_sold_data)        result = average_selling_price(prices, units_sold)        # Calculate expected average price: (50*10 + 100*20 + 150*30) / (50+100+150) = 23.33    expected = pd.DataFrame({        'product_id': [1],        'average_price': [23.33]    })        pd.testing.assert_frame_equal(result.sort_values('product_id').reset_index(drop=True),                                  expected.sort_values('product_id').reset_index(drop=True),                                 check_dtype=False)    print("Test Case 3 passed!")# Run the teststest_example_case()test_no_sales()test_multiple_price_periods()