<a href="https://colab.research.google.com/github/kusuma-103/Dynamic-parking-pricing/blob/main/notebook/price_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import pathway as pw
from datetime import datetime, timedelta
import math
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.layouts import gridplot
from bokeh.io import output_notebook
output_notebook()

Data Loading and Preprocessing

In [None]:
# Load the dataset (assuming it's available as 'dataset.csv')
# In a real implementation, this would be replaced with Pathway's streaming input
df = pd.read_csv('dataset.csv')

# Display basic info about the dataset
print(f"Dataset shape: {df.shape}")
print("\nColumns:", df.columns.tolist())
print("\nSample data:")
df.head()

Dataset shape: (18368, 12)

Columns: ['ID', 'SystemCodeNumber', 'Capacity', 'Latitude', 'Longitude', 'Occupancy', 'VehicleType', 'TrafficConditionNearby', 'QueueLength', 'IsSpecialDay', 'LastUpdatedDate', 'LastUpdatedTime']

Sample data:


Unnamed: 0,ID,SystemCodeNumber,Capacity,Latitude,Longitude,Occupancy,VehicleType,TrafficConditionNearby,QueueLength,IsSpecialDay,LastUpdatedDate,LastUpdatedTime
0,0,BHMBCCMKT01,577,26.144536,91.736172,61,car,low,1,0,04-10-2016,07:59:00
1,1,BHMBCCMKT01,577,26.144536,91.736172,64,car,low,1,0,04-10-2016,08:25:00
2,2,BHMBCCMKT01,577,26.144536,91.736172,80,car,low,2,0,04-10-2016,08:59:00
3,3,BHMBCCMKT01,577,26.144536,91.736172,107,car,low,2,0,04-10-2016,09:32:00
4,4,BHMBCCMKT01,577,26.144536,91.736172,150,bike,low,2,0,04-10-2016,09:59:00


Helper Functions

# Task
Generate price predictions using either the `BaselineLinearModel` or the `DemandBasedModel` and visualize the results.

## Select a model

### Subtask:
Choose either the `BaselineLinearModel` or the `DemandBasedModel` to generate predictions.


**Reasoning**:
Reviewing the two models, the `DemandBasedModel` takes into account more features from the dataset (`Occupancy`, `Capacity`, `QueueLength`, `TrafficConditionNearby`, `IsSpecialDay`, `VehicleType`) compared to the `BaselineLinearModel` which only uses `Occupancy` and `Capacity`. Since the goal is to generate price predictions based on the dataset, the `DemandBasedModel` is more suitable as it utilizes more relevant information available in the data. I will initialize the `DemandBasedModel`.



In [13]:
# Choose the DemandBasedModel for generating predictions
chosen_model_name = "DemandBasedModel"
model = DemandBasedModel()

print(f"Chosen model: {chosen_model_name}")

Chosen model: DemandBasedModel


## Prepare data for prediction

### Subtask:
Select the necessary columns from the dataframe and potentially perform any required transformations or feature engineering for the chosen model.


**Reasoning**:
Select the necessary columns, convert categorical features to numerical, and ensure the correct data types for the prediction model.



In [14]:
# Select the necessary columns for the DemandBasedModel
df_prediction_data = df[['Occupancy', 'Capacity', 'QueueLength', 'TrafficConditionNearby', 'IsSpecialDay', 'VehicleType']].copy()

# Convert 'TrafficConditionNearby' to numerical values
traffic_mapping = {'low': 0, 'average': 1, 'high': 2}
df_prediction_data['TrafficConditionNearby'] = df_prediction_data['TrafficConditionNearby'].map(traffic_mapping).fillna(0) # Map others to 0

# Ensure 'IsSpecialDay' is of integer type
df_prediction_data['IsSpecialDay'] = df_prediction_data['IsSpecialDay'].astype(int)

# Display the first few rows of the new DataFrame and its info
print("Prediction data sample:")
display(df_prediction_data.head())
print("\nPrediction data info:")
df_prediction_data.info()

Prediction data sample:


Unnamed: 0,Occupancy,Capacity,QueueLength,TrafficConditionNearby,IsSpecialDay,VehicleType
0,61,577,1,0,0,car
1,64,577,1,0,0,car
2,80,577,2,0,0,car
3,107,577,2,0,0,car
4,150,577,2,0,0,bike



Prediction data info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18368 entries, 0 to 18367
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   Occupancy               18368 non-null  int64 
 1   Capacity                18368 non-null  int64 
 2   QueueLength             18368 non-null  int64 
 3   TrafficConditionNearby  18368 non-null  int64 
 4   IsSpecialDay            18368 non-null  int64 
 5   VehicleType             18368 non-null  object
dtypes: int64(5), object(1)
memory usage: 861.1+ KB


## Generate predictions

### Subtask:
Apply the chosen model's prediction method to the prepared data to generate price predictions.


**Reasoning**:
Iterate through the prediction data, calculate demand, and then predict the price using the chosen model.



In [15]:
predicted_prices = []

for index, row in df_prediction_data.iterrows():
    demand_score = model.calculate_demand(
        occupancy=row['Occupancy'],
        capacity=row['Capacity'],
        queue_length=row['QueueLength'],
        traffic_level=row['TrafficConditionNearby'],
        is_special_day=row['IsSpecialDay'],
        vehicle_type=row['VehicleType']
    )
    predicted_price = model.calculate_price(demand_score)
    predicted_prices.append(predicted_price)

print(f"Generated {len(predicted_prices)} price predictions.")

Generated 18368 price predictions.


## Combine data and predictions

### Subtask:
Add the generated predictions as a new column to the original dataframe or a new dataframe for easier comparison and visualization.


**Reasoning**:
Add the generated predictions as a new column to the original dataframe and display the head of the updated dataframe.



In [16]:
df['PredictedPrice'] = predicted_prices

print("DataFrame with PredictedPrice column:")
display(df.head())

DataFrame with PredictedPrice column:


Unnamed: 0,ID,SystemCodeNumber,Capacity,Latitude,Longitude,Occupancy,VehicleType,TrafficConditionNearby,QueueLength,IsSpecialDay,LastUpdatedDate,LastUpdatedTime,PredictedPrice
0,0,BHMBCCMKT01,577,26.144536,91.736172,61,car,low,1,0,04-10-2016,07:59:00,12.893219
1,1,BHMBCCMKT01,577,26.144536,91.736172,64,car,low,1,0,04-10-2016,08:25:00,12.898419
2,2,BHMBCCMKT01,577,26.144536,91.736172,80,car,low,2,0,04-10-2016,08:59:00,12.963648
3,3,BHMBCCMKT01,577,26.144536,91.736172,107,car,low,2,0,04-10-2016,09:32:00,13.010442
4,4,BHMBCCMKT01,577,26.144536,91.736172,150,bike,low,2,0,04-10-2016,09:59:00,13.009965


## Visualize results

### Subtask:
Create visualizations to compare the predicted prices with actual prices (if available) or other relevant features.


**Reasoning**:
Create a Bokeh scatter plot to visualize the predicted prices and occupancy over time, as requested by the instructions.



In [17]:
# Create a ColumnDataSource
source = ColumnDataSource(df)

# Create a Bokeh figure
p = figure(
    height=300,
    width=800,
    title="Predicted Price and Occupancy over Time",
    x_axis_label="Index",
    y_axis_label="Value"
)

# Add scatter glyph for PredictedPrice
p.scatter(
    x='index',
    y='PredictedPrice',
    source=source,
    legend_label="Predicted Price",
    color="navy",
    alpha=0.6
)

# Add scatter glyph for Occupancy
p.scatter(
    x='index',
    y='Occupancy',
    source=source,
    legend_label="Occupancy",
    color="firebrick",
    alpha=0.6
)

# Add tooltips
p.add_tools(HoverTool(
    tooltips=[
        ("Index", "@index"),
        ("Predicted Price", "@PredictedPrice"),
        ("Occupancy", "@Occupancy")
    ]
))

# Customize legend location
p.legend.location = "top_left"
p.legend.click_policy="hide"

# Show the plot
show(p)

## Summary:

### Data Analysis Key Findings

*   The `DemandBasedModel` was chosen for price prediction as it incorporates more relevant features compared to the `BaselineLinearModel`.
*   The data was successfully prepared for the `DemandBasedModel` by selecting the required columns, converting 'TrafficConditionNearby' to numerical values (low: 0, average: 1, high: 2), and ensuring 'IsSpecialDay' is an integer type.
*   Price predictions were generated using the `DemandBasedModel`'s `calculate_demand` and `calculate_price` methods for each row in the prepared data.
*   The generated `PredictedPrice` column was successfully added to the original DataFrame, allowing for comparison and visualization.
*   A Bokeh scatter plot was created to visualize the relationship between `PredictedPrice` and `Occupancy` over time (represented by the index).

### Insights or Next Steps

*   The current visualization compares predicted price with occupancy. It would be insightful to also visualize the predicted price against the actual price (if available) to evaluate the model's performance.
*   Consider evaluating the model's accuracy using relevant metrics (e.g., Mean Absolute Error, Mean Squared Error) to quantify its performance.


In [19]:


# Create a Bokeh figure for QueueLength vs PredictedPrice
p_queue = figure(
    height=300,
    width=800,
    title="Predicted Price and Queue Length over Time",
    x_axis_label="Index",
    y_axis_label="Value"
)

# Add scatter glyph for PredictedPrice
p_queue.scatter(
    x='index',
    y='PredictedPrice',
    source=source,
    legend_label="Predicted Price",
    color="navy",
    alpha=0.6
)

# Add scatter glyph for QueueLength
p_queue.scatter(
    x='index',
    y='QueueLength',
    source=source,
    legend_label="Queue Length",
    color="green",
    alpha=0.6
)

# Add tooltips
p_queue.add_tools(HoverTool(
    tooltips=[
        ("Index", "@index"),
        ("Predicted Price", "@PredictedPrice"),
        ("Queue Length", "@QueueLength")
    ]
))

# Customize legend location
p_queue.legend.location = "top_left"
p_queue.legend.click_policy="hide"

# Show the plot
show(p_queue)