# Dynamic Pricing Strategy using Python

## What is Dynamic Pricing?

Dynamic Pricing is an application of Data Science that involves adjusting product or service prices based on various factors in real time. It is employed by businesses to optimize their revenue and profitability by setting flexible prices that respond to market demand, customer behaviour, and competitor pricing.

Using data-driven insights and algorithms, businesses can dynamically modify prices to achieve the most favourable outcomes.

For example, consider a ride-sharing company operating in a metropolitan area. The company wants to optimize its pricing strategy to maximize revenue and improve customer satisfaction. The traditional pricing model used by the business is based on fixed rates per kilometre, which does not account for fluctuations in supply and demand.

By implementing a dynamic pricing strategy, the company can leverage data science techniques to analyze various factors such as historical trip data, real-time demand, traffic patterns, and events happening in the area.

Using Machine Learning algorithms, the company can analyze data and adjust its prices in real-time. When demand is high, such as during rush hours or major events, the algorithm can increase the cost of the rides to incentivize more drivers to be available and balance the supply and demand. Conversely, during periods of low demand, the algorithm can lower the prices to attract more customers.

## Dynamic Pricing Strategy: Overview

So, in a dynamic pricing strategy, the aim is to maximize revenue and profitability by pricing items at the right level that balances supply and demand dynamics. It allows businesses to adjust prices dynamically based on factors like time of day, day of the week, customer segments, inventory levels, seasonal fluctuations, competitor pricing, and market conditions.

To implement a data-driven dynamic pricing strategy, businesses typically require data that can provide insights into customer behaviour, market trends, and other influencing factors. So to create a dynamic pricing strategy, we need to have a dataset based on:

<ul>
 <li>historical sales data</li>
 <li>customer purchase patterns</li>
 <li>market demand forecasts</li>
 <li>cost data</li>
 <li>customer segmentation data</li>
 <li>real-time market data</li>
 </ul>

### Importing the necessary Python libraries and the dataset:

In [1]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
data=pd.read_csv("C:\\Users\\megha\\dynamic_pricing.csv")

In [2]:
data.head()

Unnamed: 0,Number_of_Riders,Number_of_Drivers,Location_Category,Customer_Loyalty_Status,Number_of_Past_Rides,Average_Ratings,Time_of_Booking,Vehicle_Type,Expected_Ride_Duration,Historical_Cost_of_Ride
0,90,45,Urban,Silver,13,4.47,Night,Premium,90,284.257273
1,58,39,Suburban,Silver,72,4.06,Evening,Economy,43,173.874753
2,42,31,Rural,Silver,0,3.99,Afternoon,Premium,76,329.795469
3,89,28,Rural,Regular,67,4.31,Afternoon,Premium,134,470.201232
4,78,22,Rural,Regular,74,3.77,Afternoon,Economy,149,579.681422


### Let’s have a look at the descriptive statistics of the data:

In [3]:
data.describe()

Unnamed: 0,Number_of_Riders,Number_of_Drivers,Number_of_Past_Rides,Average_Ratings,Expected_Ride_Duration,Historical_Cost_of_Ride
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,60.372,27.076,50.031,4.25722,99.588,372.502623
std,23.701506,19.068346,29.313774,0.435781,49.16545,187.158756
min,20.0,5.0,0.0,3.5,10.0,25.993449
25%,40.0,11.0,25.0,3.87,59.75,221.365202
50%,60.0,22.0,51.0,4.27,102.0,362.019426
75%,81.0,38.0,75.0,4.6325,143.0,510.497504
max,100.0,89.0,100.0,5.0,180.0,836.116419


### Now let’s have a look at the relationship between expected ride duration and the historical cost of the ride:

In [28]:
fig=px.scatter(data,x='Expected_Ride_Duration',y='Historical_Cost_of_Ride',
               title='Historical Ride Duration vs. Historical Cost of Ride',trendline='ols')
fig.show()

### Now let’s have a look at the distribution of the historical cost of rides based on the vehicle type:

In [5]:
fig=px.box(data,x='Vehicle_Type',y='Historical_Cost_of_Ride',title='Historical Cost of Ride Distribution by Vehicle Type')
fig.show()

### Now let’s have a look at the correlation matrix:

In [6]:
corr_matrix=data.corr(numeric_only = True)
corr_matrix

Unnamed: 0,Number_of_Riders,Number_of_Drivers,Number_of_Past_Rides,Average_Ratings,Expected_Ride_Duration,Historical_Cost_of_Ride
Number_of_Riders,1.0,0.627016,0.029265,0.008572,-0.015856,0.005826
Number_of_Drivers,0.627016,1.0,0.03966,0.041204,-0.024418,0.017082
Number_of_Past_Rides,0.029265,0.03966,1.0,-0.064734,0.030679,0.035859
Average_Ratings,0.008572,0.041204,-0.064734,1.0,-0.016968,-0.001063
Expected_Ride_Duration,-0.015856,-0.024418,0.030679,-0.016968,1.0,0.927547
Historical_Cost_of_Ride,0.005826,0.017082,0.035859,-0.001063,0.927547,1.0


In [7]:
fig=go.Figure(data=go.Heatmap(z=corr_matrix.values,x=corr_matrix.columns,y=corr_matrix.columns,colorscale='Viridis'))
fig.update_layout(title='Correlation Matrix')

### Implementing a dynamic pricing strategy:

The data provided by the company states that the company uses a pricing model that only takes the expected ride duration as a factor to determine the price for a ride. Now, we will implement a dynamic pricing strategy aiming to adjust the ride costs dynamically based on the demand and supply levels observed in the data. It will capture high-demand periods and low-supply scenarios to increase prices, while low-demand periods and high-supply situations will lead to price reductions.

In [8]:
import numpy as np

# Calculate demand_multiplier based on percentile for high and low demand
high_demand_percentile=75
low_demand_percentile=25

data['demand_multiplier']=np.where(data['Number_of_Riders']>np.percentile(data['Number_of_Riders'],high_demand_percentile),
                                   data['Number_of_Riders']/np.percentile(data['Number_of_Riders'],high_demand_percentile),
                                   data['Number_of_Riders']/np.percentile(data['Number_of_Riders'],low_demand_percentile))

# Calculate supply_multiplier based on percentile for high and low supply
high_supply_percentile=75
low_supply_percentile=25

data['supply_multiplier']=np.where(data['Number_of_Drivers']>np.percentile(data['Number_of_Drivers'],low_supply_percentile),
                                   np.percentile(data['Number_of_Drivers'], high_supply_percentile)/data['Number_of_Drivers'],
                                   np.percentile(data['Number_of_Drivers'],low_supply_percentile)/data['Number_of_Drivers'])

# Define price adjustment factors for high and low demand/supply
demand_threshold_high = 1.2  # Higher demand threshold
demand_threshold_low = 0.8  # Lower demand threshold
supply_threshold_high = 0.8  # Higher supply threshold
supply_threshold_low = 1.2  # Lower supply threshold

# Calculate adjusted_ride_cost for dynamic pricing
data['adjusted_ride_cost'] = data['Historical_Cost_of_Ride'] * (np.maximum(data['demand_multiplier'], demand_threshold_low) *
                             np.maximum(data['supply_multiplier'], supply_threshold_high))

In [9]:
data['demand_multiplier']

0      1.111111
1      1.450000
2      1.050000
3      1.098765
4      1.950000
         ...   
995    0.825000
996    1.037037
997    1.100000
998    1.325000
999    1.950000
Name: demand_multiplier, Length: 1000, dtype: float64

In [10]:
data['supply_multiplier']

0      0.844444
1      0.974359
2      1.225806
3      1.357143
4      1.727273
         ...   
995    1.652174
996    1.310345
997    1.833333
998    1.407407
999    0.603175
Name: supply_multiplier, Length: 1000, dtype: float64

In [11]:
data['adjusted_ride_cost']

0       266.710528
1       245.653817
2       424.478684
3       701.155452
4      1952.472427
          ...     
995     124.567897
996     576.375440
997     317.352408
998     520.460581
999    1021.901565
Name: adjusted_ride_cost, Length: 1000, dtype: float64

In the above code, we first calculated the demand multiplier by comparing the number of riders to percentiles representing high and low demand levels. If the number of riders exceeds the percentile for high demand, the demand multiplier is set as the number of riders divided by the high-demand percentile. Otherwise, if the number of riders falls below the percentile for low demand, the demand multiplier is set as the number of riders divided by the low-demand percentile.

Next, we calculated the supply multiplier by comparing the number of drivers to percentiles representing high and low supply levels. If the number of drivers exceeds the low-supply percentile, the supply multiplier is set as the high-supply percentile divided by the number of drivers. On the other hand, if the number of drivers is below the low-supply percentile, the supply multiplier is set as the low-supply percentile divided by the number of drivers.

Finally, we calculated the adjusted ride cost for dynamic pricing. It multiplies the historical cost of the ride by the maximum of the demand multiplier and a lower threshold (demand_threshold_low), and also by the maximum of the supply multiplier and an upper threshold (supply_threshold_high). This multiplication ensures that the adjusted ride cost captures the combined effect of demand and supply multipliers, with the thresholds serving as caps or floors to control the price adjustments.

### Now let’s calculate the profit percentage we got after implementing this dynamic pricing strategy:

In [12]:
# Calculate the profit percentage for each ride
data['profit_percentage'] = ((data['adjusted_ride_cost']-data['Historical_Cost_of_Ride'])/data['Historical_Cost_of_Ride'])*100

# Identify profitable rides where profit percentage is positive
profitable_rides = data[data['profit_percentage'] > 0]

# Identify loss rides where profit percentage is negative
loss_rides = data[data['profit_percentage'] < 0]

In [13]:
data['profit_percentage']

0       -6.172840
1       41.282051
2       28.709677
3       49.118166
4      236.818182
          ...    
995     36.304348
996     35.887612
997    101.666667
998     86.481481
999     56.000000
Name: profit_percentage, Length: 1000, dtype: float64

In [14]:
profitable_rides

Unnamed: 0,Number_of_Riders,Number_of_Drivers,Location_Category,Customer_Loyalty_Status,Number_of_Past_Rides,Average_Ratings,Time_of_Booking,Vehicle_Type,Expected_Ride_Duration,Historical_Cost_of_Ride,demand_multiplier,supply_multiplier,adjusted_ride_cost,profit_percentage
1,58,39,Suburban,Silver,72,4.06,Evening,Economy,43,173.874753,1.450000,0.974359,245.653817,41.282051
2,42,31,Rural,Silver,0,3.99,Afternoon,Premium,76,329.795469,1.050000,1.225806,424.478684,28.709677
3,89,28,Rural,Regular,67,4.31,Afternoon,Premium,134,470.201232,1.098765,1.357143,701.155452,49.118166
4,78,22,Rural,Regular,74,3.77,Afternoon,Economy,149,579.681422,1.950000,1.727273,1952.472427,236.818182
5,59,35,Urban,Silver,83,3.51,Night,Economy,128,339.955361,1.475000,1.085714,544.414227,60.142857
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,33,23,Urban,Gold,24,4.21,Morning,Premium,11,91.389526,0.825000,1.652174,124.567897,36.304348
996,84,29,Urban,Regular,92,4.55,Morning,Premium,94,424.155987,1.037037,1.310345,576.375440,35.887612
997,44,6,Suburban,Gold,80,4.13,Night,Premium,40,157.364830,1.100000,1.833333,317.352408,101.666667
998,53,27,Suburban,Regular,78,3.63,Night,Premium,58,279.095048,1.325000,1.407407,520.460581,86.481481


In [15]:
loss_rides

Unnamed: 0,Number_of_Riders,Number_of_Drivers,Location_Category,Customer_Loyalty_Status,Number_of_Past_Rides,Average_Ratings,Time_of_Booking,Vehicle_Type,Expected_Ride_Duration,Historical_Cost_of_Ride,demand_multiplier,supply_multiplier,adjusted_ride_cost,profit_percentage
0,90,45,Urban,Silver,13,4.47,Night,Premium,90,284.257273,1.111111,0.844444,266.710528,-6.172840
21,22,11,Suburban,Silver,79,4.48,Night,Economy,15,64.071173,0.550000,1.000000,51.256938,-20.000000
27,97,75,Urban,Silver,76,4.70,Night,Economy,158,652.617297,1.197531,0.506667,625.223485,-4.197531
36,28,11,Suburban,Gold,52,4.33,Afternoon,Premium,94,456.180947,0.700000,1.000000,364.944757,-20.000000
42,97,81,Suburban,Regular,52,4.56,Evening,Economy,125,530.759994,1.197531,0.469136,508.481179,-4.197531
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
965,93,59,Rural,Regular,30,4.11,Afternoon,Premium,89,281.052129,1.148148,0.644068,258.151585,-8.148148
967,95,73,Urban,Regular,20,3.59,Evening,Premium,138,666.824206,1.172840,0.520548,625.662218,-6.172840
975,89,54,Suburban,Gold,48,4.38,Afternoon,Economy,24,74.788327,1.098765,0.703704,65.739863,-12.098765
976,84,62,Urban,Silver,52,4.61,Afternoon,Economy,29,126.230812,1.037037,0.612903,104.724821,-17.037037


In [16]:
import plotly.graph_objects as go

# Calculate the count of profitable and loss rides
profitable_count = len(profitable_rides)
loss_count = len(loss_rides)

profitable_count,loss_count

(826, 173)

In [17]:
# Create a donut chart to show the distribution of profitable and loss rides
labels = ['Profitable Rides', 'Loss Rides']
values = [profitable_count, loss_count]

fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=0.4)])
fig.update_layout(title='Profitability of Rides (Dynamic Pricing vs. Historical Pricing)')
fig.show()

### Now let’s have a look at the relationship between the expected ride duration and the cost of the ride based on the dynamic pricing strategy:

In [18]:
fig = px.scatter(data, 
                 x='Expected_Ride_Duration', 
                 y='adjusted_ride_cost',
                 title='Expected Ride Duration vs. Cost of Ride', 
                 trendline='ols')
fig.show()

## Training a Predictive Model

### Now, as we have implemented a dynamic pricing strategy, let’s train a Machine Learning model. Before training the model, let’s preprocess the data:

In [19]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler

In [20]:
def data_preprocessing_pipeline(data):
    
    #Identify numeric and categorical features
    numeric_features=data.select_dtypes(include=['float','int']).columns
    categorical_features=data.select_dtypes(include=['object']).columns
    
    #Handle missing values in numeric features
    data[numeric_features]=data[numeric_features].fillna(data[numeric_features].mean())
    
    #Detect and handle outliers in numeric features using IQR
    for feature in numeric_features:
        Q1=data[feature].quantile(0.25)
        Q3=data[feature].quantile(0.75)
        IQR=Q3-Q1
        lower_bound=Q1-(1.5*IQR)
        upper_bound=Q3+(1.5*IQR)
        data[feature]=np.where((data[feature]<lower_bound)|(data[feature]>upper_bound),data[feature].mean(),data[feature])
    
    #Handle missing values in categorical features
    data[categorical_features]=data[categorical_features].fillna(data[categorical_features].mode().iloc[0])
    return data 

### In the above code, we have implemented a data preprocessing pipeline to preprocess the data. As vehicle type is a valuable factor, let’s convert it into a numerical feature before moving forward:

In [21]:
data["Vehicle_Type"]=data["Vehicle_Type"].map({"Premium":1,"Economy":0})

### Now let’s split the data and train a Machine Learning model to predict the cost of a ride:

In [22]:
# splitting data
from sklearn.model_selection import train_test_split
x=np.array(data[["Number_of_Riders","Number_of_Drivers","Vehicle_Type","Expected_Ride_Duration"]])
y=np.array(data[["adjusted_ride_cost"]])

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42)

# Reshape y to 1D array
y_train=y_train.ravel()
y_test=y_test.ravel()

# Training a random forest regression model
from sklearn.ensemble import RandomForestRegressor
model=RandomForestRegressor()
model.fit(x_train,y_train)

### Now let’s test this Machine Learning model using some input values:

In [23]:
def get_vehicle_type_numeric(vehicle_type):
    vehicle_type_mapping={"Premium":1,"Economy":0}
    vehicle_type_numeric=vehicle_type_mapping.get(vehicle_type)
    return vehicle_type_numeric

In [24]:
# Predicting using user input values
def predict_price(number_of_riders,number_of_drivers,vehicle_type,Expected_Ride_Duration):
    vehicle_type_numeric=get_vehicle_type_numeric(vehicle_type)
    if vehicle_type_numeric is None:
        raise ValueError("Invalid vehicle type")
    input_data=np.array([[number_of_riders,number_of_drivers,vehicle_type_numeric,Expected_Ride_Duration]])
    predicted_price=model.predict(input_data)
    return predicted_price

In [25]:
# Example prediction using user input values
user_number_of_riders=50
user_number_of_drivers=25
user_vehicle_type="Economy"
Expected_Ride_Duration=30
predicted_price=predict_price(user_number_of_riders,user_number_of_drivers,user_vehicle_type,Expected_Ride_Duration)
print("Predicted price:",predicted_price)

Predicted price: [243.78968854]


### Here’s a comparison of the actual and predicted results:

In [26]:
import plotly.graph_objects as go

# Predict on the test set
y_pred=model.predict(x_test)

# Create a scatter plot with actual vs predicted values
fig=go.Figure()
fig.add_trace(go.Scatter(x=y_test.flatten(),y=y_pred,mode='markers',name='Actual vs Predicted'))

# Add a line representing the ideal case
fig.add_trace(go.Scatter(x=[min(y_test.flatten()),max(y_test.flatten())],y=[min(y_test.flatten()),max(y_test.flatten())],
                         mode='lines',name='Ideal',line=dict(color='red',dash='dash')))

fig.update_layout(title='Actual vs Predicted values',xaxis_title='Actual Values',yaxis_title='Predicted Values',showlegend=True)
fig.show()

So this is how you can use Machine Learning to implement a data-driven dynamic pricing strategy using Python.

### Summary
In a dynamic pricing strategy, the aim is to maximize revenue and profitability by pricing items at the right level that balances supply and demand dynamics. It allows businesses to adjust prices dynamically based on factors like time of day, day of the week, customer segments, inventory levels, seasonal fluctuations, competitor pricing, and market conditions.