βοΈ Tool : Python, MySQL
π» Presentation Deck : PowerPoint
This project showcase on a typical time series forecasting pipeline, consisting of three major phases:
- Feature engineering and Data preparation
- Exploratory data analysis (time-series analysis)
- Forecasting
Development of a pharmaceutical forecasting model aims to mitigate the potential occurrence of drug shortages. It includes collecting historical sales data and analyzing it to identify patterns and trends in drug demand. Based on the analysis of the data, we developed a forecasting model that predicts the demand for medical supplies. The project will involve collecting data on the inventory levels, usage patterns, and other relevant factors. This data will be analyzed to identify trends and patterns in supply and demand.
Business Problem:
- Drug shortages in stocks
- Increase of bounce rate
Business Objective:
- Minimize drug shortages.
- Maximize availability of drug, customer satisfaction, and profits.
Data Dictionary:
Based on the problem and objective formal definition, the data sales transactions information system are cleaned, feature engineering approach was defined, and all data are transformed to monthly time series, consisting of aggregate sales among different classes of pharmaceutical drugs in monthly time periods.
- Data Cleaning: Identify errors, inconsistencies, and missing values in the dataset.
- Data Transformation: Convert data into an appropriate format or scale for analysis or modeling.
- Feature Engineering: Create new relevant features or variables from the existing data to improve the performance of machine learning models.
8 data preprocessing methods were implemented:
- Type Casting
- Handling missing values
- Removing duplicates
- One-Hot encoding
- Data manipulation
- First Moment Business Decision: Measure of Central Tendency
- Second Moment Business Decision: Measure of Dispersion
- Third Moment Business Decision: Skewness
- Forth Moment Business Decision: Kurtosis
Click here to view full documentation of data preprocessing and descriptive statistics
After data preprocessing and descriptive statistics, our next step involves developing EDA to analyze the trends and patterns of the datasets.
- Identify and analyze patterns, and trends within the data.
Figure 5 β Distribution of Data. The Quantity data shows it is not normally distributed since the data is not falling within the straight line
Figure 6 β Log Transformation. After log transformation, the data is normally distributed.
Figure 7 β Quantity of drugs sold by month
- Max. = 39490
- Right-skewed
- Most of transactions involve lower sales amounts, but there are a few instances with significantly higher sales amounts
Figure 8 β Quantity of drugs sold by month. The month of March, April, July, and September has highest quantity of medicines sold and it is approximately same.
Figure 9 β Quantity Trend. December has the highest quantity of medicines sold while February and June is the lowest.
Figure 10 β AutoEDA using D-Tale.
Based on the analysis of the data, the project will develop a forecasting model that predicts the demand for medical supplies.
- To accurately predict future sales based on historical data, minimizing forecasting errors and improving the model's predictive performance.
- Select Models:
- 2 types of model being developed in this project which is the Random Forest Regression Model and Linear Regression Model.
- Train and Test Models:
- Split historical data into training and testing sets.
- The training set is used to train the models, while the testing set is used to assess their performance.
- Each selected model is trained on the training set using appropriate parameters.
- Generate Forecasts:
- Trained models is used to generate forecasts for the required period.
- For each forecasted point, the predicted value is compared with the actual value from the testing set.
- Calculation of Evaluation Metrics:
- Common evaluation metrics are calculated to assess the accuracy of each model.
- Metrics for time series forecasting include:
i. Mean Squared Error (MSE)
ii. Root Mean Squared Error (RMSE)
Figure 12 β Forecast for the next 12 periods.
- MAPE = nan
- RMSE = 5032.711977063249
Figure 13 β Random Forest Regression Model
- MSE = 111.60992195041523
Figure 14 β Linear Regression Model
- MSE = 159.57892788366377
- Considering the lower MSE value of the Random Forest Model (111.61) compared to the Linear Regression Model (159.58), the Random Forest Model appears to be the better performer in terms of prediction accuracy.
- A lower MSE indicates that the Random Forest Model's predictions are, on average, closer to the actual values, resulting in better overall model performance.
- The Random Forest Model is capable of capturing complex non-linear relationships in the data, which can be important when dealing with diverse and intricate patterns in pharmaceutical sales and demand.
- It is less sensitive to outliers compared to Linear Regression, potentially resulting in more reliable predictions.
- Difficulty to predict stock with return quantity when model implanted on quantity only.
- Restriction on patientsβ privacy info with different nature of health condition leads to another drugs in present environment and it automatically leads drug shortage.
- Limited knowledge in new drugs development.
- Ensuring the forecasting model aligns with pharmaceutical regulations and patient privacy laws.
- Data integration with existing inventory systems is complex and time-consuming.
- External factors of sudden market shifts, or unexpected events (e.g., pandemics) could disrupt the accuracy of forecasting models and inventory plans.
1. Enhanced Forecasting Model: Continuously improve and refine the pharmaceutical forecasting model by incorporating more advanced machine learning techniques, considering additional factors like seasonal trends, public health events, and external influences on medicine demand.
2. Supply Chain Optimization: Collaborate with suppliers and distributors to optimize the entire supply chain. This might involve streamlining delivery routes, and reducing lead times
3. Machine Learning for Bounce Rate Reduction: Utilize machine learning to predict and mitigate factors causing high bounce rates in pharmacies, offering insights into optimizing store layouts and reduce return products