## Step-by-Step Solution

- **Step 0: Import Required Libraries**
  - [X] Import the necessary libraries for data processing, forecasting, optimization, and visualization.

- **Step 1: Data Preprocessing**
  - [X] Load the provided datasets: `Biomass_History.csv` and `Distance_Matrix.csv`.
  - [X] Extract information on biomass availability, distance matrix, and other parameters.
  - [X] Ensure all values are in the correct format and dimensionless.

- **Step 2: Biomass Forecasting for 2018 and 2019**
  - [X] Apply forecasting techniques to estimate biomass availability for the years 2018 and 2019.
  - [X] Create a new datasets, `Biomass_Forecast.csv`, containing forecasted biomass values for each harvesting site.
  - [X] Save the dataset `Biomass_Forecast.csv`


- **Step 3: Define Variables and Constants**
  - [ ] Define the following variables:
    - `Biorefinery`: Binary variable indicating whether a grid block is a biorefinery location.
    - `Depot`: Binary variable indicating whether a grid block is a preprocessing depot location.
    - `Biomass_Demand_Supply[i, j]`: Amount of biomass transported from harvesting site 'i' to depot 'j'.
    - `Pellet_Demand_Supply[j, k]`: Amount of pellets transported from depot 'j' to biorefinery 'k'.
  - [ ] Set constants:
    - `a`, `b`, `c`: Constants for cost computation.
    - `Max_Depots`: Maximum number of preprocessing depots (<= 25).
    - `Max_Refineries`: Maximum number of biorefineries (<= 5).
    - `Max_Depot_Capacity`: Maximum yearly processing capacity of a depot (20,000).
    - `Max_Refinery_Capacity`: Maximum yearly processing capacity of a biorefinery (100,000).

- **Step 4: Formulate the Objective Function**
  - [ ] Define the objective function that minimizes the overall cost of the supply chain.
  - [ ] The objective function should consider transportation costs, biomass forecast mismatch costs, and underutilization costs.

- **Step 5: Formulate the Constraints**
  - [ ] Create constraints to ensure that all quantities are greater than or equal to zero.
  - [ ] Enforce that the amount of biomass procured does not exceed the forecasted biomass at each harvesting site.
  - [ ] Limit the total biomass reaching each preprocessing depot to its yearly processing capacity.
  - [ ] Limit the total pellets reaching each biorefinery to its yearly processing capacity.
  - [ ] Enforce the number of depots to be less than or equal to `Max_Depots`.
  - [ ] Enforce the number of refineries to be less than or equal to `Max_Refineries`.
  - [ ] Ensure that at least 80% of the total forecasted biomass is processed by the refineries each year.
  - [ ] Balance the total biomass entering and exiting each preprocessing depot (within a tolerance limit of 1e-03).

- **Step 6: Optimization**
  - [ ] Utilize appropriate optimization techniques (e.g., mathematical programming, linear programming) to solve the formulated problem.
  - [ ] Find the optimal locations for preprocessing depots and biorefineries.
  - [ ] Allocate biomass quantities and pellets to minimize the objective function while satisfying all constraints.

- **Step 7: Post-processing and Visualization**
  - [ ] Analyze the optimized supply chain to visualize the optimal locations of depots and refineries on the map of Gujarat.
  - [ ] Display the transportation routes for biomass and pellets between harvesting sites, depots, and refineries.
  - [ ] Summarize the total cost and other relevant information about the optimized supply chain.

- **Step 8: Output Submission**
  - [ ] Generate the final output file (`solution.csv`) in the required format with columns for year, data type, source index, destination index, and corresponding values.
  - [ ] Submit the solution file on the HackerEarth portal for evaluation.


# Step 0: Import Required Libraries

In [1]:
!pip install pandas numpy matplotlib



In [2]:
import pandas as pd
import numpy as np
import warnings
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA



In [3]:
warnings.filterwarnings("ignore")

#  Step 1: Data Preparation and Exploration


In [4]:
# Load the datasets
biomass_history_df = pd.read_csv('/kaggle/input/biomasshistory/dataset/Biomass_History.csv')
distance_matrix_df = pd.read_csv('/kaggle/input/biomasshistory/dataset/Distance_Matrix.csv')

In [5]:
# Check for any missing values in the datasets
print("\nMissing Values in Biomass History Data:")
print(biomass_history_df.isnull().sum())

print("\nMissing Values in Distance Matrix Data:")
print(distance_matrix_df.isnull().sum())


Missing Values in Biomass History Data:
Index        0
Latitude     0
Longitude    0
2010         0
2011         0
2012         0
2013         0
2014         0
2015         0
2016         0
2017         0
dtype: int64

Missing Values in Distance Matrix Data:
Unnamed: 0    0
0             0
1             0
2             0
3             0
             ..
2413          0
2414          0
2415          0
2416          0
2417          0
Length: 2419, dtype: int64


In [6]:
# Display the first few rows of each dataset to inspect the data
print("Biomass History Data:")
print(biomass_history_df.head())

print("\nDistance Matrix Data:")
print(distance_matrix_df.head())

Biomass History Data:
   Index  Latitude  Longitude       2010       2011       2012       2013  \
0      0  24.66818   71.33144   8.475744   8.868568   9.202181   6.023070   
1      1  24.66818   71.41106  24.029778  28.551348  25.866415  21.634459   
2      2  24.66818   71.49069  44.831635  66.111168  56.982258  53.003735   
3      3  24.66818   71.57031  59.974419  80.821304  78.956543  63.160561   
4      4  24.66818   71.64994  14.653370  19.327524  21.928144  17.899586   

        2014       2015       2016        2017  
0  10.788374   6.647325   7.387925    5.180296  
1  34.419411  27.361908  40.431847   42.126945  
2  70.917908  42.517117  59.181629   73.203232  
3  93.513924  70.203171  74.536720  101.067352  
4  19.534035  19.165791  16.531315   26.086885  

Distance Matrix Data:
   Unnamed: 0        0        1        2        3        4        5        6  \
0           0   0.0000  11.3769  20.4557  38.1227  45.3810  54.9915  78.6108   
1           1  11.3769   0.0000   9.07

In [7]:
# Display the data type
print("Biomass History Info:")
print(biomass_history_df.info())

print("\nDistance Matrix Info:")
print(distance_matrix_df.info())

Biomass History Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2418 entries, 0 to 2417
Data columns (total 11 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Index      2418 non-null   int64  
 1   Latitude   2418 non-null   float64
 2   Longitude  2418 non-null   float64
 3   2010       2418 non-null   float64
 4   2011       2418 non-null   float64
 5   2012       2418 non-null   float64
 6   2013       2418 non-null   float64
 7   2014       2418 non-null   float64
 8   2015       2418 non-null   float64
 9   2016       2418 non-null   float64
 10  2017       2418 non-null   float64
dtypes: float64(10), int64(1)
memory usage: 207.9 KB
None

Distance Matrix Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2418 entries, 0 to 2417
Columns: 2419 entries, Unnamed: 0 to 2417
dtypes: float64(2418), int64(1)
memory usage: 44.6 MB
None


# Step:2 Biomass Forecasting (Time Series Forecasting)

In [8]:
# Prepare the data for ARIMA model
biomass_availability = biomass_history_df.iloc[:, 3:].values

# Define the number of time periods (years) for forecasting (2010 to 2017)
num_periods = biomass_availability.shape[1]

# Initialize the forecast array for 2018 and 2019
biomass_forecast_2018 = np.zeros(biomass_availability.shape[0])
biomass_forecast_2019 = np.zeros(biomass_availability.shape[0])


# Loop through each harvesting site and forecast biomass availability for 2018
for i in range(biomass_availability.shape[0]):
    # Fit ARIMA model for each harvesting site
    model = ARIMA(biomass_availability[i, :], order=(1, 1, 0))  # (p, d, q) order
    model_fit = model.fit()
    
    # Forecast biomass availability for 2018 and 2019 (one step ahead forecast)
    forecast_2018 = model_fit.forecast(steps=1)
    forecast_2019 = model_fit.forecast(steps=2)  # Forecast for 2018 and 2019
    
    biomass_forecast_2018[i] = forecast_2018[0]
    biomass_forecast_2019[i] = forecast_2019[1]

In [9]:
# Display the forecasted biomass for 2018 and 2019 for all harvesting sites
print("Forecasted Biomass for 2018:")
print(biomass_forecast_2018)
print("Forecasted Biomass for 2019:")
print(biomass_forecast_2019)

Forecasted Biomass for 2018:
[6.73930404e+00 4.13465270e+01 6.56475406e+01 ... 4.28597348e-02
 1.15729622e+00 1.98485730e-01]
Forecasted Biomass for 2019:
[5.63834683e+00 4.17058294e+01 6.97190060e+01 ... 4.09060133e-02
 1.07786273e+00 1.81365820e-01]


In [10]:
# Create a new DataFrame for the forecasted biomass data
biomass_forecast_df = biomass_history_df.copy()

# Add the forecasted biomass for 2018 to the new DataFrame
biomass_forecast_df['2018'] = biomass_forecast_2018
biomass_forecast_df['2019'] = biomass_forecast_2019

# Display the new DataFrame
print(biomass_forecast_df.head())

# Save the new DataFrame as a new CSV file
biomass_forecast_df.to_csv('Biomass_Forecast_Data.csv', index=False)

   Index  Latitude  Longitude       2010       2011       2012       2013  \
0      0  24.66818   71.33144   8.475744   8.868568   9.202181   6.023070   
1      1  24.66818   71.41106  24.029778  28.551348  25.866415  21.634459   
2      2  24.66818   71.49069  44.831635  66.111168  56.982258  53.003735   
3      3  24.66818   71.57031  59.974419  80.821304  78.956543  63.160561   
4      4  24.66818   71.64994  14.653370  19.327524  21.928144  17.899586   

        2014       2015       2016        2017       2018       2019  
0  10.788374   6.647325   7.387925    5.180296   6.739304   5.638347  
1  34.419411  27.361908  40.431847   42.126945  41.346527  41.705829  
2  70.917908  42.517117  59.181629   73.203232  65.647541  69.719006  
3  93.513924  70.203171  74.536720  101.067352  86.576600  94.491297  
4  19.534035  19.165791  16.531315   26.086885  21.108108  23.702220  


# Rough

In [11]:
# # Create an empty DataFrame to store the forecasted biomass data for all sites
# biomass_forecast_data = pd.DataFrame()

# # Create a time index for the forecasted years
# forecast_time_index = pd.date_range(start="2018", periods=2, freq="A")

# # Iterate through all harvesting sites
# for harvesting_site_index in range(len(biomass_history_df)):
#     # Extract the biomass data for the selected harvesting site
#     biomass_timeseries = biomass_history_df.iloc[harvesting_site_index, 3:].values

#     # Convert biomass_timeseries to numeric values
#     biomass_timeseries = pd.to_numeric(biomass_timeseries, errors='coerce')

#     # Create a time series object
#     biomass_series = pd.Series(biomass_timeseries, index=time_index)

#     # Fit the ARIMA model
#     model = ARIMA(biomass_series, order=(3, 1, 1))
#     model_fit = model.fit()

#     # Forecast biomass availability for the years 2018 and 2019
#     forecasted_values = model_fit.forecast(steps=2)

#     # Create a DataFrame to store the forecasted biomass data for the current site
#     site_forecast_data = pd.DataFrame({
#         'harvesting_site_index': [harvesting_site_index] * 2,
#         'year': forecast_time_index.year,
#         'biomass_forecast': forecasted_values
#     })

#     # Append the site_forecast_data to the overall biomass_forecast_data
#     biomass_forecast_data = biomass_forecast_data.append(site_forecast_data, ignore_index=True)


In [12]:
# # Save the Biomass Forecast data to the Kaggle output directory
# biomass_forecast_data.to_csv("Biomass_Forecast.csv", index=False)

In [13]:
# print(biomass_forecast_data)

In [14]:
# # Load the biomass forecast data for the years 2018 and 2019
# biomass_forecast_data = pd.read_csv('/kaggle/input/biomassforecast/Biomass_Forecast (1).csv')

# # Determine the number of clusters for K-means (e.g., 2 clusters for 2 potential locations)
# num_clusters = 2

# # List to store cluster centers for all sites
# all_cluster_centers = []

# # Loop over each harvesting site
# for harvesting_site_index in range(2418):  # Assuming there are 2418 harvesting sites
#     # Select the forecasted biomass availability for the current harvesting site
#     forecasted_biomass = biomass_forecast_data.loc[biomass_forecast_data['harvesting_site_index'] == harvesting_site_index, 'biomass_forecast'].values
    
#     # Convert the forecasted biomass to numeric values
#     forecasted_biomass = pd.to_numeric(forecasted_biomass, errors='coerce')
    
#     # Reshape the forecasted biomass data for K-means clustering
#     X = forecasted_biomass.reshape(-1, 1)

#     # Perform K-means clustering
#     kmeans = KMeans(n_clusters=num_clusters, random_state=0)
#     kmeans.fit(X)

#     # Get the cluster centers (potential locations for Depots and Biorefineries) for the current site
#     cluster_centers = kmeans.cluster_centers_
    
#     # Add the cluster centers to the list for all sites
#     all_cluster_centers.append(cluster_centers)

# # Convert the list to a numpy array for easier handling
# all_cluster_centers = np.array(all_cluster_centers)

# # Display the cluster centers for all sites
# print(all_cluster_centers)


In [15]:
# # Load Biomass_History.csv for forecasted biomass values
# biomass_forecast_df = pd.read_csv('/kaggle/input/biomassforecast/Biomass_Forecast (1).csv')

# # Load Distance_Matrix.csv for distance data
# distance_matrix_df = pd.read_csv('/kaggle/input/biomasshistory/dataset/Distance_Matrix.csv')


In [16]:
# # Constants
# a = 0.001
# b = 1
# c = 1

# # Maximum yearly processing capacity of depot and refinery
# max_capacity_depot = 20000
# max_capacity_refinery = 100000

# # Number of depots and refineries (you can adjust these based on the constraints)
# num_depots = 5
# num_refineries = 3

In [17]:
# def biomass_lp_forecast(biomass_forecast, distance_matrix):
#     # Extract relevant data from dataframes
#     num_harvesting_sites = len(biomass_forecast)
#     num_years = len(biomass_forecast.columns) - 1

#     # Convert DataFrame to numpy array for LP optimization
#     biomass_forecast_values = biomass_forecast.drop(columns=['Index', 'Latitude', 'Longitude']).values

#     # Define LP variables and constraints
#     biomass_transport_bounds = [(0, None) for _ in range(num_harvesting_sites * num_depots)]
#     pellet_transport_bounds = [(0, None) for _ in range(num_depots * num_refineries)]
    
#     # Constraints lists
#     biomass_constraints = []
#     pellet_constraints = []

#     # Biomass demand-supply constraints
#     for i in range(num_harvesting_sites):
#         biomass_constraints.append({'type': 'ineq', 'fun': lambda x, i=i: biomass_forecast_values[i, 1:].sum() - x[i*num_years:(i+1)*num_years].sum()})

#     # Depot processing capacity constraints
#     for j in range(num_depots):
#         biomass_constraints.append({'type': 'ineq', 'fun': lambda x, j=j: max_capacity_depot - x[j*num_years*num_harvesting_sites:(j+1)*num_years*num_harvesting_sites].sum()})

#     # Refinery processing capacity constraints
#     for k in range(num_refineries):
#         pellet_constraints.append({'type': 'ineq', 'fun': lambda x, k=k: max_capacity_refinery - x[k*num_years*num_depots:(k+1)*num_years*num_depots].sum()})

#     # Solve the LP optimization problem
#     c = np.hstack([distance_matrix.flatten() for _ in range(num_years)])
#     res = linprog(c, A_ub=biomass_constraints + pellet_constraints, bounds=biomass_transport_bounds + pellet_transport_bounds, method='highs')

#     return res.x.reshape(num_years, num_harvesting_sites, num_depots), res.x.reshape(num_years, num_depots, num_refineries)


In [18]:
# # Call the LP formulation function with the forecasted values and distance matrix
# biomass_transport, pellet_transport = biomass_lp_forecast(biomass_forecast_df, distance_matrix_df)