<a href="https://colab.research.google.com/github/sanjanabayya30/Proj/blob/main/13_07_2005.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
!pip install stable-baselines3 gym



In [10]:
# Green Hydrogen Site Selection in India using PPO

import pandas as pd
import numpy as np
import gymnasium as gym # Changed import
from gymnasium import spaces # Changed import
from sklearn.preprocessing import MinMaxScaler
from stable_baselines3 import PPO
from stable_baselines3.common.env_checker import check_env

# ------------------------
# Step 1: Load Real Dataset (Simulated Structure)
# ------------------------

df = pd.read_csv("/content/synthetic_india_site_selection_dataset (1).csv")  # Replace with real CSV

# Map earthquake zones to numeric risk
zone_map = {'Zone II': 0.1, 'Zone III': 0.3, 'Zone IV': 0.6, 'Zone V': 1.0}
df['Earthquake_Risk'] = df['Earthquake_Zone'].map(zone_map)

# Normalize selected features
selected_cols = [
    'Solar_Radiation', 'Slope', 'Water_Availability_Index',
    'Land_Cost_INR_per_sqm', 'Population_Density', 'Flood_Risk_Index', 'Earthquake_Risk'
]

scaler = MinMaxScaler()
scaled_features = scaler.fit_transform(df[selected_cols])

# ------------------------
# Step 2: Define Custom PPO Environment
# ------------------------

class SiteSelectionEnv(gym.Env): # Changed inheritance
    def __init__(self, features):
        super(SiteSelectionEnv, self).__init__()
        self.features = features
        self.n = features.shape[0]
        self.current_idx = 0

        self.observation_space = spaces.Box(low=0.0, high=1.0, shape=(features.shape[1],), dtype=np.float32)
        self.action_space = spaces.Discrete(2)  # 0: Skip, 1: Select

    def reset(self, seed=None, options=None): # Changed reset return
        super().reset(seed=seed) # Added this line
        self.current_idx = 0
        return self.features[self.current_idx].astype(np.float32), {} # Changed reset return

    def step(self, action):
        terminated = False # Changed done to terminated
        truncated = False # Added truncated
        reward = 0.0

        if action == 1:
            feature = self.features[self.current_idx]
            reward = (
                + 2.0 * feature[0]       # Solar Radiation
                + 1.5 * feature[2]       # Water Availability
                - 1.0 * feature[3]       # Land Cost
                - 1.0 * feature[4]       # Population
                - 2.0 * feature[5]       # Flood Risk
                - 2.0 * feature[6]       # Earthquake Risk
                - 1.0 * feature[1]       # Slope
            )

        self.current_idx += 1
        if self.current_idx >= self.n:
            terminated = True # Changed done to terminated
            obs = np.zeros_like(self.features[0]).astype(np.float32) # Ensure dtype consistency
        else:
            obs = self.features[self.current_idx].astype(np.float32) # Ensure dtype consistency

        return obs, reward, terminated, truncated, {} # Changed step return

# ------------------------
# Step 3: Train PPO Agent
# ------------------------

env = SiteSelectionEnv(scaled_features)
check_env(env)

model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10000)

# ------------------------
# Step 4: Evaluate Trained Model
# ------------------------

selected_indices = []
obs, info = env.reset() # Changed reset call

for i in range(env.n):
    action, _ = model.predict(obs)
    # The index should be the current_idx before incrementing in the step method
    if action == 1:
        selected_indices.append(env.current_idx) # Corrected index
    obs, _, terminated, truncated, _ = env.step(action) # Changed step call
    if terminated or truncated: # Added truncated
        break

# ------------------------
# Step 5: Output Selected Sites
# ------------------------

selected_sites = df.iloc[selected_indices]
print("\nBest Selected Sites for Green Hydrogen:")
print(selected_sites[['Latitude', 'Longitude', 'Solar_Radiation', 'Land_Cost_INR_per_sqm', 'Flood_Risk_Index', 'Earthquake_Zone']])

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1e+03    |
|    ep_rew_mean     | -793     |
| time/              |          |
|    fps             | 1121     |
|    iterations      | 1        |
|    time_elapsed    | 1        |
|    total_timesteps | 2048     |
---------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1e+03       |
|    ep_rew_mean          | -734        |
| time/                   |             |
|    fps                  | 797         |
|    iterations           | 2           |
|    time_elapsed         | 5           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.017839184 |
|    clip_fraction        | 0.306       |
|    clip_range           | 0.2         |
|    entropy_loss   

In [12]:
# Green Hydrogen Site Selection in India using PPO with Place Names

import pandas as pd
import numpy as np
import gymnasium as gym # Changed import
from gymnasium import spaces # Changed import
from sklearn.preprocessing import MinMaxScaler
from stable_baselines3 import PPO
from stable_baselines3.common.env_checker import check_env
from geopy.geocoders import Nominatim
from tqdm import tqdm

# ------------------------
# Step 1: Load Real Dataset
# ------------------------

df = pd.read_csv("/content/synthetic_india_site_selection_dataset (1).csv")  # Replace with your CSV

# Map earthquake zones to numeric risk
zone_map = {'Zone II': 0.1, 'Zone III': 0.3, 'Zone IV': 0.6, 'Zone V': 1.0}
df['Earthquake_Risk'] = df['Earthquake_Zone'].map(zone_map)

# Normalize selected features
selected_cols = [
    'Solar_Radiation', 'Slope', 'Water_Availability_Index',
    'Land_Cost_INR_per_sqm', 'Population_Density', 'Flood_Risk_Index', 'Earthquake_Risk'
]

scaler = MinMaxScaler()
scaled_features = scaler.fit_transform(df[selected_cols])

# ------------------------
# Step 2: Define Custom PPO Environment
# ------------------------

class SiteSelectionEnv(gym.Env): # Changed inheritance
    def __init__(self, features):
        super(SiteSelectionEnv, self).__init__()
        self.features = features
        self.n = features.shape[0]
        self.current_idx = 0

        self.observation_space = spaces.Box(low=0.0, high=1.0, shape=(features.shape[1],), dtype=np.float32)
        self.action_space = spaces.Discrete(2)  # 0: Skip, 1: Select

    def reset(self, seed=None, options=None): # Changed reset return
        super().reset(seed=seed) # Added this line
        self.current_idx = 0
        return self.features[self.current_idx].astype(np.float32), {} # Changed reset return

    def step(self, action):
        terminated = False # Changed done to terminated
        truncated = False # Added truncated
        reward = 0.0

        if action == 1:
            feature = self.features[self.current_idx]
            reward = (
                + 2.0 * feature[0]       # Solar Radiation
                + 1.5 * feature[2]       # Water Availability
                - 1.0 * feature[3]       # Land Cost
                - 1.0 * feature[4]       # Population
                - 2.0 * feature[5]       # Flood Risk
                - 2.0 * feature[6]       # Earthquake Risk
                - 1.0 * feature[1]       # Slope
            )

        self.current_idx += 1
        if self.current_idx >= self.n:
            terminated = True # Changed done to terminated
            obs = np.zeros_like(self.features[0]).astype(np.float32) # Ensure dtype consistency
        else:
            obs = self.features[self.current_idx].astype(np.float32) # Ensure dtype consistency

        return obs, reward, terminated, truncated, {} # Changed step return

# ------------------------
# Step 3: Train PPO Agent
# ------------------------

env = SiteSelectionEnv(scaled_features)
check_env(env)

model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10000)

# ------------------------
# Step 4: Evaluate Trained Model
# ------------------------

selected_indices = []
obs, info = env.reset() # Changed reset call

for i in range(env.n):
    action, _ = model.predict(obs)
    # The index should be the current_idx before incrementing in the step method
    if action == 1:
        selected_indices.append(env.current_idx) # Corrected index
    obs, _, terminated, truncated, _ = env.step(action) # Changed step call
    if terminated or truncated: # Added truncated
        break

# ------------------------
# Step 5: Output Selected Sites with Place Names
# ------------------------

selected_sites = df.iloc[selected_indices].copy()

# Reverse geocode selected sites
geolocator = Nominatim(user_agent="green_hydrogen_selector")

def get_place_name(lat, lon):
    try:
        location = geolocator.reverse((lat, lon), language='en', timeout=10)
        if location and 'address' in location.raw:
            address = location.raw['address']
            return address.get('village') or address.get('town') or address.get('city') or \
                   address.get('county') or address.get('state_district') or address.get('state')
        return "Unknown"
    except:
        return "Error"

tqdm.pandas()
selected_sites['Place_Name'] = selected_sites.progress_apply(lambda row: get_place_name(row['Latitude'], row['Longitude']), axis=1)

# Display final selected sites with place names
print("\nBest Selected Sites for Green Hydrogen:")
print(selected_sites[['Latitude', 'Longitude', 'Place_Name', 'Solar_Radiation', 'Water_Availability_Index', 'Flood_Risk_Index', 'Earthquake_Zone']])

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1e+03    |
|    ep_rew_mean     | -810     |
| time/              |          |
|    fps             | 944      |
|    iterations      | 1        |
|    time_elapsed    | 2        |
|    total_timesteps | 2048     |
---------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1e+03       |
|    ep_rew_mean          | -718        |
| time/                   |             |
|    fps                  | 691         |
|    iterations           | 2           |
|    time_elapsed         | 5           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.018803611 |
|    clip_fraction        | 0.33        |
|    clip_range           | 0.2         |
|    entropy_loss   

100%|██████████| 128/128 [02:07<00:00,  1.00it/s]


Best Selected Sites for Green Hydrogen:
      Latitude  Longitude           Place_Name  Solar_Radiation  \
8    24.834008  75.365206                Kojya         4.163415   
16   15.779388  73.326973                 None         4.868358   
17   22.505071  84.754384              Jaldega         6.249583   
34   35.951777  70.270119               Warsaj         5.521318   
48   23.174664  93.667917       Falam District         5.358846   
..         ...        ...                  ...              ...   
959  29.010025  91.176723                Gyiru         6.017677   
963  18.373516  79.323072       Konadapalakala         6.446314   
976  24.649877  75.649254  Gandhi Sagar Colony         6.097493   
979  24.703707  92.837040              Silchar         6.425643   
983   8.791702  82.247582              Unknown         6.042820   

     Water_Availability_Index  Flood_Risk_Index Earthquake_Zone  
8                    0.056626          0.407739        Zone III  
16                   0




In [16]:
import pandas as pd
import numpy as np
import gymnasium as gym
from gymnasium import spaces
from sklearn.preprocessing import MinMaxScaler
from stable_baselines3 import PPO
from stable_baselines3.common.env_checker import check_env
from geopy.geocoders import Nominatim
from tqdm import tqdm
import folium
import plotly.express as px
import os

# ------------------------
# Step 1: Load and Preprocess Dataset
# ------------------------
def load_and_preprocess_data(file_path):
    df = pd.read_csv("/content/synthetic_india_site_selection_dataset (1).csv")

    # Map earthquake zones to numeric risk
    zone_map = {'Zone II': 0.1, 'Zone III': 0.3, 'Zone IV': 0.6, 'Zone V': 1.0}
    df['Earthquake_Risk'] = df['Earthquake_Zone'].map(zone_map)

    # Normalize selected features
    selected_cols = [
        'Solar_Radiation', 'Slope', 'Water_Availability_Index',
        'Land_Cost_INR_per_sqm', 'Population_Density', 'Flood_Risk_Index', 'Earthquake_Risk'
    ]

    scaler = MinMaxScaler()
    scaled_features = scaler.fit_transform(df[selected_cols])

    return df, scaled_features, selected_cols

# ------------------------
# Step 2: Define Custom PPO Environment
# ------------------------
class SiteSelectionEnv(gym.Env):
    def __init__(self, features):
        super(SiteSelectionEnv, self).__init__()
        self.features = features
        self.n = features.shape[0]
        self.current_idx = 0

        self.observation_space = spaces.Box(
            low=0.0, high=1.0, shape=(features.shape[1],), dtype=np.float32
        )
        self.action_space = spaces.Discrete(2)  # 0: Skip, 1: Select

    def reset(self, seed=None, options=None):
        super().reset(seed=seed)
        self.current_idx = 0
        return self.features[self.current_idx].astype(np.float32), {}

    def step(self, action):
        terminated = False
        truncated = False
        reward = 0.0

        if action == 1:
            feature = self.features[self.current_idx]
            reward = (
                + 2.0 * feature[0]       # Solar Radiation
                + 1.5 * feature[2]       # Water Availability
                - 1.0 * feature[3]       # Land Cost
                - 1.0 * feature[4]       # Population
                - 2.0 * feature[5]       # Flood Risk
                - 2.0 * feature[6]       # Earthquake Risk
                - 1.0 * feature[1]       # Slope
            )

        self.current_idx += 1
        if self.current_idx >= self.n:
            terminated = True
            obs = np.zeros_like(self.features[0]).astype(np.float32)
        else:
            obs = self.features[self.current_idx].astype(np.float32)

        return obs, reward, terminated, truncated, {}

# ------------------------
# Step 3: Geocoding Function
# ------------------------
def get_place_name(lat, lon, geolocator):
    try:
        location = geolocator.reverse((lat, lon), language='en', timeout=10)
        if location and 'address' in location.raw:
            address = location.raw['address']
            return address.get('village') or address.get('town') or address.get('city') or \
                   address.get('county') or address.get('state_district') or address.get('state')
        return "Unknown"
    except:
        return "Error"

# ------------------------
# Step 4: Create HTML Map
# ------------------------
def create_map(selected_sites, output_file="top_green_hydrogen_sites_map.html"):
    # Create map centered on India
    india_map = folium.Map(location=[20.5937, 78.9629], zoom_start=5)

    # Add markers: green for selected sites, red for others
    for _, row in selected_sites.iterrows():
        folium.Marker(
            location=[row['Latitude'], row['Longitude']],
            popup=f"{row['Place_Name']}<br>Solar: {row['Solar_Radiation']:.2f}<br>Water: {row['Water_Availability_Index']:.2f}",
            icon=folium.Icon(color='green')
        ).add_to(india_map)

    # Save map
    india_map.save(output_file)
    return output_file

# ------------------------
# Step 5: Create State-wise Bar Chart
# ------------------------
def create_state_bar_chart(selected_sites, output_file="state_distribution.html"):
    # Count sites per state
    state_counts = selected_sites['Place_Name'].value_counts().reset_index()
    state_counts.columns = ['State', 'Count']

    # Create bar chart
    fig = px.bar(state_counts, x='State', y='Count', title='State-wise Distribution of Selected Sites')
    fig.update_layout(xaxis_title="State", yaxis_title="Number of Sites") # Changed to update_layout
    fig.update_traces(marker_color='green') # Changed to update_traces
    fig.update_layout(showlegend=False) # Changed to update_layout

    # Save chart
    fig.write_html(output_file)
    return output_file

# ------------------------
# Main Execution
# ------------------------
def main():
    # Load and preprocess data
    file_path = "synthetic_india_site_selection_dataset.csv"
    df, scaled_features, selected_cols = load_and_preprocess_data(file_path)

    # Initialize environment and train PPO
    env = SiteSelectionEnv(scaled_features)
    check_env(env)

    model = PPO("MlpPolicy", env, verbose=1)
    model.learn(total_timesteps=10000)

    # Evaluate model
    selected_indices = []
    rewards = []
    obs, info = env.reset()

    for i in range(env.n):
        action, _ = model.predict(obs)
        if action == 1:
            selected_indices.append(env.current_idx)
            feature = scaled_features[env.current_idx]
            reward = (
                + 2.0 * feature[0] + 1.5 * feature[2] - 1.0 * feature[3]
                - 1.0 * feature[4] - 2.0 * feature[5] - 2.0 * feature[6] - 1.0 * feature[1]
            )
            rewards.append(reward)
        obs, reward, terminated, truncated, _ = env.step(action)
        if terminated or truncated:
            break

    # Get selected sites
    selected_sites = df.iloc[selected_indices].copy()
    selected_sites['Reward'] = rewards

    # Reverse geocode
    geolocator = Nominatim(user_agent="green_hydrogen_selector")
    tqdm.pandas()
    selected_sites['Place_Name'] = selected_sites.progress_apply(
        lambda row: get_place_name(row['Latitude'], row['Longitude'], geolocator), axis=1
    )

    # Display top 30 sites
    top_30 = selected_sites.sort_values('Reward', ascending=False).head(30)
    print("\nTop 30 Selected Sites for Green Hydrogen:")
    print(top_30[['Latitude', 'Longitude', 'Place_Name', 'Solar_Radiation',
                  'Water_Availability_Index', 'Flood_Risk_Index', 'Earthquake_Zone', 'Reward']])

    # Create visualizations
    map_file = create_map(selected_sites)
    chart_file = create_state_bar_chart(selected_sites)

    print(f"\nGenerated HTML map: {map_file}")
    print(f"Generated state distribution chart: {chart_file}")

if __name__ == "__main__":
    main()

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 1e+03    |
|    ep_rew_mean     | -822     |
| time/              |          |
|    fps             | 1049     |
|    iterations      | 1        |
|    time_elapsed    | 1        |
|    total_timesteps | 2048     |
---------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -726       |
| time/                   |            |
|    fps                  | 760        |
|    iterations           | 2          |
|    time_elapsed         | 5          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.01903437 |
|    clip_fraction        | 0.323      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.67

100%|██████████| 154/154 [02:33<00:00,  1.00it/s]



Top 30 Selected Sites for Green Hydrogen:
      Latitude  Longitude                 Place_Name  Solar_Radiation  \
925   7.185314  70.743224                    Unknown         6.254850   
142  21.666079  88.001477                       None         5.775607   
165  31.976726  92.130027                    Serchen         6.447762   
938  10.091408  82.698543                    Unknown         6.380554   
479  22.454484  80.849708             Bichhya Tahsil         5.985527   
802  24.076795  78.420917                      Golni         6.359254   
699  21.174773  84.211121                   Charamal         5.932304   
872  26.141986  82.922654                    Phulpur         6.231396   
205   6.780510  90.727019                    Unknown         6.295191   
286  20.700740  70.592062                     Amreli         6.428832   
744  13.582487  80.276543                       None         5.556034   
432  25.355651  82.420974                    Gyanpur         6.460033   
290   7.

In [18]:
from google.colab import files

# Download link will appear to download and open manually
files.download('/content/top_green_hydrogen_sites_map.html')


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [20]:
!pip install streamlit

Collecting streamlit
  Downloading streamlit-1.46.1-py3-none-any.whl.metadata (9.0 kB)
Collecting watchdog<7,>=2.1.5 (from streamlit)
  Downloading watchdog-6.0.0-py3-none-manylinux2014_x86_64.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.3/44.3 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
Collecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)
Downloading streamlit-1.46.1-py3-none-any.whl (10.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.1/10.1 MB[0m [31m46.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pydeck-0.9.1-py2.py3-none-any.whl (6.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m64.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading watchdog-6.0.0-py3-none-manylinux2014_x86_64.whl (79 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.1/79.1 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[?25hI

In [25]:
import pandas as pd
import numpy as np
import gymnasium as gym
from gymnasium import spaces
from sklearn.preprocessing import MinMaxScaler
from stable_baselines3 import PPO
from stable_baselines3.common.callbacks import EvalCallback
import geopandas as gpd # Import geopandas
from shapely.geometry import Point # Import Point
from geopy.distance import geodesic
from joblib import Parallel, delayed
import folium
import plotly.express as px
import argparse
import logging
import streamlit as st
import ee
import os
from tqdm import tqdm
import sys

# Set up logging
logging.basicConfig(filename='site_selection.log', level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# ------------------------
# Step 1: Data Validation and Preprocessing
# ------------------------
def validate_and_preprocess_data(file_path):
    try:
        df = pd.read_csv("/content/synthetic_india_site_selection_dataset (1).csv")
        logging.info(f"Loaded dataset: {file_path}")
    except FileNotFoundError:
        logging.error(f"Dataset file not found: {file_path}")
        raise

    # Removed 'Wind_Speed' as it's not in the dataset
    required_cols = ['Latitude', 'Longitude', 'Solar_Radiation', 'Water_Availability_Index',
                     'Land_Cost_INR_per_sqm', 'Population_Density', 'Flood_Risk_Index', 'Earthquake_Zone']
    for col in required_cols:
        if col not in df.columns:
            raise ValueError(f"Missing required column: {col}")

    # Validate coordinates (India bounds: 8-35N, 68-97E)
    df = df[(df['Latitude'].between(8, 35)) & (df['Longitude'].between(68, 97))]
    if df.empty:
        raise ValueError("No valid coordinates within India bounds")

    # Map earthquake zones to numeric risk
    zone_map = {'Zone II': 0.1, 'Zone III': 0.3, 'Zone IV': 0.6, 'Zone V': 1.0}
    df['Earthquake_Risk'] = df['Earthquake_Zone'].map(zone_map).fillna(0.3)  # Default to Zone III if missing

    # Add infrastructure distance (placeholder for ports data)
    ports = pd.DataFrame({'lat': [21.0, 22.0], 'lon': [72.0, 88.0]})  # Replace with actual ports dataset
    df['Distance_to_Port'] = df.apply(
        lambda row: min(geodesic((row['Latitude'], row['Longitude']), (p['lat'], p['lon'])).km
                        for _, p in ports.iterrows()), axis=1)

    # Removed 'Wind_Speed' from selected_cols
    selected_cols = ['Solar_Radiation', 'Slope', 'Water_Availability_Index', 'Land_Cost_INR_per_sqm',
                     'Population_Density', 'Flood_Risk_Index', 'Earthquake_Risk', 'Distance_to_Port']
    scaler = MinMaxScaler()
    scaled_features = scaler.fit_transform(df[selected_cols])

    return df, scaled_features, selected_cols

# ------------------------
# Step 2: Land Cover Filtering (Google Earth Engine)
# ------------------------
def filter_land_cover(df):
    try:
        ee.Initialize()
        def check_land_cover(lat, lon):
            point = ee.Geometry.Point(lon, lat)
            land_cover = ee.Image('ESA/WorldCover/v100').select('Map').reduceRegion(
                reducer=ee.Reducer.first(), geometry=point, scale=10).get('Map')
            return land_cover.getInfo() not in [40, 50, 80]  # Exclude forests, wetlands, water
        df['Suitable_Land'] = Parallel(n_jobs=-1)(
            delayed(check_land_cover)(row['Latitude'], row['Longitude']) for _, row in df.iterrows())
        return df[df['Suitable_Land']].drop(columns=['Suitable_Land'])
    except Exception as e:
        logging.warning(f"Land cover filtering failed: {e}. Proceeding without filtering.")
        return df

# ------------------------
# Step 3: Geocoding with Shapefile
# ------------------------
def assign_state(df, shapefile_path):
    try:
        gdf = gpd.GeoDataFrame(df, geometry=[Point(xy) for xy in zip(df.Longitude, df.Latitude)], crs="EPSG:4326")
        states = gpd.read_file(shapefile_path)
        df['State'] = gdf.geometry.apply(
            lambda x: states[states.geometry.contains(x)]['NAME_1'].iloc[0] if any(states.geometry.contains(x)) else 'Unknown')
        return df
    except Exception as e:
        logging.error(f"Geocoding failed: {e}. Using Nominatim as fallback.")
        geolocator = Nominatim(user_agent="green_hydrogen_selector")
        df['State'] = df.progress_apply(
            lambda row: get_place_name(row['Latitude'], row['Longitude'], geolocator), axis=1)
        return df

# ------------------------
# Step 4: PPO Environment
# ------------------------
class SiteSelectionEnv(gym.Env):
    def __init__(self, features, weights):
        super().__init__()
        self.features = features
        self.weights = weights
        self.n = features.shape[0]
        self.current_idx = 0
        self.observation_space = spaces.Box(low=0, high=1, shape=(features.shape[1],), dtype=np.float32)
        self.action_space = spaces.Discrete(2)  # 0: Skip, 1: Select

    def reset(self, seed=None, options=None):
        super().reset(seed=seed)
        self.current_idx = 0
        return self.features[self.current_idx].astype(np.float32), {}

    def step(self, action):
        terminated = False
        truncated = False
        reward = 0.0
        if action == 1:
            feature = self.features[self.current_idx]
            reward = sum(w * f for w, f in zip(self.weights, feature))
            # Non-linear penalty for low solar radiation
            if feature[0] < 0.3:  # Below 30% of max solar
                reward -= 0.5
        self.current_idx += 1
        if self.current_idx >= self.n:
            terminated = True
            obs = np.zeros_like(self.features[0]).astype(np.float32)
        else:
            obs = self.features[self.current_idx].astype(np.float32)
        return obs, reward, terminated, truncated, {}

# ------------------------
# Step 5: Visualizations
# ------------------------
def create_map(df, selected_sites, output_file="top_green_hydrogen_sites_map.html"):
    india_map = folium.Map(location=[20.5937, 78.9629], zoom_start=5)
    for _, row in selected_sites.iterrows():
        folium.Marker(
            location=[row['Latitude'], row['Longitude']],
            popup=f"{row['State']}<br>Solar: {row['Solar_Radiation']:.2f}<br>Water: {row['Water_Availability_Index']:.2f}<br>Reward: {row['Reward']:.2f}",
            icon=folium.Icon(color='green')
        ).add_to(india_map)
    for _, row in df[~df.index.isin(selected_sites.index)].iterrows():
        folium.Marker(
            location=[row['Latitude'], row['Longitude']],
            popup=f"{row.get('State', 'Unknown')}<br>Not Selected",
            icon=folium.Icon(color='red')
        ).add_to(india_map)
    india_map.save(output_file)
    return output_file

def create_state_bar_chart(selected_sites, output_file="state_distribution.html"):
    state_counts = selected_sites['State'].value_counts().reset_index()
    state_counts.columns = ['State', 'Count']
    fig = px.bar(state_counts, x='State', y='Count', title='State-wise Distribution of Selected Sites')
    fig.update_traces(marker_color='green')
    fig.update_layout(xaxis_title="State", yaxis_title="Number of Sites", showlegend=False)
    fig.write_html(output_file)
    return output_file

def create_scatter_plot(selected_sites, output_file="scatter_plot.html"):
    fig = px.scatter(selected_sites, x='Solar_Radiation', y='Land_Cost_INR_per_sqm', color='Reward',
                     hover_data=['State', 'Water_Availability_Index'], title='Solar Radiation vs. Land Cost')
    fig.write_html(output_file)
    return output_file

# ------------------------
# Step 6: Streamlit GUI
# ------------------------
def run_gui():
    st.title("Green Hydrogen Site Selection")
    # Use default values or provide inputs in the Streamlit app
    file_path = st.text_input("Dataset Path", 'synthetic_india_site_selection_dataset.csv')
    solar_weight = st.slider("Solar Radiation Weight", 0.0, 1.0, 0.3)
    # Removed wind_weight as Wind_Speed is not in dataset
    water_weight = st.slider("Water Availability Weight", 0.0, 1.0, 0.25)
    top_n = st.number_input("Number of Top Sites", min_value=1, value=30)
    shapefile = st.text_input("Shapefile Path", 'india_states.shp')

    if st.button("Run Analysis"):
        # Adjusted weights list to match remaining features
        main(file_path, [solar_weight, water_weight, -0.15, -0.1, -0.1, -0.1, -0.1], top_n, shapefile)

# ------------------------
# Main Execution
# ------------------------
def main(file_path, weights, top_n, shapefile):
    # Load and preprocess data
    df, scaled_features, selected_cols = validate_and_preprocess_data(file_path)

    # Filter land cover (requires Google Earth Engine setup)
    df = filter_land_cover(df)

    # Assign states
    df = assign_state(df, shapefile)

    # Train PPO
    env = SiteSelectionEnv(scaled_features, weights)
    check_env(env)
    model = PPO("MlpPolicy", env, verbose=1)
    eval_callback = EvalCallback(env, eval_freq=1000, deterministic=True)
    model.learn(total_timesteps=10000, callback=eval_callback)

    # Evaluate model
    selected_indices, rewards = [], []
    obs, _ = env.reset()
    for i in range(env.n):
        action, _ = model.predict(obs)
        if action == 1:
            selected_indices.append(env.current_idx)
            feature = scaled_features[env.current_idx]
            reward = sum(w * f for w, f in zip(weights, env.features[env.current_idx]))
            if env.features[env.current_idx][0] < 0.3:
                reward -= 0.5
            rewards.append(reward)
        obs, reward, terminated, truncated, _ = env.step(action)
        if terminated or truncated:
            break

    # Process results
    selected_sites = df.iloc[selected_indices].copy()
    selected_sites['Reward'] = rewards

    # Reverse geocode
    geolocator = Nominatim(user_agent="green_hydrogen_selector")
    tqdm.pandas()
    selected_sites['Place_Name'] = selected_sites.progress_apply(
        lambda row: get_place_name(row['Latitude'], row['Longitude'], geolocator), axis=1
    )
    # Update State column with Place_Name if State is Unknown
    selected_sites['State'] = selected_sites.apply(lambda row: row['Place_Name'] if row['State'] == 'Unknown' else row['State'], axis=1)


    # Outputs
    top_n_sites = selected_sites.sort_values('Reward', ascending=False).head(top_n)
    top_n_sites.to_csv("top_green_hydrogen_sites.csv", index=False)

    # Convert to GeoDataFrame before saving to GeoJSON
    top_n_sites_gdf = gpd.GeoDataFrame(
        top_n_sites, geometry=gpd.points_from_xy(top_n_sites.Longitude, top_n_sites.Latitude)
    )
    top_n_sites_gdf.to_file("top_green_hydrogen_sites.geojson", driver='GeoJSON')


    map_file = create_map(df, top_n_sites)
    chart_file = create_state_bar_chart(top_n_sites)
    scatter_file = create_scatter_plot(top_n_sites)

    print(f"\nTop {top_n} Selected Sites for Green Hydrogen:")
    print(top_n_sites[['Latitude', 'Longitude', 'State', 'Solar_Radiation',
                       'Water_Availability_Index', 'Flood_Risk_Index', 'Earthquake_Zone', 'Reward']])
    print(f"Generated HTML map: {map_file}")
    print(f"Generated state distribution chart: {chart_file}")
    print(f"Generated scatter plot: {scatter_file}")
    print("Exported top sites to top_green_hydrogen_sites.csv and top_green_hydrogen_sites.geojson")

# ------------------------
# Unit Tests
# ------------------------
import unittest
class TestSiteSelection(unittest.TestCase):
    def test_reward_calculation(self):
        feature = np.array([0.8, 0.9, 0.3, 0.1, 0.1, 0.1, 0.2]) # Adjusted feature array
        weights = [0.3, 0.25, -0.15, -0.1, -0.1, -0.1, -0.1] # Adjusted weights
        reward = sum(w * f for w, f in zip(weights, feature))
        self.assertAlmostEqual(reward, 0.215, places=3) # Adjusted expected reward

# ------------------------
# Execution in Colab or Script
# ------------------------
if __name__ == "__main__":
    # Check if running in Colab
    if 'COLAB_GPU' in os.environ or 'COLAB_TPU_ADDR' in os.environ or 'google.colab' in sys.modules:
        # In Colab, call main directly with desired parameters
        # Removed wind_weight from weights
        main(
            file_path='/content/synthetic_india_site_selection_dataset (1).csv',
            weights=[0.3, 0.25, -0.15, -0.1, -0.1, -0.1, -0.1], # Example weights
            top_n=30, # Example top_n
            shapefile='india_states.shp' # Example shapefile path
        )
        # Optionally run tests after execution in Colab
        # unittest.main(argv=['first-arg-is-ignored'], exit=False)
    else:
        # When running as a script, use argparse
        parser = argparse.ArgumentParser(description='Green Hydrogen Site Selection')
        parser.add_argument('--file-path', default='synthetic_india_site_selection_dataset.csv', help='Dataset file path')
        parser.add_argument('--solar-weight', type=float, default=0.3, help='Weight for solar radiation')
        parser.add_argument('--water-weight', type=float, default=0.25, help='Weight for water availability') # Removed wind_weight argument
        parser.add_argument('--top-n', type=int, default=30, help='Number of top sites')
        parser.add_argument('--shapefile', default='india_states.shp', help='Path to India states shapefile')
        args = parser.parse_args()
        # Adjusted weights list for command-line execution
        main(args.file_path, [args.solar_weight, args.water_weight, -0.15, -0.1, -0.1, -0.1, -0.1], args.top_n, args.shapefile)
        unittest.main(argv=['first-arg-is-ignored'], exit=False)


earthengine authenticate

in your command line, or ee.Authenticate() in Python, and then retry.. Proceeding without filtering.
ERROR:root:Geocoding failed: india_states.shp: No such file or directory. Using Nominatim as fallback.
100%|██████████| 859/859 [14:18<00:00,  1.00it/s]


Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.



Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.



Eval num_timesteps=1000, episode_reward=-38.93 +/- 0.00
Episode length: 859.00 +/- 0.00
---------------------------------
| eval/              |          |
|    mean_ep_length  | 859      |
|    mean_reward     | -38.9    |
| time/              |          |
|    total_timesteps | 1000     |
---------------------------------
New best mean reward!



Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.



Eval num_timesteps=2000, episode_reward=-38.93 +/- 0.00
Episode length: 859.00 +/- 0.00
---------------------------------
| eval/              |          |
|    mean_ep_length  | 859      |
|    mean_reward     | -38.9    |
| time/              |          |
|    total_timesteps | 2000     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 930      |
|    ep_rew_mean     | -58.6    |
| time/              |          |
|    fps             | 290      |
|    iterations      | 1        |
|    time_elapsed    | 7        |
|    total_timesteps | 2048     |
---------------------------------



Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.



Eval num_timesteps=3000, episode_reward=41.84 +/- 0.00
Episode length: 859.00 +/- 0.00
----------------------------------------
| eval/                   |            |
|    mean_ep_length       | 859        |
|    mean_reward          | 41.8       |
| time/                   |            |
|    total_timesteps      | 3000       |
| train/                  |            |
|    approx_kl            | 0.01012981 |
|    clip_fraction        | 0.124      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.684     |
|    explained_variance   | -0.0332    |
|    learning_rate        | 0.0003     |
|    loss                 | 0.173      |
|    n_updates            | 10         |
|    policy_gradient_loss | -0.0197    |
|    value_loss           | 0.478      |
----------------------------------------
New best mean reward!



Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.



Eval num_timesteps=4000, episode_reward=41.84 +/- 0.00
Episode length: 859.00 +/- 0.00
---------------------------------
| eval/              |          |
|    mean_ep_length  | 859      |
|    mean_reward     | 41.8     |
| time/              |          |
|    total_timesteps | 4000     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 965      |
|    ep_rew_mean     | -56.1    |
| time/              |          |
|    fps             | 247      |
|    iterations      | 2        |
|    time_elapsed    | 16       |
|    total_timesteps | 4096     |
---------------------------------



Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.



Eval num_timesteps=5000, episode_reward=41.33 +/- 0.00
Episode length: 859.00 +/- 0.00
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 859         |
|    mean_reward          | 41.3        |
| time/                   |             |
|    total_timesteps      | 5000        |
| train/                  |             |
|    approx_kl            | 0.016723003 |
|    clip_fraction        | 0.206       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.65       |
|    explained_variance   | -0.00673    |
|    learning_rate        | 0.0003      |
|    loss                 | 0.143       |
|    n_updates            | 20          |
|    policy_gradient_loss | -0.0277     |
|    value_loss           | 0.381       |
-----------------------------------------



Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.



Eval num_timesteps=6000, episode_reward=41.33 +/- 0.00
Episode length: 859.00 +/- 0.00
---------------------------------
| eval/              |          |
|    mean_ep_length  | 859      |
|    mean_reward     | 41.3     |
| time/              |          |
|    total_timesteps | 6000     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 976      |
|    ep_rew_mean     | -45.2    |
| time/              |          |
|    fps             | 237      |
|    iterations      | 3        |
|    time_elapsed    | 25       |
|    total_timesteps | 6144     |
---------------------------------



Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.



Eval num_timesteps=7000, episode_reward=43.51 +/- 0.00
Episode length: 859.00 +/- 0.00
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 859         |
|    mean_reward          | 43.5        |
| time/                   |             |
|    total_timesteps      | 7000        |
| train/                  |             |
|    approx_kl            | 0.020511404 |
|    clip_fraction        | 0.155       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.584      |
|    explained_variance   | 0.0191      |
|    learning_rate        | 0.0003      |
|    loss                 | 0.146       |
|    n_updates            | 30          |
|    policy_gradient_loss | -0.0217     |
|    value_loss           | 0.29        |
-----------------------------------------
New best mean reward!



Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.



Eval num_timesteps=8000, episode_reward=43.51 +/- 0.00
Episode length: 859.00 +/- 0.00
---------------------------------
| eval/              |          |
|    mean_ep_length  | 859      |
|    mean_reward     | 43.5     |
| time/              |          |
|    total_timesteps | 8000     |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 982      |
|    ep_rew_mean     | -34.9    |
| time/              |          |
|    fps             | 233      |
|    iterations      | 4        |
|    time_elapsed    | 35       |
|    total_timesteps | 8192     |
---------------------------------



Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.



Eval num_timesteps=9000, episode_reward=45.03 +/- 0.00
Episode length: 859.00 +/- 0.00
-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 859         |
|    mean_reward          | 45          |
| time/                   |             |
|    total_timesteps      | 9000        |
| train/                  |             |
|    approx_kl            | 0.019072965 |
|    clip_fraction        | 0.12        |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.509      |
|    explained_variance   | 0.0229      |
|    learning_rate        | 0.0003      |
|    loss                 | 0.099       |
|    n_updates            | 40          |
|    policy_gradient_loss | -0.0186     |
|    value_loss           | 0.236       |
-----------------------------------------
New best mean reward!



Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.



Eval num_timesteps=10000, episode_reward=45.03 +/- 0.00
Episode length: 859.00 +/- 0.00
---------------------------------
| eval/              |          |
|    mean_ep_length  | 859      |
|    mean_reward     | 45       |
| time/              |          |
|    total_timesteps | 10000    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 986      |
|    ep_rew_mean     | -25.1    |
| time/              |          |
|    fps             | 233      |
|    iterations      | 5        |
|    time_elapsed    | 43       |
|    total_timesteps | 10240    |
---------------------------------


100%|██████████| 356/356 [05:55<00:00,  1.00it/s]



Top 30 Selected Sites for Green Hydrogen:
      Latitude  Longitude             State  Solar_Radiation  \
802  24.076795  78.420917             Golni         6.359254   
738  19.893615  78.848901      Pilkiwadhona         6.406921   
636  22.744827  79.779820          Madopani         6.415119   
394  16.894508  77.941594       Kanmankalva         6.225243   
506  34.283279  85.115085             Chabu         6.134905   
618  31.089139  84.454611              Pama         6.106780   
326  23.407799  73.372479     Modasa Taluka         6.052996   
527  14.838916  77.065054          Udegolam         6.393620   
590  10.734229  88.426866           Unknown         6.056337   
200  26.081965  90.368764          Balijana         6.442001   
75   28.734719  80.619307         Dhangadhi         6.176446   
853  33.890656  92.972173     Zhidoi County         6.295598   
311   8.892920  94.397091           Unknown         6.185556   
831  20.299405  85.755507          Chandaka         5.656085 