# ***Building RAYA ‚Äî The Architect of Type 1 Civilization***

 ‚ÄúDriving humanity toward a sustainable and intelligent civilization.‚Äù


 ### **My Responsibility :-**
 *As ***Predictive Analytics*** leading the creation of RAYA‚Äôs ***Human Sustainability Index*** which think of it as the brain that combines all other modules ‚Äî water, energy, climate ‚Äî to measure how sustainable human civilization is at a ***city level***.*


---

### üåç **Problem Statement: "The Human Sustainability Index (HSI) ‚Äî AI for a Livable Future"**

Humanity‚Äôs next challenge isn‚Äôt just about progress ‚Äî it‚Äôs about survival.
Imagine an intelligent mirror that reveals how sustainable your city truly is ‚Äî not just today, but decades into the future.

The **Human Sustainability Index (HSI)** uses AI to measure and forecast the delicate balance between humans and nature ‚Äî tracking water, energy, climate, and resource stability.
By identifying vulnerable regions and emerging trends, HSI empowers smarter policies, resilient communities, and a more livable planet for generations to come.



In [None]:
import pandas as pd
import numpy as np

data = pd.read_csv('HSI data.csv')

data.tail()

Unnamed: 0,State,City,WaterAvailabilityIndex,EnergyStabilityIndex,PopulationDensity,TemperatureVariation,PollutionLevel
995,Gujarat,Surat,0,1,395854,7,153
996,Gujarat,Ahmedabad,0,1,577808,6,68
997,West Bengal,Kolkata,1,1,658398,10,18
998,Karnataka,Bengaluru,1,0,489858,7,72
999,Uttar Pradesh,Lucknow,0,0,658293,12,177


In [None]:
df_test = data.copy()

In [None]:
from sklearn.preprocessing import MinMaxScaler

# Select numeric columns
scaler = MinMaxScaler()
data[["PopulationDensity", "TemperatureVariation", "PollutionLevel"]] = scaler.fit_transform(
    data[["PopulationDensity", "TemperatureVariation", "PollutionLevel"]]
)


In [None]:
data["ClimateIndex"] = (data["PollutionLevel"]) * 0.5 + (data["TemperatureVariation"]) * 0.5


In [None]:
data["HSI_score"] = (
    0.3 * data["WaterAvailabilityIndex"] +
    0.3 * data["EnergyStabilityIndex"] +
    0.2 * data["ClimateIndex"] +
    0.2 * data["PopulationDensity"]
)

data["HSI"] = np.where(data["HSI_score"] >= 0.5, 1, 0)


In [None]:
data.head()

Unnamed: 0,State,City,WaterAvailabilityIndex,EnergyStabilityIndex,PopulationDensity,TemperatureVariation,PollutionLevel,ClimateIndex,HSI_score,HSI
0,Telangana,Warangal,0,0,0.579064,0.294118,0.195767,0.244942,0.164801,0
1,Tamil Nadu,Coimbatore,1,1,0.551183,0.0,0.301587,0.150794,0.740395,1
2,West Bengal,Howrah,1,1,0.423944,0.588235,0.058201,0.323218,0.749432,1
3,Rajasthan,Jaipur,1,0,0.934924,0.235294,0.666667,0.45098,0.577181,1
4,West Bengal,Durgapur,1,1,0.136263,0.529412,0.597884,0.563648,0.739982,1


In [None]:
#df.drop(['TemperatureVariation', 'PollutionLevel', 'HSI_score'], axis=1, inplace=True)

In [None]:

corr = data[["ClimateIndex", "WaterAvailabilityIndex", "EnergyStabilityIndex", "PopulationDensity", "HSI"]].corr()

import plotly.express as px

fig = px.imshow(
    corr,
    text_auto=".2f",
    color_continuous_scale='RdBu_r',
    title="Correlation Heatmap (To check the correlation between features)"
)

fig.update_layout(
    width=700,
    height=500
)

fig.show()


In [None]:
data.HSI.value_counts()

Unnamed: 0_level_0,count
HSI,Unnamed: 1_level_1
1,511
0,489


In [None]:
from sklearn.preprocessing import LabelEncoder

le_state = LabelEncoder()
le_city = LabelEncoder()

data['State_encoded'] = le_state.fit_transform(data['State'])
data['City_encoded'] = le_city.fit_transform(data['City'])


In [None]:
data.drop(['State', 'City'], axis=1, inplace=True)


In [None]:
from sklearn.model_selection import train_test_split

X = data[['State_encoded', 'City_encoded','WaterAvailabilityIndex', 'EnergyStabilityIndex', 'PopulationDensity', 'ClimateIndex']]
y = data['HSI']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [None]:
import numpy as np
import pandas as pd
from lightgbm import LGBMClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
import plotly.figure_factory as ff

# ‚úÖ Train LightGBM model (no SMOTE)
lgb_model = LGBMClassifier(
    n_estimators=300,
    learning_rate=0.05,
    max_depth=-1,
    random_state=42
)

lgb_model.fit(X_train, y_train)

# ‚úÖ Predictions
y_pred = lgb_model.predict(X_test)

# ‚úÖ Accuracy
print('----' * 16)
print(f"‚úÖ LightGBM Classifier Accuracy: {accuracy_score(y_test, y_pred):.2f}")
print('----' * 16)

# ‚úÖ Confusion matrix
cm = confusion_matrix(y_test, y_pred)
labels = ['Less Sustainable', 'Highly Sustainable']
z_text = [[str(y) for y in x] for x in cm]

# ‚úÖ Plotly confusion matrix
fig = ff.create_annotated_heatmap(
    z=cm,
    x=labels,
    y=labels,
    annotation_text=z_text,
    colorscale='teal',
    showscale=True
)

fig.update_layout(
    title_text='Confusion Matrix - LightGBM Classifier',
    width=550,
    height=500
)

fig['data'][0]['showscale'] = True
fig.show()


[LightGBM] [Info] Number of positive: 406, number of negative: 394
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000260 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 552
[LightGBM] [Info] Number of data points in the train set: 800, number of used features: 6
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.507500 -> initscore=0.030002
[LightGBM] [Info] Start training from score 0.030002
----------------------------------------------------------------
‚úÖ LightGBM Classifier Accuracy: 0.98
----------------------------------------------------------------


**LightGBM Classifier : Confusion Matrix Results**


   * *True positive ("Less Sustainable" and predicted value "Less Sustainable") True prediction ------ 104*


   * *True negative ("Highly Sustainable" and predicted value "Highly Sustainable") True prediction ------ 92*


   * *False positive ("Highly Sustainable" and predicted value "Less Sustainable") Type-1 error ------- 0*


   * *False negative ("Less Sustainable" and predicted value "Highly Sustainable") Type-2 error ------ 4*

In [None]:
new_df = df_test[['PopulationDensity', 'TemperatureVariation', 'PollutionLevel']]

# ‚úÖ 2Ô∏è‚É£ Do the SAME split
X_train, X_test, y_train, y_test, new_df_train, new_df_test = train_test_split(
    X, y, new_df, test_size=0.2, random_state=42
)



In [None]:
X_test[['PopulationDensity', 'TemperatureVariation', 'PollutionLevel']] = new_df_test.values

In [None]:
X_test['ClimateIndex'] = 0.5 * X_test['PollutionLevel'] + 0.5 * X_test['TemperatureVariation']

In [None]:
X_test['State_encoded'] = le_state.inverse_transform(X_test['State_encoded'])
X_test['City_encoded'] = le_city.inverse_transform(X_test['City_encoded'])


In [None]:
X_test['Predicted_HSI'] = y_pred

In [None]:
# HSI mapping
HSI_map = {
    1: 'Highly Sustainable',
    0: 'Less Sustainable'
}

X_test['Predicted_HSI'] = X_test['Predicted_HSI'].map(HSI_map)

#### **Final Test Data** : Ready for semi-deployment

In [None]:
X_test

Unnamed: 0,State_encoded,City_encoded,WaterAvailabilityIndex,EnergyStabilityIndex,PopulationDensity,ClimateIndex,TemperatureVariation,PollutionLevel,Predicted_HSI
521,Gujarat,Surat,1,0,478623,53.5,3,104,Less Sustainable
737,Uttar Pradesh,Varanasi,1,1,566283,46.5,7,86,Highly Sustainable
740,Madhya Pradesh,Gwalior,0,1,710874,99.5,10,189,Highly Sustainable
660,West Bengal,Kolkata,0,1,390758,33.0,17,49,Less Sustainable
411,Gujarat,Vadodara,0,1,420925,40.0,2,78,Less Sustainable
...,...,...,...,...,...,...,...,...,...
408,Karnataka,Bengaluru,0,0,770345,62.0,8,116,Less Sustainable
332,Uttar Pradesh,Kanpur,0,0,617268,58.5,2,115,Less Sustainable
208,Telangana,Warangal,0,0,468828,11.5,4,19,Less Sustainable
613,West Bengal,Durgapur,0,0,522314,50.5,10,91,Less Sustainable


In [None]:
import plotly.express as px
import pandas as pd

# --------------------------------------------
# 1Ô∏è‚É£ Group data for visualization
# --------------------------------------------
df_sunburst = (
    X_test.groupby(['State_encoded', 'City_encoded', 'Predicted_HSI'])
    .size()
    .reset_index(name='Count')
)

# Optional: label the binary HSI predictions for readability
df_sunburst['HSI_Label'] = df_sunburst['Predicted_HSI'].replace({
    1: 'Highly Sustainable',
    0: 'Low Sustainability'
})

# --------------------------------------------
# 2Ô∏è‚É£ Build Sunburst Chart
# --------------------------------------------
fig = px.sunburst(
    df_sunburst,
    path=['State_encoded', 'City_encoded', 'HSI_Label'],  # hierarchy
    values='Count',
    color='State_encoded',  # color by State
    color_discrete_sequence=px.colors.sequential.Tealgrn,
    title="üåç Human Sustainability Index (HSI) by States along with their Cities"
)

# --------------------------------------------
# 3Ô∏è‚É£ Style
# --------------------------------------------
fig.update_traces(
    textinfo='label+percent entry+value'
)

fig.update_layout(
    width=850,
    height=700,
    title_x=0.5
)

fig.show()


#### *Dropping not useful features*

In [None]:
X_test.drop(['TemperatureVariation', 'PollutionLevel'], axis=1, inplace=True)


#### **Renaming Columns**

In [None]:
X_test.columns = [
    'State', 'City', 'Water Availability Index', 'Energy Stability Index',
    'Population Density', 'Climate Index', 'Predicted HSI'
]

### *Semi-Deployment*

In [None]:
import pandas as pd
from ipywidgets import interact, widgets, VBox


# ------------------------------------------
# üîπ Dropdown widgets
# ------------------------------------------
state_dropdown = widgets.Dropdown(
    options=sorted(X_test['State'].unique().tolist()),
    description='Select State:',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='50%')
)

city_dropdown = widgets.Dropdown(
    options=[],
    description='Select City:',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='50%')
)

output = widgets.Output()

# ------------------------------------------
# üîÅ Update cities dynamically
# ------------------------------------------
def update_cities(*args):
    selected_state = state_dropdown.value
    filtered_cities = X_test[X_test['State'] == selected_state]['City'].unique().tolist()
    city_dropdown.options = sorted(filtered_cities)
    if filtered_cities:
        city_dropdown.value = filtered_cities[0]  # Default to first city
        show_city_info(None)  # ‚úÖ Show results immediately after state change

state_dropdown.observe(update_cities, 'value')

# ------------------------------------------
# üìä Display selected city data
# ------------------------------------------
def show_city_info(change):
    with output:
        output.clear_output()
        selected_state = state_dropdown.value
        selected_city = city_dropdown.value

        row = X_test[(X_test['State'] == selected_state) & (X_test['City'] == selected_city)]
        if row.empty:
            print("No data found for this city.")
            return

        row = row.iloc[0]
        print(f"üèôÔ∏è City: {selected_city}, State: {selected_state}\n")
        print(f"üíß Water Availability Index: {row['Water Availability Index']}")
        print(f"‚ö° Energy Stability Index: {row['Energy Stability Index']}")
        print(f"üë• Population Density: {row['Population Density']}")
        print(f"üå°Ô∏è Climate Index: {row['Climate Index']}")
        print(f"üåç Predicted HSI Category: {row['Predicted HSI']}")

city_dropdown.observe(show_city_info, 'value')

# ------------------------------------------
# üöÄ Initialize and display
# ------------------------------------------
update_cities()  # ‚úÖ Run once to initialize first state + city data
display(VBox([state_dropdown, city_dropdown, output]))


VBox(children=(Dropdown(description='Select State:', layout=Layout(width='50%'), options=('Delhi', 'Gujarat', ‚Ä¶


---

### üåç **Future Work: Towards Intelligent Sustainability for Human Life**

The next evolution of HSI prediction goes beyond numbers ‚Äî it‚Äôs about **empowering decisions for a sustainable future**.

Future work will expand **feature diversity**, build **impact-driven models** leveraging **Machine Learning, Deep Learning, NLP, AI/LLM and GenAI**, and quantify the **real-world uncertainty** behind sustainability trends.

At the final stage, an **LLM-powered intelligence layer** will transform complex data into **human-readable insights and localized recommendations**, helping **states and cities** enhance livability, align human progress with nature, and take smarter steps toward a **truly sustainable civilization**. ‚ú®



---

