# **Project Samarth: Intelligent Cross-Domain Q&A Agent**

1. Project Setup and Dependencies
This section installs the necessary Python libraries and securely configures the Gemini API key using environment variables, demonstrating a best practice for handling secrets.

Project Introduction:

This Colab notebook demonstrates Project Samarth, an intelligent Q&A system built to synthesize insights across Indian Government datasets. We use the Retrieval-Augmented Generation (RAG) architecture to connect the Gemini model with two distinct data sources:

Climate Data (IMD): Sourced from the uploaded CSV (Min_Max_Seasonal_IMD_2017.csv).

Agricultural Data (Ministry of Agriculture): We will use simulated data to demonstrate the cross-domain reasoning, as the live data.gov.in API sometimes fails in specific network environments.

Our goal is to answer complex, cross-domain questions about the relationship between climate and agricultural outcomes, ensuring Accuracy & Traceability by citing data sources for every claim.

In [66]:
# Install the Google GenAI SDK, pandas for data handling, and streamlit for deployment
!pip install google-genai streamlit pandas



In [67]:
import os
import pandas as pd
from google import genai
from google.genai import types
from google.colab import userdata # Import userdata

# Set the API key using environment variable (Best Practice)
# Removed the hardcoded API key and retrieving from Colab Secrets instead
GEMINI_API_KEY = userdata.get('GOOGLE_API_KEY')
os.environ['GEMINI_API_KEY'] = GEMINI_API_KEY # Also set in environment for consistency if other parts of code rely on it

# Initialize the Gemini Client
try:
    if not GEMINI_API_KEY:
        raise ValueError("API Key not set in environment or Colab Secrets.")

    # Use the API key directly from the variable
    client = genai.Client(api_key=GEMINI_API_KEY)
    print("Gemini client initialized successfully.")
except Exception as e:
    print(f"ERROR: Failed to initialize Gemini Client. Check your setup and Colab Secrets. Details: {e}")
    client = None

Gemini client initialized successfully.


# Data Ingestion and Preparation (Phase 1)

This section loads and processes the IMD and Agricultural data, establishing the dataframes and insight summaries (our Knowledge Base).

IMD Climate Data Ingestion and Processing

The IMD Seasonal and Annual Min/Max Temp Series (1901-2017) is loaded. We calculate the ANNUAL - MEAN temperature and extract key decadal trend insights.

In [68]:
# Load the IMD temperature data
try:
    imd_df = pd.read_csv("/content/Min_Max_Seasonal_IMD_2017.csv")

    # Calculate the Annual Mean Temperature
    imd_df['ANNUAL - MEAN'] = (imd_df['ANNUAL - MIN'] + imd_df['ANNUAL - MAX']) / 2

    # Calculate Decadal Averages for trend analysis
    imd_df['Decade'] = (imd_df['YEAR'] // 10) * 10
    decadal_avg = imd_df.groupby('Decade')['ANNUAL - MEAN'].mean().reset_index()

    # --- Extract Key Climate Insights (for RAG Knowledge Base) ---
    first_decade_mean = decadal_avg.iloc[0]['ANNUAL - MEAN']
    last_decade_mean = decadal_avg.iloc[-1]['ANNUAL - MEAN']
    warming_increase = last_decade_mean - first_decade_mean
    baseline_mean = imd_df['ANNUAL - MEAN'].mean()

    imd_insights = {
        "warming_trend": f"The Annual Mean Temperature has increased by {warming_increase:.2f}°C from the early 1900s to the 2010s.",
        "hottest_decade": f"The decade from 2010-2017 recorded the highest decadal average mean temperature at {last_decade_mean:.2f}°C.",
        "baseline_temp": f"The 117-year (1901-2017) historical baseline average mean temperature for India is {baseline_mean:.2f}°C.",
        "source": "India Meteorological Department (IMD) - Seasonal and Annual Min/Max Temp Series (data.gov.in)."
    }

    print("IMD Data Processed. First 5 rows:")
    print(imd_df.head())

except FileNotFoundError:
    print("ERROR: 'Min_Max_Seasonal_IMD_2017.csv' not found.")
    imd_df = pd.DataFrame()
    imd_insights = {"source": "IMD Data Missing."}

IMD Data Processed. First 5 rows:
   YEAR  ANNUAL - MIN  ANNUAL - MAX  JAN-FEB - MIN  JAN-FEB - MAX  \
0  1901         19.51         28.96          14.16          23.27   
1  1902         19.44         29.22          13.64          25.75   
2  1903         19.25         28.47          13.87          24.24   
3  1904         19.22         28.49          13.72          23.62   
4  1905         19.03         28.30          12.81          22.25   

   MAR-MAY - MIN  MAR-MAY - MAX  JUN-SEP - MIN  JUN-SEP - MAX  OCT-DEC - MIN  \
0          20.67          31.46          23.38          31.27          16.59   
1          21.12          31.76          23.28          31.09          16.50   
2          20.25          30.71          23.40          30.92          16.29   
3          20.72          30.95          22.96          30.66          16.44   
4          19.97          30.00          23.43          31.33          16.39   

   OCT-DEC - MAX  ANNUAL - MEAN  Decade  
0          27.25         24.

In [80]:
imd_df.head()

Unnamed: 0,YEAR,ANNUAL - MIN,ANNUAL - MAX,JAN-FEB - MIN,JAN-FEB - MAX,MAR-MAY - MIN,MAR-MAY - MAX,JUN-SEP - MIN,JUN-SEP - MAX,OCT-DEC - MIN,OCT-DEC - MAX,ANNUAL - MEAN,Decade
0,1901,19.51,28.96,14.16,23.27,20.67,31.46,23.38,31.27,16.59,27.25,24.235,1900
1,1902,19.44,29.22,13.64,25.75,21.12,31.76,23.28,31.09,16.5,26.49,24.33,1900
2,1903,19.25,28.47,13.87,24.24,20.25,30.71,23.4,30.92,16.29,26.26,23.86,1900
3,1904,19.22,28.49,13.72,23.62,20.72,30.95,22.96,30.66,16.44,26.4,23.855,1900
4,1905,19.03,28.3,12.81,22.25,19.97,30.0,23.43,31.33,16.39,26.57,23.665,1900


In [81]:
imd_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 117 entries, 0 to 116
Data columns (total 13 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   YEAR           117 non-null    int64  
 1   ANNUAL - MIN   117 non-null    float64
 2   ANNUAL - MAX   117 non-null    float64
 3   JAN-FEB - MIN  117 non-null    float64
 4   JAN-FEB - MAX  117 non-null    float64
 5   MAR-MAY - MIN  117 non-null    float64
 6   MAR-MAY - MAX  117 non-null    float64
 7   JUN-SEP - MIN  117 non-null    float64
 8   JUN-SEP - MAX  117 non-null    float64
 9   OCT-DEC - MIN  117 non-null    float64
 10  OCT-DEC - MAX  117 non-null    float64
 11  ANNUAL - MEAN  117 non-null    float64
 12  Decade         117 non-null    int64  
dtypes: float64(11), int64(2)
memory usage: 12.0 KB


Agricultural Data Simulation

Since live API access can be unreliable, we are using a simulated dataset to represent the "District-wise, Season-wise Crop Production Statistics." This simulation is crucial for testing the cross-domain reasoning logic required by the challenge.

In [69]:
# --- SIMULATE CROP PRODUCTION DATA (MOCK DATA) ---
crop_data = {
    'Year': [2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2017, 2017],
    'State': ['Andhra Pradesh', 'Telangana', 'Andhra Pradesh', 'Telangana', 'Andhra Pradesh', 'Andhra Pradesh', 'Telangana', 'Andhra Pradesh', 'Telangana', 'Andhra Pradesh'],
    'District': ['Guntur', 'Kurnool', 'Guntur', 'Kurnool', 'Guntur', 'Guntur', 'Kurnool', 'Guntur', 'Kurnool', 'Nellore'],
    'Season': ['Rabi', 'Kharif', 'Rabi', 'Kharif', 'Rabi', 'Kharif', 'Kharif', 'Rabi', 'Kharif', 'Kharif'],
    'Crop': ['Rice', 'Cotton', 'Rice', 'Cotton', 'Maize', 'Rice', 'Rice', 'Maize', 'Cotton', 'Rice'],
    'Crop_Type': ['Water-Intensive', 'Drought-Tolerant', 'Water-Intensive', 'Drought-Tolerant', 'Drought-Tolerant', 'Water-Intensive', 'Water-Intensive', 'Drought-Tolerant', 'Drought-Tolerant', 'Water-Intensive'],
    'Production_Tonnes': [150000, 50000, 145000, 48000, 30000, 100000, 95000, 40000, 55000, 120000]
}
crop_df = pd.DataFrame(crop_data)
crop_insights = {"source": "Ministry of Agriculture & Farmers Welfare - Crop Production Statistics (data.gov.in - SIMULATED)."}

print("Simulated Crop Data Head:")
print(crop_df.head())

Simulated Crop Data Head:
   Year           State District  Season    Crop         Crop_Type  \
0  2010  Andhra Pradesh   Guntur    Rabi    Rice   Water-Intensive   
1  2011       Telangana  Kurnool  Kharif  Cotton  Drought-Tolerant   
2  2012  Andhra Pradesh   Guntur    Rabi    Rice   Water-Intensive   
3  2013       Telangana  Kurnool  Kharif  Cotton  Drought-Tolerant   
4  2014  Andhra Pradesh   Guntur    Rabi   Maize  Drought-Tolerant   

   Production_Tonnes  
0             150000  
1              50000  
2             145000  
3              48000  
4              30000  


# System Architecture: Retrieval-Augmented Generation (RAG)

This section defines the core logic: specialized retrieval functions for each dataset and the central Gemini-based agent for synthesizing the final answer.

RAG Core Functions

The RAG architecture is implemented via two distinct data retrieval functions (query_imd_data and query_crop_data). These functions act as intelligent data query tools, fetching only the most relevant tabular data or summary insights based on the user's natural language question. The final, central function (ask_samarth_agent) then sends the question PLUS the retrieved context to Gemini for accurate reasoning and answer generation.

In [70]:
# --- RAG Tool 1: IMD Data Retrieval ---
def query_imd_data(query: str, df: pd.DataFrame, insights: dict) -> str:
    """Retrieves climate data based on query focus (e.g., trend or recent)."""
    if "warming" in query.lower() or "baseline" in query.lower() or "decade" in query.lower():
        summary_context = "\n".join([f"- {k}: {v}" for k, v in insights.items()])
        return f"CLIMATE SUMMARY INSIGHTS:\n{summary_context}"

    # Default: provide last 5 years data
    if not df.empty:
        data_markdown = df[['YEAR', 'ANNUAL - MEAN', 'ANNUAL - MIN', 'ANNUAL - MAX']].tail(5).to_markdown(index=False)
        return f"RECENT CLIMATE DATA SNIPPET:\n{data_markdown}"
    return "IMD data not available."

# --- RAG Tool 2: Crop Data Retrieval ---
def query_crop_data(query: str, df: pd.DataFrame) -> str:
    """Retrieves specific crop/production data based on the query."""

    if any(k in query.lower() for k in ['crop_type', 'drought', 'water-intensive', 'andhra pradesh']):
        # Policy question focus: filter for specific state/crop types
        state = 'Andhra Pradesh'
        df_filtered = df[df['State'] == state]

        # Aggregate production by crop type for direct comparison
        aggregated = df_filtered.groupby('Crop_Type')['Production_Tonnes'].sum().sort_values(ascending=False).reset_index()
        aggregated['Production_Tonnes'] = aggregated['Production_Tonnes'].apply(lambda x: f"{x:,.0f} Tonnes")

        return f"CROP DATA SNIPPET (Production by Crop Type in {state}):\n{aggregated.to_markdown(index=False)}"

    # Default: List overall top production areas
    top_producers = df.groupby(['State', 'Crop'])['Production_Tonnes'].sum().nlargest(3).reset_index()
    top_producers['Production_Tonnes'] = top_producers['Production_Tonnes'].apply(lambda x: f"{x:,.0f} Tonnes")
    return f"TOP PRODUCTION OVERVIEW:\n{top_producers.to_markdown(index=False)}"

In [71]:
def ask_samarth_agent(user_question: str) -> str:
    """
    The central function that orchestrates the RAG process (Retrieval -> Generation).
    """
    global client
    if client is None:
        return "ERROR: Gemini client is not initialized."

    # Step 1: Retrieval (R) - Gather Context
    climate_context = query_imd_data(user_question, imd_df, imd_insights)
    crop_context = query_crop_data(user_question, crop_df)

    # Step 2: Generation (G) - Craft the synthesis prompt
    system_prompt = (
        "You are 'Project Samarth,' an intelligent Q&A agent for the Indian Government. "
        "Your mission is to synthesize data from two separate, disparate sources: IMD (Climate) and Ministry of Agriculture (Crop Production). "
        "Your response must be professional, highly accurate, and data-backed. "
        "For every claim or data point, **cite the specific source dataset** (IMD or CROP) "
        "and mention the specific numerical values from the context provided below."
    )

    prompt = f"""
    SYSTEM PROMPT: {system_prompt}

    --- DATA CONTEXT 1: CLIMATE DATA (IMD) ---
    Source: {imd_insights['source']}
    {climate_context}
    ------------------------------------------

    --- DATA CONTEXT 2: AGRICULTURAL DATA (CROP PRODUCTION) ---
    Source: {crop_insights['source']}
    {crop_context}
    ----------------------------------------------------------

    USER QUESTION: {user_question}
    """

    # Step 3: Call the Gemini API
    try:
        response = client.models.generate_content(
            model='gemini-2.5-flash',
            contents=prompt
        )
        return response.text
    except Exception as e:
        return f"API ERROR: Could not get response from Gemini. Details: {e}"

Agent Testing (Phase 2)

We test the system with a complex, cross-domain question from the challenge.

Run Agent Test: Policy Advisor Question
This test requires combining the national warming trend (IMD data) with state-level crop production data to form policy recommendations.

In [76]:
test_question_1 = "What is the overall warming trend in India? Compare the average mean temperature of the most recent decade to the historical 117-year baseline, citing the IMD data source."

print(f"\n--- User Question --- \n{test_question_1}")

response_1 = ask_samarth_agent(test_question_1)

print(f"\n--- Project Samarth Answer --- \n{response_1}")


--- User Question --- 
What is the overall warming trend in India? Compare the average mean temperature of the most recent decade to the historical 117-year baseline, citing the IMD data source.

--- Project Samarth Answer --- 
As per the data from the India Meteorological Department (IMD) - Seasonal and Annual Min/Max Temp Series (data.gov.in), the overall warming trend in India indicates a significant increase in temperature.

Specifically:
*   The Annual Mean Temperature has increased by **1.26°C** from the early 1900s to the 2010s (Source: IMD).
*   Comparing the most recent decade to the historical baseline, the decade from 2010-2017 recorded the highest decadal average mean temperature at **25.22°C**. This is higher than the 117-year (1901-2017) historical baseline average mean temperature for India, which stands at **24.28°C** (Source: IMD).


In [73]:
test_question_2 = "List the total production (in Tonnes) for Water-Intensive crops versus Drought-Tolerant crops in Andhra Pradesh, based on the Ministry of Agriculture's data."

print(f"\n--- User Question --- \n{test_question_2}")

response_2 = ask_samarth_agent(test_question_2)

print(f"\n--- Project Samarth Answer --- \n{response_2}")


--- User Question --- 
List the total production (in Tonnes) for Water-Intensive crops versus Drought-Tolerant crops in Andhra Pradesh, based on the Ministry of Agriculture's data.

--- Project Samarth Answer --- 
As per the Ministry of Agriculture & Farmers Welfare data (CROP), the crop production in Andhra Pradesh is as follows:

*   **Water-Intensive Crops:** 515,000 Tonnes (CROP)
*   **Drought-Tolerant Crops:** 70,000 Tonnes (CROP)


In [75]:
test_question_3 = "A policy advisor is proposing a scheme to promote drought-tolerant crops over water-intensive crops in Andhra Pradesh. Based on historical climate and production data, what are the three most compelling data-backed arguments to support this policy?"

print(f"\n--- User Question --- \n{test_question_3}")

response_3 = ask_samarth_agent(test_question_3)

print(f"\n--- Project Samarth Answer --- \n{response_3}")


--- User Question --- 
A policy advisor is proposing a scheme to promote drought-tolerant crops over water-intensive crops in Andhra Pradesh. Based on historical climate and production data, what are the three most compelling data-backed arguments to support this policy?

--- Project Samarth Answer --- 
Project Samarth is pleased to provide a data-backed analysis supporting the proposal to promote drought-tolerant crops over water-intensive crops in Andhra Pradesh. Based on the provided IMD Climate Data and Ministry of Agriculture Crop Production Data, here are the three most compelling arguments:

1.  **Rising Temperatures Indicate Increased Climate Stress on Water-Intensive Crops:**
    IMD data reveals a significant warming trend over recent years, which directly increases the water requirements for crops and the risk of water scarcity. Specifically, the **Annual Mean Temperature** in India rose from **24.93°C in 2015** to **26.455°C in 2016**, and remained high at **26.295°C in 20