# TheGoodPolicyAI

TheGoodPolicyAI is a Retrieval Augmented Generation (RAG) based application that combines Google's Gemini AI with economic data analysis to generate evidence based recommedations for policies in a country. It demonstrates its use case to the policy stakeholders and government officials for better analysis of the previous statistics and assert on more precise and crucial decisions for the welfare of the citizens. 

This application integrates World Bank indicators (GDP, tax rates, business climate) with strategic documents from sources like the Atlantic Council through a hybrid vector/keyword search system. It follows UN Sustainable Development Goals, analyzing real-time economic trends and grounding responses in verified data.

TheGoodPolicyAI presents a highly synergistic application of GenAI to public policy analysis, with several innovative elements:

- hybrid architecture: vector/keyword retrieval for searching the relevant docs for context, along with google search grounding for real time context.
- novel policy optimization: use SDGs as constraint filters -  LLM outputs are shaped by legal/economic thresholds rather than free-form generation.
- multi-document synthesis: RAG merges Atlantic Council strategies with WB data.
- numbers to learn along with text: analyzes various economic indicators across time to generate more comprehensive responses.
- unbiased system instructions sent to the model

Currently, if we see, policymakers lack tools to combine: i) historical economic trends ii) legal/regulatory constraints, and iii) geopolitical strategies. The capability of Generative AI to understand complex patterns and learn quickly over large datasets to generate appopriate and creative outputs in less time can help stabilise and improve the ongoing geopolitical situtation of the citizens to a certain extent. Sometimes, these models see in the data and history that humans miss to notice and can provide relevant solutions, we could not think of. 

# Step1: Install necessary libraries

In [None]:
!pip uninstall -qqy jupyterlab  # Remove unused conflicting packages
!pip install -U -q "google-genai" "chromadb" "langchain" "langchain-community" "langchain-google-genai" "pandas" "wbdata" "google"
!pip install google-api-core>=2.11.0

# Step2: Import google.genai and check version

In [None]:
from google import genai

genai.__version__

# Step3: Create a genai Client using the Google API Key

In [None]:
from kaggle_secrets import UserSecretsClient

GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")

client = genai.Client(api_key=GOOGLE_API_KEY)

# Step4: Extract World Bank Data 

- Import wbdata and other necessary libraries.
- List indicators needed to be fetched.
- Fetch using wbdata.get_dataframe to get a pandas dataframe.
- Impute the missing values using sklearn.impute.SimpleImputer.
- Return the dataframe.

In [None]:
import wbdata
from datetime import datetime
import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer

def init_data():
    
    # fetches world bank data

    try:
    
        indicators = {
            "NY.GDP.MKTP.CD": "GDP", 
            "IC.CRED.ACC.LGL.RGHT.010.XD.DB0514.DFRN": "STRENGTH_OF_LEGAL_RIGHTS", 
            "IC.ELC.ACS.COST.DFRN": "ELECTRICITY_COST", 
            "TRD.ACRS.BRDR.EXPT.COST.BRDR.COMP.CD.DB1619.DFRN": "TRADING_ACROSS_BORDERS", 
            "IC.BUS.EASE.DFRN.DB1014": "EASE_OF_DOING_BUSINESS",
            "PAY.TAX.TOT.TAX.RT.ZS": "TAX"
        }
        
        data = wbdata.get_dataframe(indicators, country=['all'], date=(datetime(2010, 1, 1), datetime(2025, 1, 1))).reset_index()

        
        imp = SimpleImputer(missing_values=np.nan, strategy='mean')
        data = pd.concat([pd.DataFrame(data['country']), pd.DataFrame(imp.fit_transform(data[list(indicators.values())]), columns = list(indicators.values()))], axis=1)

        return data

    except Exception as e:
        
        return f"Error Loading World Bank Data: {e}"

wb_data = init_data()
if type(wb_data) == "pandas.core.frame.DataFrame":
    print(wb_data.head())
else:
    print(wb_data)

# Step5: Create an Embedding Function

to embed the documents extracted from the web into word vectors so that the machine properly find relationships and assign meaning to them.

- Import necessary libraries (chromadb, google)
- Define a class of Embedding Function that inherits the EmbeddingFunction class of chromadb (to have certail functions inbuilt).
- Define mode for embedding: i) document ii) query
- With a Python decorator for to retry if retriable (function described earlier on the basis of error code returned by API), describe what would be done if the class is called.
- Create response object that embeds content, decide on the model (here text-embedding-004 is used) to see varying embedding outputs.
- Return embedding (values associated with that embedding).

In [None]:
from chromadb import Documents, EmbeddingFunction, Embeddings

from google.genai import types
from google.api_core import retry

is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

if not hasattr(genai.models.Models.generate_content, '__wrapped__'):
  genai.models.Models.generate_content = retry.Retry(
      predicate = is_retriable)(genai.models.Models.generate_content)

class GeminiEmbeddingFunction(EmbeddingFunction):
    
    # specifies whether to generate embeddings for documents, or queries
    document_mode = True

    @retry.Retry(predicate=is_retriable)
    def __call__(self, input: Documents) -> Embeddings:
        if self.document_mode:
            embedding_task = "retrieval_document"
        else:
            embedding_task = "retrieval_query"

        response = client.models.embed_content(
            model="models/text-embedding-004",
            contents=input,
            config=types.EmbedContentConfig(
                task_type=embedding_task,
            ),
        )
        return [e.values for e in response.embeddings]

# Step6: Load Web Documents and Create Word Embeddings

- Import necessary libraries (langchain document loader, chromadb)
- Create a loader object and then load it to get a langchain Document.
- Extract text out of the document.
- Create a chroma client and give name of the database and select the embedding function as coded above.
- Add the documents downloaded from the web to the database.
- return chroma database object.

In [None]:
from langchain_community.document_loaders import WebBaseLoader
import chromadb

DB_NAME = "thegoodpolicyaidb"

embed_fn = GeminiEmbeddingFunction()
embed_fn.document_mode = True

# initializes chroma db with data

def init_economic_db():

    global embed_fn, DB_NAME
    
    # fetches document

    try:
    
        loader = WebBaseLoader(web_paths=["https://www.atlanticcouncil.org/content-series/atlantic-council-strategy-paper-series/three-worlds-in-2035/"])
        docs = loader.load()

        texts = [doc.page_content for doc in docs]
        
    except Exception as e:
        
        return f"Error Fetching Documents: {e}"
    
    # creates vector store
    
    try:
    
        chroma_client = chromadb.PersistentClient()
        db = chroma_client.get_or_create_collection(name=DB_NAME, embedding_function=embed_fn)
    
        db.add(documents=texts, ids=[str(i) for i in range(len(texts))])
    
        return db
    
    except Exception as e:
        
        return f"Error Embedding Content: {e}"

vector_db = init_economic_db()

# Step7: Define instructions for the agent.

so that it is not biased and generates correct, appropriate, relevant, precise, and just opinions.

In [None]:
instructions = """
You are a policy expert analyzing global development. Use these guidelines:
1. Prioritize UN Sustainable Development Goals
2. Consider latest economic indicators (GDP, HDI)
3. Incorporate real-time news via Google Search grounding
4. Suggest data-driven policy recommendations
5. Maintain non-partisan, evidence-based approach
"""

# Step8: Set the document_mode = False.

now, we are going to the retrive the data from the chroma db for response generation.

In [None]:
embed_fn.document_mode = False

# Step9: Define the search function

This function searches the database for relevant documents to the query (both vector wise (embeddings) and keyword) and returns the combined document which is a list of lists.

In [None]:
def hybrid_search(query, vector_db):
    
    # vector search
    vector_docs = vector_db.query(query_texts = [query], n_results=3)["documents"]
    
    # keyword search
    keyword_docs = [doc for doc in vector_db.get() if query.lower() in doc.lower()]
    
    # combine and deduplicate
    all_docs = vector_docs + keyword_docs
    return all_docs

# Step 10: Define a function to analyze the world bank data collected.

This displays basic pandas operations to send more context to the model to generate response for a query. There can be more analysis steps involved as extensive we would want our model to be. Currently, it just retrieves the latest info about the main factors and a trend factor for the GDP. Others might include an analysis over a long range of time, or a correlation analysis of two different factors.

In [None]:
def analyze_country_data(country):
    
    country_df = wb_data[wb_data['country'] == country]
    
    if country_df.empty:
        return "No economic data available"
    
    analysis = []
    latest = country_df.iloc[-1]
    analysis.append(f"GDP: ${latest['GDP']/1e9:.1f}B")
    analysis.append(f"Tax Burden: {latest['TAX']}% of profit")
    analysis.append(f"Business Climate: {latest['EASE_OF_DOING_BUSINESS']}/100")
    analysis.append(f"Electricity Cost: {latest['ELECTRICITY_COST']}")
    analysis.append(f"Trading Across Borders: {latest['TRADING_ACROSS_BORDERS']}")
    analysis.append(f"Strength of Legal Rights: {latest['STRENGTH_OF_LEGAL_RIGHTS']}")
    
    # trend analysis
    if len(country_df) > 1:
        gdp_growth = (latest['GDP'] - country_df.iloc[-2]['GDP'])/country_df.iloc[-2]['GDP']*100
        analysis.append(f"Recent GDP Growth: {gdp_growth:.1f}%")
    
    return "\n".join(analysis)


# Step11: Define a query function.

The query is sent to this function which further employs the search and the analysis functions to send context to the model. The response object is a model to generate content, i.e. gemini-2.0-flash-001 (can use any other also). The system_instructions are set as described above and the temperature (symbolical randomness of the output) and top_p is adjusted accordingly. There is the use of Google Search Grounding for real time analysis and more accurate results. Then, the sources of these searches are also sent along with the repsonse.

In [None]:
# rag-enhanced generation function

def policy_query(query):
    
    # retrieves relevant context
    docs = hybrid_search(query, vector_db)
    
    countries = wb_data['country'].unique()
    country = next((c for c in countries if c.lower() in query.lower()), None)
    if not country:
        return "Please specify a country in your query (e.g. 'India')", []

    analysis = analyze_country_data(country)

    docs.append([analysis])

    context = "\n".join([d for doc in docs for d in doc])
    
    # generates with google-search grounding
    response = client.models.generate_content(
        model="gemini-2.0-flash-001",
        contents = f"Context: {context}, Query: {query}",
        config=types.GenerateContentConfig(
            system_instruction=instructions,
            temperature=0.3,
            top_p=0.95,
            tools=[types.Tool(google_search=types.GoogleSearch())]
        ),
    )

    sources = set()

    for candidate in response.candidates:
        grounding_metadata = getattr(candidate, "grounding_metadata", None)
        if grounding_metadata and hasattr(grounding_metadata, "grounding_chunks"):
            for chunk in grounding_metadata.grounding_chunks:
                web = getattr(chunk, "web", None)
                if web and getattr(web, "uri", None):
                    sources.add(f"{web.title}: {web.uri}")
        
    return response, list(sources)

# Step12: Describe the user interface and take queries ().



In [None]:
print("Welcome to the TheGoodPolicyAI command line interface! (type 'exit' to quit)")

print("It is an interactive Google-GenAI LLM + RAG agent that processes your question along with your country's current situation and suggest good policies for the country with respect to the problem you mention. These policies will be subject to consideration with policy stakeholders for better implementation.")

print("Type your question as $ prompts.")

while True:
    
    query = input("\n$ ")
    
    if query.lower() == "exit":
        break
    
    response, sources = policy_query(query)
 
    print("\nRecommendation:")
    print(response.text)

    print("\nSources:")
    for source in sources:
        print(source)

# Limitations:

- Static World Bank update cycles (quarterly) vs real-time crises
- Black box policy prioritization (weights for SDGs not transparent)
- Limited multi-stakeholder perspective integration

# Future Projections
- inclusion of more economic, political, and social factors for analysis.
- more dedicated google search functionality and also, implemention of Google News Alert API for real time updates for a country.
- use of extensive machine learning techniques (regression, exploratory data analysis, etc.) along with making the model understand how certain actions in the past have made those regression graph the way they are.
- since it comes to operation by the high-priority executives, a comprehensive security system be laid out along with the feature of recording history and caching similar context.
- digital twin for policy outcome modelling.