# Summary 

Testing out the WikipediaRetriever of LangChain, 
Official documentation link: https://python.langchain.com/v0.2/docs/integrations/retrievers/wikipedia/


## Step 1 - Load the env

In [5]:
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())
openai_api_key = os.environ["OPENAI_API_KEY"]


## Step 2 Instantiate the retriever

In [1]:
from langchain_community.retrievers import WikipediaRetriever

retriever = WikipediaRetriever()

## Step 3 Basic Usage

In [3]:
docs = retriever.invoke("Malaysia")
# Print first 400 chars of the content
print(docs[0].page_content[:400])

Malaysia is a country in Southeast Asia. The federal constitutional monarchy consists of 13 states and three federal territories, separated by the South China Sea into two regions: Peninsular Malaysia and Borneo's East Malaysia. Peninsular Malaysia shares a land and maritime border with Thailand and maritime borders with Singapore, Vietnam, and Indonesia. East Malaysia shares land and maritime bor


We want to link it with a chain, so we can ask more specific questions 
We will need:
1. An llm model
2. A prompt template
3. Processing function for the docs retrieved from WikiRetriever
   

## Step 4 - 1 Instantiate the LLM Model

In [6]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

## Step 4 - 2 Prepare the required components and chain it

In [7]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

# This prompt should provide answer based on the context
prompt = ChatPromptTemplate.from_template(
    """
    Answer the question based only on the context provided.
    Context: {context}
    Question: {question}
    """
)

# Helper method to join the documents
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Chaining retriever with format_docs to context should 
# retrieve the documents related to the document
chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

## Step 5 Testing some sample

In [8]:
chain.invoke("Which city is the capital of Malaysia?")

'The capital city of Malaysia is Kuala Lumpur.'

In [9]:
chain.invoke("Which city is the capital of Malaysia and how does the city get the name?")

'The capital of Malaysia is Kuala Lumpur, and the name "Kuala Lumpur" is derived from the Malay words "kuala," meaning "confluence," and "lumpur," meaning "mud." The name refers to the city\'s location at the confluence of the Gombak and Klang rivers, where muddy rivers meet.'

In [10]:
chain.invoke("What industries do Malaysia has competitive advantage?")

"Malaysia has a competitive advantage in several industries, including:\n\n1. **Semiconductor Industry**: Malaysia is an important nexus in the global semiconductor market and is the third largest exporter of semiconductor devices in the world. The country has plans to target US$100 billion in investment to further establish itself as a global semiconductor hub.\n\n2. **Manufacturing**: The manufacturing sector has a large influence on Malaysia's economy, accounting for over 40% of the GDP. Malaysia has developed vertical and horizontal integration across several export-linked industries, capturing significant global market share for manufactured products.\n\n3. **High-Tech Products**: The export value of high-tech products from Malaysia was around US$66 billion in 2022, making it the third highest in ASEAN.\n\n4. **Palm Oil**: Malaysia exports the second largest volume and value of palm oil products globally, after Indonesia.\n\n5. **Artificial Intelligence and Cloud Computing**: The 