# Fuzzy Match URL Param to Surah:Ayah Format using Large Language Model (GPT-3.5)

A user could type Surah:Ayah in the url in a number of ways. We need to be able to extract the surah and ayah from their request. 

The reason we are not using RegExp here is because there are many different ways the user may enter the Surah and Ayah number, so we can instead infer it from the Large Language Model.

In [10]:
#!pip3 install python-dotenv

In [18]:
#!pip3 install openai

In [19]:
#!pip3 install langchain

In [89]:
from dotenv import load_dotenv
import os

load_dotenv(dotenv_path="keys/.env")

openai_api_key = os.getenv('OPEN_API_KEY_FOR_FUZZY_MATCH_SURAH_AYAH')

A user enters the following Surah and Ayah number in the URL. We need to be able to extract the Surah/Ayah.

In [81]:
# This is the URL Param entered by the user. Note that it can be changed to other formats and the LLM will 
# still be able to infer it. For example, change 2/36 to 2-36 or 2:36, etc
user_prompt = "2/36"

In [40]:
# To help construct our Chat Messages
from langchain.schema import HumanMessage
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate

# We will be using a chat model, defaults to gpt-3.5-turbo
from langchain.chat_models import ChatOpenAI

# To parse outputs and get structured data back
from langchain.output_parsers import StructuredOutputParser, ResponseSchema

chat_model = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo', openai_api_key=openai_api_key)

In [63]:
# The schema I want out
response_schemas = [
    ResponseSchema(name="surah_number", description="Surah number from the Quran"),
    ResponseSchema(name="ayah_number", description="Ayah number from the Quran")
]

# The parser that will look for the LLM output in my schema and return it back to me
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

In [64]:
# The format instructions that LangChain makes. Let's look at them
format_instructions = output_parser.get_format_instructions()
print(format_instructions)

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"surah_number": string  // Surah number from the Quran
	"ayah_number": string  // Ayah number from the Quran
}
```


In [65]:
# The prompt template that brings it all together

prompt = ChatPromptTemplate(
    messages=[
        HumanMessagePromptTemplate.from_template("You will be given a surah number and ayah number for the Quran. \n \
                                                  Extract the surah number and aya number. If no match is found, \n \
                                                  just output surah number as 0 and ayah number as 0. \n \
                                                  {format_instructions}\n{user_prompt}")  
    ],
    input_variables=["user_prompt"],
    partial_variables={"format_instructions": format_instructions}
)

In [80]:
query = prompt.format_prompt(user_prompt=user_prompt)
print (query.messages[0].content)

You will be given a surah number and ayah number for the Quran. 
                                                   Extract the surah number and aya number. If no match is found, 
                                                   just output surah number as 0 and ayah number as 0. 
                                                   The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"surah_number": string  // Surah number from the Quran
	"ayah_number": string  // Ayah number from the Quran
}
```
2/36


In [82]:
prompt_output = chat_model(query.to_messages())
formatted_output = output_parser.parse(prompt_output.content)

print (formatted_output)
print (type(formatted_output))

{'surah_number': '2', 'ayah_number': '36'}
<class 'dict'>


In [83]:
formatted_output

{'surah_number': '2', 'ayah_number': '36'}