## Extracting information from a text using an LLM

As an LLM "understands" a language, it can be suited for tasks like sentiment analysis or information extraction.

In this Notebook, we are going to use our LLM to analyze the claims to find the state of mind of the person writing, and the location and time of the accident.

### Requirements and Imports

If you have selected the right workbench image to launch as per the Lab's instructions, you should already have all the needed libraries. If not uncomment the first line in the next cell to install all the right packages.

In [None]:
# Uncomment the following line only if you have not selected the right workbench image, or are using this notebook outside of the workshop environment.
# !pip install --no-cache-dir --no-dependencies --disable-pip-version-check -r requirements.txt

import json
import os
from os import listdir
from os.path import isfile, join

from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_community.llms import VLLMOpenAI

### Langchain pipeline

Again, we are going to use Langchain to define our task pipeline.

In [None]:
# LLM Inference Server URL
inference_server_url = "http://granite-7b-instruct-predictor.ic-shared-llm.svc.cluster.local:8080"

# LLM definition
llm = VLLMOpenAI(           # We are using the vLLM OpenAI-compatible API client. But the Model is running on OpenShift AI, not OpenAI.
    openai_api_key="EMPTY",   # And that is why we don't need an OpenAI key for this.
    openai_api_base= f"{inference_server_url}/v1",
    model_name="granite-7b-instruct",
    top_p=0.92,
    temperature=0.01,
    max_tokens=512,
    presence_penalty=1.03,
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()]
)

We will now define a **template** to use specifically with the different tasks.

Notice that this time we have **two placeholders** to be able to resuse the same template for different questions.

In [None]:
template="""<|system|>
You are a helpful, respectful and honest assistant.
Always assist with care, respect, and truth. Respond with utmost utility yet securely.
Avoid harmful, unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity.
I will give you a text, then ask a question about it. Give a precise and as concise as possible answer to this question.

<|user|>
### TEXT:
{text}

### QUESTION:
{query}

### ANSWER:
<|assistant|>
"""
prompt = PromptTemplate(input_variables=["text", "query"], template=template)

And we can now create the **conversation** object that we will use to query the model.

In [None]:
conversation = prompt | llm

We are now ready to query the model!

In the `claims` folder we have JSON files with examples of claims that could be received. We are going to read those files, display them, then the analysis that the LLM made.

In [None]:
# Read the claims and populate a dictionary

claims_path = 'claims'
onlyfiles = [f for f in listdir(claims_path) if isfile(join(claims_path, f))]

claims = {}

for filename in onlyfiles:
    # Opening JSON file
    with open(os.path.join(claims_path, filename), 'r') as file:
        data = json.load(file)
    claims[filename] = data

In [None]:
# Analyze the claims

for filename in onlyfiles:
    print(f"***************************")
    print(f"* Claim: {filename}")
    print(f"***************************")
    print("Original content:")
    print("-----------------")
    print(f"Subject: {claims[filename]['subject']}\nContent:\n{claims[filename]['content']}\n\n")
    print('Analysis:')
    print("--------")
    text_input = f"Subject: {claims[filename]['subject']}\nContent:\n{claims[filename]['content']}"
    sentiment_query = "What is the sentiment of the person sending this claim?"
    location_query = "Where does the event the claim is related to happen?"
    time_query = "When does the event the claim is related to happen? If possible, specify the date and the time."
    print("- Sentiment: ")
    conversation.invoke(input={"text": text_input, "query": sentiment_query});
    print("\n- Location: ")
    conversation.invoke(input={"text": text_input, "query": location_query});
    print("\n- Time: ")
    conversation.invoke(input={"text": text_input, "query": time_query});
    print("\n\n                          ----====----\n")

You can come back to this notebook at section 3.7 for some optional exercises if you want.