## Information Extraction using LLM
This example demonstrates text information extraction example using Open AI compatible LLM endpoint

### Step 0: Set the working directory for the project

In [9]:
# set the working directory for the project
%cd /home/vcap/app/cf-jupyterlab-workshop

/home/vcap/app/cf-jupyterlab-workshop


### Step 1: Import the dependencies

In [10]:
import sys, os
import requests
import json
import httpx
import warnings
from langchain_core.prompts import ChatPromptTemplate
from langchain_classic.chains import LLMChain
from langchain_openai import ChatOpenAI
from tanzu_utils import CFGenAIService
from os import listdir
from os.path import isfile, join
warnings.filterwarnings('ignore')

### Step 2: Set up the OpenAI API credentials

In [11]:
# load your service details replace name with your Gen AI service.  Gen AI service is bound to the app
chat_service = CFGenAIService("tanzu-gpt-oss-120b")

# List available models
models = chat_service.list_models()
for m in models:
    print(f"- {m['name']} (capabilities: {', '.join(m['capabilities'])})")

# construct chat_credentials
chat_credentials = {
    "api_base": chat_service.api_base + "/openai/v1",
    "api_key": chat_service.api_key,
    "model_name": models[0]["name"]
}

- openai/gpt-oss-120b (capabilities: CHAT, TOOLS)


### Step 3: Initialize the LLM

In [12]:
# 2. HTTP client (optional but recommended for custom config)
httpx_client = httpx.Client(verify=False)  # verify=False if your endpoint needs --insecure

# 3. Initialize the LLM
llm = ChatOpenAI(
    temperature=0.9,
    model=chat_credentials["model_name"],   # model name from CF service
    base_url=chat_credentials["api_base"],  # OpenAI-compatible endpoint
    api_key=chat_credentials["api_key"],    # Bearer token
    http_client=httpx_client
)

### Step 4: Create a prompt template

In [13]:
template="""<s>[INST]
You are a helpful, respectful and honest assistant.
Always assist with care, respect, and truth. Respond with utmost utility yet securely.
Avoid harmful, unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity.
I will give you a text, then ask a question about it. Give a precise and as concise as possible answer to this question.

### TEXT:
{text}

### QUESTION:
{query}

### ANSWER:
[/INST]
"""
PROMPT = ChatPromptTemplate.from_template(template)

### Step 5: Create a chain

In [14]:
conversation = LLMChain(llm=llm,
                        prompt=PROMPT,
                        verbose=False
                        )

### Step 6: read the claims from directory and populate a dictionary

In [15]:
# Read the claims and populate a dictionary
claims_path = '/home/vcap/app/cf-jupyterlab-workshop/workshop/aircraft-claims'
onlyfiles = [f for f in listdir(claims_path) if isfile(join(claims_path, f))]

claims = {}

for filename in onlyfiles:
    # Opening JSON file
    with open(os.path.join(claims_path, filename), 'r') as file:
        data = json.load(file)
    claims[filename] = data

### Step 7 run the chain and display original claim and extract sentiment, location and time of the claim

In [16]:
for filename in onlyfiles:
    print(f"***************************")
    print(f"* Claim: {filename}")
    print(f"***************************")
    print("Original content:")
    print("-----------------")
    print(f"Subject: {claims[filename]['subject']}\nContent:\n{claims[filename]['content']}\n\n")
    print('Analysis:')
    print("--------")
    text_input = f"Subject: {claims[filename]['subject']}\nContent:\n{claims[filename]['content']}"
    sentiment_query = "What is the sentiment of the person sending this claim?"
    location_query = "Where does the event the claim is related to happen?"
    time_query = "When does the event the claim is related to happen? If possible, specify the date and the time."
    print(f"- Sentiment: ")
    sentiment = conversation.predict(text=text_input, query=sentiment_query);
    print(sentiment)
    print("\n- Location: ")
    loc = conversation.predict(text=text_input, query=location_query);
    print(loc)
    print("\n- Time: ")
    t = conversation.predict(text=text_input, query=time_query);
    print(t)
    print("\n\n                          ----====----\n")       

***************************
* Claim: claim1.json
***************************
Original content:
-----------------
Subject: Maintenance Discrepancy Report - Landing Gear B Hydraulic Failure
Content:
Dear Maintenance Control/Engineering Team,

I am submitting a Maintenance Discrepancy Report (MDR) for an issue noted on **August 2nd, 2025**, during a pre-flight check on **Landing Gear B**. 

The defect is logged under the operational system as **MAINT-LG-20250802-002**.

**Observed Discrepancy (Equipment: Landing Gear B):**

The primary concern is a **Hydraulic failure** resulting in a slow response time during the gear swing test. Initial observation shows a noticeable drop in hydraulic system pressure specifically impacting the operation of the Landing Gear B retraction/extension sequence.

**Action Taken/Request (Required Procedure: Inspect hydraulic lines and refill fluid):**

The aircraft is currently grounded pending repair. I have applied a **red maintenance tag** to the landing gea