## **Entity Extraction from IBM's Quaterly Earning Transcript call using Granite-8B**

##### This notebook works with two approaches to extract the entities from the transcript. The first approach is defining the entities in the prompt directly along with its description. In the second approach, we are defining the entities in a class and then converting it into pydantic function. This is then passed along with the prompt to the LLM.

##### The model used in this notebook is IBM's Granite-8b-preview-4k.

----

In [136]:
import os

Importing all the langchain libraries for prompt template.

In [137]:
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from pydantic import BaseModel, Field

As this model is currently only avaiable on BAM, we have to do BAM API call.  
All imports requried for doing BAM API call.

In [138]:
from genai import Client, Credentials
from genai.schema import TextGenerationParameters
from genai.extensions.langchain import LangChainInterface

In [None]:
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv()) 

Fetching the BAM credentials from .env file.

In [140]:
bam_url = os.environ['BAM_URL']
bam_api = os.environ['BAM_API_KEY']

Setting the parameters for the model to make it give accurate output. Model used here is IBM's granite-8b-preview-4k

> **IMP**: You need to have access to the granite-8b-preview-4k model on BAM to be able to run it.

In [153]:
params = TextGenerationParameters(
    decoding_method="sampling",
    max_new_tokens=100,
    min_new_tokens=50,
    reptition_penalty = 1,
    temperature=0,
    top_k= 1
    # random_seed=999
    )
creds = Credentials(api_endpoint=bam_url, api_key=bam_api)
client = Client(credentials=creds)
model_id = 'ibm/granite-8b-instruct-preview-4k'

llm = LangChainInterface(model_id=model_id, client=client, params=params)

Fetching and displaying the transcript from a pre-loaded .txt file present in the same parent folder as this jupyter file.

In [None]:
with open('ibm_transcript.txt', 'r') as file:
    page_content = file.read()
print(page_content)

---

### Method 1 - All the entities defined in the prompt only.

All the entities that needs to be fetched is defined in the prompt itself along with the entitie's description.

In [108]:
def extract_entities_prompt_only():
    template = """
    <|start_of_role|>user<|end_of_role|>
    -You are AI Entity Extractor. You help extracting entities from the given transcript: {page_content}
    -Analyze this transcript and extract the following entities:
    1) `company_name` : This is the name of the company for which the transcript is given.
    2) `ceo_name`: This is the name of the CEO of IBM.
    3) `pre_tax_profit_percentage`: This is the operating pre-tax profit in percentage.
    4) `pre_tax_profit_number`: This is the operating pre-tax profit in numbers.
    5) `total_revenue_transaction_processing`: This is the total revenue growth for Transaction Processing sector in percentage.
    6) `total_revenue_data_ai`: This is the total revenue growth for Data and AI sector in percentage.
    7) `total_revenue_security`: This is the total revenue growth/decline for security sector in percentage.
    8) `total_revenue_automation`: This is total revenue growth/decline for automation sector in percentage.
    9) `names_of_people`: All the names of the people that were mentioned in the transcript. Do not assume the last name. Only output the names which you find in the transcript.
    
    -Your output should strictly be in a json format.
    -If any entity is not found, your output shound be `data not available`. Do not make up your own entites if it is not present
    -Only strictly do what is asked to you. Do not give any explanations to your output.
    <|end_of_text|>
    <|start_of_role|>assistant<|end_of_role|>   
    """
    extract_prompt = PromptTemplate(
        input_variables = ["page_content"],
        template = template,
    )

    entities = extract_prompt | llm | StrOutputParser()
    return entities

Calling the model with the prompt to get the output.

In [109]:
chain = extract_entities_prompt_only()
response = chain.invoke({"page_content": page_content})

Showing the response from the LLM model.

In [None]:
print(response)

---

### Method 2 - All entities defined in classes and then using pydantic to convert to openai function. This is then passed onto the prompt.

This import is helpful when we want to create structured outputs (like JSON responses) from OpenAI models using predefined schemas. 

In [143]:
from langchain.utils.openai_functions import convert_pydantic_to_openai_function

Defining all the entities in a class along with the descripiton.

In [144]:
class PreTaxProfit(BaseModel):
    pre_tax_profit_percentage: str = Field(description="Operating pre-tax profit in percentage.")
    pre_tax_profit_numbers: str = Field(description="Operating pre-tax profit in numbers.")


class RevenueGrowth(BaseModel):
    total_revenue_transaction_processing: str = Field(description="Total revenue growth for Transaction Processing sector in percentage.")
    total_revenue_data_ai: str = Field(description="Total revenue growth for Data and AI sector in percentage.")
    total_revenue_security: str = Field(description="Total revenue growth/decline for security sector in percentage.")
    total_revenue_automation: str = Field(description="Total revenue growth/decline for automation sector in percentage.")

Wrapping all the classes into one parent class which is given to pydantic.

In [157]:
class EarningCallReport(BaseModel):
    company_name: str = Field(description="The public company name.")
    ceo_name: str = Field(description="Name of the CEO of the company.")
    pre_tax_profit: PreTaxProfit = Field(description="Operating pre-tax profit.")
    revenue: RevenueGrowth = Field(description="All revenue growth details for all sectors.")
    names_of_people: str = Field(description="All the names of the people that were mentioned in the transcript.")

In [158]:
overview_tagging_function = convert_pydantic_to_openai_function(EarningCallReport)


In [None]:
overview_tagging_function

Same prompt as before, but here, the pydantic function is passed here instead of defining each entity in the prompt.

In [160]:
def extract_entities():
    template = """
    <|start_of_role|>user<|end_of_role|>
    -You are AI Entity Extractor. You help extracting entities from the given transcript: {page_content}
    -Analyze this transcript and extract the following entities as per the following function defination: {function_content}
    -Your output should strictly be in a json format.
    -Do not generate random entities on your own. If it is not present or you are unable to find any specified entity, you strictly have to output it as `Data not available`.
    -Only do what is asked to you. Do not give any explanations to your output and do not hallucinate.
    <|end_of_text|>
    <|start_of_role|>assistant<|end_of_role|>   
    """
    extract_prompt = PromptTemplate(
        input_variables = ["page_content"],
        template = template,
    )

    entities = extract_prompt | llm | StrOutputParser()
    return entities

Calling the model with prompt to get the output.

In [161]:
chain = extract_entities()
response = chain.invoke({"page_content": page_content, "function_content": overview_tagging_function})

Showing the response from the LLM model.

In [None]:
print(response)

---