# Contexto 

Queremos extraer de un texto información relevante como ser los nombres de marcas de productos y modelos.

In [2]:
import os

from kor.extraction import create_extraction_chain
from kor.nodes import Object, Text, Number

from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from langchain.callbacks import get_openai_callback

# Cargamos la key de open AI

Para evitar que termine en nuestro repositorio por accidente la cargamos desde una variable de entorno.

In [3]:
os.environ["CHAT_GPT_API_KEY"] = ""

# Creamos un schema object

El schema nos permite declarar que es lo que necesitamos.

In [22]:
schema = Object(
    id="pet_name_tricks",
    
    description="pet's names with the number of tricks performed",
    
    attributes=[
        Number(
            id="number_of_tricks",
            description="pet tricks performed",
        ),
        Text(
            id="pet_name",
            descriptio="name of the pet"
        )
    ],
    examples=[
        ("cleopatra performed 3 rollbacks in a row and 2 flip backs!", [{"pet_name":"Cleopatra", "number_of_tricks": [3, 2]}]),
    ],

    many=True
)

# Creamos el lang_chain object

Con el chainlang podemos utilizarlo para hacer lo que necesitamos

In [23]:
llm = ChatOpenAI(
    model_name="gpt-4",
    temperature=0,
    max_tokens=2000,
    openai_api_key=openai_api_key
)

chain = create_extraction_chain(llm, schema)

# Antes definimos algunos callbacks para ir controlando el costo

In [24]:
with get_openai_callback() as cb:
    result = chain.run(text="sussie was performing well and could manage to achieve 2 flip backs with 6 roll overs in a row!")
    
    print(f"Total Tokens: {cb.total_tokens}")
    print(f"Prompt Tokens: {cb.prompt_tokens}")
    print(f"Completion Tokens: {cb.completion_tokens}")
    print(f"Successful Requests: {cb.successful_requests}")
    print(f"Total Cost (USD): ${cb.total_cost}")

Total Tokens: 234
Prompt Tokens: 213
Completion Tokens: 21
Successful Requests: 1
Total Cost (USD): $0.00765


# Probemos algunos ejemplos

In [25]:
result

'number_of_tricks|pet_name\n[2]|Sussie\n[6]|Sussie'

# Que es lo que estamos enviando?

In [17]:
print(chain.prompt.format_prompt(text="sussie was performing well and could manage to achieve 2 flip backs with 6 roll overs in a row!").to_string())

Your goal is to extract structured information from the user's input that matches the form described below. When extracting information please make sure it matches the type information exactly. Do not add any attributes that do not appear in the schema shown below.

```TypeScript

name_tricks: Array<{ // pet's names with the number of tricks performed
 number_of_tricks: number // pet tricks performed
 pet_name: string // 
}>
```


Please output the extracted information in CSV format in Excel dialect. Please use a | as the delimiter. 
 Do NOT add any clarifying information. Output MUST follow the schema above. Do NOT add any additional columns that do not appear in the schema.



Input: cleopatra performed 3 rollbacks in a row and 2 flip backs!
Output: number_of_tricks|pet_name
[3, 2]|Cleopatra

Input: sussie was performing well and could manage to achieve 2 flip backs with 6 roll overs in a row!
Output:
