## Hugging Face: Inference to Json (Local)

Common small models that run efficiently on CPUs:
* microsoft/DialoGPT-small (117M, 500MB): basic conversation, very fast on CPU
* gpt2 (124M, 500MB): classic, reliable, fast inference
* distilgpt2 (82M, 350MB): lighter version of GPT-2, faster than GPT-2

Remember to add the environment variable HF_TOKEN containing your token, or login to HF by
> huggingface-cli login

and give the token.

ToDo
* Sample to display information about the model (such as number of parameters)
* Develop script for Full/LORA fine-tuning on json instructions

In [17]:
import os, sys
import transformers, torch, json, warnings # logging
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from pydantic import BaseModel

# Stop the endless nagging
warnings.filterwarnings("ignore")#, message=".*attention mask.*")
# os.environ['TRANSFORMERS_NO_ADVISORY_WARNINGS'] = '1'
# logging.getLogger("transformers").setLevel(logging.ERROR)
# logging.set_verbosity_error()

print(f"Python: {sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}")
print(f"Transformers: {transformers.__version__}")
print(f"PyTorch: {torch.__version__}")

# Global setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Device type: " + device.type)
root_folder = r"C:\\temp\\llms"
data_folder = os.path.join(root_folder, "datasets")

Python: 3.12.10
Transformers: 4.57.1
PyTorch: 2.9.0+cpu
Device type: cpu


In [18]:
# Load the model
model_name = "meta-llama/Llama-3.2-1B-Instruct"
# model_name = "distilgpt2"
# model_name = "microsoft/DialoGPT-small"
# model_name = "gpt2"
# model_name = "google/flan-t5-base"
# model_name = "Qwen/Qwen2.5-1.5B-Instruct"

# Retrieve local model (download on first call)
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
print(f"Model name: {model_name}")
print(f"Number of parameters: {model.num_parameters():,}")
print(f"Running on: {model.device}", "\n")

Model name: meta-llama/Llama-3.2-1B-Instruct
Number of parameters: 1,235,814,400
Running on: cpu 



### Examples for chatting
We show how to use local models for the simplest objective of chatting. We use two versions of the code. The bare version that allows more control, and the version with pipelines for easier syntax.

In [None]:
# Initialize chat history
chat_history_ids = None

# First message
prompt = "Hello! How are you?"
print(f"Prompt: {prompt}")
input_ids = tokenizer.encode(prompt, return_tensors="pt")

# Generate response
chat_history_ids = model.generate(input_ids, max_length=100, pad_token_id=tokenizer.eos_token_id,
                                  temperature=0.7, do_sample=True)

# Decode and print
response = tokenizer.decode(chat_history_ids[:, input_ids.shape[-1]:][0], skip_special_tokens=True)
print(f"Bot: {response}")

# Continue conversation
print()
prompt = "What's your favorite color?"
print(f"Prompt: {prompt}")
input_ids = tokenizer.encode(prompt, return_tensors="pt")
chat_history_ids = torch.cat([chat_history_ids, input_ids], dim=-1)

chat_history_ids = model.generate(chat_history_ids, max_length=200, pad_token_id=tokenizer.eos_token_id)

response = tokenizer.decode(chat_history_ids[:, -50:][0], skip_special_tokens=True)
print(f"Bot: {response}")

Model name: meta-llama/Llama-3.2-1B-Instruct
Number of parameters: 1,235,814,400
Running on: cpu 

Prompt: Hello! How are you?
Bot:  I'm excited to start this new project with you! I've been thinking about how we can improve the user experience on our website.

I'd like to propose a few ideas to enhance the user experience. Firstly, I think we could create a more intuitive navigation menu that makes it easier for users to find what they need. Perhaps we could add a section for frequently asked questions or a "help" section that provides clear instructions on how to use our website.

Additionally

Prompt: What's your favorite color?
Bot:  I was thinking of implementing a new feature that would allow users to save their favorite pages or articles for later. This could be a " favorites" section that allows users to save content for easy access later. Do you think this is something that would be


In [41]:
# Chat through pipeline
chatbot = pipeline("text-generation", model=model_name)

prompt = ["What's your favorite color?", "Tell me a joke"]

# Generate response
response = chatbot(prompt, truncation=True, num_return_sequences=1,
                   pad_token_id=tokenizer.eos_token_id) # max_length=50

for p, r in zip(prompt, response):
    print(f"Prompt: {p}")
    print(f"Bot: {r[0]['generated_text'][len(p):]}")
    print()

Device set to use cpu


Prompt: What's your favorite color?
Bot:  What's your favorite food? What's your favorite hobby? What's your favorite place to visit?
Do you have any pets? What's your favorite type of music?

I'd love to hear about your interests and preferences! It's always great to meet someone who shares similar tastes and passions.

Also, I have to ask: Are you a morning person, a night owl, or somewhere in between?

Prompt: Tell me a joke
Bot: . Why was the math book sad?

(wait for the punchline)

Because it had too many problems!



### Extract json information
Define a json schema for the output. This is done by using the package Pydantic, which allows to define an object and have it translated into a json schema. It is then that schema that is passed to the model through the prompt.

The data is a sample email in text. We give instruction to the model to extract information from the input email in the json format. Models suitable for instruction have been trained with a specific instruction syntax, which may differ with the model and should be followed for optimal response.

In [None]:
# Define pydantic object out of which the json schema is extracted
class OutputSchemaModel(BaseModel):
    customer_name: str
    phone_number: str
    order_number: str
    delivery_address: str

schema = OutputSchemaModel.model_json_schema()
print(schema)

{'properties': {'customer_name': {'title': 'Customer Name', 'type': 'string'}, 'phone_number': {'title': 'Phone Number', 'type': 'string'}, 'order_number': {'title': 'Order Number', 'type': 'string'}, 'delivery_address': {'title': 'Delivery Address', 'type': 'string'}}, 'required': ['customer_name', 'phone_number', 'order_number', 'delivery_address'], 'title': 'OutputSchemaModel', 'type': 'object'}


In [7]:
email = """Subject: URGENT: Non-Delivery of Order #9876543 - Expected Delivery Date 2025-11-01

Dear ElecTools, Customer Support Team,

I am writing to inquire about the status of my order, #9876543, which was placed on October 25, 2025.
The item, an Astro-Fit Backpack, was scheduled for delivery yesterday, November 1, 2025.
I have checked the tracking link multiple times (Tracking ID: 1Z99999999), but the status has not updated
in over 72 hours and still shows the package as being in the initial shipping facility.

I need an immediate update on where my package is and confirmation of a new, guaranteed delivery date.
If you cannot provide a confirmed delivery date within the next two business days, I request a full refund
for the purchase price. Please treat this issue with urgency.

Sincerely,

Jason Bourne,
3 Rue de la Paix,
83136 Gareoult, France
Tel: +33 874 7373 738
Email: jason.bourne@cia.com """

print(email)

Subject: URGENT: Non-Delivery of Order #9876543 - Expected Delivery Date 2025-11-01

Dear ElecTools, Customer Support Team,

I am writing to inquire about the status of my order, #9876543, which was placed on October 25, 2025.
The item, an Astro-Fit Backpack, was scheduled for delivery yesterday, November 1, 2025.
I have checked the tracking link multiple times (Tracking ID: 1Z99999999), but the status has not updated
in over 72 hours and still shows the package as being in the initial shipping facility.

I need an immediate update on where my package is and confirmation of a new, guaranteed delivery date.
If you cannot provide a confirmed delivery date within the next two business days, I request a full refund
for the purchase price. Please treat this issue with urgency.

Sincerely,

Jason Bourne,
3 Rue de la Paix,
83136 Gareoult, France
Tel: +33 874 7373 738
Email: jason.bourne@cia.com 


Create a prompt following the Llama template.

In [19]:
# Format the prompt using Llama 3.2 chat template
messages = [
    {
        "role": "system",
        "content": "You are a helpful assistant that extracts information from emails and returns valid JSON. Only return the JSON object, no additional text or explanation."
    },
    {
        "role": "user",
        "content": f"""Extract the following information from the email below and return it as a JSON object matching this schema:

{json.dumps(schema, indent=2)}

Email:
{email}

Return only the JSON object:"""
    }
]

# Apply chat template
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Tokenize and generate
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)


In [20]:
# Generate model response
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512,
                             temperature=0.1, # Low temperature for more deterministic output
                             do_sample=True, top_p=0.9,
                             pad_token_id=tokenizer.eos_token_id)

# Decode and print
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print("Model Response:")
print(response)

Model Response:
{
  "customer_name": "Jason Bourne",
  "phone_number": "+33 874 7373 738",
  "order_number": "9876543",
  "delivery_address": "3 Rue de la Paix, 83136 Gareoult, France"
}


In [22]:
# Check the output json
try:
    # Clean up response if needed
    json_start = response.find('{')
    json_end = response.rfind('}') + 1
    if json_start != -1 and json_end > json_start:
        json_str = response[json_start:json_end]
        parsed_json = json.loads(json_str)
        print("Parsed JSON:")
        print(json.dumps(parsed_json, indent=2))
    else:
        print("Could not find valid JSON in response")
except json.JSONDecodeError as e:
    print(f"SON parsing error: {e}")

Parsed JSON:
{
  "customer_name": "Jason Bourne",
  "phone_number": "+33 874 7373 738",
  "order_number": "9876543",
  "delivery_address": "3 Rue de la Paix, 83136 Gareoult, France"
}


### Extract trade information

In [53]:
# Define pydantic object out of which the json schema is extracted
class TradeSchemaModel(BaseModel):
    Counterparty: str
    Notional: str
    BuySell: str
    Maturity: str
    Index: str
    CallPut: str
    Strike: float
    BarrierLevel: float
    UpDown: str

schema = TradeSchemaModel.model_json_schema()
print(schema)

{'properties': {'Counterparty': {'title': 'Counterparty', 'type': 'string'}, 'Notional': {'title': 'Notional', 'type': 'string'}, 'BuySell': {'title': 'Buysell', 'type': 'string'}, 'Maturity': {'title': 'Maturity', 'type': 'string'}, 'Index': {'title': 'Index', 'type': 'string'}, 'CallPut': {'title': 'Callput', 'type': 'string'}, 'Strike': {'title': 'Strike', 'type': 'number'}, 'BarrierLevel': {'title': 'Barrierlevel', 'type': 'number'}, 'UpDown': {'title': 'Updown', 'type': 'string'}}, 'required': ['Counterparty', 'Notional', 'BuySell', 'Maturity', 'Index', 'CallPut', 'Strike', 'BarrierLevel', 'UpDown'], 'title': 'TradeSchemaModel', 'type': 'object'}


In [54]:
term_sheet = """
Counterparty Alpha Inc. Ltd. selling 10000 units of call option with strike 120 on index SPX,
expiring on November 15th, 2036. If the spot level goes above 140, the trade shall redeem at 0 cost.
"""

print(term_sheet)


Counterparty Alpha Inc. Ltd. selling 10000 units of call option with strike 120 on index SPX,
expiring on November 15th, 2036. If the spot level goes above 140, the trade shall redeem at 0 cost.



In [59]:
# Format the prompt using Llama 3.2 chat template
messages = [
    {
        "role": "system",
        "content": "You are a financial institution agent that extracts information from financial term sheets and returns valid JSON. Only return the JSON object, no additional text or explanation."
    },
    {
        "role": "user",
        "content": f"""Extract the following information from the term sheet below.
                       Return as a JSON object matching this schema:

{json.dumps(schema, indent=2)}

        The UpDown value should either be 'Up' or 'Down'.

Email:
{term_sheet}

Return only the JSON object:"""
    }
]

# Apply chat template
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Tokenize and generate
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)


In [None]:
# Generate model response
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1, do_sample=True,
                             top_p=0.9, pad_token_id=tokenizer.eos_token_id)

# Decode
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print("Trade definition:")
print(response)

Trade definition:
{
  "Counterparty": "Alpha Inc. Ltd.",
  "Notional": "10000",
  "BuySell": "Sell",
  "Maturity": "2036-11-15",
  "Index": "SPX",
  "CallPut": "Call",
  "Strike": 120,
  "BarrierLevel": 140,
  "UpDown": "Up"
}
