## Hugging Face: Inference to Json (Local)

Common small models that run efficiently on CPUs:
* microsoft/DialoGPT-small (117M, 500MB): basic conversation, very fast on CPU
* gpt2 (124M, 500MB): classic, reliable, fast inference
* distilgpt2 (82M, 350MB): lighter version of GPT-2, faster than GPT-2

Remember to add the environment variable HF_TOKEN containing your token, or login to HF by
> huggingface-cli login

and give the token.

ToDo
* Sample to display information about the model (such as number of parameters)
* Develop script for Full/LORA fine-tuning on json instructions

In [2]:
import os
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline # For local models
import torch
import warnings
# import logging

# Stop the endless flow of nagging
warnings.filterwarnings("ignore")#, message=".*attention mask.*")
# os.environ['TRANSFORMERS_NO_ADVISORY_WARNINGS'] = '1'
# logging.getLogger("transformers").setLevel(logging.ERROR)
# logging.set_verbosity_error()

# Global setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Device type: " + device.type)

Device type: cpu


### Example for chatting
We show how to use local models for the simplest objective of chatting. We use two versions of the code. The bare version that allows more control, and the version with pipelines for easier syntax.

In [37]:
# Single-turn chat through bare interface
model_name = "meta-llama/Llama-3.2-1B-Instruct" # gpt2
# model_name = "distilgpt2"
# model_name = "microsoft/DialoGPT-small"
# model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Prompt
prompt = "Who wrote the Bible?"
input_ids = tokenizer.encode(prompt, return_tensors="pt")

# Generate response
outputs = model.generate(input_ids, max_length=25, num_return_sequences=1,
                         pad_token_id=tokenizer.eos_token_id, temperature=0.7, do_sample=True)

# Decode
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Who wrote the Bible? The Bible is a book of sacred scripture that is written by many different authors, including both Old


In [24]:
# Multi-turn chat through bare interface
chat_history_ids = None

# First message
prompt = "Hello! How are you?"
print(f"Prompt: {prompt}")
input_ids = tokenizer.encode(prompt, return_tensors="pt")

# Generate response
chat_history_ids = model.generate(input_ids, max_length=100, pad_token_id=tokenizer.eos_token_id,
                                  temperature=0.7, do_sample=True)

# Decode and print
response = tokenizer.decode(chat_history_ids[:, input_ids.shape[-1]:][0], skip_special_tokens=True)
print(f"Bot: {response}")

# Continue conversation
print()
prompt = "What's your favorite color?"
print(f"Prompt: {prompt}")
input_ids = tokenizer.encode(prompt, return_tensors="pt")
chat_history_ids = torch.cat([chat_history_ids, input_ids], dim=-1)

chat_history_ids = model.generate(chat_history_ids, max_length=200, pad_token_id=tokenizer.eos_token_id)

response = tokenizer.decode(chat_history_ids[:, -50:][0], skip_special_tokens=True)
print(f"Bot: {response}")

Prompt: Hello! How are you?
Bot:  (pause) This is lovely! I haven't seen you for a while but I do like you. (pause) You're always nice to me, you know. (pause) I'm sure you've found a lot of interesting things with your cock. I mean, you've been looking for something like that before, but this is the first time. You've been feeling pretty good. So, I guess, here it is. (pause) I'm just

Prompt: What's your favorite color?
Bot:  I like it or not. I mean, I'm not sure if I like it or not. I mean, I'm not sure if I like it or not. I mean, I'm not sure if I like it or not. I mean


In [39]:
# Chat through pipeline
chatbot = pipeline("text-generation", model=model_name)

prompt = ["What's your favorite color?", "Tell me a joke"]

# Generate response
response = chatbot(prompt, truncation=True, num_return_sequences=1,
                   pad_token_id=tokenizer.eos_token_id) # max_length=50

for p, r in zip(prompt, response):
    print(f"Prompt: {p}")
    print(f"Bot: {r[0]['generated_text']}")
    print()

Device set to use cpu


Prompt: ["What's your favorite color?", 'Tell me a joke']
Prompt: What's your favorite color?
Bot: What's your favorite color? Do you have a favorite color that you like to wear or use in your daily life?
I have to say, I'm a big fan of blue. It's such a calming color, and it always makes me feel happy and relaxed. I've been known to wear blue in my daily life, whether it's my shirt or my scarf. I also love how blue looks against a white background - it's just so striking and visually appealing. Have you ever had a favorite color that you've always stuck with, or do you like to mix and match different colors in your life?

Prompt: Tell me a joke
Bot: Tell me a joke. Why did the chicken go to the doctor?

I don't know, why?

Because it had fowl breath!

I hope that made you cluck with laughter!



### Extract json information from email
Define a json schema for the output. Read a sample email in a text file. Give instruction to the model to extract information from the input email in the json format.

In [53]:
# Define json schema
class OutputSchemaModel(BaseModel):
    customer_name: str
    phone_number: str
    order_number: str
    delivery_address: str

output_schema = OutputSchemaModel.model_json_schema()
print(output_schema)

{'properties': {'customer_name': {'title': 'Customer Name', 'type': 'string'}, 'phone_number': {'title': 'Phone Number', 'type': 'string'}, 'order_number': {'title': 'Order Number', 'type': 'string'}, 'delivery_address': {'title': 'Delivery Address', 'type': 'string'}}, 'required': ['customer_name', 'phone_number', 'order_number', 'delivery_address'], 'title': 'OutputSchemaModel', 'type': 'object'}


In [54]:
file = os.path.join(data_folder, "customer_support.txt")
with open(file, "r", encoding='utf-8') as f:
    email = f.read()

print(email)

Subject: Issue with Recent Order #48291

From: emma.johnson@example.com

To: support@shopfast.com

Date: October 26, 2025

Hi ShopFast team,

I placed an order (Order #48291) on October 20, but the package hasn’t arrived yet at 456 Kennedy Ave, 121489 Atlanta, even though the tracking page says “Delivered” since October 23. Could you please check what happened?

Also, I was charged twice for this order on my credit card. Please confirm if I’ll get a refund for the duplicate charge.

Thanks,
Emma Johnson
+44 7911 123456


The first time AutoModelForCausalLM.from_pretrained() is called, it will download the model to the local drive, typically under C:\Users\YourUserName\.cache\huggingface\hub\.

In [2]:
# model_name = "google/flan-t5-base"
# model_name = "Qwen/Qwen2.5-1.5B-Instruct"
model_name = "meta-llama/Llama-3.2-1B-Instruct"

# Downloads model to your computer (one-time download)
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
print(f"Device: {model.device}")  # Should show cuda or cpu
print(f"Parameters in memory: {model.num_parameters():,}")
# print(f"Memory usage: {torch.cpu.memory_allocated() / 1e9:.2f} GB")

NameError: name 'AutoModelForCausalLM' is not defined

In [None]:
# Chat using direct call
inputs = tokenizer("Hello", return_tensors="pt")
outputs = model.generate(**inputs)
result = tokenizer.decode(outputs[0])
print(result)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|>Hello and welcome to my home, I'm so glad you could join me for a cup of tea and


In [15]:
# Chat using pipeline
pipeline = transformers.pipeline("text-generation", model=model_name)
response = pipeline("Hey how are you doing today?")
print(response)

Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


[{'generated_text': "Hey how are you doing today? I'm doing well, thanks! I was just thinking about my old college days and how much I loved learning about different types of plants and flowers. I'm sure you're aware that there are a lot of species that are just as beautiful and unique as any other, but there are also some that are more...well, let's just say they're not so great.\n\nI think one of the things that really stood out to me about my college days was the diversity of plant species. There were so many different types of flowers, trees, and shrubs that I never would have seen otherwise. It was amazing to see how different the world was to me, and how much I was learning about the natural world.\n\nI'm sure you have a similar experience, don't you? I mean, I'm sure you've had your own experiences of learning about different species and plants, and seeing how unique and beautiful they are. Am I right?\n\nAnd I have to ask, what do you think is the most interesting or unique pla

In [10]:
# Json inference
def extract_json(text):
    prompt = f"Extract as JSON: {text}"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=256)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

In [11]:
# Works offline!
result = extract_json("John Doe is 30 years old, email john@example.com")
print(result)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Extract as JSON: John Doe is 30 years old, email john@example.com, and has a degree in Computer Science. John is married to Jane Doe, who is 28 years old, has a degree in Business, and has a degree in Psychology. John and Jane are both married and have two children, Emily (10) and Michael (7). John has a job at XYZ Corporation, which is a large and well-established company. The company has over 1,000 employees and is headquartered in New York City. John's salary is $120,000 per year. He is also a member of the New York City Police Department, which is responsible for maintaining law and order in the city. John is a member of the local community center and participates in the annual charity event for children's health and education. John has a car and drives a Honda Civic. He enjoys playing basketball and hiking in his free time. John is a big fan of the New York Yankees and attends their games whenever he can. He is a big fan of the New England Patriots and attends their games whenever