<a href="https://colab.research.google.com/github/rickqiu/jsonformer/blob/main/Demo%20of%20NL2JSON%20Using%20LLM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Demo of NL2JSON Using LLM

**Problem**

- It is a hard problem for an LLM to generate a structured JSON completion for a natural language query.

- A pre-trained LLM is good at general tasks but poor at tasks in your domain.

- An instruction-tuned LLM does not give the capability to output valid JSON for a natural language query.

- LLMs have hallucination problems.


**Solution**

- Use an instruct LLM plus jsonformer to generate syntax-correct JSON completions to natural language queries.


**Supported Schema Types**

Below is a list of the supported schema types:

- number

- boolean

- string

- array

- object

In [None]:
# Install the required libraries, if not installed.
!pip install transformers accelerate jsonformer

Collecting transformers
  Downloading transformers-4.33.2-py3-none-any.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m27.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate
  Downloading accelerate-0.23.0-py3-none-any.whl (258 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m258.1/258.1 kB[0m [31m29.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting jsonformer
  Downloading jsonformer-0.12.0-py3-none-any.whl (6.6 kB)
Collecting huggingface-hub<1.0,>=0.15.1 (from transformers)
  Downloading huggingface_hub-0.17.2-py3-none-any.whl (294 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m294.9/294.9 kB[0m [31m34.4 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m75.8 MB/s[

In [None]:
# Import required modules
from transformers import AutoModelForCausalLM, AutoTokenizer
from jsonformer.format import highlight_values
from jsonformer.main import Jsonformer
import json

In [None]:
# Load Databricks instruct dolly-v2-3b and its tokenizer
# For more details, see https://huggingface.co/databricks
print("Loading model and tokenizer...")
model_name = "databricks/dolly-v2-3b"
model = AutoModelForCausalLM.from_pretrained(model_name, use_cache=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, use_cache=True)
print("Loaded model and tokenizer")

Loading model and tokenizer...


Downloading (…)lve/main/config.json:   0%|          | 0.00/819 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/5.68G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/450 [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/228 [00:00<?, ?B/s]

Loaded model and tokenizer


In [None]:
# Declare the car schema
car = {
  "type": "object",
  "properties": {
    "car": {
      "type": "object",
      "properties": {
        "make": {"type": "string"},
        "model": {"type": "string"},
        "year": {"type": "number"},
        "colors": {
          "type": "array",
          "items": {"type": "string"}
        },
        "features": {
          "type": "object",
          "properties": {
            "audio": {
              "type": "object",
              "properties": {
                "brand": {"type": "string"},
                "speakers": {"type": "number"},
                "hasBluetooth": {"type": "boolean"}
              }
            },
            "safety": {
              "type": "object",
              "properties": {
                "airbags": {"type": "number"},
                "parkingSensors": {"type": "boolean"},
                "laneAssist": {"type": "boolean"}
              }
            },
            "performance": {
              "type": "object",
              "properties": {
                "engine": {"type": "string"},
                "horsepower": {"type": "number"},
                "topSpeed": {"type": "number"}
              }
            }
          }
        }
      }
    },
    "owner": {
      "type": "object",
      "properties": {
        "firstName": {"type": "string"},
        "lastName": {"type": "string"},
        "age": {"type": "number"},
      }
    }
  }
}

In [None]:
# Given an instruction in the prompt, generate a JSON completion.
builder = Jsonformer(
    model=model,
    tokenizer=tokenizer,
    json_schema=car,
    prompt="Generate an example car",
)

print("Generating...")
output = builder()

highlight_values(output)

Generating...
{
  car: {
    make: "Chevrolet",
    model: "Corvette",
    year: 2016.0,
    colors: [
      "Red"
    ],
    features: {
      audio: {
        brand: "Sony",
        speakers: 2.0,
        hasBluetooth: True
      },
      safety: {
        airbags: 2.0,
        parkingSensors: True,
        laneAssist: True
      },
      performance: {
        engine: "4.0",
        horsepower: 220.0,
        topSpeed: 220.0
      }
    }
  },
  owner: {
    firstName: "John",
    lastName: "Doe",
    age: 40.0
  }
}


In [None]:
# In specific color "grey"
builder = Jsonformer(
    model=model,
    tokenizer=tokenizer,
    json_schema=car,
    prompt="Generate an example car in grey color",
)

print("Generating...")
output = builder()

# Write the output to a file
json_object = json.dumps(output, indent = 4)
file_path = "example.txt"
with open(file_path, 'w') as file:
  file.write(json_object)

highlight_values(output)

Generating...
{
  car: {
    make: "Honda",
    model: "Civic",
    year: 2016.0,
    colors: [
      "grey"
    ],
    features: {
      audio: {
        brand: "Apple",
        speakers: 2.0,
        hasBluetooth: True
      },
      safety: {
        airbags: 2.0,
        parkingSensors: True,
        laneAssist: True
      },
      performance: {
        engine: "2.0",
        horsepower: 220.0,
        topSpeed: 220.0
      }
    }
  },
  owner: {
    firstName: "John",
    lastName: "Doe",
    age: 40.0
  }
}
