<a href="https://colab.research.google.com/github/rickqiu/jsonformer/blob/main/Demo%20of%20NL2JSON%20Using%20LLM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Demo of NL2JSON Using LLM

**Problem**

It is hard to get an LLM to output exact JSON.

**Challenges**

- Foundation models are good at general tasks but poor at specific tasks in your domain.

- Fine-tuned models don't generate the JSON for your natural language query.

- LLMs have hallucination problems, i.e., generating text or responses that seem sound natural but are factually incorrect.


**Solution**

Use an instruct LLM plus Jsonformer to generate syntax-correct JSON completions for natural language queries.


**Supported Schema Types**

Below is a list of the supported schema types:

- number

- boolean

- string

- array

- object

In [2]:
# Check GPU spec
!nvidia-smi

Wed Sep 20 05:48:28 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P0    25W / 300W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [4]:
# Install the required libraries, if not installed.
!pip install transformers accelerate jsonformer

In [5]:
# Import required modules
from transformers import AutoModelForCausalLM, AutoTokenizer
from jsonformer.format import highlight_values
from jsonformer.main import Jsonformer
import json

In [6]:
# Load Databricks instruct dolly-v2-3b and its tokenizer
# For more details, see https://huggingface.co/databricks
print("Loading model and tokenizer...")
model_name = "databricks/dolly-v2-3b"
model = AutoModelForCausalLM.from_pretrained(model_name, use_cache=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, use_cache=True)
print("Loaded model and tokenizer")

Loading model and tokenizer...


Downloading (…)lve/main/config.json:   0%|          | 0.00/819 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/5.68G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/450 [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/228 [00:00<?, ?B/s]

Loaded model and tokenizer


In [7]:
# Declare the car schema
car = {
  "type": "object",
  "properties": {
    "car": {
      "type": "object",
      "properties": {
        "make": {"type": "string"},
        "model": {"type": "string"},
        "year": {"type": "number"},
        "colors": {
          "type": "array",
          "items": {"type": "string"}
        },
        "features": {
          "type": "object",
          "properties": {
            "audio": {
              "type": "object",
              "properties": {
                "brand": {"type": "string"},
                "speakers": {"type": "number"},
                "hasBluetooth": {"type": "boolean"}
              }
            },
            "safety": {
              "type": "object",
              "properties": {
                "airbags": {"type": "number"},
                "parkingSensors": {"type": "boolean"},
                "laneAssist": {"type": "boolean"}
              }
            },
            "performance": {
              "type": "object",
              "properties": {
                "engine": {"type": "string"},
                "horsepower": {"type": "number"},
                "topSpeed": {"type": "number"}
              }
            }
          }
        }
      }
    },
    "owner": {
      "type": "object",
      "properties": {
        "firstName": {"type": "string"},
        "lastName": {"type": "string"},
        "age": {"type": "number"},
      }
    }
  }
}

In [8]:
# Given an instruction in the prompt, generate a JSON completion.
builder = Jsonformer(
    model=model,
    tokenizer=tokenizer,
    json_schema=car,
    prompt="Generate an example car",
)

print("Generating...")
output = builder()

# Write the output to a file
json_object = json.dumps(output, indent = 4)
file_path = "example.txt"
with open(file_path, 'w') as file:
  file.write(json_object)

Generating...


In [None]:
# Display the generated response
highlight_values(output)