# Natural Language Based APIs

Kor can extract information from text matching a schema.

If we have the schema of an API, then we can power the API using natural language (Open API endpoint, HTML forms etc.)

In [1]:
%load_ext autoreload
%autoreload 2

import sys

sys.path.insert(0, "../../")

In [2]:
from kor.extraction import Extractor
from kor.nodes import Object, Text, Number
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

In [3]:
llm = ChatOpenAI(model_name="gpt-3.5-turbo")
model = Extractor(llm)

## Music Player

In [4]:
form = Object(
    id="player",
    description=(
        "User is controling a music player to select songs, pause or start them or play"
        " music by a particular artist."
    ),
    attributes=[
        Text(id="song", description="User wants to play this song", examples=[]),
        Text(id="album", description="User wants to play this album", examples=[]),
        Text(
            id="artist",
            description="Music by the given artist",
            examples=[("Songs by paul simon", "paul simon")],
        ),
        Text(
            id="action",
            description="Action to take one of: `play`, `stop`, `next`, `previous`.",
            examples=[
                ("Please stop the music", "stop"),
                ("play something", "play"),
                ("next song", "next"),
            ],
        ),
    ],
)

In [5]:
%%time
model("stop the music now", form)

NotImplementedError: Not implemented yet

In [26]:
%%time
model("i want to hear a song", form)

CPU times: user 3.49 ms, sys: 709 µs, total: 4.2 ms
Wall time: 854 ms


{'player': [{'action': ['play']}]}

In [27]:
%%time
model("can you play the album lion king from the movie", form)

CPU times: user 3.82 ms, sys: 0 ns, total: 3.82 ms
Wall time: 1.06 s


{'player': [{'album': ['lion king']}]}

In [28]:
%%time
model("can you play all the songs from paul simon and led zepplin", form)

CPU times: user 4.09 ms, sys: 0 ns, total: 4.09 ms
Wall time: 1.72 s


{'player': [{'artist': ['paul simon', 'led zepplin']}]}

In [29]:
%%time
model("the previous song", form)

CPU times: user 3.68 ms, sys: 657 µs, total: 4.34 ms
Wall time: 1.01 s


{'player': [{'action': ['previous']}]}

## Ticket ordering

Let's hook into an imaginary search API for ordering tickets.

In [30]:
form = Object(
    id="action",
    description="User is looking for sports tickets",
    attributes=[
        Text(
            id="sport",
            description="which sports do you want to buy tickets for?",
            examples=[
                (
                    "I want to buy tickets to basketball and football games",
                    ["basketball", "footbal"],
                )
            ],
        ),
        Text(
            id="location",
            description="where would you like to watch the game?",
            examples=[
                ("in boston", "boston"),
                ("in france or italy", ["france", "italy"]),
            ],
        ),
        Object(
            id="price_range",
            description="how much do you want to spend?",
            attributes=[],
            examples=[
                ("no more than $100", {"price_max": "100", "currency": "$"}),
                (
                    "between 50 and 100 dollars",
                    {"price_max": "100", "price_min": "50", "currency": "$"},
                ),
            ],
        ),
    ],
)

In [31]:
%%time
model("I want to buy tickets for a baseball game in LA area under $100", form)

CPU times: user 4.13 ms, sys: 0 ns, total: 4.13 ms
Wall time: 2.06 s


{'action': [{'sport': ['baseball'],
   'location': ['LA'],
   'price_range': [{'currency': ['$'], 'price_max': ['100']}]}]}

In [32]:
%%time
model(
    (
        "I want to see a celtics game in boston somewhere between 20 and 40 dollars per"
        " ticket"
    ),
    form,
)

CPU times: user 4.61 ms, sys: 88 µs, total: 4.7 ms
Wall time: 2.31 s


{'action': [{'sport': ['basketball'],
   'location': ['boston'],
   'price_range': [{'currency': ['$'],
     'price_max': ['40'],
     'price_min': ['20']}]}]}

## Company Search

**ATTENTION** This is a demo that shows how to build complexity. This particular package is actually *NOT* good for dealing with complex database queries (e.g., nesting filters), yet it can still get one pretty far. 

There's a better way run these kinds of queries and I may add it in the future to this package.

In [33]:
company_name = Text(
    id="company_name",
    description="what is the name of the company you want to find",
    examples=[
        ("Apple inc", "Apple inc"),
        ("largest 10 banks in the world", ""),
        ("microsoft and apple", "microsoft,apple"),
    ],
)

industry_name = Text(
    id="industry_name",
    description="what is the name of the company's industry",
    examples=[
        ("companies in the steel manufacturing industry", "steel manufacturing"),
        ("large banks", "banking"),
        ("military companies", "defense"),
        ("chinese companies", ""),
        ("companies that cell cigars", "cigars"),
    ],
)

geography_name = Text(
    id="geography_name",
    description="where is the company based?",
    examples=[
        ("chinese companies", "china"),
        ("companies based in france", "france"),
        ("LaMaple was based in france, italy", ["france", "italy"]),
        ("italy", ""),
    ],
)

foundation_date = Text(
    id="foundation_date",
    description="Foundation date of the company",
    examples=[("companies founded in 2023", "2023")],
)

attribute_filter = Text(
    id="attribute_filter",
    description=(
        "Filter by a value of an attribute using a binary expression. Specify the"
        " attribute's name, an operator (>, <, =, !=, >=, <=, in, not in) and a value."
    ),
    examples=[
        (
            "Companies with revenue > 100",
            {
                "attribute": "revenue",
                "op": ">",
                "value": "100",
            },
        ),
        (
            "number of employees between 50 and 1000",
            {"attribute": "employees", "op": "in", "value": ["50", "1000"]},
        ),
        (
            "blue or green color",
            {
                "attribute": "color",
                "op": "in",
                "value": ["blue", "green"],
            },
        ),
        (
            "companies that do not sell in california",
            {
                "attribute": "geography-sales",
                "op": "not in",
                "value": "california",
            },
        ),
    ],
)

sales_geography = Text(
    id="geography_sales",
    description="where is the company doing sales? Please use a single country name.",
    examples=[
        ("companies with sales in france", "france"),
        ("companies that sell their products in germany", "germany"),
        ("france, italy", ""),
    ],
)

attribute_selection_block = Text(
    id="attribute_selection",
    description="Asking to see the value of one or more attributes",
    examples=[
        ("What is the revenue of tech companies?", "revenue"),
        ("market cap of apple?", "market cap"),
        ("number of employees of largest company", "number of employees"),
        ("what are the revenue and market cap of apple", ["revenue", "market cap"]),
        (
            "share price and number of shares of indian companies",
            ["share price", "number of shares"],
        ),
    ],
)

sort_by_attribute_block = Object(
    id="sort_block",
    description=(
        "Use to request to sort the results by a particular attribute. "
        "Can specify the direction"
    ),
    attributes=[
        Text(id="direction", description="The direction of the sort"),
        Text(id="attribute", description="The sort attribute"),
    ],
    examples=[
        (
            "Largest by market-cap tech companies",
            {"direction": "descending", "attribute": "market-cap"},
        ),
        (
            "sort by companies with smallest revenue ",
            {"direction": "ascending", "attribute": "revenue"},
        ),
    ],
)

form = Object(
    id="search_for_companies",
    description="Search for companies matching the following criteria.",
    attributes=[
        company_name,
        geography_name,
        foundation_date,
        industry_name,
        sales_geography,
        attribute_filter,
        attribute_selection_block,
        sort_by_attribute_block,
    ],
)

**ATTENTION** Some of the queries below fail. One common reason is that more examples could be useful to show the model how to group objects together. Pay attention to failures!

Confirm that we're not getting false positives

In [34]:
%%time
model(
    (
        "Today Alice MacDonald is turning sixty days old. She had blue eyes. "
        "Bob is turning 10 years old. His eyes were bright red."
    ),
    form,
),

CPU times: user 53.6 ms, sys: 3.27 ms, total: 56.9 ms
Wall time: 1.27 s


({'search_for_companies': [{}]},)

In [40]:
%%time
model(
    (
        "revenue, eps of indian companies that have market cap of over 1 million, and"
        " and between 20-50 employees"
    ),
    form,
)

CPU times: user 3.16 ms, sys: 401 µs, total: 3.56 ms
Wall time: 4.33 s


{'search_for_companies': [{'attribute_filter': [{'attribute': ['market cap'],
     'op': ['>'],
     'value': ['1 million']},
    {'attribute': ['employees'], 'op': ['in'], 'value': ['20', '50']}],
   'attribute_selection': ['revenue', 'eps']}]}

In [41]:
%%time
model("companies that own red and blue buildings", form)

CPU times: user 4.32 ms, sys: 192 µs, total: 4.51 ms
Wall time: 2.51 s


{'search_for_companies': [{'attribute_filter': [{'attribute': ['building-colors'],
     'op': ['in'],
     'value': ['red', 'blue']}]}]}

In [42]:
%%time
model("revenue of largest german companies sorted by number of employees", form)

CPU times: user 0 ns, sys: 3.84 ms, total: 3.84 ms
Wall time: 3.28 s


{'search_for_companies': [{'geography_name': ['germany'],
   'sort_block': [{'attribute': ['number of employees'],
     'direction': ['descending']}],
   'attribute_selection': ['revenue']}]}

In [44]:
%%time
model(
    (
        "revenue, eps of indian companies that have market cap of over 1 million, "
        "but less than 50 employees and own red and blue buildings"
    ),
    form,
)

CPU times: user 4.73 ms, sys: 0 ns, total: 4.73 ms
Wall time: 5.52 s


{'search_for_companies': [{'attribute_filter': [{'attribute': ['market cap',
      'employees',
      'color'],
     'op': ['>', '<', 'in'],
     'value': ['1 million', '50', 'red', 'blue']}],
   'attribute_selection': ['revenue', 'eps']}]}

**ATTENTION** The query above actually fails to group things correctly ^^