# LLMs: Completion

The awesome thing about langchain is wide range of LLM that is has support for and that it is actively managed by the community. It is one of its super powers. In this notebook we'll look at the completion interface but langchain also has the Chat interface as well.

We'll get started with OpenAI's LLM as usual.

In [17]:
from langchain.llms import OpenAI

llm = OpenAI()

You can interact with the LLM in a couple of ways

1. `__call__()` which take a string and returns a string

In [10]:
llm("tell me a joke")

'\n\nQ: What did the fish say when it hit the wall?\nA: Dam!'

2. `generate()` which is greate for a batch of requests. It return and `LLMResult` object which has a few options.

In [11]:
llm_result = llm.generate(["Tell me a joke", "Tell me a poem"]*15)

In [5]:
len(llm_result.generations)

30

In [13]:
# results are under the generations
llm_result.generations[0]

[Generation(text='\n\nQ: What did the fish say when he hit the wall?\nA: Dam!', generation_info={'finish_reason': 'stop', 'logprobs': None})]

In [14]:
# llm_output has metadata about the results
llm_result.llm_output

{'token_usage': {'completion_tokens': 1063,
  'prompt_tokens': 120,
  'total_tokens': 1183},
 'model_name': 'text-davinci-003'}

we'll explore `LLMResult` in more detail below

3. `async` API which is greate for concurrency since most of these calls are network-bound.

In [16]:
import time
import asyncio

from langchain.llms import OpenAI


def generate_serially():
    llm = OpenAI(temperature=0.9)
    for _ in range(10):
        resp = llm.generate(["Hello, how are you?"])


async def async_generate(llm):
    resp = await llm.agenerate(["Hello, how are you?"])


async def generate_concurrently():
    llm = OpenAI(temperature=0.9)
    tasks = [async_generate(llm) for _ in range(10)]
    await asyncio.gather(*tasks)


s = time.perf_counter()
# If running this outside of Jupyter, use asyncio.run(generate_concurrently())
await generate_concurrently()
elapsed = time.perf_counter() - s
print("\033[1m" + f"Concurrent executed in {elapsed:0.2f} seconds." + "\033[0m")

s = time.perf_counter()
generate_serially()
elapsed = time.perf_counter() - s
print("\033[1m" + f"Serial executed in {elapsed:0.2f} seconds." + "\033[0m")

[1mConcurrent executed in 1.99 seconds.[0m
[1mSerial executed in 13.72 seconds.[0m


### A note about internals

All language model wrappers inherit from `BaseLanguageModel`.

Exposes three main methods:
- `generate_prompt()`: generate language model outputs for a sequence of prompt
    values. A prompt value is a model input that can be converted to any language
    model input format (string or messages).
- `predict()`: pass in a single string to a language model and return a string
    prediction.
- `predict_messages()`: pass in a sequence of BaseMessages (corresponding to a single
    model call) to a language model and return a BaseMessage prediction.


these are the functions every LLMs should implement by default.

But there is also `BaseLLM` which implementes the interface we saw above with `OpenAI` LLM.

If you want to build you own custom LLM it is recommended to subclass from `LLM` in `langchain.llm.base`

### `LLMResult` and `Generation`

`LLMResults` is the object that stores the results from the LLM. It has
- `generations` - which returns a list of list of `Generation` objects, each with one output from the LLM. This is a `list[list[Generation]]` because each input can have multipe candidate resonses.
- `llm_output` - which is LLM provider specifc outputs. You can get token_usage, model_name etc from this object.

In [20]:
llm_result.generations[:2]

[[Generation(text='\n\nQ: What did the fish say when he hit the wall?\nA: Dam!', generation_info={'finish_reason': 'stop', 'logprobs': None})],
 [Generation(text="\n\nHere's a beautiful poem by Robert Frost: \n\nThe Road Not Taken\n\nTwo roads diverged in a yellow wood,\nAnd sorry I could not travel both\nAnd be one traveler, long I stood\nAnd looked down one as far as I could\nTo where it bent in the undergrowth;\n\nThen took the other, as just as fair,\nAnd having perhaps the better claim,\nBecause it was grassy and wanted wear;\nThough as for that the passing there\nHad worn them really about the same,\n\nAnd both that morning equally lay\nIn leaves no step had trodden black.\nOh, I kept the first for another day!\nYet knowing how way leads on to way,\nI doubted if I should ever come back.\n\nI shall be telling this with a sigh\nSomewhere ages and ages hence:\nTwo roads diverged in a wood, and I—\nI took the one less traveled by,\nAnd that has made all the difference.", generation_i

In [21]:
llm_result.llm_output

{'token_usage': {'completion_tokens': 1063,
  'prompt_tokens': 120,
  'total_tokens': 1183},
 'model_name': 'text-davinci-003'}

`Generation` has
- `text` - generated text output
- `generation_info` - raw response from the provider which may include things like reason for finishing or token log probability

In [24]:
g = llm_result.generations[0][0]

g.text

'\n\nQ: What did the fish say when he hit the wall?\nA: Dam!'

In [25]:
g.generation_info

{'finish_reason': 'stop', 'logprobs': None}