# Cleanlab

This notebook shows how to use Cleanlab's Trustworthy Language Model (TLM) and Trustworthiness score.

TLM is a more reliable LLM that gives high-quality outputs and indicates when it is unsure of the answer to a question, making it suitable for applications where unchecked hallucinations are a show-stopper.
Trustworthiness score quantifies how confident you can be that the response is good (higher values indicate greater trustworthiness). These scores combine estimates of both aleatoric and epistemic uncertainty to provide an overall gauge of trustworthiness.

Learn about using TLM via Cleanlab's [quickstart tutorial](https://help.cleanlab.ai/tutorials/tlm/), [blog](https://cleanlab.ai/blog/trustworthy-language-model/), and [API documentation](https://help.cleanlab.ai/reference/python/trustworthy_language_model/).

Visit https://app.cleanlab.ai and sign up to get a free API key.


## Setup

If you're opening this Notebook on colab, you will probably need to install langchain community package to use the integration.

In [None]:
%pip install -qU langchain-community

## Imports

In [None]:
import os

from langchain.chains import LLMChain
from langchain_community.llms import CleanlabTLM
from langchain_core.prompts import PromptTemplate

## Set the Environment API Key
Make sure to get your free API key from Cleanlab. 

In [None]:
# set api key in env or in llm
# import os
# os.environ["CLEANLAB_API_KEY"] = "your api key"

llm = CleanlabTLM(api_key="your_api_key")

In [None]:
resp = llm.generate(["Who is Paul Graham?"])

In [None]:
resp.generations[0][0].text

You also get the trustworthiness score of the above response in the `trustworthiness_score` attribute. TLM automatically computes this score for all the <prompt, response> pair.

In [None]:
resp.generations[0][0].generation_info

A score of **~0.86** indicates that LLM's response can be trusted. Let's take another example here.

resp = llm.generate(
    "What was the horsepower of the first automobile engine used in a commercial truck in the United States?"
)

In [None]:
resp.generations[0][0].text

In [None]:
resp.generations[0][0].generation_info

A low score of **~0.58** indicates that the LLM's response shouldn't be trusted.

From these 2 straightforward examples, we can observe that the LLM's responses with the highest scores are direct, accurate, and appropriately detailed.<br />
On the other hand, LLM's responses with low trustworthiness score convey unhelpful or factually inaccurate answers, sometimes referred to as hallucinations. 

### Async

We can also use TLM asynchronously to allow non-blocking concurrent operations.

In [None]:
resp = llm.agenerate(["explain why saturn is round in only 100 words?"], stop="\t")

In [None]:
await resp

## Advance use of TLM

TLM can be configured with the following options:
- **model**: underlying LLM to use
- **max_tokens**: maximum number of tokens to generate in the response
- **num_candidate_responses**: number of alternative candidate responses internally generated by TLM
- **num_consistency_samples**: amount of internal sampling to evaluate LLM-response-consistency
- **use_self_reflection**: whether the LLM is asked to self-reflect upon the response it generated and self-evaluate this response

These configurations are passed as a dictionary to the `CleanlabTLM` object during initialization. <br />
More details about these options can be referred from [Cleanlab's API documentation](https://help.cleanlab.ai/reference/python/trustworthy_language_model/#class-tlmoptions) and a few use-cases of these options are explored in [this notebook](https://help.cleanlab.ai/tutorials/tlm/#advanced-tlm-usage).

Let's consider an example where the application requires `gpt-4` model with `128` output tokens.

In [None]:
options = {
    "model": "gpt-4",
    "max_tokens": 128,
}
llm = CleanlabTLM(api_key="your_api_key", options=options)

In [None]:
print(llm)

In [None]:
resp = llm.generate("Who is Paul Graham?")