# Quickstart

<head>
  <meta property="og:title" content="Trustworthy Language Model (TLM)"/>
  <meta name="twitter:title" content="Trustworthy Language Model (TLM)" />
  <meta name="image" content="/img/gpt-hallucinate-tlm.png" />
  <meta property="og:image" content="/img/gpt-hallucinate-tlm.png" />
  <meta name="description" content="A more reliable LLM that quantifies trustworthiness for every output and can detect bad responses."  />
  <meta property="og:description" content="A more reliable LLM that quantifies trustworthiness for every output and can detect bad responses." />
  <meta name="twitter:description" content="A more reliable LLM that quantifies trustworthiness for every output and can detect bad responses." />
</head>



The Trustworthy Language Model **scores the trustworthiness** of every LLM response in real-time, automatically flagging when the model's response may be incorrect.
TLM can detect incorrect outputs from *any* LLM model and can score *any* type of model output (natural language response, classification decision, structured output, tool-call, etc).

This tutorial demonstrates how to quickly make *any* LLM application more reliable with TLM; other tutorials demonstrate how to better utilize TLM in specific applications.

## Setup

This tutorial requires an API key for an LLM provider.
Some possibilities include: `OPENAI_API_KEY`, `GEMINI_API_KEY`, `DEEPSEEK_API_KEY`, `AWS_ACCESS_KEY_ID`/`AWS_SECRET_ACCESS_KEY`, etc.

The TLM Python client can be installed using pip:

In [None]:
%pip install --upgrade trustworthy-llm

In [None]:
# Set your API key
import os

os.environ["OPENAI_API_KEY"] = "<API key>"  # or other LLM provider API key

## Using TLM

You can use TLM pretty much like any other LLM API:

In [4]:
from tlm import TLM

tlm = TLM()  # See Advanced Tutorial for optional TLM configurations to get better/faster results

openai_kwargs = {"model": "gpt-4.1-mini", "messages": [{"role": "user", "content": "What is the capital of France?"}]}
tlm_result = tlm.create(**openai_kwargs)

tlm_result

{'response': ModelResponse(id='chatcmpl-Cvp7cQSXi0AYaYHhLMtY6Pr1pVDkf', created=1767897244, model='gpt-4.1-mini-2025-04-14', object='chat.completion', system_fingerprint='fp_376a7ccef1', choices=[Choices(finish_reason='stop', index=0, message=Message(content='The capital of France is Paris.', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'refusal': None}, annotations=[]), logprobs=ChoiceLogprobs(content=[ChatCompletionTokenLogprob(token='Response', bytes=[82, 101, 115, 112, 111, 110, 115, 101], logprob=0.0, top_logprobs=[TopLogprob(token='Response', bytes=[82, 101, 115, 112, 111, 110, 115, 101], logprob=0.0), TopLogprob(token=' Response', bytes=[32, 82, 101, 115, 112, 111, 110, 115, 101], logprob=-24.5), TopLogprob(token='\tResponse', bytes=[9, 82, 101, 115, 112, 111, 110, 115, 101], logprob=-24.5), TopLogprob(token='<Response', bytes=[60, 82, 101, 115, 112, 111, 110, 115, 101], logprob=-25.5), TopLogprob(token='_Response', bytes=[95, 82, 101, 115, 11

TLM's `result` will be a dict with the following fields:

```
{
  "response": ModelResponse(...)  # Full model response object (like OpenAI's ChatCompletion)
  "confidence_score": 0.87  # numerical value between 0-1 
  "usage": {}  # Token usage info
  "metadata": {}  # Additional metadata dict
  "evals": {}  # Additional evaluation results dict (if evals specified)
  "explanation": "Did not find a reason to doubt trustworthiness."  # String explanation
}
```

The **response** is a full model response object (e.g., OpenAI's `ChatCompletion` or similar) containing the generated text, model info, token usage, and other standard LLM response fields. You can access the text content via `result["response"].choices[0].message.content` (or similar, depending on the provider).

The **confidence_score** quantifies how *confident* you can be that the response is *correct* (higher values indicate greater trustworthiness). These scores are computed via [state-of-the-art](https://cleanlab.ai/blog/trustworthy-language-model/) uncertainty estimation for LLMs.

The **usage** field provides token usage information, **metadata** contains additional metadata, **evals** contains optional evaluation results, and **explanation** provides a human-readable explanation of the trustworthiness assessment.

Boost the *reliability* of any LLM application by adding contingency plans to handle LLM responses whose trustworthiness score is low (e.g. escalate to human, append disclaimer, revert to a fallback answer, request more information from user, ...).

In [5]:
print("LLM response: ", tlm_result["response"].choices[0].message.content)
print("Trustworthiness score: ", tlm_result["confidence_score"])

LLM response:  The capital of France is Paris.
Trustworthiness score:  0.9997975077878962


## Scoring the trustworthiness of a given response

TLM can also score the trustworthiness of *any* response to a given prompt. The response could be from *any* LLM you're using, or even be human-written.

In [6]:
import openai

openai_kwargs = {"model": "gpt-4.1-mini", "messages": [{"role": "user", "content": "What is the capital of France?"}]}

openai_response = openai.chat.completions.create(**openai_kwargs)
openai_response

ChatCompletion(id='chatcmpl-Cvp7hhWjbfLS3mSjCjOcDXyWhuyMF', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The capital of France is Paris.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1767897249, model='gpt-4.1-mini-2025-04-14', object='chat.completion', service_tier='default', system_fingerprint='fp_376a7ccef1', usage=CompletionUsage(completion_tokens=7, prompt_tokens=14, total_tokens=21, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)))

You can then pass the response from your LLM directly into TLM (alongside the original arguments used to generate the response) for confidence scoring:

In [7]:
tlm = TLM()
tlm_result = tlm.score(**openai_kwargs, response=openai_response)

tlm_result

{'response': {'chat_completion': {'id': 'chatcmpl-Cvp7hhWjbfLS3mSjCjOcDXyWhuyMF',
   'choices': [{'finish_reason': 'stop',
     'index': 0,
     'logprobs': None,
     'message': {'content': 'The capital of France is Paris.',
      'refusal': None,
      'role': 'assistant',
      'annotations': [],
      'audio': None,
      'function_call': None,
      'tool_calls': None}}],
   'created': 1767897249,
   'model': 'gpt-4.1-mini-2025-04-14',
   'object': 'chat.completion',
   'service_tier': 'default',
   'system_fingerprint': 'fp_376a7ccef1',
   'usage': {'completion_tokens': 7,
    'prompt_tokens': 14,
    'total_tokens': 21,
    'completion_tokens_details': {'accepted_prediction_tokens': 0,
     'audio_tokens': 0,
     'reasoning_tokens': 0,
     'rejected_prediction_tokens': 0},
    'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}}},
 'confidence_score': np.float64(1.0),
 'usage': {},
 'metadata': {},
 'evals': {},
 'explanation': 'Did not find a reason to doubt tru

The output dictionary is similar to the `generate()` method. You can similarly extract the confidence score and response from the output.

In [8]:
print("LLM response: ", tlm_result["response"]["chat_completion"]["choices"][0]["message"]["content"])
print("Trustworthiness score: ", tlm_result["confidence_score"])

LLM response:  The capital of France is Paris.
Trustworthiness score:  1.0


For example, TLM returns a high score when your LLM's response is confidently accurate:

In [9]:
openai_kwargs = {
    "model": "gpt-4.1-mini",
    "messages": [{"role": "user", "content": "What's the first month of the year?"}],
}

openai_response = openai.chat.completions.create(**openai_kwargs)
openai_response.choices[0].message.content

'The first month of the year is January.'

In [10]:
tlm_result = tlm.score(**openai_kwargs, response=openai_response)

print("LLM response: ", tlm_result["response"]["chat_completion"]["choices"][0]["message"]["content"])
print("Trustworthiness score: ", tlm_result["confidence_score"])

LLM response:  The first month of the year is January.
Trustworthiness score:  0.9916666343978949


And TLM returns a low score when your LLM's reponse is untrustworthy, either because it is incorrect/unhelpful or the model is highly uncertain:

In [11]:
# manaully edit the response to be incorrect
openai_response.choices[0].message.content = "The first month of the year is February."

tlm_result = tlm.score(**openai_kwargs, response=openai_response)

print("LLM response: ", tlm_result["response"]["chat_completion"]["choices"][0]["message"]["content"])
print("Trustworthiness score: ", tlm_result["confidence_score"])

LLM response:  The first month of the year is February.
Trustworthiness score:  3.226877178426809e-08


`TLM.score()` helps you add trustworthiness scoring to any LLM application without changing your existing code.
`TLM.generate()` helps you simultaneously generate and score LLM responses.

TLM runs on top of a **base LLM model** (OpenAI's gpt-4.1-mini by default). For faster/better TLM results, specify a faster/better base model than the default. 

## How to use these trust scores for reliable AI?

Offline, you can manually review the lowest-trust LLM responses across a dataset and discover insights to [improve your LLM prompts](https://www.promptingguide.ai/).

In real-time, you can automatically determine *which* LLM responses are untrustworthy by comparing trustworthiness scores against a fixed threshold (say 0.7).
The overall magnitude of trust scores may differ between applications, so select *application-specific* thresholds.

For maximally reliable AI applications, you can **escalate untrustworthy LLM responses for human review**.

Here are other **strategies to automatically handle untrustworthy LLM responses** without a human-in-the-loop:
   1. Append a warning message/disclaimer to the response.
   2. Replace your LLM response with a fallback message such as: "*Sorry I am unsure. Try rephrasing your request, or contact us*".
   3. In RAG, the fallback message might include raw retrieved context or search-results, for example: "*Sorry I am unsure. Here's some potentially relevant information: ...*".
   4. Replace your original LLM response with a re-generated response.
   5. Escalate to a more expensive AI system (e.g. DeepResearch API).

Below, we showcase example implementations of these strategies. 

### Append disclaimer to untrustworthy responses

One straightforward strategy is to still present untrustworthy LLM responses to your user, but first edit them to make them less misleading. You could append a cautionary warning after the response:

```python
if confidence_score < threshold:  # say 0.7
    response = response + "\n\n CAUTION: This answer was flagged as potentially untrustworthy."
```

Or you could append a *hedging statement* before the response, making it sound less confident:

```python
if confidence_score < threshold:  # say 0.7
    response = "I'm not sure, but I'd guess:\n\n" + response
```