[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/use_cases/extraction.ipynb)

## Use case

LLMs can be used to extract information from text. LLMs can be used to enhance or replace systems based on rules or custom trained models to help simplify the workflow

## Approaches

There are 3 broad approaches for information extraction using LLMs:

- **Tool/Function Calling**: Some LLMs support a *tool or function calling* mode. These LLMs can structured output according to a given **schema**. Generally, this approach is the easiest to work with and is expected to yield good results since the models have been fine-tuned for function calling.

- **JSON Mode**: Some LLMs are can be forced to output valid JSON. This is similar to **Tool/Function Calling** approach, except that the schema is provided as part of the prompt. Generally, this approach expected to perform worse than a **Tool/Function Calling** approach.

- **Prompt Based**: LLMs that can follow instructions well can be instructed to generate text in a desired format. This text can then be parsed using existing [Output Parsers](/docs/modules/model_io/output_parsers/) or using [custom parsers](/docs/modules/model_io/output_parsers/custom) into structured format like JSON. This approach can be used with LLMs that **do not support** JSON mode or tool/function calling modes, so it's more broadly applicable.

## Table of Contents

- [Quickstart](/docs/use_cases/extraction/quickstart): We recommend starting here. Many of the following guides assume you fully understand the architecture shown in the Quickstart.
- [Improving Performance](/docs/use_cases/extraction/improving_performance): Learn how to use few shot examples to improve performance.
- [Working With Long Text](/docs/use_cases/extraction/long_text): What should you do if the text does not fit into the context window of the LLM?
- [Working with Files](/docs/use_cases/extraction/working_with_files): Examples of using LangChain document loaders and parsers to extract from files like PDFs.
- [Guidelines](/docs/use_cases/extraction/guidelines): Guidelines for getting good performance on extraction tasks.

## Use Case Accelerant

[langchain-extract](https://github.com/langchain-ai/langchain-extract) is a starter repo that implements a simple web server for information extraction from text and files using LLMs. It is build using **FastAPI**, **LangChain** and **Postgresql**. Feel free to adapt it to your own use cases.

## Other Resources

* The [output parser](/docs/modules/model_io/output_parsers/) documentation includes various parser examples for specific types (e.g., lists, datetime, enum, etc).
* LangChain [document loaders](/modules/data_connection/document_loaders/) to load content from files. Please see list of [integrations](/docs/integrations/document_loaders).
* The experimental [Anthropic function calling](https://python.langchain.com/docs/integrations/chat/anthropic_functions) support provides similar functionality to Anthropic chat models.
* [LlamaCPP](https://python.langchain.com/docs/integrations/llms/llamacpp#grammars) natively supports constrained decoding using custom grammars, making it easy to output structured content using local LLMs 
* [JSONFormer](/docs/integrations/llms/jsonformer_experimental) offers another way for structured decoding of a subset of the JSON Schema.
* [Kor](https://eyurtsev.github.io/kor/) is another library for extraction where schema and examples can be provided to the LLM.
* [OpenAI's function and tool calling](https://platform.openai.com/docs/guides/function-calling)
* For example, see [OpenAI's JSON mode](https://platform.openai.com/docs/guides/text-generation/json-mode).