# How to return citations

:::info Prerequisites

This guide assumes familiarity with the following:

- [Retrieval-augmented generation](/docs/tutorials/rag/)
- [Returning structured data from a model](/docs/how_to/structured_output/)

:::

How can we get a model to cite which parts of the source documents it referenced in its response?

To explore some techniques for extracting citations, let's first create a simple RAG chain. To start we'll just retrieve from the web using the [`TavilySearchAPIRetriever`](https://api.js.langchain.com/classes/langchain_community_retrievers_tavily_search_api.TavilySearchAPIRetriever.html).

## Setup
### Dependencies

We’ll use an OpenAI chat model and embeddings and a Memory vector store in this walkthrough, but everything shown here works with any [ChatModel](/docs/concepts/chat_models) or [LLM](/docs/concepts/text_llms), [Embeddings](/docs/concepts/embedding_models/), and [VectorStore](/docs/concepts/vectorstores/) or [Retriever](/docs/concepts/retrievers).

We’ll use the following packages:

```bash
npm install --save langchain @langchain/community @langchain/openai
```

We need to set environment variables for Tavily Search & OpenAI:

```bash
export OPENAI_API_KEY=YOUR_KEY
export TAVILY_API_KEY=YOUR_KEY
```

### LangSmith

Many of the applications you build with LangChain will contain multiple steps with multiple invocations of LLM calls. As these applications get more and more complex, it becomes crucial to be able to inspect what exactly is going on inside your chain or agent. The best way to do this is with [LangSmith](https://smith.langchain.com/).

Note that LangSmith is not needed, but it is helpful. If you do want to use LangSmith, after you sign up at the link above, make sure to set your environment variables to start logging traces:


```bash
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY=YOUR_KEY

# Reduce tracing latency if you are not in a serverless environment
# export LANGCHAIN_CALLBACKS_BACKGROUND=true
```

### Initial setup

In [1]:
import { TavilySearchAPIRetriever } from "@langchain/community/retrievers/tavily_search_api";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({
  model: "gpt-3.5-turbo",
  temperature: 0,
});

const retriever = new TavilySearchAPIRetriever({
  k: 6,
});

const prompt = ChatPromptTemplate.fromMessages([
  ["system", "You're a helpful AI assistant. Given a user question and some web article snippets, answer the user question. If none of the articles answer the question, just say you don't know.\n\nHere are the web articles:{context}"],
  ["human", "{question}"],
]);

Now that we've got a model, retriever and prompt, let's chain them all together. We'll need to add some logic for formatting our retrieved `Document`s to a string that can be passed to our prompt. We'll make it so our chain returns both the answer and the retrieved Documents.

In [2]:
import { Document } from "@langchain/core/documents";
import { StringOutputParser } from "@langchain/core/output_parsers";
import { RunnableMap, RunnablePassthrough } from "@langchain/core/runnables";

/**
 * Format the documents into a readable string.
 */
const formatDocs = (input: Record<string, any>): string => {
  const { docs } = input;
  return "\n\n" + docs.map((doc: Document) => `Article title: ${doc.metadata.title}\nArticle Snippet: ${doc.pageContent}`).join("\n\n");
}
// subchain for generating an answer once we've done retrieval
const answerChain = prompt.pipe(llm).pipe(new StringOutputParser());
const map = RunnableMap.from({
  question: new RunnablePassthrough(),
  docs: retriever,
})
// complete chain that calls the retriever -> formats docs to string -> runs answer subchain -> returns just the answer and retrieved docs.
const chain = map.assign({ context: formatDocs }).assign({ answer: answerChain }).pick(["answer", "docs"])

await chain.invoke("How fast are cheetahs?")

{
  answer: [32m"Cheetahs are the fastest land animals on Earth. They can reach speeds as high as 75 mph or 120 km/h."[39m... 124 more characters,
  docs: [
    Document {
      pageContent: [32m"Contact Us − +\n"[39m +
        [32m"Address\n"[39m +
        [32m"Smithsonian's National Zoo & Conservation Biology Institute  3001 Connecticut"[39m... 1343 more characters,
      metadata: {
        title: [32m"Cheetah | Smithsonian's National Zoo and Conservation Biology Institute"[39m,
        source: [32m"https://nationalzoo.si.edu/animals/cheetah"[39m,
        score: [33m0.96283[39m,
        images: [1mnull[22m
      }
    },
    Document {
      pageContent: [32m"Now, their only hope lies in the hands of human conservationists, working tirelessly to save the che"[39m... 880 more characters,
      metadata: {
        title: [32m"How Fast Are Cheetahs, and Other Fascinating Facts About the World's ..."[39m,
        source: [32m"https://www.discovermagazine.com/planet-

See a LangSmith trace [here](https://smith.langchain.com/public/bb0ed37e-b2be-4ae9-8b0d-ce2aff0b4b5e/r) that shows off the internals.

## Tool calling

### Cite documents
Let's try using [tool calling](/docs/how_to/tool_calling) to make the model specify which of the provided documents it's actually referencing when answering. LangChain has some utils for converting objects or [Zod](https://zod.dev) objects to the JSONSchema format expected by providers like OpenAI. We'll use the [`.withStructuredOutput()`](/docs/how_to/structured_output/) method to get the model to output data matching our desired schema:

In [3]:
import { z } from "zod";

const llmWithTool1 = llm.withStructuredOutput(
  z.object({
    answer: z.string().describe("The answer to the user question, which is based only on the given sources."),
    citations: z.array(z.number()).describe("The integer IDs of the SPECIFIC sources which justify the answer.")
  }).describe("A cited source from the given text"),
  {
    name: "cited_answers"
  }
);

const exampleQ = `What is Brian's height?

Source: 1
Information: Suzy is 6'2"

Source: 2
Information: Jeremiah is blonde

Source: 3
Information: Brian is 3 inches shorter than Suzy`;

await llmWithTool1.invoke(exampleQ);

{
  answer: [32m`Brian is 6'2" - 3 inches = 5'11" tall.`[39m,
  citations: [ [33m1[39m, [33m3[39m ]
}

See a LangSmith trace [here](https://smith.langchain.com/public/28736c75-122e-4deb-9916-55c73eea3167/r) that shows off the internals

Now we're ready to put together our chain

In [4]:
import { Document } from "@langchain/core/documents";

const formatDocsWithId = (docs: Array<Document>): string => {
  return "\n\n" + docs.map((doc: Document, idx: number) => `Source ID: ${idx}\nArticle title: ${doc.metadata.title}\nArticle Snippet: ${doc.pageContent}`).join("\n\n");
}
// subchain for generating an answer once we've done retrieval
const answerChain1 = prompt.pipe(llmWithTool1);
const map1 = RunnableMap.from({
  question: new RunnablePassthrough(),
  docs: retriever,
})
// complete chain that calls the retriever -> formats docs to string -> runs answer subchain -> returns just the answer and retrieved docs.
const chain1 = map1
  .assign({ context: (input: { docs: Array<Document> }) => formatDocsWithId(input.docs) })
  .assign({ cited_answer: answerChain1 })
  .pick(["cited_answer", "docs"])
  
await chain1.invoke("How fast are cheetahs?")

{
  cited_answer: {
    answer: [32m"Cheetahs can reach speeds as high as 75 mph or 120 km/h."[39m,
    citations: [ [33m1[39m, [33m2[39m, [33m5[39m ]
  },
  docs: [
    Document {
      pageContent: [32m"One of two videos from National Geographic's award-winning multimedia coverage of cheetahs in the ma"[39m... 60 more characters,
      metadata: {
        title: [32m"The Science of a Cheetah's Speed | National Geographic"[39m,
        source: [32m"https://www.youtube.com/watch?v=icFMTB0Pi0g"[39m,
        score: [33m0.97858[39m,
        images: [1mnull[22m
      }
    },
    Document {
      pageContent: [32m"The maximum speed cheetahs have been measured at is 114 km (71 miles) per hour, and they routinely r"[39m... 1048 more characters,
      metadata: {
        title: [32m"Cheetah | Description, Speed, Habitat, Diet, Cubs, & Facts"[39m,
        source: [32m"https://www.britannica.com/animal/cheetah-mammal"[39m,
        score: [33m0.97213[39m,
        images

See a LangSmith trace [here](https://smith.langchain.com/public/86814255-b9b0-4c4f-9463-e795c9961451/r) that shows off the internals.

### Cite snippets

What if we want to cite actual text spans? We can try to get our model to return these, too.

**Note**: Note that if we break up our documents so that we have many documents with only a sentence or two instead of a few long documents, citing documents becomes roughly equivalent to citing snippets, and may be easier for the model because the model just needs to return an identifier for each snippet instead of the actual text. We recommend trying both approaches and evaluating.

In [5]:
import { Document } from "@langchain/core/documents";

const citationSchema = z.object({
  sourceId: z.number().describe("The integer ID of a SPECIFIC source which justifies the answer."),
  quote: z.string().describe("The VERBATIM quote from the specified source that justifies the answer.")
});

const llmWithTool2 = llm.withStructuredOutput(
  z.object({
    answer: z.string().describe("The answer to the user question, which is based only on the given sources."),
    citations: z.array(citationSchema).describe("Citations from the given sources that justify the answer.")
  }), {
    name: "quoted_answer",
  })

const answerChain2 = prompt.pipe(llmWithTool2);
const map2 = RunnableMap.from({
  question: new RunnablePassthrough(),
  docs: retriever,
})
// complete chain that calls the retriever -> formats docs to string -> runs answer subchain -> returns just the answer and retrieved docs.
const chain2 = map2
  .assign({ context: (input: { docs: Array<Document> }) => formatDocsWithId(input.docs) })
  .assign({ quoted_answer: answerChain2 })
  .pick(["quoted_answer", "docs"]);
  
await chain2.invoke("How fast are cheetahs?")

{
  quoted_answer: {
    answer: [32m"Cheetahs can reach speeds of up to 120kph or 75mph, making them the world’s fastest land animals."[39m,
    citations: [
      {
        sourceId: [33m5[39m,
        quote: [32m"Cheetahs can reach speeds of up to 120kph or 75mph, making them the world’s fastest land animals."[39m
      },
      {
        sourceId: [33m1[39m,
        quote: [32m"The cheetah (Acinonyx jubatus) is the fastest land animal on Earth, capable of reaching speeds as hi"[39m... 25 more characters
      },
      {
        sourceId: [33m3[39m,
        quote: [32m"The maximum speed cheetahs have been measured at is 114 km (71 miles) per hour, and they routinely r"[39m... 72 more characters
      }
    ]
  },
  docs: [
    Document {
      pageContent: [32m"Contact Us − +\n"[39m +
        [32m"Address\n"[39m +
        [32m"Smithsonian's National Zoo & Conservation Biology Institute  3001 Connecticut"[39m... 1343 more characters,
      metadata: {
        titl

You can check out a LangSmith trace [here](https://smith.langchain.com/public/f0588adc-1914-45e8-a2ed-4fa028cea0e1/r) that shows off the internals.

## Direct prompting

Not all models support tool-calling. We can achieve similar results with direct prompting. Let's see what this looks like using an older Anthropic chat model that is particularly proficient in working with XML:

### Setup

Install the LangChain Anthropic integration package:

```bash
npm install @langchain/anthropic
```

Add your Anthropic API key to your environment:

```bash
export ANTHROPIC_API_KEY=YOUR_KEY
```

In [7]:
import { ChatAnthropic } from "@langchain/anthropic";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { XMLOutputParser } from "@langchain/core/output_parsers";
import { Document } from "@langchain/core/documents";
import { RunnableLambda, RunnablePassthrough, RunnableMap } from "@langchain/core/runnables";

const anthropic = new ChatAnthropic({
  model: "claude-instant-1.2",
  temperature: 0,
});
const system = `You're a helpful AI assistant. Given a user question and some web article snippets,
answer the user question and provide citations. If none of the articles answer the question, just say you don't know.

Remember, you must return both an answer and citations. A citation consists of a VERBATIM quote that
justifies the answer and the ID of the quote article. Return a citation for every quote across all articles
that justify the answer. Use the following format for your final output:

<cited_answer>
    <answer></answer>
    <citations>
        <citation><source_id></source_id><quote></quote></citation>
        <citation><source_id></source_id><quote></quote></citation>
        ...
    </citations>
</cited_answer>

Here are the web articles:{context}`;

const anthropicPrompt = ChatPromptTemplate.fromMessages([
  ["system", system],
  ["human", "{question}"]
]);

const formatDocsToXML = (docs: Array<Document>): string => {
  const formatted: Array<string> = [];
  docs.forEach((doc, idx) => {
    const docStr = `<source id="${idx}">
  <title>${doc.metadata.title}</title>
  <article_snippet>${doc.pageContent}</article_snippet>
</source>`
    formatted.push(docStr);
  });
  return `\n\n<sources>${formatted.join("\n")}</sources>`;
}

const format3 = new RunnableLambda({
  func: (input: { docs: Array<Document> }) => formatDocsToXML(input.docs)
})
const answerChain = anthropicPrompt
  .pipe(anthropic)
  .pipe(new XMLOutputParser())
  .pipe(
    new RunnableLambda({ func: (input: { cited_answer: any }) => input.cited_answer })
  );
const map3 = RunnableMap.from({
  question: new RunnablePassthrough(),
  docs: retriever,
});
const chain3 = map3.assign({ context: format3 }).assign({ cited_answer: answerChain }).pick(["cited_answer", "docs"])

const res = await chain3.invoke("How fast are cheetahs?");

console.log(JSON.stringify(res, null, 2));

{
  "cited_answer": [
    {
      "answer": "Cheetahs can reach top speeds of around 75 mph, but can only maintain bursts of speed for short distances before tiring."
    },
    {
      "citations": [
        {
          "citation": [
            {
              "source_id": "1"
            },
            {
              "quote": "Scientists calculate a cheetah's top speed is 75 mph, but the fastest recorded speed is somewhat slower."
            }
          ]
        },
        {
          "citation": [
            {
              "source_id": "3"
            },
            {
              "quote": "The maximum speed cheetahs have been measured at is 114 km (71 miles) per hour, and they routinely reach velocities of 80–100 km (50–62 miles) per hour while pursuing prey."
            }
          ]
        }
      ]
    }
  ],
  "docs": [
    {
      "pageContent": "One of two videos from National Geographic's award-winning multimedia coverage of cheetahs in the magazine's November 2012 

Check out this LangSmith trace [here](https://smith.langchain.com/public/e2e938e8-f847-4ea8-bc84-43d4eaf8e524/r) for more on the internals.

## Retrieval post-processing

Another approach is to post-process our retrieved documents to compress the content, so that the source content is already minimal enough that we don't need the model to cite specific sources or spans. For example, we could break up each document into a sentence or two, embed those and keep only the most relevant ones. LangChain has some built-in components for this. Here we'll use a [`RecursiveCharacterTextSplitter`](/docs/how_to/recursive_text_splitter), which creates chunks of a specified size by splitting on separator substrings, and an [`EmbeddingsFilter`](/docs/how_to/contextual_compression), which keeps only the texts with the most relevant embeddings.

In [8]:
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { EmbeddingsFilter } from "langchain/retrievers/document_compressors/embeddings_filter";
import { OpenAIEmbeddings } from "@langchain/openai";
import { DocumentInterface } from "@langchain/core/documents";
import { RunnableMap, RunnablePassthrough } from "@langchain/core/runnables";

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 400,
  chunkOverlap: 0,
  separators: ["\n\n", "\n", ".", " "],
  keepSeparator: false,
});

const compressor = new EmbeddingsFilter({
  embeddings: new OpenAIEmbeddings(),
  k: 10,
});

const splitAndFilter = async (input): Promise<Array<DocumentInterface>> => {
  const { docs, question } = input;
  const splitDocs = await splitter.splitDocuments(docs);
  const statefulDocs = await compressor.compressDocuments(splitDocs, question);
  return statefulDocs;
};

const retrieveMap = RunnableMap.from({
  question: new RunnablePassthrough(),
  docs: retriever,
});

const retriever = retrieveMap.pipe(splitAndFilter);
const docs = await retriever.invoke("How fast are cheetahs?");
for (const doc of docs) {
  console.log(doc.pageContent, "\n\n");
}

The maximum speed cheetahs have been measured at is 114 km (71 miles) per hour, and they routinely reach velocities of 80–100 km (50–62 miles) per hour while pursuing prey.
cheetah,
(Acinonyx jubatus), 


The science of cheetah speed
The cheetah (Acinonyx jubatus) is the fastest land animal on Earth, capable of reaching speeds as high as 75 mph or 120 km/h. Cheetahs are predators that sneak up on their prey and sprint a short distance to chase and attack.
 Key Takeaways: How Fast Can a Cheetah Run?
Fastest Cheetah on Earth 


Built for speed, the cheetah can accelerate from zero to 45 in just 2.5 seconds and reach top speeds of 60 to 70 mph, making it the fastest land mammal! Fun Facts
Conservation Status
Cheetah News
Taxonomic Information
Animal News
NZCBI staff in Front Royal, Virginia, are mourning the loss of Walnut, a white-naped crane who became an internet sensation for choosing one of her keepers as her mate. 


The speeds attained by the cheetah may be only slightly greater th

See the LangSmith trace [here](https://smith.langchain.com/public/ae6b1f52-c1fe-49ec-843c-92edf2104652/r) to see the internals.

In [9]:
const chain4 = retrieveMap
  .assign({ context: formatDocs })
  .assign({ answer: answerChain })
  .pick(["answer", "docs"]);
  
// Note the documents have an article "summary" in the metadata that is now much longer than the
// actual document page content. This summary isn't actually passed to the model.
const res = await chain4.invoke("How fast are cheetahs?");

console.log(JSON.stringify(res, null, 2))

{
  "answer": [
    {
      "answer": "\nCheetahs are the fastest land animals. They can reach top speeds between 75-81 mph (120-130 km/h). \n"
    },
    {
      "citations": [
        {
          "citation": [
            {
              "source_id": "Article title: How Fast Can a Cheetah Run? - ThoughtCo"
            },
            {
              "quote": "The science of cheetah speed\nThe cheetah (Acinonyx jubatus) is the fastest land animal on Earth, capable of reaching speeds as high as 75 mph or 120 km/h."
            }
          ]
        },
        {
          "citation": [
            {
              "source_id": "Article title: Cheetah - Wikipedia"
            },
            {
              "quote": "Scientists calculate a cheetah's top speed is 75 mph, but the fastest recorded speed is somewhat slower."
            }
          ]
        }
      ]
    }
  ],
  "docs": [
    {
      "pageContent": "The science of cheetah speed\nThe cheetah (Acinonyx jubatus) is the fastest l

Check out the LangSmith trace [here](https://smith.langchain.com/public/b767cca0-6061-4208-99f2-7f522b94a587/r) to see the internals.

## Next steps

You've now learned a few ways to return citations from your QA chains.

Next, check out some of the other guides in this section, such as [how to add chat history](/docs/how_to/qa_chat_history_how_to).