# Document loaders as retrievers

## Concept

Retriever is very similar to the Document loader.

Usually, document loaders are used for downloading many documents but retrievers are used for downloading small amount of documents. That is the biggest difference.

Retrievers use `query` in every retrieve function. Document loaders use `query` only optionally because they targets a big amoung of downloaded documents.

That's why Document loader defines a `query` also as an attribute, not as an argument for every function. A `query` as the function argument has a priotiy to the `query` class attribute if both are declared.

## Any Document loader works as a Retriever

If we are going to add a new integration to the LangChain, we don't have to implement the `get_relevant_documents` function for a new Document loader. We just implement the `load` function and this allow us to use `get_relevant_documents`.

Any Document loader can be used as a retriever. We just have to make sure that `get_relevant_documents` returns a small number of documents.

**Note:** Please, make sure that the document loader is implementing the query semantics and that document loader class is realy using the `query` parameter. 
Otherwise, the `query` argument in the `get_relevant_documents` is just a placeholder and does not change the downloaded dataset.

## Example

In [16]:
from langchain.document_loaders import ArxivLoader

Let's use our `ArxivLoader` directly as the Document loader. 

**Note:** The `ArxivLoader` is using the `query` parameter for downloading documents.

In [21]:
loader = ArxivLoader(query="1605.08386", load_max_docs=2)

In [22]:
docs = loader.load()
len(docs)

1

In [9]:
docs[0].metadata  # meta-information of the Document

{'Published': '2016-05-26',
 'Title': 'Heat-bath random walks with Markov bases',
 'Authors': 'Caprice Stanley, Tobias Windisch',
 'Summary': 'Graphs on lattice points are studied whose edges come from a finite set of\nallowed moves of arbitrary length. We show that the diameter of these graphs on\nfibers of a fixed integer matrix can be bounded from above by a constant. We\nthen study the mixing behaviour of heat-bath random walks on these graphs. We\nalso state explicit conditions on the set of moves so that the heat-bath random\nwalk, a generalization of the Glauber dynamics, is an expander in fixed\ndimension.'}

In [28]:
docs[0].page_content[:400]  # the Document content

'arXiv:1605.08386v1  [math.CO]  26 May 2016\nHEAT-BATH RANDOM WALKS WITH MARKOV BASES\nCAPRICE STANLEY AND TOBIAS WINDISCH\nAbstract. Graphs on lattice points are studied whose edges come from a ﬁnite set of\nallowed moves of arbitrary length. We show that the diameter of these graphs on ﬁbers of a\nﬁxed integer matrix can be bounded from above by a constant. We then study the mixing\nbehaviour of heat-b'

But we also can use the `ArxivLoader` as a retriever. 

For that, we have to use the `query` parameter for every call of the `get_relevant_documents` function:

In [23]:
retrieved_docs = loader.get_relevant_documents(query='Caprice Stanley')
len(retrieved_docs)

2

In [26]:
retrieved_docs[0].metadata  # meta-information of the Document

{'Published': '2017-10-10',
 'Title': 'On Mixing Behavior of a Family of Random Walks Determined by a Linear Recurrence',
 'Authors': 'Caprice Stanley, Seth Sullivant',
 'Summary': 'We study random walks on the integers mod $G_n$ that are determined by an\ninteger sequence $\\{ G_n \\}_{n \\geq 1}$ generated by a linear recurrence\nrelation. Fourier analysis provides explicit formulas to compute the\neigenvalues of the transition matrices and we use this to bound the mixing time\nof the random walks.'}

In [27]:
retrieved_docs[1].metadata  # meta-information of the Document

{'Published': '2016-05-26',
 'Title': 'Heat-bath random walks with Markov bases',
 'Authors': 'Caprice Stanley, Tobias Windisch',
 'Summary': 'Graphs on lattice points are studied whose edges come from a finite set of\nallowed moves of arbitrary length. We show that the diameter of these graphs on\nfibers of a fixed integer matrix can be bounded from above by a constant. We\nthen study the mixing behaviour of heat-bath random walks on these graphs. We\nalso state explicit conditions on the set of moves so that the heat-bath random\nwalk, a generalization of the Glauber dynamics, is an expander in fixed\ndimension.'}