Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the Easy RAG documentation #530

Merged
merged 1 commit into from
Apr 30, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 29 additions & 15 deletions docs/modules/ROOT/pages/easy-rag.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,21 @@ include::./includes/attributes.adoc[]

The Quarkus LangChain4j project provides a separate extension named
`quarkus-langchain4j-easy-rag` which provides an extremely easy way to get a
RAG pipeline up and running. All that is needed to ingest documents into an
embedding store is a single configuration property,
`quarkus.langchain4j.easy-rag.path`, which denotes a path in the local
filesystem where your documents are stored. On startup, Quarkus will
automatically scan all files in the directory and ingest them into an
embedding store.
RAG pipeline up and running. After adding this extension to your
application, all that is needed to ingest documents into an embedding store
is to add a dependency that provides an embedding model, and provide a
single configuration property, `quarkus.langchain4j.easy-rag.path`, which
denotes a path in the local filesystem where your documents are stored. On
startup, Quarkus will automatically scan all files in the directory and
ingest them into an in-memory embedding store.

Apache Tika, a library for parsing various file formats, is used under the
hood, so your documents can be in any of its supported formats (plain text,
PDF, DOCX, HTML, etc), including images with text, which will be parsed
using OCR (OCR requires to have the Tesseract library installed in your
system - see https://cwiki.apache.org/confluence/display/TIKA/TikaOCR).

You also don't even need to add a "real" embedding store - the
You also don't even need to add a persistent embedding store - the
`quarkus-langchain4j-easy-rag` extension will automatically register a
simple in-memory store for you if no other store is detected. You also don't
have to provide an implementation of `RetrievalAugmentor`, a basic default
Expand All @@ -30,14 +31,27 @@ still possible to use a persistent embedding store, such as Redis, by adding
the relevant extension (like `quarkus-langchain4j-redis`) to your
application.

To see this extension in action, use the project `samples/chatbot-easy-rag`
in the `quarkus-langchain4j` repository. Simply clone the repository,
compile it with `mvn clean install -DskipTests`, navigate to
`samples/chatbot-easy-rag`, and then start the project using `mvn
quarkus:dev`. When the application starts, navigate to `localhost:8080` and
ask any questions that the bot can answer (look into the files in
`samples/chatbot-easy-rag/src/main/resources/documents` to see what
documents the bot has ingested). For example, ask:
You have to add an extension that provides an embedding model. For that, you
can choose from the plethora of extensions like
`quarkus-langchain4j-openai`, `quarkus-langchain4j-ollama`, or import an
xref:in-process-embedding.adoc[in-process embedding model] - these have the
advantage of not having to send data over the wire.

NOTE: If you add two or more artifacts that provide embedding models,
Quarkus will ask you to choose one of them using the
`quarkus.langchain4j.embedding-model.provider` property.

== Getting started with a ready-to-use example

To see Easy RAG in action, use the project `samples/chatbot-easy-rag` in the
https://github.com/quarkiverse/quarkus-langchain4j[quarkus-langchain4j
repository]. Simply clone the repository, navigate to
`samples/chatbot-easy-rag`, declare the `QUARKUS_LANGCHAIN4J_OPENAI_API_KEY`
environment variable containing your OpenAI API key and then start the
project using `mvn quarkus:dev`. When the application starts, navigate to
`localhost:8080` and ask any questions that the bot can answer (look into
the files in `samples/chatbot-easy-rag/src/main/resources/documents` to see
what documents the bot has ingested). For example, ask:

----
What are the benefits of a standard savings account?
Expand Down
Loading