# 🥅 Goal

In this notebook, we'll undertake a series of experiments to understand and explore the concepts and techniques behind Retrieval-Augmented Generation.

To get started, make a copy of this notebook in your Google Drive and add your own code to capture your explorations and findings.

As you work through the experiments, choose one topic that particularly interests you and dive deeper into it. Prepare a concise lightning talk on your chosen topic to present during Week 3.

You do not have to deep dive into every experiement. Pick one and explore it more deeply. Prioritize finishing and deploying the assignment at the bottom of the notebook so you get more experience deploying a RAG-based LLM application.

The goal is to gain a solid understanding of Retrieval-Augmented Generation and share your insights with others in Discord. Happy exploring!

# 🧪 Experiment 1: How do I choose my text splitter?

LangChain supports a number of [text splitters](https://python.langchain.com/docs/modules/data_connection/document_transformers/). As mentioned in the documentation, a recursive text splitting approach is the recommended way to start splitting text.

`RecursiveCharacterTextSplitter` starts with a specified separator and if the resulting chunks are too large or not ideal, it recursively tries other separators.

This adaptability ensures that the text is split into chunks that are more manageable and semantically meaningful, which is crucial for the effectiveness of RAG systems in retrieving relevant information.

Due to its recursive nature, this splitter can ensure that the overlap between chunks is contextually relevant. This is important in RAG systems where the context of the information can significantly influence the quality of the generated responses. The overlapping chunks help maintain continuity and context, which can be lost in simpler splitting methods.



# 🧪 Experiment 2: How do I know what `chunk_size` and `chunk_overlap` to choose?

[ChunkViz](https://chunkviz.up.railway.app/) is a nifty tool you can use to experiment with different `chunk_size` and `chunk_overlap`.

Questions

Experiment with different types of textual data (e.g., prose, HTML, Markdown, JSON).
* What do you notice?
* When do you notice a difference in performance based on how you set these parameters?

Experiment with different types of non-texual data (e.g., audio, video, images)
* How do you chunk non-textual data? What approaches and libaries exist?

# 🧪 Experiment 3: How do I choose an embedding model?

Research and experiment with different types of embedding models.

* How much do they cost?
* How do open source embedding models compare with paid embedding models?
* Are there any differences between stability? Latency?
* Do some embedding models perform better or a specific type of data?
* What types of customization do different embedding models offer?

# 🧪 Experiment 4: How much time does using `CachedBackedEmbeddings` save you?

Once the embedding is computed, it is stored in the cache for future use. This means that any subsequent request for the same text will retrieve the embedding directly from the cache, bypassing the computation step.

* How do we measure that this caching is working?
* Does it truly reduce the time for returning matching documents?
* What happens if we ask the question a different way?
* Do we still benefit from caching?
* How much do we have to change the query for it to miss the cache?

# 🧪 Experiment 5: Improve the performance of `Mr-TD/RAG-PDF-QnA-ChatBot`

 Clone the [Hugging Face space](https://huggingface.co/spaces/Mr-TD/RAG-PDF-QnA-ChatBot/tree/main) `Mr-TD/RAG-PDF-QnA-ChatBot` referenced in the Week 2 lecture and try to make it more performant.

# 🧪  Experiment 6: Implement RAG for Movie Recommendations using Haystack 2.0 or another library

There are several libraries available for implementing a Retrieval-Augmented Generation (RAG) system. Each has its own strengths and weaknesses.

[Hackstack 2.0](https://docs.haystack.deepset.ai/docs/intro) is known for being well-suited for production-quality large language models (LLMs), with features like:

- Support for a variety of document stores, retrieval methods, and LLMs
- Scalability to handle large datasets and high query volumes
- Customizable pipelines for query processing, retrieval, and generation
- Detailed documentation and active community support

On the other hand, libraries like LangChain have a reputation for being easier to get started with for rapid prototyping, but may not be as robust for production-scale LLM applications.

For this experiment, try implementing your movie recommendation RAG system using [Hackstack 2.0](https://docs.haystack.deepset.ai/docs/intro) or another library of your choice.

As you work with the selected library, ask yourself the following questions:

1. How easy is the library to set up and install?
2. What is the quality and completeness of the documentation?
3. Is the API well-designed, intuitive, and easy to learn?
4. How active and supportive is the library's community?
5. Does the library offer flexibility and customization options?
6. How well does it integrate with your existing data and models?
7. Are all essential aspects of usage covered in the documentation?

# 💻 Assignment: Deploy a RAG chatbot to production

## Deploying your project as a chatbot

Now that you have developed and tested your movie question answering system, the next step is to deploy it so that it can be accessed by users anywhere.

You will deploy to Hugging Face Spaces using [Chainlit](https://chainlit.io), a powerful tool for deploying LLM projects as interactive web applications.


## Getting started with [Chainlit](https://chainlit.io) and Hugging Face Spaces

[Chainlit](https://chainlit.io) is designed to make the deployment of machine learning models straightforward and accessible.

Here are some resources to help you get started with deploying your project:

1. **Explore Chainlit documentation**: Begin by exploring the [Chainlit Documentation](https://chainlit.io/docs). This resource provides comprehensive guidance on how to use Chainlit, from installation to deployment.

2. **Learn about Hugging Face Spaces**: Learn more about Hugging Face Spaces, which is a platform for hosting ML models and apps, by visiting [Hugging Face Spaces](https://huggingface.co/spaces). This platform integrates seamlessly with Chainlit, making it an ideal choice for deploying your chatbot.

3. **Browse example projects**: It can also be helpful to look at example projects on Hugging Face Spaces to see how others have structured their deployments. This can provide inspiration and practical insights for your own deployment.


## Deploying your chatbot

To deploy your RAG question answering system as a chatbot, follow these steps:

1. **Prepare your project** Create a folder for your project and ensure that it is organized and well-documented. This includes having clear code comments, a requirements.txt file for dependencies, and a README for instructions.

2. **Set up Chainlit** Install Chainlit on your local machine or development environment.

3. **Build and deploy your chatbot** Using the above resources, figure out how to deploy your chatbot to a Hugging Face Space.

# 📝 Submission

Submit your experiment notebook for Week 2 using the form [here](https://docs.google.com/forms/d/1l935d2L3YN3Kj3ovNf3CKWB2EyxvDMkYY_sYte-NYWI/edit).

Please make sure sharing permissions are turned on for everyone with the link.

Note: In Week 3, we'll do lightning presentations where each student will share the results of their experiment.
