# Welcome
This workshop aims to develop a Retrieval-Augmented Generation (RAG) using Microsoft Fabric, Azure AI Services, and Azure OpenAI. We'll delve into the dynamic capabilities of Microsoft Fabric and SynapseML, focusing on how these tools leverage Large Language Models (LLMs) to perform question-and-answer (Q&A) tasks on PDF documents. Microsoft Fabric and SynapseML provide a comprehensive suite of LLM tools and technologies, enabling seamless integration and in-depth data analysis. This allows users to easily extract valuable insights from unstructured documents, such as PDFs. Although we'll be working with PDF documents specifically in this workshop, the techniques we cover also apply to various other types of documents. This workshop offers a guided approach inspired by https://moaw.dev/workshop/fabric-e2e-rag.

## Goals
In this workshop, you'll learn how to create a question-answering system through the following steps:
1.	**Preprocessing PDF documents:** Utilise [Azure AI Document Intelligence](https://azure.microsoft.com/en-us/products/ai-services/ai-document-intelligence), part of [Azure AI Services](https://azure.microsoft.com/en-us/products/ai-services), to prepare PDFs for analysis.
2.	**Text segmentation:** Break down the text into manageable sections using [SynapseML](https://microsoft.github.io/SynapseML/).
3.	**Creating embeddings:** Generate numerical representations for text chunks with [SynapseML](https://microsoft.github.io/SynapseML/) and [Azure AI Services](https://azure.microsoft.com/en-us/products/ai-services).
4.	**Storing embeddings:** Save these embeddings in [Azure AI Search](https://azure.microsoft.com/en-us/products/ai-services/ai-search) for easy access.
5.	**Query processing:** Generate embeddings for user queries.
6.	**Document retrieval:** Identify relevant documents based on query embeddings.
7.	**Answer generation:** Answer user questions using information from the relevant documents.

Throughout the workshop, we'll highlight the unique features of [Microsoft Fabric](https://app.fabric.microsoft.com/) and [SynapseML](https://microsoft.github.io/SynapseML/) that distinguish them in question-answering over PDF documents. These include:

* **Scalability and Performance:** We'll demonstrate how Microsoft Fabric and SynapseML handle large-scale data processing with ease, thanks to their simple, composable, and distributed APIs built on top of the Apache Spark distributed computing framework. Their high-performance results make them ideal for enterprise-level applications.
* **Seamless Integration:** Understand how Microsoft Fabric and SynapseML seamlessly work together, allowing users to combine the strengths of both tools for powerful Q&A capabilities over PDF documents.

### RAG Overview
![rag overview](https://github.com/luisdza/rag-fabric-workbook/raw/main/images/rag-overview.png)

![image-alt-text](https://miro.medium.com/v2/resize:fit:720/format:webp/1*sAJdxEsDjsPMioHyzlN3_A.png)
When it's said that OpenAI's ada-002 generates embeddings of length 1536, it means that ada-002 transforms input data—such as text, images, or any other type of information it's designed to process—into vectors that are 1536 elements long. Each element of these vectors is a number, and the entire vector represents the input data in a high-dimensional space. This numerical representation captures various aspects of the input's characteristics and patterns, making it useful for computational tasks like similarity comparison, clustering, and machine learning model training.

# Pre-requisites
* An active Azure account.
* A Microsoft Fabric License.
* A workspace within Microsoft Fabric.
* Permission to use the Azure OpenAI API.
* A web browser.
* Familiarity with Python programming.
