# Example of RAG with a GitHub repo code

In this example, we implement a RAG system where you can chat with your GitHub repository code.

**Please, complete the example_rag.ipynb first to get more insight.**

Let's go!

<p align="center">
  <img src="https://images.contentstack.io/v3/assets/bltac01ee6daa3a1e14/blt45d9c451c9a70269/6542d10b8b3f8e001b7aeead/img_blog_image_inline.png?width=1120&disable=upscale&auto=webp" alt="LlamaIndex Logo" width="20%">
  <img src="https://cdn.analyticsvidhya.com/wp-content/uploads/2023/07/langchain3.png" alt="Langchain Logo" width="20%">
  <img src="https://bookface-images.s3.amazonaws.com/logos/ee60f430e8cb6ae769306860a9c03b2672e0eaf2.png" alt="Ollama Logo" width="20%">
  <img src="https://cdn-icons-png.flaticon.com/256/25/25231.png" alt="GitHub Logo" width="20%">
</p>

Sources:

* https://lightning.ai/lightning-ai/studios/chat-with-your-code-using-rag

# Requirements

* Ollama installed locally




#### Install dependencies

In [None]:
!pip3 install llama_index
!pip3 install llama-index-readers-github
!pip3 install llama-index-embeddings-langchain
!pip3 install llama-index-llms-ollama

#### This cell prevents the kernel to collapse when downloading the GitHub repo content

In [1]:
# This is due to the fact that we use asyncio.loop_until_complete in
# the DiscordReader. Since the Jupyter kernel itself runs on
# an event loop, we need to add some help with nesting
import nest_asyncio

nest_asyncio.apply()

#### Let's download the repo code!

In this case, we target this particular repo but you can change the code to target other repos.

In [2]:
GITHUB_ACCESS_TOKEN="GITHUB_ACCESS_TOKEN"

In [3]:
from llama_index.readers.github import GithubRepositoryReader, GithubClient

def initialize_github_client(github_token):
    return GithubClient(github_token)

github_client = initialize_github_client(GITHUB_ACCESS_TOKEN)

loader = GithubRepositoryReader(
            github_client,
            owner='sergiopaniego', # CHANGE
            repo='RAG_local_tutorial', # CHANGE
            filter_file_extensions=(
                [".ipynb"],
                GithubRepositoryReader.FilterType.INCLUDE,
            ),
            verbose=False,
            concurrent_requests=5,
        )

docs = loader.load_data(branch="main")

#### Load the embedding model

In [None]:
from llama_index.embeddings.langchain import LangchainEmbedding
from langchain.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name='BAAI/bge-base-en-v1.5')
embed_model = LangchainEmbedding(embeddings)

#### Store the code into VectorStoreIndex using the loaded embedding model

In [5]:
from llama_index.core import Settings
from llama_index.core import VectorStoreIndex

# Create vector store and upload indexed data
Settings.embed_model = embed_model # Set the embedding model to be used
index = VectorStoreIndex.from_documents(docs)

### Load the LLM model to make the requests

In [6]:
from llama_index.llms.ollama import Ollama
from llama_index.core import Settings

# Select the LLM model. You can change the model name below.
llm = Ollama(model="llama3", request_timeout=500.0) 

# Generate a query engine from the previosuly created vector store index
Settings.llm = llm # Set the LLM model to be used
query_engine = index.as_query_engine(streaming=True, similarity_top_k=4)

#### Let's chat with the code! :)

In [7]:
from llama_index.core.prompts.base import PromptTemplate

response = query_engine.query('What is this repository about?')
print(response)

This repository is likely about using language models, specifically the LLaMA model, to analyze and generate text related to agriculture and sustainability. The code in this repository appears to be focused on creating a query engine that can answer questions about agricultural practices, sustainable farming methods, and the integration of technologies into these systems. The goal seems to be to develop more efficient and environmentally friendly ways to produce food, while also providing better options for consumers and those in need.
