# Chat with Code
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Y7fr7_CCprlW8yb7BTQttUGAxEfLAAd0)

This notebook demonstrates how to build a **Retrieval-Augmented Generation (RAG)** system for interacting with GitHub repositories using Nebius AI and LlamaIndex.

Nebius AI provides access to many state-of-the-art LLM models. Check out the full list of models [here](https://studio.nebius.ai/).

Visit https://studio.nebius.ai/ and sign up to get an API key.





## Install Dependencies

In [None]:
%pip install llama-index-llms-nebius llama-index-embeddings-nebius llama-index-readers-github

In [None]:
!pip install -U llama-index

## Setting Up Environment Variables

For security reasons, use environment variables to store API keys instead of hardcoding them.

In [None]:
# set api key in env or in llm
import os
os.environ["GITHUB_TOKEN"]="Your Github Token"
os.environ["NEBIUS_API_KEY"] = "Your Nebius API Key"

In [None]:
import nest_asyncio

nest_asyncio.apply()

## Import Required Modules

We import essential modules from `llama-index` to work with Nebius AI models and GitHub repositories.

In [None]:
from llama_index.core import SimpleDirectoryReader,Settings, VectorStoreIndex, PromptTemplate
from llama_index.embeddings.nebius import NebiusEmbedding
from llama_index.llms.nebius import NebiusLLM
from llama_index.readers.github import GithubRepositoryReader, GithubClient

## Initializing GitHub Client and Loading Repository Data

Replace `owner` and `repo` with your target repository details.

In [None]:
github_token = os.environ.get("GITHUB_TOKEN")
def initialize_github_client(github_token):
    return GithubClient(github_token)

github_client = initialize_github_client(github_token)

owner = "firoz786"
repo = "logo-ai"

loader = GithubRepositoryReader(
    github_client,
            owner=owner,
            repo=repo,
            filter_file_extensions=(
                [".py", ".ipynb", ".js", ".ts", ".md"],
                GithubRepositoryReader.FilterType.INCLUDE,
            ),
            verbose=False,
            concurrent_requests=5,
)

branch = "main"

docs = loader.load_data(branch=branch)

if not docs:
    raise ValueError("No documents were loaded. Check your GitHub repo and filters.")

print(f"Loaded {len(docs)} documents from GitHub.")

Loaded 13 documents from GitHub.


## Defining RAG Completion Function

This function initializes the Nebius LLM and Embedding models, creates an index, and retrieves relevant information from the loaded GitHub repository.

Runs retrieval-augmented generation (RAG) on GitHub repository data.
    
Parameters:
  - query_text (str): The user query for retrieving information.
  - embedding_model (str): The embedding model to use.
  - generative_model (str): The generative model to use.
    
Returns:
  - str: The generated response.

In [None]:
def completion_to_prompt(completion: str) -> str:
  return f"<s>[INST] {completion} [/INST] </s>\n"


def run_rag_completion(
    query_text: str,
    embedding_model: str ="BAAI/bge-en-icl",
    generative_model: str ="deepseek-ai/DeepSeek-V3"
    ) -> str:

    llm = NebiusLLM(
    model=generative_model,
    api_key=os.getenv("NEBIUS_API_KEY")
    )

    embed_model = NebiusEmbedding(
        model_name=embedding_model,
        api_key=os.getenv("NEBIUS_API_KEY")
    )
    Settings.llm = llm
    Settings.embed_model = embed_model

    index = VectorStoreIndex.from_documents(docs)

    query_engine = index.as_query_engine(similarity_top_k=5,streaming=True)

    qa_prompt_tmpl_str = (
            "Context information is below.\n"
            "---------------------\n"
            "{context_str}\n"
            "---------------------\n"
            "Given the context information above I want you to think step by step to answer the query in a crisp manner, incase case you don't know the answer say 'I don't know!'.\n"
            "Query: {query_str}\n"
            "Answer: "
            )

    qa_prompt_tmpl = PromptTemplate(qa_prompt_tmpl_str)
    query_engine.update_prompts({"response_synthesizer:text_qa_template": qa_prompt_tmpl})

    response = query_engine.query(query_text)
    return str(response)


## Running a Query on the GitHub Repository

We will now ask a question about the repository's purpose.

In [None]:
res = run_rag_completion('What is this repository about?')

print(res)

This repository is about an AI-powered logo generator. It uses advanced algorithms to create unique logo designs tailored to user preferences. Users can customize fonts, colors, and layouts, and they gain full commercial rights to the logos they download. The project includes a user-friendly interface, supports Docker deployment, and is built with technologies like Next.js, Nebius AI, and Drizzle ORM.
