<a href="https://colab.research.google.com/github/quartermaine/LLMs_Open/blob/main/Code_Base_Anlyzer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# -- START NOTEBOOK

## GitHub Repository Analyzer 🤖

### **Description**

The GitHub Repository Analyzer is a Gradio-powered application designed to analyze the software architecture of any GitHub repository. The analysis is powered by models deployed on Azure OpenAI.

### **Features**

✅ User-Friendly Interface – Provide a GitHub repository URL and receive a structured architectural analysis.

✅ Codebase Parsing – Automatically converts the repository's code into a structured format for analysis.

✅ LLM-Powered Analysis – Uses Azure OpenAI models to generate insights into the repository’s software architecture.

✅ Markdown Output – Presents the analysis in a clear, well-formatted markdown format.

### **How to Use**

1️⃣ Start the Tool – Run the provided code in your Google Colab notebook to launch the GitHub Repository Analyzer.

2️⃣ Enter Repository URL – Input the GitHub repository URL into the text box.

3️⃣ Submit Your Query – Click the "Submit" button. The agent will clone the repository, analyze its structure, and generate an architectural breakdown.

4️⃣ Review the Output – The structured markdown analysis will be displayed in the output field.

### **Handling Errors & Limitations**

🚨 Repository Size Limitation – Due to prompt size constraints, very large repositories may cause errors. If the repository is too big, the LLM might not be able to process the full codebase, leading to incomplete analysis or failures. Consider analyzing a smaller repository or breaking the analysis into smaller parts.

🚨 Other Errors – If you encounter an issue (e.g., cloning failures or API timeouts), verify your environment variables and try again with a different repository.

### **Adding Environment Variables**

Define the following environment variables in the secrets section of your Google Colab notebook:

🔑 OPENAI_API_KEY – Your Azure OpenAI API key.

🔗 AZURE_OPENAI_ENDPOINT – Your Azure OpenAI deployment endpoint URL.

📌 DEPLOYMENT_NAME – The name of your Azure OpenAI deployment.

⚙️ OPENAI_API_TYPE – The API type (for this notebook, it is "azure").

📅 OPENAI_API_VERSION – The Azure OpenAI API version.

By setting up these environment variables, you ensure seamless integration and functionality of the GitHub Repository Analyzer CrewAI Agent.


In [2]:
#@title Install libraries

%%capture
!pip install py-llm-core llm-components gitpython gradio


In [3]:
#@title Import modules

import os
import tempfile
from pathlib import Path
from textwrap import fill
from dataclasses import dataclass
from google.colab import userdata
from typing import List
import gradio as gr
from git import Repo
import json
# map_codebase_to_text converts a codebase directory into structured markdown text.
from llm_components.loaders.code_base import map_codebase_to_text
# Import AzureOpenAIAssistant from py-llm-core.
from llm_core.assistants import AzureOpenAIAssistant


In [10]:
#@title Set env vatiables

# Set Azure OpenAI configuration
os.environ['AZURE_OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')
os.environ['MODEL_NAME'] = userdata.get('DEPLOYMENT_NAME')
os.environ['AZURE_OPENAI_API_TYPE'] = userdata.get('OPENAI_API_TYPE')
os.environ['AZURE_OPENAI_API_VERSION'] = userdata.get('OPENAI_API_VERSION')
os.environ['AZURE_OPENAI_ENDPOINT'] = userdata.get('AZURE_OPENAI_ENDPOINT')


In [11]:
#@title Define helper functions


# -----------------------------
# Data classes definition
# -----------------------------
@dataclass
class LowLevelModule:
    name: str
    description: str

@dataclass
class HighLevelModule:
    name: str
    description: str
    sub_modules: List[LowLevelModule]

@dataclass
class SoftwareArchitecture:
    # System prompt and main prompt templates.
    system_prompt: str = "You are a software architect"
    prompt: str = (
        "Code base:\n{code_base}\n----\n\n"
        "Analyze this code base and carefully write a description of the software architecture."
    )
    name: str = ""
    description: str = ""
    modules: List[HighLevelModule] = None

    def to_markdown(self) -> str:
        lines = [
            f"# {self.name}\n\n",
            f"{fill(self.description, width=60)}\n\n",
        ]
        if self.modules:
            for module in self.modules:
                lines.append(f"## {module.name}\n\n")
                lines.append(f"**Description:** {fill(module.description, width=60)}\n\n")
                lines.append("### Sub-modules\n\n")
                for sub_module in module.sub_modules:
                    lines.append(f"**{sub_module.name}**\n\n")
                    lines.append(f"{fill(sub_module.description, width=60)}\n\n")
        return "".join(lines)

# -----------------------------
# Utility Functions
# -----------------------------
def clone_repository(repo_url: str, clone_dir: Path):
    """
    Clone the given GitHub repository URL into the specified directory.
    """
    Repo.clone_from(repo_url, str(clone_dir), depth=1)

def analyze_repository(repo_url: str) -> str:
    """
    Clones the GitHub repository, converts its code base to text, then uses an LLM to analyze the software architecture.
    Returns the analysis as markdown.
    """
    # Create a temporary directory for cloning the repo.
    with tempfile.TemporaryDirectory() as temp_dir:
        clone_dir = Path(temp_dir) / "repo"
        try:
            clone_repository(repo_url, clone_dir)
        except Exception as e:
            return f"Error cloning repository: {e}"

        try:
            # Convert the code base into a structured markdown text.
            code_base = map_codebase_to_text(clone_dir)
        except Exception as e:
            return f"Error processing code base: {e}"

        # Use OpenAIAssistant to analyze the code base.
        try:
            # The model name here should match your environment variable or chosen model.
            model_name = os.environ.get("MODEL_NAME", "gpt-4")
            # You can also pass other configurations using environment variables as needed.
            with AzureOpenAIAssistant(SoftwareArchitecture, model=model_name) as assistant:
                software_architecture = assistant.process(code_base=code_base)
                markdown_output = software_architecture.to_markdown()
                return markdown_output
        except Exception as e:
            return f"Error analyzing code base with LLM: {e}"



In [None]:
#@title --

# def analyze_repository(repo_url: str) -> str:
#     """
#     Clones the GitHub repository, converts its code base to text, then uses an LLM to analyze the software architecture.
#     Returns the analysis as markdown.
#     """
#     # Create a temporary directory for cloning the repo.
#     with tempfile.TemporaryDirectory() as temp_dir:
#         clone_dir = Path(temp_dir) / "repo"
#         try:
#             clone_repository(repo_url, clone_dir)
#         except Exception as e:
#             return f"Error cloning repository: {e}"

#         try:
#             # Convert the code base into a structured markdown text.
#             code_base = map_codebase_to_text(clone_dir)
#         except Exception as e:
#             return f"Error processing code base: {e}"

#         # Use OpenAI's API to analyze the code base.
#         try:
#             # Set your Azure OpenAI configuration
#             openai.api_type = "azure"
#             openai.api_base = os.environ.get("AZURE_OPENAI_ENDPOINT")
#             openai.api_version = os.environ.get("AZURE_OPENAI_API_VERSION")
#             openai.api_key = os.environ.get("AZURE_OPENAI_API_KEY")

#             # Define the prompt for the model
#             prompt = (
#                 "You are a software architect.\n\n"
#                 "Analyze the following code base and provide a detailed description of the software architecture, "
#                 "including high-level modules and their sub-modules:\n\n"
#                 f"{code_base}\n\n"
#                 "Provide the analysis in a structured markdown format."
#             )

#             # Call the OpenAI API
#             response = openai.Completion.create(
#                 engine=os.environ.get("MODEL_NAME", "gpt-4"),  # Replace with your model deployment name
#                 prompt=prompt,
#                 max_tokens=1000,  # Adjust based on your needs
#                 temperature=0.5
#             )

#             # Extract the generated text
#             analysis = response.choices[0].text.strip()
#             return analysis
#         except Exception as e:
#             return f"Error analyzing code base with LLM: {e}"


In [12]:
#@title Gradio app

# -----------------------------
# Gradio Interface
# -----------------------------
def gradio_analyze(repo_url: str) -> str:
    """
    Gradio interface function: Given a GitHub repo URL, return the analysis.
    """
    if not repo_url.startswith("http"):
        return "Please provide a valid GitHub repository URL."
    return analyze_repository(repo_url)

# Create and launch the Gradio interface.
iface = gr.Interface(
    fn=gradio_analyze,
    inputs=gr.Textbox(label="GitHub Repository URL", placeholder="https://github.com/username/repository"),
    outputs=gr.Textbox(label="Architecture Analysis (Markdown)"),
    title="Codebase Analyzer with LLMs",
    description=(
        "Set your secrets (AZURE_OPENAI_API_KEY, MODEL_NAME, AZURE_OPENAI_API_TYPE, etc.) in the Colab secrets. "
        "Then provide the URL of a GitHub repository to analyze its code base and extract the software architecture."
    )
)

iface.launch(debug= True)


Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://f3680193d6689ac05d.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://f3680193d6689ac05d.gradio.live




# -- END NOTEBOOK