# **Code Documentation System**

## **Abstract**

This AI-powered Code Documentation System automates the generation of readable explanations and structured documentation for code snippets. Using 'gemini-2.0-flash' model's API, it provides summaries, identifies functions, suggests missing docstrings, and recommends improvements. Enhanced with embeddings and a RAG-style retrieval system, it delivers context-aware insights and supports both JSON and Markdown outputs. The tool simplifies code understanding, aiding developers and learners in writing and maintaining better software.

## **Introduction**

The Code Documentation Assistant is an interactive AI-powered tool designed to help developers generate high-quality explanations, summaries, and improvement suggestions for code snippets. Leveraging large language models with few-shot prompting, structured JSON outputs, and retrieval-augmented generation (RAG), this assistant supports multiple explanation formats—ranging from formal summaries to engaging, story-like descriptions.

It integrates advanced features like code embeddings, vector search using FAISS, and long context window support through the Gemini API, allowing it to retrieve similar code examples to enrich understanding. The system also provides basic evaluation of explanation quality, making it an excellent tool for learning, onboarding, and improving code maintainability.

Whether you're a junior developer trying to understand unfamiliar code or a senior engineer documenting a large codebase, this assistant streamlines the process with intelligent, context-aware outputs.

## **Code**

### Installing dependencies and importing libraries

In [None]:
# Install dependencies
!pip install -q google-generativeai faiss-cpu

# Imports
import os
import json
import faiss
import numpy as np
import google.generativeai as genai
import re

### Setting Gemini API key and initializing the Gemini model

In [None]:
# Set Gemini API key
GEMINI_API_KEY = "AIzaSyBw4CpIbBOpmq8n3CNkLxaDzCIKutz4TWw"
genai.configure(api_key=GEMINI_API_KEY)

# Initialize Gemini Model
model = genai.GenerativeModel('gemini-2.0-flash')

### Few-shot template

In [None]:
FEW_SHOT_EXAMPLES = """
Examples:

Code:
def add(a, b):
    return a + b

Response:
{
  "summary": "This function performs addition of two numbers.",
  "functions": [
    {
      "name": "add",
      "description": "Returns the sum of two input values a and b."
    }
  ],
  "missing_docstrings": "def add(a, b):\\n    '''Adds two numbers and returns the result.'''",
  "potential_improvements": "Add type hints for better clarity."
}
"""

### Generate structured explanation with few-shot prompting

In [None]:
def generate_code_explanation(code_snippet: str, output_format="json"):
    if output_format.lower() == "json":
        prompt = f"""
You are a code documentation assistant. Respond only in JSON with:
- summary
- functions (name and description)
- missing docstrings
- potential improvements

{FEW_SHOT_EXAMPLES}

Now analyze this code:

Code:
{code_snippet}
"""
    else:  # Story format
        prompt = f"""
You are a code documentation assistant. Analyze the following code and explain it as a story
in a creative, engaging way. Make the explanation accessible while still being technically accurate.
Include information about:
- What the code does
- The functions and their purpose
- Any missing documentation
- Potential improvements

Code:
{code_snippet}
"""

    response = model.generate_content(prompt)

    if output_format.lower() == "json":
        return extract_json_from_response(response.text)
    else:
        return {"story": response.text}

### JSON extraction from response function

In [None]:
def extract_json_from_response(text):
    try:
        # First try direct parsing
        return json.loads(text)
    except json.JSONDecodeError:
        # If that fails, try to extract JSON from markdown code blocks
        json_pattern = r'```(?:json)?\s*([\s\S]*?)\s*```'
        match = re.search(json_pattern, text)
        if match:
            try:
                return json.loads(match.group(1))
            except json.JSONDecodeError:
                pass

        # If still no valid JSON, return error with raw output
        return {"error": "Model output not valid JSON", "raw_output": text}

### Embedding generation

In [None]:
def get_code_embedding(code_snippet: str):
    response = genai.embed_content(
        model="models/embedding-001",
        content=code_snippet,
        task_type="retrieval_document"
    )
    return response['embedding']

### Embed storing and implementing FAISS index

In [None]:
embedding_dimension = 768  # 768 is the size of Gemini embeddings
embedding_index = faiss.IndexFlatL2(embedding_dimension)
code_snippets = []

def store_code_snippet(code_snippet: str):
    embedding = np.array([get_code_embedding(code_snippet)]).astype("float32")
    embedding_index.add(embedding)
    code_snippets.append(code_snippet)

def search_similar_code(query_snippet: str, top_k=1):
    query_embedding = np.array([get_code_embedding(query_snippet)]).astype("float32")
    D, I = embedding_index.search(query_embedding, top_k)
    return [code_snippets[i] for i in I[0]]

### RAG-style loop

In [None]:
def rag_enhanced_explanation(query_code: str, output_format="json"):
    similar_snippets = search_similar_code(query_code)
    context = "\n\n".join(similar_snippets)

    if output_format.lower() == "json":
        combined_prompt = f"""
You are a code explanation assistant. Use the context below to help generate better explanation.

Context:
{context}

New Code:
{query_code}

Return your response in this structured JSON format:
{{
  "summary": "...",
  "functions": [...],
  "missing_docstrings": "...",
  "potential_improvements": "..."
}}
"""
    else:  # Story format
        combined_prompt = f"""
You are a code explanation assistant. Use the context below to help generate better explanation.

Context:
{context}

New Code:
{query_code}

Explain this code as an engaging story that a junior developer would find both entertaining and educational.
Include information about what the code does, its functions, any missing documentation, and potential improvements.
"""

    response = model.generate_content(combined_prompt)

    if output_format.lower() == "json":
        return extract_json_from_response(response.text)
    else:
        return {"story": response.text}

### Explanation quality evaluation

In [None]:
def evaluate_explanation_quality(explanation: dict):
    if "error" in explanation:
        return f"Error in explanation: {explanation['error']}"
    elif "story" in explanation:
        word_count = len(explanation["story"].split())
        if word_count > 100:
            return f"Good story explanation with {word_count} words"
        return f"Story explanation too short: {word_count} words"
    elif "summary" in explanation and len(explanation["summary"].split()) > 3:
        return "Good summary"
    return "Summary is too short or missing"

def print_explanation(explanation, output_format):
    if output_format.lower() == "json":
        print(json.dumps(explanation, indent=2))
    else:  # Story format
        if "story" in explanation:
            print("\n--- CODE STORY ---\n")
            print(explanation["story"])
            print("\n-----------------\n")
        else:
            print("Error generating story format")
            print(explanation)

### Main Function

In [None]:
def main():
    # Check if model is properly initialized
    if model is None:
        print("Error: Model is not initialized. Please configure your AI model first.")
        return

    while True:
        print("\n=== CODE DOCUMENTATION ASSISTANT ===")
        print("1. Analyze code")
        print("2. Exit")
        choice = input("Enter your choice (1-2): ")

        if choice == "2":
            print("Goodbye!")
            break

        if choice == "1":
            # Get code input
            print("\nEnter or paste your code (type 'DONE' on a new line when finished):")
            code_lines = []
            while True:
                line = input()
                if line == "DONE":
                    break
                code_lines.append(line)

            user_code = "\n".join(code_lines)

            if not user_code.strip():
                print("No code provided. Please try again.")
                continue

            # Get format preference
            format_choice = input("\nChoose output format (json/story): ").lower()
            output_format = "json" if format_choice == "json" else "story"

            # Determine analysis method
            analysis_method = input("\nUse RAG enhancement? (y/n): ").lower()

            print("\nAnalyzing code...")

            # Store for future RAG comparisons
            try:
                store_code_snippet(user_code)
                print("Code stored in vector database.")
            except Exception as e:
                print(f"Warning: Could not store code in vector database. Error: {e}")

            # Generate explanation
            try:
                if analysis_method == "y":
                    explanation = rag_enhanced_explanation(user_code, output_format)
                    print("\n=== RAG-ENHANCED EXPLANATION ===")
                else:
                    explanation = generate_code_explanation(user_code, output_format)
                    print("\n=== BASIC EXPLANATION ===")

                print_explanation(explanation, output_format)
                print("\nEvaluation:", evaluate_explanation_quality(explanation))

            except Exception as e:
                print(f"Error generating explanation: {e}")
        else:
            print("Invalid choice. Please try again.")

if __name__ == "__main__":
    main()