# Day 3 - Lab 2: Refactoring & Documentation

**Objective:** Use an LLM to refactor a complex Python function to improve its readability and maintainability, and then generate comprehensive, high-quality documentation for the project.

**Estimated Time:** 60 minutes

**Introduction:**
Writing code is only the first step; writing *good* code is what makes a project successful in the long run. In this lab, you will use an LLM as a code quality expert. You will refactor a poorly written function to make it cleaner and then generate professional-grade documentation, including docstrings and a README file. These are high-value tasks that AI can significantly accelerate.

For definitions of key terms used in this lab, please refer to the [GLOSSARY.md](../../GLOSSARY.md).

## Step 1: Setup

We will set up our environment and define a sample of poorly written code that we will use as the target for our refactoring and documentation efforts.

**Model Selection:**
Models with strong coding and reasoning abilities are best for this task. `gpt-4.1`, `o3`, or `codex-mini` are great choices. You can also try more general models like `gemini-2.5-pro`.

**Helper Functions Used:**
- `setup_llm_client()`: To configure the API client.
- `get_completion()`: To send prompts to the LLM.
- `save_artifact()`: To save the generated README file.
- `clean_llm_output()`: To clean up the generated code and documentation.

In [2]:
import sys
import os

# Add the project's root directory to the Python path to ensure 'utils' can be imported.
try:
    project_root = os.path.abspath(os.path.join(os.getcwd(), '..', '..'))
except IndexError:
    project_root = os.path.abspath(os.path.join(os.getcwd()))

if project_root not in sys.path:
    sys.path.insert(0, project_root)

from utils import setup_llm_client, get_completion, save_artifact, clean_llm_output, load_artifact

client, model_name, api_provider = setup_llm_client(model_name="gemini-2.5-pro")

2025-10-29 17:39:42,982 ag_aisoftdev.utils INFO LLM Client configured provider=google model=gemini-2.5-pro latency_ms=None artifacts_path=None


## Step 2: The Code to Improve

Here is a sample Python function that is functional but poorly written. It's hard to read, has no comments or type hints, and mixes multiple responsibilities. This is the code we will improve.

In [3]:
bad_code = """
def process_data(data, operation):
    if operation == 'sum':
        total = 0
        for i in data:
            total += i
        return total
    elif operation == 'average':
        total = 0
        for i in data:
            total += i
        return total / len(data)
    elif operation == 'max':
        max_val = data[0]
        for i in data:
            if i > max_val:
                max_val = i
        return max_val
"""

## Step 3: The Challenges

### Challenge 1 (Foundational): Refactoring the Code

**Task:** Use the LLM to refactor the `bad_code` to be more readable, efficient, and maintainable.

**Instructions:**
1.  Create a prompt that instructs the LLM to act as a senior Python developer.
2.  Provide the `bad_code` as context.
3.  Ask the LLM to refactor the code. Be specific about the improvements you want, such as:
    * Breaking the single function into multiple, smaller functions.
    * Using built-in Python functions where appropriate (e.g., `sum()`, `max()`).
    * Adding clear type hints and return types.

> **Tip:** When you ask the AI to refactor, give it a principle to follow. For example, ask it to apply the 'Single Responsibility Principle,' which means each function should do only one thing. This guides the AI to create cleaner, more modular code.

**Expected Quality:** A block of Python code that is functionally identical to the original but is significantly cleaner, more modular, and easier to understand.

In [4]:
refactor_prompt = f"""
You are a senior Python developer and clean code expert. Refactor any python code provided to improve its structure, readability, and efficiency without changing its functionality. Apply best practices such as removing redundancy, improving variable names, and simplifying logic.

# INPUT CODE:
{bad_code}

Follow these guidelines:
 1. Always use SOLID principles.
 2. Write additional helper methods if necessary to ensure Single Responsibility Principle.
 3. Include comments and docstrings for function and class declarations.
 4. Use built-in functions and libraries where applicable.
 5. Add clear type hints and return types.
 7. Do not include starting or ending markdown syntax.
 8. Avoid nesting code too deeply to improve readability.
 9. Avoid String literals for operations, use Enums where applicable.
```
"""

print("--- Refactoring Code ---")
refactored_code = get_completion(refactor_prompt, client, model_name, api_provider)
cleaned_code = clean_llm_output(refactored_code, language='python')
print(cleaned_code)

--- Refactoring Code ---
import collections.abc
from enum import Enum, auto
from typing import Callable, Dict, List, Union

# Use a type alias for clarity and reusability.
Numeric = Union[int, float]
DataList = List[Numeric]


class Operation(Enum):
    """Defines the supported data processing operations in a type-safe way."""
    SUM = auto()
    AVERAGE = auto()
    MAX = auto()


def calculate_sum(data: DataList) -> Numeric:
    """
    Calculates the sum of a list of numbers using the built-in sum() function.

    Args:
        data: A list of numbers (integers or floats).

    Returns:
        The sum of the numbers in the list.
    """
    return sum(data)


def calculate_average(data: DataList) -> float:
    """
    Calculates the average of a list of numbers.

    Args:
        data: A list of numbers (integers or floats).

    Returns:
        The average of the numbers.

    Raises:
        ValueError: If the input list is empty.
    """
    if not data:
        raise ValueEr

### Challenge 2 (Intermediate): Generating Docstrings

**Task:** Prompt the LLM to generate high-quality docstrings for the newly refactored code.

**Instructions:**
1.  Create a new prompt.
2.  Provide the `refactored_code` from the previous step as context.
3.  Instruct the LLM to generate Google-style Python docstrings for each function.
4.  The docstrings should include a description of the function, its arguments (`Args:`), and what it returns (`Returns:`).

**Expected Quality:** The refactored Python code, now with complete and professional-looking docstrings for each function.

In [5]:
docstring_prompt = f"""
You are a senior Python developer and documentation expert. Add Google-style docstrings to all functions and classes in the provided Python code. Ensure that each docstring includes a brief description of the function/class, its parameters with types, and the return type.
The docstrings should include a description of the function, its arguments (`Args:`), and what it returns (`Returns:`).


# INPUT CODE:
{cleaned_code}

# IMPORTANT: 
1. Do not change any code, only add docstrings.
2. The code might already have docstrings. Only add docstrings where they are missing. 
3. Modify existing docstrings to ensure they follow the format specified above.
"""

print("--- Generating Docstrings ---")
code_with_docstrings = get_completion(docstring_prompt, client, model_name, api_provider)
cleaned_code_with_docstrings = clean_llm_output(code_with_docstrings, language='python')
print(cleaned_code_with_docstrings)

--- Generating Docstrings ---
import collections.abc
from enum import Enum, auto
from typing import Callable, Dict, List, Union

# Use a type alias for clarity and reusability.
Numeric = Union[int, float]
DataList = List[Numeric]


class Operation(Enum):
    """Defines the supported data processing operations.

    This enumeration provides a type-safe way to specify which data processing
    operation should be performed.

    Attributes:
        SUM: Represents the summation of all numbers in a list.
        AVERAGE: Represents the calculation of the average of numbers in a list.
        MAX: Represents finding the maximum number in a list.
    """
    SUM = auto()
    AVERAGE = auto()
    MAX = auto()


def calculate_sum(data: DataList) -> Numeric:
    """Calculates the sum of a list of numbers using the built-in sum() function.

    Args:
        data (DataList): A list of numbers (integers or floats).

    Returns:
        Numeric: The sum of the numbers in the list.
    """
    r

### Challenge 3 (Advanced): Generating a Project README

**Task:** Generate a comprehensive `README.md` file for the entire Onboarding Tool project.

**Instructions:**
1.  Create a final prompt that instructs the LLM to act as a technical writer.
2.  This time, you will provide multiple pieces of context: the `day1_prd.md` and the `app/main.py` source code. (You will need to load these files).
3.  Ask the LLM to generate a `README.md` file with the following sections:
    * Project Title
    * Overview (based on the PRD)
    * Features
    * API Endpoints (with `curl` examples)
    * Setup and Installation instructions.
4.  Save the final output to `README.md` in the project's root directory.

**Expected Quality:** A complete, professional `README.md` file that provides a comprehensive overview of the project for other developers.

In [7]:
# Load the necessary context files
prd_content = load_artifact("artifacts/day1_prd_2025-10-28_14-03-52.md")
api_code = load_artifact("app/main.py")

readme_prompt = f"""
You are a technical writer creating a comprehensive README.md file for a new open-source project.

Use the provided Product Requirements Document (PRD) for high-level context and the FastAPI source code for technical implementation details.

**PRD Context:**
<prd>
{prd_content}
</prd>

**API Source Code:**
<code>
{api_code}
</code>

Generate a complete README.md file in GitHub-flavored Markdown format with the following sections:

1. **Project Title** - Extract the project name from the PRD and create an engaging title
2. **Overview** - Summarize the project's purpose, problem it solves, and target users based on the PRD content
3. **Features** - List the key functionalities and capabilities derived from both the PRD and the API endpoints
4. **API Endpoints** - Document each available API route with:
   - HTTP method and path
   - Brief description of functionality
   - Practical `curl` examples for testing each endpoint
   - Include JSON request bodies for POST/PUT endpoints using the Pydantic models from the code
5. **Setup and Installation** - Provide step-by-step instructions including:
   - Prerequisites (Python version, dependencies)
   - Virtual environment setup
   - Installing requirements
   - Database setup (if applicable)
   - Running the FastAPI server with uvicorn

Ensure the README is professional, well-structured, and provides everything a developer needs to understand and run the project locally. Use proper Markdown formatting including code blocks, headers, and lists.

Output only the raw Markdown content for the README.md file.
"""

print("--- Generating Project README ---")
if prd_content and api_code:
    readme_content = get_completion(readme_prompt, client, model_name, api_provider)
    cleaned_readme = clean_llm_output(readme_content, language='markdown')
    print(cleaned_readme)
    save_artifact(cleaned_readme, "README.md")
else:
    print("Skipping README generation because PRD or API code is missing.")

--- Generating Project README ---
# Ascend Onboarding Platform API

Welcome to the Ascend Onboarding Platform API, the backend service powering a revolutionary new hire experience. This project aims to transform a typically fragmented process into a structured, engaging, and efficient journey.

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python Version](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![Framework](https://img.shields.io/badge/Framework-FastAPI-green)](https://fastapi.tiangolo.com/)

---

## Overview

The Ascend Onboarding Platform is designed to solve the common challenges of new hire onboarding. New employees often face an overwhelming amount of information, while HR and managers lack centralized tools to track progress and provide support.

This API provides the core infrastructure to:
-   **For New Hires (`Eager Contributors`):** Create personalized learning pat

## Lab Conclusion

Well done! You have used an LLM to perform two of the most valuable code quality tasks: refactoring and documentation. You've seen how AI can help transform messy code into a clean, maintainable structure and how it can generate comprehensive documentation from high-level project artifacts and source code. These skills are a massive productivity multiplier for any development team.

> **Key Takeaway:** LLMs excel at understanding and generating structured text, whether that structure is code or documentation. Providing a clear 'before' state (the bad code) and a clear goal (the refactoring principles) allows the AI to perform complex code transformation and documentation tasks efficiently.