# Day 3 - Lab 2: Refactoring & Documentation

**Objective:** Use an LLM to refactor a complex Python function to improve its readability and maintainability, and then generate comprehensive, high-quality documentation for the project.

**Estimated Time:** 60 minutes

**Introduction:**
Writing code is only the first step; writing *good* code is what makes a project successful in the long run. In this lab, you will use an LLM as a code quality expert. You will refactor a poorly written function to make it cleaner and then generate professional-grade documentation, including docstrings and a README file. These are high-value tasks that AI can significantly accelerate.

For definitions of key terms used in this lab, please refer to the [GLOSSARY.md](../../GLOSSARY.md).

## Step 1: Setup

We will set up our environment and define a sample of poorly written code that we will use as the target for our refactoring and documentation efforts.

**Model Selection:**
Models with strong coding and reasoning abilities are best for this task. `gpt-4.1`, `o3`, or `codex-mini` are great choices. You can also try more general models like `gemini-2.5-pro`.

**Helper Functions Used:**
- `setup_llm_client()`: To configure the API client.
- `get_completion()`: To send prompts to the LLM.
- `save_artifact()`: To save the generated README file.
- `clean_llm_output()`: To clean up the generated code and documentation.

In [6]:
import sys
import os

# Add the project's root directory to the Python path to ensure 'utils' can be imported.
try:
    project_root = os.path.abspath(os.path.join(os.getcwd(), '..', '..'))
except IndexError:
    project_root = os.path.abspath(os.path.join(os.getcwd()))

if project_root not in sys.path:
    sys.path.insert(0, project_root)

from utils import setup_llm_client, get_completion, save_artifact, load_artifact, clean_llm_output, prompt_enhancer

client, model_name, api_provider = setup_llm_client(model_name="gemini-2.5-pro")

2025-10-30 16:12:26,415 ag_aisoftdev.utils INFO LLM Client configured provider=google model=gemini-2.5-pro latency_ms=None artifacts_path=None


## Step 2: The Code to Improve

Here is a sample Python function that is functional but poorly written. It's hard to read, has no comments or type hints, and mixes multiple responsibilities. This is the code we will improve.

In [7]:
bad_code = """
def process_data(data, operation):
    if operation == 'sum':
        total = 0
        for i in data:
            total += i
        return total
    elif operation == 'average':
        total = 0
        for i in data:
            total += i
        return total / len(data)
    elif operation == 'max':
        max_val = data[0]
        for i in data:
            if i > max_val:
                max_val = i
        return max_val
"""

## Step 3: The Challenges

### Challenge 1 (Foundational): Refactoring the Code

**Task:** Use the LLM to refactor the `bad_code` to be more readable, efficient, and maintainable.

**Instructions:**
1.  Create a prompt that instructs the LLM to act as a senior Python developer.
2.  Provide the `bad_code` as context.
3.  Ask the LLM to refactor the code. Be specific about the improvements you want, such as:
    * Breaking the single function into multiple, smaller functions.
    * Using built-in Python functions where appropriate (e.g., `sum()`, `max()`).
    * Adding clear type hints and return types.

> **Tip:** When you ask the AI to refactor, give it a principle to follow. For example, ask it to apply the 'Single Responsibility Principle,' which means each function should do only one thing. This guides the AI to create cleaner, more modular code.

**Expected Quality:** A block of Python code that is functionally identical to the original but is significantly cleaner, more modular, and easier to understand.

In [8]:
# TWrite a prompt to refactor the 'bad_code'.
refactor_prompt = f"""
You are a senior Python developer.

Based on the bad_code provided below, refactor the code
with the following improvements:

- Break the single function into multiple, smaller
functions.
- Make sure these functions are significantly
cleaner and easier to understand.
- You can include some comments (but not too many) to
explain the logic throughout the code.
- Use built-in Python functions where appropriate
(e.g.: sum(), max())
- Add clear type hints and return types
- Apply the Single Responsibility Principle
(i.e.: each function should do only one thing) so
that you create cleaner, more modular code.

# bad_code #:
{bad_code}

** OUTPUT REQUIREMENTS**:
- Produce a block of Python code.
- Ensure the refactored code is syntactically correct.
"""

print("--- Refactoring Code ---")
enhanced_refactor_prompt = prompt_enhancer(refactor_prompt)
print("enhanced_refactor_prompt:", enhanced_refactor_prompt)
enhanced_refactored_code = get_completion(enhanced_refactor_prompt, client, model_name, api_provider)
cleaned_code = clean_llm_output(enhanced_refactored_code, language='python')
print(cleaned_code)

--- Refactoring Code ---


2025-10-30 16:12:26,769 ag_aisoftdev.utils INFO LLM Client configured provider=openai model=o3 latency_ms=None artifacts_path=None


enhanced_refactor_prompt: <prompt>

  <persona>
    You are an expert Python software engineer specializing in clean-code refactoring, modular design, and PEP 8 compliance.
  </persona>

  <context>
    Original code to refactor:
    ```python
    def process_data(data, operation):
        if operation == 'sum':
            total = 0
            for i in data:
                total += i
            return total
        elif operation == 'average':
            total = 0
            for i in data:
                total += i
            return total / len(data)
        elif operation == 'max':
            max_val = data[0]
            for i in data:
                if i > max_val:
                    max_val = i
            return max_val
    ```

    Refactoring goals and constraints:
    • Break the single function into multiple, smaller functions.  
    • Ensure each function is clean, focused, and easy to understand.  
    • Add concise comments only where they improve clarity.  
    

### Challenge 2 (Intermediate): Generating Docstrings

**Task:** Prompt the LLM to generate high-quality docstrings for the newly refactored code.

**Instructions:**
1.  Create a new prompt.
2.  Provide the `refactored_code` from the previous step as context.
3.  Instruct the LLM to generate Google-style Python docstrings for each function.
4.  The docstrings should include a description of the function, its arguments (`Args:`), and what it returns (`Returns:`).

**Expected Quality:** The refactored Python code, now with complete and professional-looking docstrings for each function.

In [9]:
# TWrite a prompt to add Google-style docstrings to the refactored code.
docstring_prompt = f"""
# You are a senior Python developer.

Based on the refactored code provided below,
generate high-quality Google-style Python docstrings
for each function in the code:

# Refactored Code #:
{cleaned_code}

The docstrings should include:
- A description of the function
- A description of its arguments (Args:)
- A description of what it returns (Regurns:)

** OUTPUT REQUIREMENTS**:
- Complete and professional-looking docstrings for
each function in the code.
- Ensure the docstrings are syntactically correct.
"""

print("--- Generating Docstrings ---")
code_with_docstrings = get_completion(docstring_prompt, client, model_name, api_provider)
cleaned_code_with_docstrings = clean_llm_output(code_with_docstrings, language='python')
print(cleaned_code_with_docstrings)

--- Generating Docstrings ---
from typing import Callable, Dict


def calculate_sum(data: list[int | float]) -> int | float:
    """Calculates the total sum of a list of numbers.

    Args:
        data (list[int | float]): A list of numbers (integers or floats).

    Returns:
        int | float: The total sum of the numbers in the list.
    """
    return sum(data)


def calculate_average(data: list[int | float]) -> float:
    """Calculates the average of a list of numbers.

    Args:
        data (list[int | float]): A list of numbers to be averaged.

    Returns:
        float: The average of the numbers.

    Raises:
        ValueError: If the input list `data` is empty.
    """
    if not data:
        raise ValueError("Cannot calculate the average of an empty list.")
    return sum(data) / len(data)


def find_maximum(data: list[int | float]) -> int | float:
    """Finds the maximum value in a list of numbers.

    Args:
        data (list[int | float]): A list of numbers to sea

### Challenge 3 (Advanced): Generating a Project README

**Task:** Generate a comprehensive `README.md` file for the entire Onboarding Tool project.

**Instructions:**
1.  Create a final prompt that instructs the LLM to act as a technical writer.
2.  This time, you will provide multiple pieces of context: the `day1_prd.md` and the `app/main.py` source code. (You will need to load these files).
3.  Ask the LLM to generate a `README.md` file with the following sections:
    * Project Title
    * Overview (based on the PRD)
    * Features
    * API Endpoints (with `curl` examples)
    * Setup and Installation instructions.
4.  Save the final output to `README.md` in the project's root directory.

**Expected Quality:** A complete, professional `README.md` file that provides a comprehensive overview of the project for other developers.

In [10]:
# Load the necessary context files
prd_content = load_artifact("artifacts/day1_prd.md")
api_code = load_artifact("app/main.py")

# TWrite a prompt to generate a complete README.md file.
readme_prompt = f"""
You are a technical writer.

Based on the Product Requirements Document (PRD) for the
new hiring onboarding tool and the FastAPI codebase
provided below, generate a complete, professional
README.md file that includes a thorough description
of each of the following sections:
- Project Title
- Overview (based on the PRD)
- Features
- API Endpoints (with curl examples)
- Setup and Installation instructions

# Product Requirements Document (PRD) #:
{prd_content}

# FastAPI Codebase #:
{api_code}

** OUTPUT REQUIREMENTS**:
- Ensure that the README.md file provides a 
comprehensive overview of the entire
Onboarding Tool project for other developers.
- Save the final output to README.md in the
project's root directory.
- Ensure the README.md is formatted in valid
Markdown syntax.
"""

print("--- Generating Project README ---")
if prd_content and api_code:
    readme_content = get_completion(readme_prompt, client, model_name, api_provider)
    cleaned_readme = clean_llm_output(readme_content, language='markdown')
    print(cleaned_readme)
    save_artifact(cleaned_readme, "README.md", overwrite = True)
else:
    print("Skipping README generation because PRD or API code is missing.")

--- Generating Project README ---
# WelcomePath: New Hire Onboarding Platform API

WelcomePath is a centralized, digital onboarding platform designed to streamline the new hire experience. This repository contains the backend API service, built with FastAPI, which powers the core functionality of the platform.

## Overview

Based on the WelcomePath Product Requirements Document, this API aims to solve the problem of fragmented, manual, and inconsistent onboarding processes. New hires often face a disconnected and paper-heavy experience, leading to first-day anxiety and reduced productivity. Simultaneously, managers and HR teams are burdened with significant administrative overhead.

The vision for WelcomePath is to become the single source of truth for onboarding, fostering a welcoming and engaging environment that accelerates new hire productivity and integration into the company culture. This API provides the necessary backend infrastructure to manage users, roles, tasks, and their a

## Lab Conclusion

Well done! You have used an LLM to perform two of the most valuable code quality tasks: refactoring and documentation. You've seen how AI can help transform messy code into a clean, maintainable structure and how it can generate comprehensive documentation from high-level project artifacts and source code. These skills are a massive productivity multiplier for any development team.

> **Key Takeaway:** LLMs excel at understanding and generating structured text, whether that structure is code or documentation. Providing a clear 'before' state (the bad code) and a clear goal (the refactoring principles) allows the AI to perform complex code transformation and documentation tasks efficiently.