# PyRepo-Pal: AI-Assisted Python Environment Management

This document outlines a brainstorming idea for PyRepo-Pal, a tool using a multi-tiered, AI-assisted approach to analyze Python repositories, determine their system and dependency requirements, and help create tailored Conda environments or `requirements.txt` files. The goal is to provide an optimal and compatible setup, adapting to general or specialized hardware (like GPUs) as inferred from the repository's needs.

## High-Level PyRepo-Pal Workflow

This outlines the general sequence of operations envisioned for PyRepo-Pal. It's a high-level overview and subject to refinement as the project evolves.

1.  **Collect Data:** Gather raw data from primary sources. This involves:
    *   **User System Information:** Profiling the user's local machine (OS, CPU, RAM, GPU, disk, Python versions) using the `/src/business/sys/sys_profiler.py` script. The output (`hardware_specs.json`) is stored in `/src/data/system_info/`.
    *   **Target Repository Data:** Reading key files from the specified Python project repository (`README.md`, `requirements.txt`, `environment.yaml`, any existing explicit requirements files). This will be handled by a new script (e.g., `/src/business/ai/repo_data_collector.py`). The collected raw text data will be processed and potentially stored temporarily or passed directly to the AI prompt generation stage.
    *   **(Implicit) User Configuration/Intent:** Parameters provided by the user when running PyRepo-Pal (e.g., target repo path, desired output format, specific flags).
    *(Note: External APIs for package metadata might be queried in later analysis steps rather than initial bulk collection.)*
2.  **Process Data:** Clean, transform, and structure the collected raw data into a usable format.
3.  **Analyze Data - Level 1:** Perform initial analysis on the processed data to extract basic insights and identify key characteristics (e.g., primary language, presence of dependency files).
4.  **Persist Data as Information:** Store the results of the initial analysis and processed data in a structured way (e.g., `hardware_specs.json`, parsed repository file contents).
5.  **Process Information:** Further refine and combine the persisted information, preparing it for more in-depth analysis.
6.  **Analyze Information (AI-Assisted):** Leverage AI (e.g., Gemini) to perform deeper analysis on the processed information. This includes:
    *   Determining likely minimum system requirements (Step 1A of Multi-Tiered Approach).
    *   Characterizing the environment and high-level dependencies (Step 4, Stage 1 of Multi-Tiered Approach).
    *   Resolving detailed and compatible dependency versions (Step 4, Stage 2 of Multi-Tiered Approach).
7.  **Display Deliverables (Edit Mode):** Present the AI-generated findings and proposed environment configurations to the user in an editable format, allowing for review and modification.
8.  **Re-Analyze if Necessary:** If the user makes significant changes or requests adjustments, re-engage the AI or processing steps with the new input.
9.  **Persist Deliverables:** Save the final, user-approved environment specifications (e.g., `environment.yml`, `requirements.txt`, final system requirements summary).
10. **Provide User Final Deliverables:** Make the persisted deliverables available to the user for setting up their environment or for documentation.

## Proposed Multi-Tiered Approach

The envisioned process involves the following steps:

1.  **Analyze Repository & User's System:**
    *   **Action (Phase A - AI-Driven Requirement Analysis):** Use Gemini (with the `generate_min_sys_reqs_from_repo_prompt.md` template) to analyze the target repository's files (`README.md`, `requirements.txt`, `environment.yaml`, existing requirements docs). Goal: Have AI determine the *likely* minimum system requirements (OS, CPU, RAM, Python version, and importantly, whether specialized hardware like GPUs seems necessary).
    *   **Action (Phase B - User Hardware Profiling):** Programmatically collect details about the user's system using `sys_profiler.py` (OS, CPU, RAM, disk, Python versions, and if available, GPU details). This profile (`hardware_specs.json`) provides the 'actual' state of the user's machine.
    *   **Action (Phase C - Prerequisite Check):** Compare the AI-determined requirements (from Phase A) against the user's actual hardware profile (from Phase B). If fundamental requirements (e.g., OS, minimum RAM, or a required GPU type if AI indicated one) are not met, inform the user and potentially halt or offer guidance.
    *   **Rationale:** This approach first uses AI to understand the *repository's* needs, then checks if the user's system is a potential match before proceeding to detailed environment configuration. It avoids assuming specialized hardware upfront.

2.  **Generate Hardware Information File:**
    *   **Action:** Store the queried hardware details in a structured, intermediate file (e.g., JSON or YAML).
    *   **Rationale:** Creates a clear, inspectable artifact for debugging, logging, and as input for subsequent steps. It also allows for potential caching.

3.  **Utilize Prompt Templates with Placeholders:**
    *   **Action:** Develop pre-defined prompt templates for interacting with Gemini. These templates will include placeholders for the hardware information gathered in step 2.
    *   **Rationale:** Ensures consistent and manageable AI interactions. The specific hardware details can be injected into these templates to customize the queries.

4.  **AI-Powered Dependency Resolution & Environment Specification:**
    *   **Stage 1: Environment Characterization & High-Level Dependencies Query:**
        *   **Input:** Content from the repository's `requirements.txt`, `environment.yaml`, and potentially key phrases from `README.md` indicating project type or critical dependencies. User's Python version(s) from `hardware_specs.json`.
        *   **Goal with Gemini:** 
            *   Identify the primary Python version suitable for the project.
            *   Determine if the project requires specialized hardware (e.g., NVIDIA GPU, specific accelerators) based on its dependencies (e.g., `tensorflow-gpu`, `pytorch+cuda`, `jaxlib[cuda]`).
            *   If specialized hardware is indicated, identify the type (e.g., NVIDIA CUDA) and any version constraints mentioned or implied by packages.
            *   List core dependencies and their high-level version constraints.
        *   **Rationale:** To understand the nature of the required environment (general Python, or specialized like CUDA-enabled) before diving into exact package versions.
    *   **Stage 2: Detailed & Compatible Dependency Versioning Query (Conditional):**
        *   **Input:** Insights from Stage 1, the user's detailed hardware specs from `hardware_specs.json` (especially GPU model, driver version if Stage 1 indicated GPU needs), and the list of packages from the repository.
        *   **Goal with Gemini:**
            *   **If specialized environment (e.g., CUDA):** Determine exact compatible versions for all dependencies, including specific builds (e.g., `pytorch==X.Y.Z+cuABC`, `cudatoolkit=U.V`, `cudnn=S.T`). The AI should consider the user's *actual* driver and GPU capabilities.
            *   **If general Python environment:** Resolve a compatible set of versions for all listed Python packages, respecting any version specifiers in the original files and ensuring they work with the chosen Python version.
        *   **Output:** A precise list of packages and versions suitable for generating an `environment.yml` (for Conda) or a `requirements.txt` file.
        *   **Rationale:** To leverage AI for the complex task of finding a fully compatible set of dependencies, tailored to the user's system if specialized hardware is involved, or a robust general Python environment otherwise.

5.  **(Post-Generation) Verify with Post-Install Script:**
    *   **Action:** After the environment is created using the Gemini-generated files, run a verification script (e.g., `verify_installation.py`, potentially adapted to use dynamically generated expected versions).
    *   **Rationale:** Confirms that the environment was created as specified and that all components are functioning correctly, providing a final quality assurance check.

## AI-Driven Minimum System Requirement Determination

This is indeed a fascinating challenge! We're looking to leverage the analytical power of Gemini to synthesize information from various common repository files and infer the minimum system requirements. This is a smart way to automate a task that often requires manual reading and interpretation.

The core idea is to:

1.  Gather data from available sources within a target repository (`README.md`, `requirements.txt`, `environment.yaml`, and any existing explicit requirements file).
2.  Feed this collated information into a carefully crafted prompt.
3.  Have Gemini analyze this information and output a structured list of minimum system requirements.

### Prompt Template for Gemini

A new prompt template file will be created in `src/business/ai/prompt_templates/` named `generate_min_sys_reqs_from_repo_prompt.md`.

**Content of `generate_min_sys_reqs_from_repo_prompt.md`:**
```markdown
# Prompt to Determine Minimum System Requirements for a Software Repository

You are an expert system analyst AI. Your task is to analyze the provided information extracted from a software repository and determine its minimum system requirements.

## Repository Context
*(Optional: If available, provide a brief description of the repository's purpose or name. Placeholder: {{repository_description}})*

## Information Extracted from Repository Files:

### 1. Content from README.md:
```text
{{readme_content}}
```
*(If README.md was not found or is empty, this section will state: "README.md not found or empty.")*

### 2. Content from requirements.txt (if available):
```text
{{requirements_txt_content}}
```
*(If requirements.txt was not found or is empty, this section will state: "requirements.txt not found or empty.")*

### 3. Content from environment.yaml (if available):
```text
{{environment_yaml_content}}
```
*(If environment.yaml was not found or is empty, this section will state: "environment.yaml not found or empty.")*

### 4. Content from Existing Minimum System Requirements Definition (if available):
```text
{{existing_min_sys_reqs_content}}
```
*(If no existing min-sys-requirements file was found or it is empty, this section will state: "No existing min-sys-requirements file found or empty.")*

## Task:
Based on all the provided information above, please synthesize and list the minimum system requirements for this repository.
Focus on identifying requirements for the following categories if the information allows:

- **Operating System(s):** (e.g., Linux, Windows, macOS; specific versions or distributions if inferable)
- **CPU:** (e.g., minimum cores, architecture like x86_64, arm64, if inferable)
- **System RAM:** (e.g., minimum GB)
- **GPU:** (e.g., "NVIDIA GPU required", "AMD GPU required", "Any modern GPU"; minimum VRAM in GB; specific series like "NVIDIA RTX 20-series or newer"; minimum CUDA version if NVIDIA and inferable; minimum Compute Capability if inferable)
- **Disk Space:** (e.g., minimum GB for installation, dependencies, and typical datasets)
- **Python Version:** (e.g., "3.8+", "==3.9.x")
- **Key Software Dependencies or Runtimes:** (e.g., "CUDA Toolkit 11.8", "cuDNN 8.x", "Node.js 16+", specific compilers, database versions)

If information for a category is not clearly available or cannot be reasonably inferred from the provided text, please state "Not specified" or "Cannot be determined from provided information" for that category. Be cautious about making assumptions if the text is not explicit.

## Desired Output Format:
Please provide the minimum system requirements as a JSON object. Use the following keys if applicable, and add others if necessary. If a value cannot be determined, use `null` or "Not specified".

```json
{
  "operating_system": "Linux (e.g., Ubuntu 20.04+)",
  "cpu_architecture": "x86_64",
  "cpu_cores_min": 4,
  "ram_min_gb": 8,
  "gpu_required": true,
  "gpu_type_preference": "NVIDIA",
  "gpu_vram_min_gb": 6,
  "gpu_cuda_version_min": "11.8",
  "gpu_compute_capability_min": "6.1",
  "disk_space_min_gb": 50,
  "python_version_min": "3.9",
  "key_software_dependencies": ["CUDA Toolkit 11.8", "cuDNN 8.x"]
}
```

Analyze the provided text carefully and generate the JSON output.
```

**Explanation of the Template:**

- Placeholders (`{{...}}`): These are where the actual content extracted from the repository files will be injected by your Python script before sending the prompt to Gemini.
- Context for Missing Files: The template includes notes on how to represent missing information (e.g., "README.md not found or empty"). This helps Gemini understand the completeness of the provided data.
- Clear Task Definition: It explicitly tells Gemini what to do and what categories of system requirements to focus on.
- Guidance on Uncertainty: It instructs Gemini on how to handle cases where information isn't available.
- Desired Output Format (JSON): Specifying JSON output makes Gemini's response programmatically parsable, which is crucial for using these determined requirements later (e.g., comparing them with actual system specs from `sys_profiler.py`).

**Next Steps (Conceptual):**

1.  Develop a Python script (perhaps in `src/business/ai/` or `src/business/sys/`) that:
    *   Takes a path to a target repository as input.
    *   Reads the `README.md`, `requirements.txt`, `environment.yaml`, and any existing `min-sys-requirements.txt` file from that repository.
    *   Populates this `generate_min_sys_reqs_from_repo_prompt.md` template with the extracted content.
    *   Uses your `AIGenerator` (or a similar service) to send the filled prompt to the Gemini API.
    *   Receives and parses the JSON response from Gemini.
2.  This script would effectively implement "Phase A - Prerequisite Check" from your `pyrepopal.ipynb` plan, but by using AI to *determine* the requirements first, rather than just checking against a predefined file.

This prompt template is a key component in your plan to use AI for understanding and setting up repository environments!
