# Documentation Troubleshooting Guide

This notebook provides guidance on how to fix the various warnings that appeared in your Sphinx documentation build. While the documentation has built successfully and is available at http://localhost:8000, addressing these warnings will improve the quality and maintainability of your documentation.

The warnings can be categorized into several types, which we'll address one by one.

## 1. Title Underline Warnings

Many warnings look like this:
```
/workspace/docs/source/api/config/index.rst:4: WARNING: Title underline too short.

Configuration API
==============
```

**Problem**: The length of the underline doesn't match the length of the title text.

**Solution**: Make sure the underline characters (=, -, etc.) match or exceed the length of the title.

In [None]:
# Example of how to fix title underlines in RST files
import os
import re

def fix_title_underlines(filepath):
    """Fix title underlines in an RST file to match title length."""
    if not os.path.exists(filepath):
        print(f"File not found: {filepath}")
        return False
    
    with open(filepath, 'r', encoding='utf-8') as file:
        content = file.readlines()
    
    fixed_content = []
    i = 0
    while i < len(content):
        line = content[i]
        fixed_content.append(line)
        
        # If this is a potential title line and there's a next line
        if i < len(content) - 1 and line.strip() and not line.startswith('.. '):
            next_line = content[i + 1]
            # Check if next line is an underline (consists of one type of character)
            if next_line.strip() and len(set(next_line.strip())) == 1:
                char = next_line.strip()[0]
                # Replace the underline with one that matches the title length
                fixed_content.append(char * len(line.rstrip()) + '\n')
                i += 2  # Skip the original underline
                continue
        i += 1
    
    with open(filepath, 'w', encoding='utf-8') as file:
        file.writelines(fixed_content)
    
    return True

# Example usage (not executed):
# fix_title_underlines('/workspace/docs/source/api/config/index.rst')

## 2. Import Failures

You're seeing many warnings related to module imports, particularly with the NumPy/wandb compatibility issue:

```
AttributeError: `np.float_` was removed in the NumPy 2.0 release. Use `np.float64` instead.
```

**Problem**: The wandb package is using `np.float_` which was removed in NumPy 2.0.

**Solution**: We've already implemented a mocking system to handle this. The `--mock-modules` flag when running `serve_docs.sh` creates mock versions of problematic modules like wandb.

For a more permanent solution, you could:
1. Pin NumPy to a version before 2.0
2. Update wandb to a version compatible with NumPy 2.0
3. Continue using the mocking solution

In [None]:
# Option 1: Pin NumPy to an earlier version
# !poetry add numpy@"<2.0" --group dev

# Option 2: Update wandb (if a compatible version exists)
# !poetry add wandb@latest --group dev

# These commands are commented out as they should be run only if you decide to use one of these approaches

## 3. Missing Document References

Warnings like these indicate references to documents that don't exist yet:

```
/workspace/docs/source/index.rst:33: WARNING: toctree contains reference to nonexisting document 'guides/installation'
```

**Problem**: Your documentation references files that haven't been created yet.

**Solution**: Either create these missing files or remove references to them from your toctree directives.

In [None]:
# Let's see what missing documents are referenced most frequently
import re
from collections import Counter

def analyze_missing_docs(log_text):
    """Extract and count missing document references from sphinx build log."""
    pattern = r"WARNING: toctree contains reference to nonexisting document '([^']+)'"
    missing_docs = re.findall(pattern, log_text)
    
    # Count occurrences
    counter = Counter(missing_docs)
    
    # Group by directory
    by_directory = {}
    for doc in counter:
        directory = doc.split('/')[0] if '/' in doc else 'root'
        if directory not in by_directory:
            by_directory[directory] = []
        by_directory[directory].append((doc, counter[doc]))
    
    return by_directory

# Example output (based on your log):
missing_docs = {
    'guides': [
        ('guides/installation', 1),
        ('guides/quickstart', 1),
        ('guides/configuration', 1),
        ('guides/distributed', 1),
        ('guides/monitoring', 1),
        ('guides/optimization', 1),
        ('guides/scaling', 2)
    ],
    'examples': [
        ('examples/pretraining_llama', 2),
        ('examples/tokenizer_custom', 2),
        ('examples/distributed_training', 1)
    ],
    'development': [
        ('development/contributing', 1),
        ('development/testing', 1),
        ('development/release_notes', 1)
    ]
}

for directory, docs in missing_docs.items():
    print(f"Directory: {directory}")
    for doc, count in docs:
        print(f"  - {doc} (referenced {count} times)")

## 4. Docstring Indentation Issues

Warnings like these indicate indentation problems in docstrings:

```
/workspace/src/config/config_loader.py:docstring of src.config.config_loader.ConfigValidator:11: ERROR: Unexpected indentation.
```

**Problem**: Incorrect indentation in Python docstrings can cause Sphinx to misinterpret the structure.

**Solution**: Fix the indentation in the docstrings to follow a consistent pattern. For Google and NumPy style docstrings, make sure indentation is consistent within sections.

In [None]:
def inspect_docstring_example():
    """Example of properly formatted Google-style docstring.
    
    This is the extended description, which can span multiple lines.
    
    Args:
        param1 (type): Description of param1.
        param2 (type): Description of param2.
    
    Returns:
        type: Description of return value.
    
    Raises:
        ExceptionType: Description of when this exception is raised.
    
    Example:
        >>> example_function(1, 2)
        3
    """
    pass

# Common indentation issues:
# 1. Inconsistent indentation under section headers
# 2. Blank lines with spaces/tabs in them
# 3. Improper section headers (missing colon)
# 4. Additional indentation in examples or code blocks

# Let's create a function to print docstrings for examination
import inspect

def print_docstring(obj):
    """Print a docstring with line numbers for easier debugging."""
    docstring = obj.__doc__
    if not docstring:
        print("No docstring available")
        return
    
    lines = docstring.split('\n')
    for i, line in enumerate(lines):
        print(f"{i+1:3d} | {line}")

# Example usage
# print_docstring(inspect_docstring_example)

## 5. Duplicate Object Descriptions

Warnings like:

```
WARNING: duplicate object description of src.utils.logging.VerboseLevel.NONE
```

**Problem**: The same object is documented in multiple places.

**Solution**: Use the `:no-index:` option in one of the locations. This tells Sphinx not to index that particular instance of the documentation.

## 6. Fixing Missing Transform Intersphinx Link

There was an error with the transformers intersphinx link:

```
intersphinx inventory 'https://huggingface.co/transformers/master/objects.inv' not fetchable
```

This can be fixed by updating the URL in `conf.py`:

```python
intersphinx_mapping = {
    # ...
    'transformers': ('https://huggingface.co/docs/transformers/main', None),
    # ...
}
```

Which we've already done.

## Action Plan

Here's a prioritized list of actions to improve your documentation:

1. **Fix title underlines**: These are just visual issues but easy to fix.
2. **Create essential missing documents**: Start with the most referenced missing documents.
3. **Fix docstring indentation**: Ensure consistent formatting in Python docstring comments.
4. **Add `:no-index:` where needed**: Fix duplicate object warnings.
5. **Review mock system**: Ensure it's mocking all required modules.

For a more robust long-term solution:
1. Consider pinning NumPy to a version compatible with your dependencies
2. Update wandb to a version compatible with NumPy 2.0 when available
3. Create all the missing documentation files or remove references to them

## Example: Creating a Missing Guide

Let's create one of the missing guide files as an example.

In [None]:
# Create a basic installation guide file
installation_guide = """
Installation Guide
================

This guide covers how to install the ML Training Framework and set up your environment for effective development and training workflows.

Prerequisites
-----------

Before installing, ensure you have the following:

* Python 3.8 or newer
* Git
* CUDA-compatible GPU (for training, optional for development)
* 64-bit operating system (Linux recommended for training)

Installation Methods
------------------

The ML Training Framework offers several installation options to suit different workflows.

Using Poetry (Recommended)
~~~~~~~~~~~~~~~~~~~~~~~

Poetry provides deterministic dependency management and ensures consistent environments:

.. code-block:: bash

    # Clone the repository
    git clone https://github.com/yourusername/workspace.git
    cd workspace

    # Install with Poetry
    make install-poetry

    # Or just use the default install target (defaults to Poetry)
    make install

Using pip
~~~~~~~~

For a standard Python workflow with pip:

.. code-block:: bash

    # Clone the repository
    git clone https://github.com/yourusername/workspace.git
    cd workspace

    # Install with pip in a virtual environment
    make install-pip

    # Or specify pip as the installation method
    make INSTALL_METHOD=pip install

Direct pip install
~~~~~~~~~~~~~~~~

You can also install directly with pip:

.. code-block:: bash

    # Create and activate a virtual environment (recommended)
    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\\Scripts\\activate

    # Install the package
    pip install -e .  # Regular installation
    pip install -e .[dev]  # With development dependencies

Docker Installation
-----------------

For containerized workflows:

.. code-block:: bash

    # Build development image (with Poetry)
    make build-dev

    # Run development container
    make container

Verification
-----------

Verify your installation:

.. code-block:: bash

    # Show the current version
    make version-show

    # Run a simple test
    make test-quick

Next Steps
---------

Once installed, you can:

* Read the :ref:`quickstart` guide to begin using the framework
* Explore the :ref:`configuration` documentation to learn about customization options
* Set up your :ref:`development environment <contributing>` for contributing
"""

# Let's check if the guides directory exists, if not create it
import os

guides_dir = '/workspace/docs/source/guides'
if not os.path.exists(guides_dir):
    os.makedirs(guides_dir)

# Example only - don't execute this yet
# with open(os.path.join(guides_dir, 'installation.rst'), 'w') as f:
#     f.write(installation_guide)

## Conclusion

Your Sphinx documentation is building successfully despite the warnings, and it should be accessible at http://localhost:8000. The issues identified are mostly related to formatting, missing documents, and import problems that we've addressed with mocking.

By systematically addressing the warnings outlined in this notebook, you can improve the quality and completeness of your documentation. Remember that documentation is an ongoing process - as your project evolves, your documentation should evolve with it.

Happy documenting!