# Tutorial: Using the Diario Dataclass

This tutorial introduces the `Diario` dataclass, a unified representation for judicial diaries from various tribunals within the CausaGanha project. It's designed to simplify how diario information is handled throughout the system.

## 1. Understanding the `Diario` Dataclass

The `Diario` dataclass is defined in `src/models/diario.py`. Its main purpose is to provide a consistent structure for diario data, regardless of the source tribunal.

Key fields include:
- `tribunal`: Short identifier for the tribunal (e.g., 'tjro', 'tjsp').
- `data`: The publication date of the diario (`datetime.date` object).
- `url`: The URL where the diario can be found.
- `filename`: Optional filename for the diario PDF.
- `hash`: Optional hash of the PDF content.
- `pdf_path`: Optional local path to the downloaded PDF (`pathlib.Path` object).
- `ia_identifier`: Optional Internet Archive identifier.
- `status`: Current processing status (e.g., 'pending', 'downloaded').
- `metadata`: A dictionary for any additional tribunal-specific metadata.

## 2. Creating a `Diario` Instance

You can create a `Diario` instance by providing the required fields. Let's assume the necessary modules are in our Python path.

In [None]:
from datetime import date
from pathlib import Path
# Assuming src is in PYTHONPATH or you are running from project root
from src.models.diario import Diario

# Example: Creating a Diario instance for a TJRO diario
tjro_diario = Diario(
    tribunal='tjro',
    data=date(2024, 7, 15),
    url='https://www.tjro.jus.br/novodiario/2024/20240715N132.pdf',
    filename='20240715N132.pdf',
    status='pending'
)

print(f"Created Diario: {tjro_diario.display_name}")
print(f"Tribunal: {tjro_diario.tribunal}")
print(f"Date: {tjro_diario.data}")
print(f"URL: {tjro_diario.url}")
print(f"Status: {tjro_diario.status}")

## 3. Key Properties

The `Diario` dataclass has some useful properties:

In [None]:
# Using the tjro_diario instance from the previous cell

# display_name: A human-readable identifier
print(f"Display Name: {tjro_diario.display_name}")

# queue_item: Converts the Diario instance to a dictionary format suitable for the job_queue database table
diario_queue_item = tjro_diario.queue_item
print(f"\nQueue Item Format:")
for key, value in diario_queue_item.items():
    print(f"  {key}: {value}")

## 4. Creating from a Queue Item

You can also create a `Diario` instance from a dictionary that represents a row from the `job_queue` database table using the `from_queue_item` class method.

In [None]:
# Example queue item (simulating a database row)
sample_queue_row = {
    'url': 'https://www.tjsp.jus.br/diario/2024/20240716_100.pdf',
    'date': '2024-07-16',
    'tribunal': 'tjsp',
    'filename': '20240716_100.pdf',
    'metadata': {'page_count': 150},
    'ia_identifier': 'tjsp-diario-20240716',
    'status': 'downloaded'
}

tjsp_diario = Diario.from_queue_item(sample_queue_row)

print(f"Restored Diario: {tjsp_diario.display_name}")
print(f"Tribunal: {tjsp_diario.tribunal}")
print(f"Date: {tjsp_diario.data}")
print(f"Status: {tjsp_diario.status}")
print(f"IA Identifier: {tjsp_diario.ia_identifier}")
print(f"Metadata: {tjsp_diario.metadata}")

## 5. Related Interfaces

The `Diario` dataclass is designed to work with a set of abstract interfaces defined in `src/models/interfaces.py`:
- `DiarioDiscovery`: For discovering diario URLs.
- `DiarioDownloader`: For downloading diario PDFs and archiving to Internet Archive.
- `DiarioAnalyzer`: For extracting content from diarios.

Implementations of these interfaces for specific tribunals (e.g., `TJRODiscovery`) will typically consume or produce `Diario` objects, ensuring a consistent workflow.

## 6. CLI Integration

The plan for `diario-class.md` includes integrating this new dataclass into the existing CLI (`src/cli.py`). 
A flag `--as-diario` will be added to commands like `get-urls` to enable the new workflow using the `Diario` dataclass.

**Example (conceptual based on `diario-class.md` plan):**
```bash
# This command would use the new Diario-based workflow
causaganha get-urls --date 2024-07-15 --tribunal tjro --as-diario --to-queue
```
This command would leverage a `DiarioDiscovery` implementation for 'tjro' to find the URL for the given date, create a `Diario` object, and then add its `queue_item` representation to the job queue.

## Conclusion

The `Diario` dataclass provides a standardized way to handle judicial diary information, facilitating easier integration of multiple tribunals and more robust data processing pipelines. As the CausaGanha project evolves, this dataclass will be central to managing diario-related operations.