## PDF Page Rotation Angle Detection Task

Objective:
Implement the `determine_rotation_angle` function within the given code structure to detect the rotation angle of each page in a PDF file.

Code Structure:
The main function `rotate_all_pages_upright` is already implemented, but if necessary you are allowed to change its implementation. Your task is to complete the `determine_rotation_angle` function.

Input:
- A PDF file path (the function should be able to handle various PDF files)

Output:
- A list of integers, where each integer represents the rotation angle needed for a page in the PDF

Rotation Angle:
- The rotation angle should be in degrees, normalized to the range [0, 359].
- 0 means the page is already upright
- 90 means the page needs to be rotated 90 degrees clockwise to be upright
- and so on...

Task:
1. Implement the `determine_rotation_angle` function:
   - Input: A single page object (PdfReader.PageObject)
   - Output: An integer representing the rotation angle in degrees

2. The function should analyze the content of the page and determine the angle needed to make the page upright.

Requirements:
1. The function should work with different PDF files, not just a specific one.
2. Implement robust methods to determine the correct rotation angle.
3. Handle potential exceptions or edge cases (e.g., pages with mixed orientations, complex layouts).
4. Optimize for both accuracy and processing speed, as the function will be called for each page in the PDF.

Additional Considerations:
- You are allowed to use up to 40GB of GPU VRAM if necessary for your implementation.
- You may create as many additional functions as needed to support your implementation.
- You may use additional libraries if required, but ensure they are imported properly.
- Provide clear comments in your code to explain your rotation detection logic.

Testing:
- Test your implementation with various types of PDFs to ensure its robustness and generalizability.
- The main script provides a way to test your implementation on a file named "grouped_documents.pdf".

Note:
The task involves determining the rotation angle only. The actual rotation of the pages is not required in this implementation.

In [None]:
### This is a deep learning based approach, logic behind:

# 1. Train a model for orientation detection: 
# - need some data for training a deep learnign model classifying the orientation, i.e., exemplary scans available at IDM or otherwise, some publicly available data sets that are labeled according to their orientation
# - draw on some pre-trained models like ResNet18 with a modified output layer because here we have multiclass classification (4 categories since the page may be upright, turned-right, upside-down, turned-left)
# - define loss and optimizer
# - train model, i.e., initialize, feed forward, backward pass, update parameters until convergence or max of epochs is reached
# - save model
# 
# 2. Use trained model to classify orientation of the PDF document input
# - PDF page as input
# - render page as image and use it as input for the trained model, the model then maps the input to a 4-category output corresponding to the angles


import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, models, transforms
from torch.utils.data import DataLoader
import os

# Define data transforms for training and validation
data_transforms = {
    'train': transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.RandomRotation(30),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

# Define dataset paths
data_dir = 'path_to_dataset' # possibly some labeled data is already available at IDM? Otherwise, it is also possible to look out for publicly available labeled data sets for orientation 
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x])
                  for x in ['train', 'val']}

# Data loaders
dataloaders = {x: DataLoader(image_datasets[x], batch_size=32, shuffle=True, num_workers=4)
               for x in ['train', 'val']}

# Use a pre-trained ResNet model and modify the final layer
model = models.resnet18(pretrained=True)
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, 4)  # 4 classes: 0°, 90°, 180°, 270°

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

# Training loop
def train_model(model, criterion, optimizer, num_epochs=25):
    best_model_wts = model.state_dict()
    best_acc = 0.0

    for epoch in range(num_epochs):
        print(f'Epoch {epoch}/{num_epochs - 1}')
        print('-' * 10)

        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            for inputs, labels in dataloaders[phase]:
                inputs = inputs.cuda()
                labels = labels.cuda()

                # Zero the parameter gradients
                optimizer.zero_grad()

                # Forward
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # Backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # Statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / len(image_datasets[phase])
            epoch_acc = running_corrects.double() / len(image_datasets[phase])

            print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')

            # Deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = model.state_dict()

    print(f'Best val Acc: {best_acc:.4f}')
    model.load_state_dict(best_model_wts)
    return model

# Move model to GPU if available
model = model.cuda()

# Train the model
model = train_model(model, criterion, optimizer, num_epochs=10)

# Save the trained model
torch.save(model.state_dict(), 'orientation_model.pth')


In [None]:
# actually classify the PDFs using the trained model 

def determine_rotation_angle(page):
    """
    Determines the rotation angle of a given PDF page object using the trained deep learning model.
    
    Args:
        page (PdfReader.PageObject): A single page object from the PDF.
    
    Returns:
        int: The rotation angle in degrees needed to make the page upright (0, 90, 180, 270).
    """
    try:
        # Render page to an image using PIL
        page_image = render_pdf_page_to_image(page)
        
        # Apply image transformations and prepare for model input
        input_image = transform(page_image).unsqueeze(0)  # Add batch dimension
        
        # Load the trained model
        model = OrientationModel()
        model.load_state_dict(torch.load("orientation_model.pth"))
        model.eval()

        # Move to GPU if available
        input_image = input_image.cuda()
        model = model.cuda()
        
        # Predict the orientation using the deep learning model
        with torch.no_grad():
            outputs = model(input_image)
            _, predicted = torch.max(outputs, 1)
        
        # Map the prediction to the corresponding rotation angle
        angle_map = {0: 0, 1: 90, 2: 180, 3: 270}
        angle = angle_map[predicted.item()]
        
        return angle
    except Exception as e:
        print(f"Error processing page: {e}")
        return 0  # Default to no rotation if error occurs
