# AlphaFold Job Dialect Converter

This notebook converts AlphaFold jobs from alphafoldserver dialect to alphafold3 dialect. It reads jobs from an input directory, converts them using the `rewrite_af_job` function, and saves the converted jobs to an output directory.

## 1. Import Required Libraries

Import the necessary libraries including os, json, and the custom functions from functions_job_creation.py module.

In [1]:
import sys
import os
import json
from typing import List, Dict, Any

# Add the path to the functions_job_creation module
sys.path.append('/home/markus/MPI_local/src/production1/')

# Import custom functions for AlphaFold job processing
from functions_job_creation import (
    collect_created_jobs,
    rewrite_af_job,
    write_af_jobs_to_individual_files
)

print("Libraries imported successfully!")

Libraries imported successfully!


## 2. Set Input and Output Directory Paths

Define the input directory containing alphafoldserver dialect jobs and the output directory for alphafold3 dialect jobs.

In [2]:
# Define input and output directories
input_dir = "/home/markus/MPI_local/production1/AF_job_batches/batch_43"  # Directory with alphafoldserver dialect jobs
output_dir = "/home/markus/MPI_local/production1/AF_job_batches/batch_43_new"      # Directory for converted alphafold3 dialect jobs

print(f"Input directory: {input_dir}")
print(f"Output directory: {output_dir}")

# Check if input directory exists
if not os.path.exists(input_dir):
    print(f"WARNING: Input directory '{input_dir}' does not exist!")
    print("Please update the input_dir variable with the correct path.")
else:
    print(f"Input directory exists with {len([f for f in os.listdir(input_dir) if f.endswith('.json')])} JSON files.")

Input directory: /home/markus/MPI_local/production1/AF_job_batches/batch_43
Output directory: /home/markus/MPI_local/production1/AF_job_batches/batch_43_new
Input directory exists with 100 JSON files.


## 3. Read AlphaFold Jobs from Input Directory

Use the `collect_created_jobs` function to read all .json files from the input directory and load the AlphaFold jobs.

In [3]:
# Read all AlphaFold jobs from the input directory
try:
    alphafoldserver_jobs = collect_created_jobs(input_dir)
    print(f"Successfully loaded {len(alphafoldserver_jobs)} jobs from {input_dir}")
    
    # Display sample job structure if jobs exist
    if alphafoldserver_jobs:
        print("\nSample job structure (first job):")
        print(f"Job name: {alphafoldserver_jobs[0]['name']}")
        print(f"Dialect: {alphafoldserver_jobs[0]['dialect']}")
        print(f"Number of sequences: {len(alphafoldserver_jobs[0]['sequences'])}")
        
        # Show first sequence info
        first_seq = alphafoldserver_jobs[0]['sequences'][0]['proteinChain']
        print(f"First sequence length: {len(first_seq['sequence'])}")
    else:
        print("No jobs found in the input directory.")
        
except Exception as e:
    print(f"Error reading jobs from input directory: {e}")
    alphafoldserver_jobs = []

Successfully loaded 100 jobs from /home/markus/MPI_local/production1/AF_job_batches/batch_43

Sample job structure (first job):
Job name: O14980_1-1071_O15119_1-743
Dialect: alphafoldserver
Number of sequences: 2
First sequence length: 1071


## 4. Convert Jobs from alphafoldserver to alphafold3 Dialect

Iterate through each job and use the `rewrite_af_job` function to convert from alphafoldserver dialect to alphafold3 dialect.

In [4]:
# Convert jobs from alphafoldserver to alphafold3 dialect
alphafold3_jobs = []
conversion_errors = []

if alphafoldserver_jobs:
    print("Converting jobs from alphafoldserver to alphafold3 dialect...")
    
    for i, job in enumerate(alphafoldserver_jobs):
        try:
            # Convert the job using the rewrite_af_job function
            converted_job = rewrite_af_job(job)
            alphafold3_jobs.append(converted_job)
            
            # Print progress every 10 jobs
            if (i + 1) % 10 == 0:
                print(f"Converted {i + 1}/{len(alphafoldserver_jobs)} jobs...")
                
        except Exception as e:
            error_msg = f"Error converting job '{job.get('name', 'unknown')}': {e}"
            print(error_msg)
            conversion_errors.append(error_msg)
    
    print(f"\nConversion completed!")
    print(f"Successfully converted: {len(alphafold3_jobs)} jobs")
    print(f"Conversion errors: {len(conversion_errors)} jobs")
    
    if conversion_errors:
        print("\nConversion errors:")
        for error in conversion_errors:
            print(f"  - {error}")
else:
    print("No jobs to convert.")

Converting jobs from alphafoldserver to alphafold3 dialect...
Converted 10/100 jobs...
Converted 20/100 jobs...
Converted 30/100 jobs...
Converted 40/100 jobs...
Converted 50/100 jobs...
Converted 60/100 jobs...
Converted 70/100 jobs...
Converted 80/100 jobs...
Converted 90/100 jobs...
Converted 100/100 jobs...

Conversion completed!
Successfully converted: 100 jobs
Conversion errors: 0 jobs


## 5. Write Converted Jobs to Output Directory

Use the `write_af_jobs_to_individual_files` function to save the converted jobs as individual JSON files in the output directory.

In [5]:
# Write converted jobs to output directory
if alphafold3_jobs:
    try:
        print(f"Writing {len(alphafold3_jobs)} converted jobs to {output_dir}...")
        
        # Use the write_af_jobs_to_individual_files function to save jobs
        write_af_jobs_to_individual_files(alphafold3_jobs, output_dir)
        
        print("Successfully wrote all converted jobs to output directory!")
        
        # Verify files were created
        if os.path.exists(output_dir):
            output_files = [f for f in os.listdir(output_dir) if f.endswith('.json')]
            print(f"Created {len(output_files)} JSON files in {output_dir}")
        
    except Exception as e:
        print(f"Error writing jobs to output directory: {e}")
        print("This might happen if the output directory already exists.")
        print("Please either delete the existing directory or choose a different output path.")
else:
    print("No converted jobs to write.")

Writing 100 converted jobs to /home/markus/MPI_local/production1/AF_job_batches/batch_43_new...
Successfully wrote all converted jobs to output directory!
Created 100 JSON files in /home/markus/MPI_local/production1/AF_job_batches/batch_43_new


## 6. Verify Conversion Results

Display summary statistics showing the number of jobs converted and sample the converted job structure to verify the dialect conversion was successful.

In [6]:
# Verify conversion results
print("=== CONVERSION SUMMARY ===")
print(f"Original jobs (alphafoldserver): {len(alphafoldserver_jobs)}")
print(f"Converted jobs (alphafold3): {len(alphafold3_jobs)}")
print(f"Conversion errors: {len(conversion_errors)}")
print(f"Success rate: {len(alphafold3_jobs)/len(alphafoldserver_jobs)*100:.1f}%" if alphafoldserver_jobs else "N/A")

# Display sample converted job structure
if alphafold3_jobs:
    print("\n=== SAMPLE CONVERTED JOB STRUCTURE ===")
    sample_job = alphafold3_jobs[0]
    print(f"Job name: {sample_job['name']}")
    print(f"Dialect: {sample_job['dialect']}")
    print(f"Version: {sample_job['version']}")
    print(f"Number of sequences: {len(sample_job['sequences'])}")
    
    print("\nSequence structure comparison:")
    print("BEFORE (alphafoldserver):")
    if alphafoldserver_jobs:
        orig_seq = alphafoldserver_jobs[0]['sequences'][0]
        print(f"  - Key: 'proteinChain'")
        print(f"  - Fields: {list(orig_seq['proteinChain'].keys())}")
    
    print("AFTER (alphafold3):")
    conv_seq = sample_job['sequences'][0]
    print(f"  - Key: 'protein'")
    print(f"  - Fields: {list(conv_seq['protein'].keys())}")
    
    print(f"\nFirst sequence length: {len(sample_job['sequences'][0]['protein']['sequence'])}")
    print(f"Second sequence length: {len(sample_job['sequences'][1]['protein']['sequence'])}")

print("\n=== CONVERSION COMPLETE ===")
if alphafold3_jobs and os.path.exists(output_dir):
    print(f"✅ Successfully converted {len(alphafold3_jobs)} jobs to alphafold3 dialect")
    print(f"✅ Output files saved to: {output_dir}")
else:
    print("❌ Conversion failed or no jobs to convert")

=== CONVERSION SUMMARY ===
Original jobs (alphafoldserver): 100
Converted jobs (alphafold3): 100
Conversion errors: 0
Success rate: 100.0%

=== SAMPLE CONVERTED JOB STRUCTURE ===
Job name: O14980_1-1071_O15119_1-743
Dialect: alphafold3
Version: 1
Number of sequences: 2

Sequence structure comparison:
BEFORE (alphafoldserver):
  - Key: 'proteinChain'
  - Fields: ['sequence', 'count']
AFTER (alphafold3):
  - Key: 'protein'
  - Fields: ['id', 'sequence', 'modifications']

First sequence length: 1071
Second sequence length: 743

=== CONVERSION COMPLETE ===
✅ Successfully converted 100 jobs to alphafold3 dialect
✅ Output files saved to: /home/markus/MPI_local/production1/AF_job_batches/batch_43_new
