# BEAK Remote Functions - Simple Test Notebook

This notebook demonstrates the core remote functions in the BEAK package for protein evolutionary analysis.

## Core Functions Overview:
1. **`search()`** - Find homologous protein sequences using mmseqs2
2. **`align()`** - Multiple sequence alignment using Clustal Omega  
3. **`compute_tree()`** - Phylogenetic tree inference using IQ-TREE2 *(NEW)*
4. **`status()`** - Monitor job progress and completion

## Quick Start:
Run cells in order for a complete analysis workflow, or jump to specific functions as needed.

## Setup and Authentication

In [1]:
# Import BEAK remote functions
from beak.remote import authenticate, search, align, compute_tree, status, retrieve_results

print("✅ BEAK remote functions imported successfully!")

✅ BEAK remote functions imported successfully!




In [2]:
# Authenticate with remote server (run once per session)
# Replace 'your_username' with your actual username
authenticate("mbolivas")

print("🔑 Authentication complete!")

🔑 Authentication complete!


## 1. Protein Sequence Search

Search for homologous protein sequences in UniRef90 database using mmseqs2.

In [19]:
# Example protein sequence (replace with your sequence of interest)
protein_sequence = "MSTAQSLKSVDYEVFGRVQGVCFRMYTEDEARKIGVVGWVKNTSKGTVTGQVQGPEDKVNSMKSWLSKVGSPSSRIDRTNFSNEKTISKLEYSNFSIRY"

# Start asynchronous search
print("🔍 Starting protein sequence search...")
search_job = search(
    query=protein_sequence,
    db="uniref90",
    sensitivity=3.0,
    job_id="AcyP_UniRef90",
    user_id="hAcyP2_query",  # Custom identifier for easy tracking
    verbose=True
)

🔍 Starting protein sequence search...
Connecting to shr-zion.stanford.edu...
🔑 Using stored credentials for: mbolivas
Connection successful

🔧 Using sensitivity: 3.0
🔍 Checking for existing identical searches...
🔍 Checking available databases...
🔍 Scanning remote server for available databases...
   Raw ls result: '[?2004h(base) ]0;mbolivas@zion: ~[01;32mmbolivas@zion[00m:[01;34m~[00m$ ls -1 ~/beak_tmp/ 2>/dev/null | grep '^beak_' || echo 'no_jobs' && echo 'BEAK_END_1754429214_9770' || echo 'BEAK_END_1754...'
   Cleaned ls result: 'beak_acyp_uniref90...'
   Found 1 files/directories in database directory
🗄️  Found 0 available databases:
⚠️  No databases detected, using fallback list
🔍 Starting mmseqs sequence search with database: uniref90
📁 Creating project directory: beak_acyp_uniref90
📂 Search operation directory: beak_tmp/beak_acyp_uniref90/search
📝 Creating query FASTA file...
   ✍️  Writing sequence to remote FASTA file...
   Write result: [?2004l
BEAK_END_1754429214_7192

## 2. Job Status Monitoring

Check the progress of your search job.

In [21]:
# Check status of specific job
job_id = 'beak_acyp_uniref90'  # Use the job_id from above

job_status = status(job_id, verbose=True)

Connecting to shr-zion.stanford.edu...
🔑 Using stored credentials for: mbolivas
Connection successful

🔍 Scanning and updating job manifest...
📋 Project: beak_acyp_uniref90
🔄 Search: In progress
   Recent: BEAK_END_1754429520_6724


In [14]:
all_jobs = status()

Connecting to shr-zion.stanford.edu...
🔑 Using stored credentials for: mbolivas
Connection successful

📋 Found 5 project(s) (sorted by recency)

📋 Project: beak_acyp_uniref90
🔄 Search: In progress

📋 Project: beak_acyp_uniref50
🔄 Search: In progress

📋 Project: beak_acyp_alignment_swissprot
✅ Search: Complete

📋 Project: beak_acyp_alignment_2
✅ Search: Complete

📋 Project: beak_fluiem_zeg
🔄 Search: In progress



## 3. Retrieve Search Results

Download the search results once the job is complete.

In [20]:
results = retrieve_results(job_id)

Connecting to shr-zion.stanford.edu...
🔑 Using stored credentials for: mbolivas
Connection successful

❌ No valid job configuration found for 'beak_acyp_alignment_swissprot'
   Checked: ~/beak_tmp/beak_acyp_alignment_swissprot/{search,align,tree,taxonomy}/config.json


## 4. Multiple Sequence Alignment

Align the sequences from the search results using Clustal Omega.

In [18]:
# TEST: New status() output format - shows all operations per project
print("🧪 TESTING: New status() output format")
print("="*50)

job_id = 'beak_acyp_alignment_swissprot'

print(f"📋 Checking status of project: {job_id}")
print("💡 The new format shows all operations under a single project header")
print()

try:
    job_status = status(job_id, verbose=True)
    
    print(f"\n📊 Return Value Summary:")
    print(f"   🆔 Job ID: {job_status['job_id']}")
    print(f"   📋 Overall Status: {job_status['status']}")
    print(f"   🔄 Is Running: {job_status['is_running']}")
    print(f"   ✅ Is Complete: {job_status['is_complete']}")
    
    print(f"\n🔍 Operations Overview:")
    for operation, op_status in job_status['operations'].items():
        print(f"   {operation}: {op_status}")
    
    print(f"\n💡 Benefits of new format:")
    print(f"   ✅ All operations for a project shown together")
    print(f"   ✅ Clear project-level organization")
    print(f"   ✅ Easy to see workflow progress")
    print(f"   ✅ Consistent formatting across single/multiple projects")
    
except Exception as e:
    print(f"❌ Error checking status: {e}")
    import traceback
    traceback.print_exc()

🧪 TESTING: New status() output format
📋 Checking status of project: beak_acyp_alignment_swissprot
💡 The new format shows all operations under a single project header

Connecting to shr-zion.stanford.edu...
🔑 Using stored credentials for: mbolivas
Connection successful

🔍 Scanning and updating job manifest...
📋 Project: beak_acyp_alignment_swissprot
✅ Search: Complete

📊 Return Value Summary:
   🆔 Job ID: beak_acyp_alignment_swissprot
   📋 Overall Status: completed
   🔄 Is Running: False
   ✅ Is Complete: True

🔍 Operations Overview:
   search: completed

💡 Benefits of new format:
   ✅ All operations for a project shown together
   ✅ Clear project-level organization
   ✅ Easy to see workflow progress
   ✅ Consistent formatting across single/multiple projects


In [None]:
# TEST: New status() output for ALL projects
print("🧪 TESTING: New status() output for all projects")
print("="*50)

print("📋 Checking status of all projects...")
print("💡 Shows all projects with their operations grouped together")
print()

try:
    all_status = status()  # No job_id = show all
    
    print(f"\n📊 Summary:")
    if 'jobs' in all_status:
        print(f"   Found {len(all_status['jobs'])} projects")
        print(f"   Each project shows all its operations")
        print(f"   Operations are displayed in logical order: Search → Align → Tree → Taxonomy")
    else:
        print("   No projects found")
    
except Exception as e:
    print(f"❌ Error checking all status: {e}")
    import traceback
    traceback.print_exc()

In [None]:
job_status = status(job_id, verbose=True)

In [None]:
# TEST: Updated debug test for new asynchronous behavior

from beak.remote import align

job_id = 'beak_acyp_alignment_swissprot_debug'  # Use a different job ID for debug test

print(f"🔧 DEBUG TEST: Testing alignment with new asynchronous behavior")
print(f"🆔 Job ID: {job_id}")
print("📋 This should start the job and return immediately...")

try:
    alignment_result = align(
        input_fasta='beak_acyp_alignment_swissprot',  # Use existing search results
        job_id=job_id,       # Use unique job_id for this test
        verbose=True,
        debug=True           # Enable debug mode to see raw command outputs
    )
    
    if alignment_result:
        print("\n✅ Alignment job started successfully!")
        print(f"🆔 Job ID: {alignment_result['job_id']}")
        print(f"📊 Status: {alignment_result['status']}")
        print(f"🔢 Process ID: {alignment_result['process_id']}")
        print(f"📁 Remote directory: {alignment_result['align_dir']}")
        print(f"📋 Expected output: {alignment_result['expected_output']}")
        print(f"🔍 Log file: {alignment_result['log_file']}")
        print(f"\n💡 Monitor with: status('{job_id}')")
        
        # Immediately check status
        print(f"\n🔍 Checking job status immediately after starting...")
        from beak.remote import status
        immediate_status = status(job_id, verbose=True)
        print(f"   Status: {immediate_status['status']}")
        print(f"   Running: {immediate_status['is_running']}")
    else:
        print("\n❌ Alignment job failed to start - check debug output above")
    
except Exception as e:
    print(f"❌ Error during alignment: {e}")
    import traceback
    print("Full traceback:")
    traceback.print_exc()

# TEST: Enhanced status() output format with emojis and improved formatting
print("🧪 TESTING: Enhanced status() formatting")
print("="*60)

job_id = 'beak_acyp_alignment_swissprot'

print("📋 Testing single project status with new formatting:")
print("💡 New features:")
print("   ✅ Emojis to indicate completion/progress status")
print("   ✅ Removed asterisks, using colons instead")
print("   ✅ Better visual hierarchy")
print()

try:
    job_status = status(job_id, verbose=True)
    
    print(f"\n📊 Return Value Summary:")
    print(f"   🆔 Job ID: {job_status['job_id']}")
    print(f"   📋 Overall Status: {job_status['status']}")
    print(f"   🔄 Is Running: {job_status['is_running']}")
    print(f"   ✅ Is Complete: {job_status['is_complete']}")
    
    print(f"\n🔍 Operations Overview:")
    for operation, op_status in job_status['operations'].items():
        emoji = "✅" if op_status == "completed" else "🔄" if op_status == "running" else "❌"
        print(f"   {emoji} {operation}: {op_status}")
    
except Exception as e:
    print(f"❌ Error checking status: {e}")
    import traceback
    traceback.print_exc()

In [None]:
# TEST: Enhanced status() output for ALL projects with recency ranking
print("🧪 TESTING: Enhanced status() for all projects")
print("="*60)

print("📋 Testing all projects status with new features:")
print("💡 Enhancements:")
print("   📅 Projects sorted by recency (most recent first)")
print("   ✅ Emojis to indicate completion/progress status")
print("   ✅ Clean formatting without asterisks")
print("   ✅ Consistent visual hierarchy")
print()

try:
    all_status = status()  # No job_id = show all
    
    print(f"📊 Summary:")
    if 'jobs' in all_status:
        print(f"   Found {len(all_status['jobs'])} projects")
        print(f"   Projects are now sorted by recency")
        print(f"   Each operation shows clear status with emojis")
        
        # Show which projects have which operations
        print(f"\n📋 Project Operations Summary:")
        for job in all_status['jobs'][:3]:  # Show first 3 projects
            operations = job.get('operations', {})
            op_count = len(operations)
            running_count = sum(1 for op in operations.values() if op.get('status') == 'running')
            completed_count = sum(1 for op in operations.values() if op.get('status') == 'completed')
            
            print(f"   📋 {job['job_id']}: {op_count} operations")
            print(f"      🔄 {running_count} running, ✅ {completed_count} completed")
        
        if len(all_status['jobs']) > 3:
            print(f"   ... and {len(all_status['jobs']) - 3} more projects")
    else:
        print("   No projects found")
    
except Exception as e:
    print(f"❌ Error checking all status: {e}")
    import traceback
    traceback.print_exc()

## 6. Final Status Check

Verify all operations completed successfully.

In [None]:
# Final status check
print(f"📋 Final status check for job: {job_id}")
final_status = status(job_id, verbose=True)

print(f"\n🎉 Complete workflow status: {final_status['status']}")
print(f"📁 All results saved in: beak_results/{job_id}/")

## 7. Working with Local Files (Alternative)

You can also use BEAK functions with local FASTA files instead of search results.

In [None]:
# Example: Align a local FASTA file
# local_fasta_file = "path/to/your/sequences.fasta"
# 
# alignment_result = align(
#     input_fasta=local_fasta_file,
#     user_id="local_alignment_test",
#     verbose=True
# )
# 
# print(f"Aligned sequences: {alignment_result['aligned_fasta']}")

print("💡 Uncomment the code above to test with local FASTA files")

In [None]:
# Example: Compute tree from local alignment file
# local_alignment_file = "path/to/your/alignment.fasta"
# 
# tree_result = compute_tree(
#     input_source=local_alignment_file,
#     user_id="local_tree_test",
#     verbose=True
# )
# 
# print(f"Tree file: {tree_result['tree_file']}")

print("💡 Uncomment the code above to test tree computation with local alignment files")

## Summary

This notebook demonstrated the complete BEAK workflow with **NEW asynchronous behavior**:

1. **Search** → Find homologous sequences in protein databases
2. **Align** → Create multiple sequence alignments *(NOW ASYNCHRONOUS!)*
3. **Compute Tree** → Build phylogenetic trees
4. **Monitor** → Track job progress throughout with `status()`

### Key Features:
- **🚀 NEW: Asynchronous align()**: Jobs start immediately and run in background
- **📋 Enhanced monitoring**: Use `status()` to track progress of background jobs  
- **🔄 Real-time feedback**: Check job status at any time without blocking
- **📁 Organized output**: All results saved in structured directories
- **🆔 Job tracking**: Monitor progress and reuse results across operations
- **📊 Flexible input**: Works with search results, local files, or FASTA content
- **🐛 Debugging support**: Verbose output and debug modes for troubleshooting

### NEW Workflow Pattern:
```python
# Start alignment job (returns immediately)
job = align(input_fasta=search_results, job_id="my_job", verbose=True)

# Monitor progress
status("my_job", verbose=True)  # Check anytime

# When complete, retrieve results
retrieve_results("my_job")
```

### Next Steps:
- Use `beak.alignments.utils` for alignment analysis and visualization
- Use `beak.viz` for creating sequence logos and conservation plots  
- View tree files in phylogenetic software like FigTree or ggtree

### Benefits of Asynchronous Processing:
- ✅ **No blocking**: Start multiple jobs without waiting
- ✅ **Better resource usage**: CPU-intensive tasks run remotely
- ✅ **Flexible workflow**: Check progress when convenient
- ✅ **Error resilience**: Jobs continue even if connection drops