Skip to content

pangenome/onecode-rs

Repository files navigation

onecode-rs

Rust bindings for ONEcode, a simple and efficient data representation format for genomic data.

Overview

ONEcode is a data representation framework designed primarily for genomic data, providing both human-readable ASCII and compressed binary file versions with strongly typed data.

This library provides safe, idiomatic Rust bindings to the ONEcode C library.

Features

  • ✅ Read and write ONE files in both ASCII and binary formats
  • ✅ Schema validation and creation
  • ✅ Provenance and reference tracking
  • ✅ Type-safe access to fields (integers, reals, characters, strings, lists)
  • ✅ File navigation and statistics
  • ✅ Sequence name extraction from embedded GDB in alignment files
  • ✅ RAII-based resource management
  • Fully thread-safe - concurrent operations supported

Installation

Add this to your Cargo.toml:

[dependencies]
onecode = { git = "https://github.com/pangenome/onecode-rs" }

Usage

Reading a ONE file

use onecode::OneFile;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut file = OneFile::open_read("data.1seq", None, None, 1)?;

    // Read through the file
    loop {
        let line_type = file.read_line();
        if line_type == '\0' {
            break; // End of file
        }

        match line_type {
            'S' => {
                // Access DNA sequence data
                println!("Sequence line");
            },
            'I' => {
                // Access identifier string
                println!("ID: {}", file.int(0));
            },
            _ => {}
        }
    }

    Ok(())
}

Writing a ONE file

use onecode::{OneFile, OneSchema};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create a schema
    let schema_text = "P 3 tst\nO T 1 3 INT\n";
    let schema = OneSchema::from_text(schema_text)?;

    // Open file for writing
    let mut writer = OneFile::open_write_new(
        "output.1tst",
        &schema,
        "tst",
        false,  // ASCII format
        1       // single-threaded
    )?;

    // Add provenance
    writer.add_provenance("myprogram", "1.0", "example command")?;

    // Write data
    writer.set_int(0, 42);
    writer.write_line('T', 0, None);

    // File is automatically closed on drop
    Ok(())
}

Creating schemas from text

use onecode::OneSchema;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Define schema inline
    let schema_text = r#"
P 3 seq
O S 1 3 DNA
D I 1 3 INT
    "#;

    let schema = OneSchema::from_text(schema_text)?;
    // Use schema for file operations
    Ok(())
}

Getting file statistics

use onecode::OneFile;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let file = OneFile::open_read("data.1seq", None, None, 1)?;

    // Get statistics for a line type
    let (count, max_length, total_length) = file.stats('S')?;
    println!("Sequences: {}, Max length: {}, Total: {}",
             count, max_length, total_length);

    Ok(())
}

Working with alignment files (.1aln) and sequence names

Alignment files can contain embedded genome database (GDB) information, mapping sequence IDs to names:

use onecode::OneFile;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut file = OneFile::open_read("alignments.1aln", None, None, 1)?;

    // Get all sequence names (efficient for multiple lookups)
    let seq_names = file.get_all_sequence_names();
    println!("Found {} sequences", seq_names.len());

    // Read alignments and resolve sequence names
    loop {
        let line_type = file.read_line();
        if line_type == '\0' { break; }

        if line_type == 'A' {
            let query_id = file.int(0);
            let target_id = file.int(3);

            if let (Some(query_name), Some(target_name)) =
                (seq_names.get(&query_id), seq_names.get(&target_id)) {
                println!("Alignment: {} vs {}", query_name, target_name);
            }
        }
    }

    Ok(())
}

Or look up individual names on-demand:

let mut file = OneFile::open_read("alignments.1aln", None, None, 1)?;

// Get a specific sequence name by ID
if let Some(name) = file.get_sequence_name(5) {
    println!("Sequence 5: {}", name);
}

API Documentation

Full API documentation is available via cargo doc:

cargo doc --open

Key types:

  • OneFile - Main file handle for reading/writing ONE files
  • OneSchema - Schema definition and validation
  • OneError - Error types
  • OneType - Field type enumeration

Building

The library uses bindgen to automatically generate bindings from the C headers and cc to compile the C library.

cargo build --release

Testing

All tests pass with full concurrent execution:

cargo test

Test suite includes:

  • 9 basic functionality tests
  • 3 sequence name extraction tests
  • 4 thread-safety stress tests (10-50 concurrent threads)
  • 2 doc tests

Thread Safety

Fully thread-safe! The library supports concurrent operations without any restrictions.

The upstream ONEcode C library has been updated with thread-local storage for all global state, making it safe for concurrent use from multiple threads. All operations including schema creation, file reading, and error handling work correctly under concurrent load.

Architecture

The library is organized into several modules:

  • ffi - Raw FFI bindings generated by bindgen
  • error - Rust error types and Result wrapper
  • types - Rust-friendly type definitions
  • file - Safe OneFile wrapper with RAII resource management
  • schema - OneSchema management and validation

Integration with ONEcode

The C library is included as a git subtree in the ONEcode/ directory and compiled automatically during the build process.

To update the ONEcode subtree:

git subtree pull --prefix ONEcode https://github.com/thegenemyers/ONEcode.git main --squash

Performance

  • Zero-copy access to data where possible
  • Supports parallel reading/writing with configurable thread count
  • Binary format provides efficient compression
  • Thread-safe without synchronization overhead

License

This Rust wrapper is licensed under MIT OR Apache-2.0.

The ONEcode C library has its own license - see ONEcode/ for details.

Contributing

Contributions are welcome! Please ensure tests pass before submitting PRs:

cargo test
cargo clippy
cargo fmt

Acknowledgments

ONEcode was developed by Gene Myers and Richard Durbin. This Rust wrapper builds on their excellent work to provide safe, idiomatic Rust bindings.

About

Rust bindings for ONEcode - a data representation format for genomic data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages