Rust bindings for ONEcode, a simple and efficient data representation format for genomic data.
ONEcode is a data representation framework designed primarily for genomic data, providing both human-readable ASCII and compressed binary file versions with strongly typed data.
This library provides safe, idiomatic Rust bindings to the ONEcode C library.
- ✅ Read and write ONE files in both ASCII and binary formats
- ✅ Schema validation and creation
- ✅ Provenance and reference tracking
- ✅ Type-safe access to fields (integers, reals, characters, strings, lists)
- ✅ File navigation and statistics
- ✅ Sequence name extraction from embedded GDB in alignment files
- ✅ RAII-based resource management
- ✅ Fully thread-safe - concurrent operations supported
Add this to your Cargo.toml
:
[dependencies]
onecode = { git = "https://github.com/pangenome/onecode-rs" }
use onecode::OneFile;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut file = OneFile::open_read("data.1seq", None, None, 1)?;
// Read through the file
loop {
let line_type = file.read_line();
if line_type == '\0' {
break; // End of file
}
match line_type {
'S' => {
// Access DNA sequence data
println!("Sequence line");
},
'I' => {
// Access identifier string
println!("ID: {}", file.int(0));
},
_ => {}
}
}
Ok(())
}
use onecode::{OneFile, OneSchema};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create a schema
let schema_text = "P 3 tst\nO T 1 3 INT\n";
let schema = OneSchema::from_text(schema_text)?;
// Open file for writing
let mut writer = OneFile::open_write_new(
"output.1tst",
&schema,
"tst",
false, // ASCII format
1 // single-threaded
)?;
// Add provenance
writer.add_provenance("myprogram", "1.0", "example command")?;
// Write data
writer.set_int(0, 42);
writer.write_line('T', 0, None);
// File is automatically closed on drop
Ok(())
}
use onecode::OneSchema;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Define schema inline
let schema_text = r#"
P 3 seq
O S 1 3 DNA
D I 1 3 INT
"#;
let schema = OneSchema::from_text(schema_text)?;
// Use schema for file operations
Ok(())
}
use onecode::OneFile;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let file = OneFile::open_read("data.1seq", None, None, 1)?;
// Get statistics for a line type
let (count, max_length, total_length) = file.stats('S')?;
println!("Sequences: {}, Max length: {}, Total: {}",
count, max_length, total_length);
Ok(())
}
Alignment files can contain embedded genome database (GDB) information, mapping sequence IDs to names:
use onecode::OneFile;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut file = OneFile::open_read("alignments.1aln", None, None, 1)?;
// Get all sequence names (efficient for multiple lookups)
let seq_names = file.get_all_sequence_names();
println!("Found {} sequences", seq_names.len());
// Read alignments and resolve sequence names
loop {
let line_type = file.read_line();
if line_type == '\0' { break; }
if line_type == 'A' {
let query_id = file.int(0);
let target_id = file.int(3);
if let (Some(query_name), Some(target_name)) =
(seq_names.get(&query_id), seq_names.get(&target_id)) {
println!("Alignment: {} vs {}", query_name, target_name);
}
}
}
Ok(())
}
Or look up individual names on-demand:
let mut file = OneFile::open_read("alignments.1aln", None, None, 1)?;
// Get a specific sequence name by ID
if let Some(name) = file.get_sequence_name(5) {
println!("Sequence 5: {}", name);
}
Full API documentation is available via cargo doc:
cargo doc --open
Key types:
OneFile
- Main file handle for reading/writing ONE filesOneSchema
- Schema definition and validationOneError
- Error typesOneType
- Field type enumeration
The library uses bindgen
to automatically generate bindings from the C headers and cc
to compile the C library.
cargo build --release
All tests pass with full concurrent execution:
cargo test
Test suite includes:
- 9 basic functionality tests
- 3 sequence name extraction tests
- 4 thread-safety stress tests (10-50 concurrent threads)
- 2 doc tests
✅ Fully thread-safe! The library supports concurrent operations without any restrictions.
The upstream ONEcode C library has been updated with thread-local storage for all global state, making it safe for concurrent use from multiple threads. All operations including schema creation, file reading, and error handling work correctly under concurrent load.
The library is organized into several modules:
ffi
- Raw FFI bindings generated by bindgenerror
- Rust error types and Result wrappertypes
- Rust-friendly type definitionsfile
- SafeOneFile
wrapper with RAII resource managementschema
-OneSchema
management and validation
The C library is included as a git subtree in the ONEcode/
directory and compiled automatically during the build process.
To update the ONEcode subtree:
git subtree pull --prefix ONEcode https://github.com/thegenemyers/ONEcode.git main --squash
- Zero-copy access to data where possible
- Supports parallel reading/writing with configurable thread count
- Binary format provides efficient compression
- Thread-safe without synchronization overhead
This Rust wrapper is licensed under MIT OR Apache-2.0.
The ONEcode C library has its own license - see ONEcode/
for details.
Contributions are welcome! Please ensure tests pass before submitting PRs:
cargo test
cargo clippy
cargo fmt
ONEcode was developed by Gene Myers and Richard Durbin. This Rust wrapper builds on their excellent work to provide safe, idiomatic Rust bindings.