Skip to content
This repository has been archived by the owner on Dec 13, 2017. It is now read-only.
/ crlf Public archive

Concise Run-Length Format for small alphabets; DEPRECATED

Notifications You must be signed in to change notification settings

lh3/crlf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CRLF, or Concise Run-Length Format, is a lightweight format to store run-length encoded strings, typically for small alphabets such as DNA. This repository contains the spec, a library and a tool to convert encodings. The library is implemented in two files crlf.h and crlf.c without any dependencies to other libraries.

The minimal code create a CRLF to stdout from a BWT string:

uint32_t dectab[256], l_BWT, i;
uint8_t *BWT; // $ACGTN encoded as 012345
crlf_t *crlf;
crlf_dectab_RL53(dectab); // generate the decoding table
crlf = crlf_create(0, 6, dectab, crlf_write_RL53, 0, 0);
for (i = 0; i < l_BWT; ++i)
    crlf_write(crlf, BWT[i], 1);
crlf_close(crlf);

The minimal code to read a CRLF from stdin:

int c;
uint64_t i, l;
crlf_t *crlf;
crlf = crlf_open(0);
while ((c = crlf_read(crlf, &l)) >= 0)
    for (i = 0; i < l; ++i)
        putchar("$ACGTN"[c]);
crlf_close(crlf);

Basic APIs (see crlf.h for details):

  • crlf_create() creates a CRLF and writes the header, with a user-provided decoding table and a function pointer for encoding a run.

  • crlf_open() opens an existing CRLF for reading.

  • crlf_close() closes a CRLF.

  • crlf_write() writes a run to CRLF. If this function is called consecutively on runs of the same symbol, these runs will be merged.

  • crlf_read() reads a run from CRLF until it meets the next run of a different symbol from the current run.

About

Concise Run-Length Format for small alphabets; DEPRECATED

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published