Skip to content

nbbrd/sasquatch

Repository files navigation

Sasquatch - SAS dataset library for Java

Download

This Java library provides a reader for SAS datasets.
It also provides a command-line tool and a desktop application.

Key points:

  • lightweight library designed as a facade
  • Java 8 minimum requirement
  • has a module-info that makes it compatible with JPMS

Features:

  • reads meta and data from SAS datasets (*.sas7bdat)
  • browses data with 3 types of cursor: forward-only, scrollable and splittable
  • is compatible with Java Stream API
  • provides a simple facade that allows to plug in any implementation at deployment time
  • implies the addition of a single mandatory dependency

Java library

API overview

Sasquatch is instantiated by a factory:

Sasquatch sasquatch = Sasquatch.ofServiceLoader();

It provides 3 ways of browsing the data:

  • forward-only: row by row from the first to the last
  • scrollable: any row by its position
  • splittable: rows as a (parallel) stream
Path file = ...;

// forward-only cursor
try (SasForwardCursor cursor = sasquatch.readForward(file)) {
    while (cursor.next()) {
    }
}

// scrollable cursor
try (SasScrollableCursor cursor = sasquatch.readScrollable(file)) {
    for (int i = 0; i < cursor.getRowCount(); i++) {
        cursor.moveTo(i);
    }
}

// splittable cursor
try (SasSplittableCursor cursor = sasquatch.readSplittable(file)) {
    Stream<SasRow> stream = StreamSupport.stream(cursor.getSpliterator(), false);
}

Some shortcuts are also available:

// sample factory that extracts the first field as a string
SasRow.Factory<String> factory = cursor -> row -> row.getString(0);

// stream shortcut
try (Stream<String> stream = sasquatch.rows(file, factory)) {
}

// list shortcut
List<String> rows = sasquatch.getAllRows(file, factory);

Metadata can be retrieved directly or through a cursor:

// direct
SasMetaData meta = sasquatch.readMetaData(file);

// through a cursor
try (SasCursor cursor = sasquatch.read...(file)) {
    cursor.getMetaData();
}

Implementations

At least one implementation must be available at runtime (on classpath or modulepath) in order to read datasets. No implementation triggers an IOException on read operations.

Sasquatch supports the following implementations:

artifactId description support
sasquatch-ri native reference implementation advanced
sasquatch-parso wrapper around parso library advanced
sasquatch-sassy wrapper around sassy library basic
sasquatch-biostatmatt java version of biostatmatt r code basic

Feature matrix:

ri parso sassy biostatmatt
BIG_ENDIAN_32 x x - -
LITTLE_ENDIAN_32 x x x x
BIG_ENDIAN_64 x x - -
LITTLE_ENDIAN_64 x x - x
ATTRIBUTES x x - x
LABEL_META x x - -
FIELD_ENCODING x x - -
COLUMN_ENCODING x x - -
CHAR_COMP x x - -
BIN_COMP x x - -
DATE_TYPE x x - -
DATE_TIME_TYPE x x - -
TIME_TYPE x x - -
CUSTOM_NUMERIC x x x -
COLUMN_FORMAT x x - -

Dependencies setup

<dependencies>
  <dependency>
    <groupId>com.github.nbbrd.sasquatch</groupId>
    <artifactId>sasquatch-api</artifactId>
    <version>LATEST_VERSION</version>
  </dependency>
  <dependency>
    <groupId>com.github.nbbrd.sasquatch</groupId>
    <artifactId>sasquatch-ri</artifactId>
    <version>LATEST_VERSION</version>
    <scope>runtime</scope>
  </dependency>
</dependencies>

Command-line tool

The command-line tool (sasquatch in sasquatch-cli project) allows to export a SAS dataset to a CSV or SQL file.

$ sasquath csv somedata.sas7bdat -o somedata.csv
$ sasquath sql somedata.sas7bdat -o somedata.sql

Desktop application

The desktop application (sasquatchw in sasquatch-desktop project) is a basic dataset viewer.