# Sci-F: The Scientific Filesystem

> Paul Gierz  
> HPC and Data Processing  
> Group Meeting 15. August, 2024

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/pgierz/scif-demo/HEAD)

## The Scientific Filesystem
`scif` provides an abstraction for organizing various programs and meta-data into a user-friendly, discoverable way. From [the handbook](https://sci-f.github.io) [1]:
> ...the Scientific Filesystem (SCIF), [is] an organizational format that supports exposure of executables and metadata for discoverability.

Primarily, it allows:
* a known *filesystem structure*

* a definition for a set of *environment variables* describing it

* and *functions* for generation of the variables and interaction with the libraries, metadata, and executables located within.

This is of particular use for containerized applications when we want to bundle multiple parts together!

## The Filesystem Structure
Two main parts:
1. `/scif/apps`: Location for each application
2. `/scif/data`: Data used by each application

### `/scif/app`
`/scif/app` can be thought of as a copy of the standard Linux `$PREFIX` directory, one for each app. This is very similar to how `spack` handles things. You have (at a mimimum):

* a directory for binaries: `/scif/apps/foo/bin`. These will be added to your `$PATH` when running a `foo`.

* a directory for libraries: `/scif/apps/foo/lib`. This is added to `$LD_LIBRARY_PATH`.

* metadata describing the app under `/scif/apps/foo/scif`. This contains labels, environment variables, the runscript, and help.

### `/scif/data`
The `/scif/data` path is also separated for each app, and can be as complex as you like. Generally you'll want to store input and output data here.

There is no strict rule here, but you could do something like this:
* `/scif/data/foo/input`: Input data needed for app `foo`
* `/scif/data/foo`: General data needed for `foo`
* `/scif/data`: Data that you might need across various applications.

## The Environment Variables
You get a set of environment variables that should be defined when using a scientific filesystem. 
> **Table 1.** Default environment variables used by `scif`.
| Variable           | Default Setting | Meaning                                              |
|--------------------|-----------------|------------------------------------------------------|
| `SCIF_BASE`        | `/scif`	       | the root location for SCIF                           |
| `SCIF_DATA`        | `/scif/data`	   | the root location for apps data                      |
| `SCIF_APPS`        | `/scif/apps`    | the root location for installed apps                 |
| `SCIF_SHELL`       | `/bin/bash`	   | shell to use for “shell” command                     |
| `SCIF_PYSHELL`     | `ipython`	   | interactive python shell for pyshell command         |
| `SCIF_ENTRYPOINT`  | `/bin/bash`	   | the command to run given no runscript or app defined |
| `SCIF_ENTRYFOLDER` | `${SCIF_BASE}`  | the entry folder to run the entrypoint command       |
| `SCIF_MESSAGELEVEL`| INFO	           | a client level of verbosity. Must be one             |
|                    |                 | of CRITICAL, ABORT, ERROR, WARNING, LOG, INFO,       |
|                    |                 | QUIET, VERBOSE, DEBUG                                |


There are also a set of variables that describe the currently used application:
> **Table 2.** Currently active variables

| Variable          | Default Setting                          | Meaning                                                           |
|-------------------|------------------------------------------|-------------------------------------------------------------------|
| `SCIF_APPNAME`	| `example`	                               | the active software app                                           |
| `SCIF_APPDATA`	| `/scif/data/example`	                   | the data root for the active software app                         |
| `SCIF_APPROOT`	| `/scif/apps/example`	                   | the install root for the active software app                      |
| `SCIF_APPBIN`	    | `/scif/apps/example/bin`	               | the app bin, which is automatically added to the path when active |
| `SCIF_APPLIB`	    | `/scif/apps/example/lib`	               | the app bin, which is automatically added to the path when active |
| `SCIF_APPMETA`	| `/scif/apps/example/scif`	               | the metadata folder                                               |
| `SCIF_APPHELP`	| `/scif/apps/example/scif/runscript.help` | a text file with help to print for the user to the terminal       |
| `SCIF_APPRUN`	    | `/scif/apps/example/scif/runscript`	   | the commands to run as the app entrypoint                         |
| `SCIF_APPSTART`	| `/scif/apps/example/scif/startscript`	   | the start script (if provided) for an app                         |
| `SCIF_APPTEST`	| `/scif/apps/example/scif/test`	       | the commands to run to test the app                               |
| `SCIF_APPLABELS`	| `/scif/apps/example/scif/labels.json`	   | a key:value json lookup dictionary of labels                      |
| `SCIF_APPENV`  	| `/scif/apps/example/scif/environment.sh` | a shell script to source for the software app environment         |

Other applications also define these variables, with their name appended.

## The `scif` command line tool

The Sci-F framework provides a nice command line tool, `scif` which can interact with the environment. You can easily install it with pip:

In [None]:
%%bash
pip install scif

Let's see what this can do:

In [None]:
%%bash
which scif

In [None]:
%%bash
scif --help

# References
1. https://sci-f.github.io