Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: 📝 add naming scheme for output files/folders #497

Merged
merged 3 commits into from
Jul 9, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 59 additions & 4 deletions docs/design/naming.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,13 @@ title: "Naming scheme"

The naming scheme for Sprout is guided by our [style
guide](https://design.seedcase-project.org/style/). The below naming
scheme *only* applies to file paths, functions, URLS, API endpoints, and
command line interfaces that are exposed to or associated with
user-facing content. It does *not* apply to internal content; see the
style guide for details on naming internal (developer-facing) content.
scheme *only* applies to file paths, data objects, functions, URLS, API
lwjohnst86 marked this conversation as resolved.
Show resolved Hide resolved
endpoints, and command line interfaces that are exposed to or associated
with user-facing content. It does *not* apply to internal content; see
the style guide for details on naming internal (developer-facing)
content.

## Actions and objects

Guided by our style guide, we'll compose names based on the objects we
and our users interact with as well as the actions taken on those
Expand Down Expand Up @@ -68,3 +71,55 @@ projects <id> metadata <id> delete
projects <id> metadata <id> data update
projects <id> metadata <id> data delete
```

## Project file paths

Sprout is designed to store and structure data in a coherent and
consistent way. Part of that is about how the data are stored into files
and folders. Depending on the context, if Sprout is within a server
environment with Postgres installed, the original input from the user
will be stored in the Postgres database as the below file path and can
be exported into actual files if desired.

When constructing the project paths, we use our [style
guide](https://design.seedcase-project.org/style/) as well as these
additional guidelines:

- Follow the [Frictionless
Framework](https://v4.framework.frictionlessdata.io/) and [Data
Package](https://datapackage.org/) structure

The types of files and folders that make up the project are:

- Types of files: metadata, raw, database (as a SQLite database),
parquet (as a backup of the database)
- Types of folders: raw, data

Following a similar pattern described in the actions and objects section
above, the base folders are:

- `project <id>`: The related set of data specific to, for instance,
one data collection study.
- `data <id>`: The files related to a specific table or file. Uses
`data` instead of `metadata` like above because the metadata is
contained a separate file (see below), while the individual files
are themselves the data.
- `ROOT`: The default location that projects are stored in. If on a
server, would be the path to that server space given to Sprout. If
on local computer, would be a default path provided by Sprout for
lwjohnst86 marked this conversation as resolved.
Show resolved Hide resolved
the specific operating system.

```
# Compress all raw data files
ROOT/projects/<id>/data/<id>/raw/<timestamp>-<uuid>.<extension>.gzip
ROOT/projects/<id>/data/<id>/data.parquet

# Full database
ROOT/projects/<id>/database.sqlite

# Machine-readable metadata file
ROOT/projects/<id>/datapackage.json

# A human readable, auto-generated 'metadata' file
ROOT/projects/<id>/README.md
```