Skip to content

Commit

Permalink
Merge pull request #12 from mbsantiago/feat/terms
Browse files Browse the repository at this point in the history
Feat/terms:  Introduce Term data model for standardized tags and features
  • Loading branch information
mbsantiago committed Aug 29, 2024
2 parents 6e49d19 + b00261a commit df5aee7
Show file tree
Hide file tree
Showing 64 changed files with 4,203 additions and 1,200 deletions.
51 changes: 38 additions & 13 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,26 +6,51 @@ on:
branches: ["main"]
jobs:
test:
runs-on: ubuntu-latest
env:
UV_CACHE_DIR: /tmp/.uv-cache
strategy:
fail-fast: false
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
python-version:
- "3.9"
- "3.10"
- "3.11"
- "3.12"
os:
- ubuntu-latest
- windows-latest
- macos-latest
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v3
with:
python-version: ${{ matrix.python-version }}
- uses: actions/checkout@v4
- name: Install dependencies
if: ${{ matrix.os == 'ubuntu-latest' }}
run: |
sudo apt-get update && sudo apt-get install libsndfile1
python -m pip install --upgrade pip
python -m pip install pytest pytest-xdist hypothesis ruff pyright html5lib
python -m pip install ".[all]"
- name: Set up uv
if: ${{ matrix.os == 'ubuntu-latest' || matrix.os == 'macos-latest' }}
run: curl -LsSf https://astral.sh/uv/install.sh | sh
- name: Set up uv
if: ${{ matrix.os == 'windows-latest' }}
run: irm https://astral.sh/uv/install.ps1 | iex
shell: powershell
- name: Set up Python ${{ matrix.python-version }}
run: uv python install ${{ matrix.python-version }}
- name: Restore uv cache
uses: actions/cache@v4
with:
path: /tmp/.uv-cache
key: uv-${{ runner.os }}-${{ hashFiles('uv.lock') }}
restore-keys: |
uv-${{ runner.os }}-${{ hashFiles('uv.lock') }}
uv-${{ runner.os }}
- name: Install the project
run: uv sync --all-extras --dev
- name: Make sure types are consistent
run: pyright src
run: uv run pyright src
- name: Lint with ruff
run: ruff check src
run: uv run ruff check src
- name: Test with pytest
run: pytest tests -n auto
run: uv run pytest tests -n auto
- name: Minimize uv cache
run: uv cache prune --ci
1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.9
3 changes: 1 addition & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -111,5 +111,4 @@ docs: ## Build the documentation.
.PHONY: docs-serve
docs-serve: ## Build the documentation and watch for changes.
@echo "building documentation ..."
URL="http://localhost:8000/soundevent/"; xdg-open $$URL || sensible-browser $$URL || x-www-browser $$URL || gnome-open $$URL
@$(ENV_PREFIX)mkdocs serve
@$(ENV_PREFIX)mkdocs serve --open
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
[![PyPI version](https://badge.fury.io/py/soundevent.svg)](https://badge.fury.io/py/soundevent)
![tests](https://github.com/mbsantiago/soundevent/actions/workflows/test.yml/badge.svg)
[![docs](https://github.com/mbsantiago/soundevent/actions/workflows/docs.yml/badge.svg)](https://mbsantiago.github.io/soundevent/)
![Python 3.8 +](https://img.shields.io/badge/python->=_3.8-blue.svg)
![Python 3.9 +](https://img.shields.io/badge/python->=_3.9-blue.svg)
![Static Badge](https://img.shields.io/badge/formatting-black-black)
[![codecov](https://codecov.io/gh/mbsantiago/soundevent/branch/main/graph/badge.svg?token=42kVE87avA)](https://codecov.io/gh/mbsantiago/soundevent)

Expand Down
94 changes: 39 additions & 55 deletions docs/data_schemas/descriptors.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,16 @@
# Data Description

Let's delve into **users**, **tags**, **features**, and **notes** – the tools
that add depth to our bioacoustic research. Categorical tags, numerical
features, and freeform notes bring an extra layer of understanding to our
research objects, while user information provides adequate attribution
to the contribution of all involved.
Let's explore **users**, **terms**, **tags**, **features**, and **notes** – essential tools for enriching bioacoustic research.
Controlled vocabularies (terms), categorical tags, numerical features, and free-form notes provide deeper context and insights into your research objects.
User information ensures proper attribution for everyone involved.

## Users

Collaboration is at the heart of most bioacoustic analyses, involving data
collectors, annotators, reviewers, administrators, developers, and researchers.
To ensure proper attribution of work, soundevent introduces a
[**Users**][soundevent.data.User] data schema, holding minimal information about
each individual involved. The **User** object can optionally include a _name_,
_email_, _username_ (a commonly known alias), and _institution_. Recognizing the
sensitivity of this information, it's important to ensure that individuals are
comfortable sharing these details. If privacy concerns persist, User objects can
be omitted altogether.
Bioacoustic analysis often involves collaboration between data collectors, annotators, reviewers, administrators, developers, and researchers.
To acknowledge contributions, soundevent introduces a [**Users**][soundevent.data.User] data schema, storing basic information about each individual.
The **User** object can optionally include _name_, _email_, _username_ and _institution_.
It's crucial to respect privacy and ensure individuals are comfortable sharing this information.
If concerns remain, User objects can be omitted entirely.

```mermaid
erDiagram
Expand All @@ -29,31 +23,35 @@ erDiagram
}
```

## Terms

[**Terms**][soundevent.data.Term] ensure everyone's on the same page.
Inconsistent naming like "species" vs. "Species" wastes time.
**Terms** provide a controlled vocabulary for common properties used in annotations and descriptions.

We've selected terms from established vocabularies like [Darwin Core](https://dwc.tdwg.org/list/) and [Audiovisual Core](https://ac.tdwg.org/termlist/), aligning your work with best practices.
Take a look here for the [terms][soundevent.terms] defined in soundevent.

## Tags

[**Tags**][soundevent.data.Tag] within the `soundevent` package are like
categorical variables that add specific meaning to the objects they adorn—be it
recordings, clips, or sound events. Serving as informative labels, **Tags**
offer a way to organize and contextualize data.
[**Tags**][soundevent.data.Tag] are informative labels within the `soundevent` package.
They add meaning to recordings, clips, or sound events, helping organize and contextualize data.

A **Tag** comprises two essential components: a _key_ and a _value_, both in the
form of simple text. While in many computational contexts, a **Tag** might be
considered just a text, we find it exceptionally beneficial to introduce a
_"namespace"_—the _key_—for each tag. This _key_ refines the meaning of the
**Tag** and establishes the context in which it is employed.
A **Tag** has two parts: a _term_ and a _value_.
The term acts as a namespace, refining the Tag's meaning and context.

```mermaid
erDiagram
Tag{
string key
Tag {
string value
}
Term
Tag ||--o| Term: term
```

The beauty lies in the flexibility offered – there are no restrictions on what
can be employed as a _key_ or _value_. This flexibility accommodates
project-specific requirements, allowing researchers to tailor **Tags** to their
unique needs and objectives.
You have the flexibility to use a term or not.
We strongly recommend it, but it's not mandatory.
This adaptability allows you to tailor Tags to your specific project needs.

??? Note "What is a namespace?"

Expand All @@ -72,42 +70,28 @@ unique needs and objectives.

## Features

[**Features**][soundevent.data.Feature] serve as numerical descriptions,
providing valuable information to the objects they enhance. They can encompass a
range of nature – from measurements of environmental sensors to attributes of
individuals creating a sound, even extending to abstract features extracted by
general-purpose deep learning models. When multiple **Features** accompany sound
events, clips, or recordings, they become tools for understanding similarities
and differences, allowing comparison and visualization in feature space.
**Features** play a pivotal role in outlier identification, gaining insights
into characteristic distribution, and enabling statistical analyses.

A **Feature** comprises a textual _name_ and a floating _value_. In
`soundevent`, lists of **Features** can be attached to various objects without
restrictions on the name or value. This flexibility allows for tailoring
features to specific project needs
[**Features**][soundevent.data.Feature] are numerical descriptions.
They can include measurements from environmental sensors, attributes of sound-producing individuals, or even abstract features extracted by deep learning models.
**Features** enable comparison, visualization, outlier identification, understanding characteristic distributions, and statistical analysis.

A **Feature** consists of a **Term** and a numerical _value_.

```mermaid
erDiagram
Feature {
string name
float value
}
Term
Feature ||--o| Term: term
```

## Notes

[**Notes**][soundevent.data.Note] serve as textual companions, allowing
communication among researchers and providing nuanced context to the objects
they accompany. Whether conveying vital information, engaging in discussions
about specific aspects of the attached objects, or flagging potential data
issues, **Notes** play an indispensable role in promoting collaboration and
enriching the overall understanding of audio data.

These textual _messages_, varying in length, also capture essential details such
as the note's _creator_ and the _time of creation_, ensuring proper recognition.
Beyond their informative role, **Notes** can be marked as _issues_ when
highlighting significant points requiring external review.
[**Notes**] are free-form textual additions, facilitating communication and providing context.
They can convey information, enable discussions, or flag data issues.

**Notes** can have any length and include the note's _creator_ and _time of creation_.
Notes can also be marked as issues to highlight points needing review.

```mermaid
erDiagram
Expand Down
67 changes: 24 additions & 43 deletions docs/data_schemas/index.md
Original file line number Diff line number Diff line change
@@ -1,79 +1,60 @@
# Data Schemas

Welcome to the data schemas tour with the `soundevent` package! In this
overview, we'll break down the various data schemas provided by the package into
the following sections:
Welcome to the data schemas tour with the `soundevent` package! In this overview, we'll break down the various data schemas provided by the package into the following sections:

## Describing the Data

`soundevent` equips you with tools to attach crucial information to diverse
objects encountered in bioacoustic analysis. These include:
`soundevent` provides tools to attach essential information to various objects in bioacoustic analysis:

- [Users](descriptors.md#users): Keeping reference of everyone's contribution.
- [Terms](descriptors#terms): Standardized vocabularies ensure consistent language.
- [Tags](descriptors.md#tags): Attaching semantic context to objects.
- [Features](descriptors.md#features): Numerical descriptors capturing
continuously varying attributes.
- [Features](descriptors.md#features): Numerical descriptors capturing continuously varying attributes.
- [Notes](descriptors.md#notes): User-written free-text annotations.

## Audio Content

Delving into the core of acoustic analysis, we have schemas for:
At the core of acoustic analysis, we have schemas for:

- [Recordings](audio_content.md#recordings): Complete audio files.
- [Dataset](audio_content.md#datasets): A collection of recordings from a common
source.
- [Dataset](audio_content.md#datasets): A collection of recordings from a common source.

## Acoustic Objects

Identifying distinctive sound elements within audio content, we have:
Identifying distinctive sound elements within audio content:

- [Geometric Objects](acoustic_objects.md#geometries): Defining Regions of
Interest (RoI) in the temporal-frequency plane.
- [Sound Events](acoustic_objects.md#sound_events): Individual sonic occurrences.
- [Geometric Objects](acoustic_objects.md#geometries): Defining Regions of Interest (RoI) in the temporal-frequency plane.
- [Sound Events](acoustic_objects.md#sound_events): Individual sonic occurrences.
- [Sequences](acoustic_objects.md#sequences): Patterns of connected sound events.
- [Clips](acoustic_objects.md#clips): Fragments extracted from recordings.

## Annotation

`soundevent` places emphasis on human annotation processes, covering:

- [Sound Event Annotations](annotation.md#sound_event_annotation): Expert-created
markers for relevant sound events.
- [Sequence Annotations](annotation.md#sequence_annotation): User provided
annotations of sequences of sound events.
- [Clip Annotations](annotation.md#clip_annotations): Annotations and notes at the
clip level.
- [Annotation Task](annotation.md#annotation_task): Descriptions of tasks and the
status of annotation.
- [Annotation Project](annotation.md#annotation_project): The collective
description of tasks and annotations.
- [Sound Event Annotations](annotation.md#sound_event_annotation): Expert-created markers for relevant sound events.
- [Sequence Annotations](annotation.md#sequence_annotation): User provided annotations of sequences of sound events.
- [Clip Annotations](annotation.md#clip_annotations): Annotations and notes at the clip level.
- [Annotation Task](annotation.md#annotation_task): Descriptions of tasks and the status of annotation.
- [Annotation Project](annotation.md#annotation_project): The collective description of tasks and annotations.

## Prediction

Automated processing methods also play a role, generating:

- [Sound Event Predictions](prediction.md#sound_event_predictions): Predictions
made during automated processing.
- [Sequence Predictions](prediction.md#sequence_predictions): Predictions of
sequences of sound events.
- [Clip Predictions](prediction.md#clip_predictions): Collections of predictions
and additional information at the clip level.
- [Model Runs](prediction.md#model_runs): Sets of clip predictions generated in a
single run by a specific model.
- [Sound Event Predictions](prediction.md#sound_event_predictions): Predictions made during automated processing.
- [Sequence Predictions](prediction.md#sequence_predictions): Predictions of sequences of sound events.
- [Clip Predictions](prediction.md#clip_predictions): Collections of predictions and additional information at the clip level.
- [Model Runs](prediction.md#model_runs): Sets of clip predictions generated in a single run by a specific model.

## Evaluation

Assessing the accuracy of predictions is crucial, and `soundevent` provides
schemas for:

- [Matches](evaluation.md#matches): Predicted sound events overlapping with ground
truth.
- [Clip Evaluation](evaluation.md#clip_evaluation): Information about matches and
performance metrics at the clip level.
- [Evaluation](evaluation.md#evaluation_1): Comprehensive details on model
performance across the entire evaluation set.
- [Evaluation Set](evaluation.md#evaluation_set): Human annotations serving as
ground truth.
Assessing the accuracy of predictions is crucial, and `soundevent` provides schemas for:

- [Matches](evaluation.md#matches): Predicted sound events overlapping with ground truth.
- [Clip Evaluation](evaluation.md#clip_evaluation): Information about matches and performance metrics at the clip level.
- [Evaluation](evaluation.md#evaluation_1): Comprehensive details on model performance across the entire evaluation set.
- [Evaluation Set](evaluation.md#evaluation_set): Human annotations serving as ground truth.

Want to know more? Dive in for a closer look at each of these schemas.

Expand Down
2 changes: 2 additions & 0 deletions docs/javascripts/jquery-3.3.1.min.js

Large diffs are not rendered by default.

Loading

0 comments on commit df5aee7

Please sign in to comment.