Merge pull request #12 from mbsantiago/feat/terms

Feat/terms: Introduce Term data model for standardized tags and features
mbsantiago · Aug 29, 2024 · df5aee7 · df5aee7
2 parents 6e49d19 + b00261a
commit df5aee7
Show file tree

Hide file tree

Showing 64 changed files with 4,203 additions and 1,200 deletions.
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -6,26 +6,51 @@ on:
     branches: ["main"]
 jobs:
   test:
-    runs-on: ubuntu-latest
+    env:
+      UV_CACHE_DIR: /tmp/.uv-cache
     strategy:
       fail-fast: false
       matrix:
-        python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
+        python-version:
+          - "3.9"
+          - "3.10"
+          - "3.11"
+          - "3.12"
+        os:
+          - ubuntu-latest
+          - windows-latest
+          - macos-latest
+    runs-on: ${{ matrix.os }}
     steps:
-      - uses: actions/checkout@v3
-      - name: Set up Python ${{ matrix.python-version }}
-        uses: actions/setup-python@v3
-        with:
-          python-version: ${{ matrix.python-version }}
+      - uses: actions/checkout@v4
       - name: Install dependencies
+        if: ${{ matrix.os == 'ubuntu-latest' }}
         run: |
           sudo apt-get update && sudo apt-get install libsndfile1
-          python -m pip install --upgrade pip
-          python -m pip install pytest pytest-xdist hypothesis ruff pyright html5lib
-          python -m pip install ".[all]"
+      - name: Set up uv
+        if: ${{ matrix.os == 'ubuntu-latest' || matrix.os == 'macos-latest' }}
+        run: curl -LsSf https://astral.sh/uv/install.sh | sh
+      - name: Set up uv
+        if: ${{ matrix.os == 'windows-latest' }}
+        run: irm https://astral.sh/uv/install.ps1 | iex
+        shell: powershell
+      - name: Set up Python ${{ matrix.python-version }}
+        run: uv python install ${{ matrix.python-version }}
+      - name: Restore uv cache
+        uses: actions/cache@v4
+        with:
+          path: /tmp/.uv-cache
+          key: uv-${{ runner.os }}-${{ hashFiles('uv.lock') }}
+          restore-keys: |
+            uv-${{ runner.os }}-${{ hashFiles('uv.lock') }}
+            uv-${{ runner.os }}
+      - name: Install the project
+        run: uv sync --all-extras --dev
       - name: Make sure types are consistent
-        run: pyright src
+        run: uv run pyright src
       - name: Lint with ruff
-        run: ruff check src
+        run: uv run ruff check src
       - name: Test with pytest
-        run: pytest tests -n auto
+        run: uv run pytest tests -n auto
+      - name: Minimize uv cache
+        run: uv cache prune --ci
diff --git a/.python-version b/.python-version
@@ -0,0 +1 @@
+3.9
diff --git a/Makefile b/Makefile
@@ -111,5 +111,4 @@ docs:             ## Build the documentation.
 .PHONY: docs-serve
 docs-serve:             ## Build the documentation and watch for changes.
 	@echo "building documentation ..."
-	URL="http://localhost:8000/soundevent/"; xdg-open $$URL || sensible-browser $$URL || x-www-browser $$URL || gnome-open $$URL
-	@$(ENV_PREFIX)mkdocs serve
+	@$(ENV_PREFIX)mkdocs serve --open
diff --git a/README.md b/README.md
@@ -4,7 +4,7 @@
 [![PyPI version](https://badge.fury.io/py/soundevent.svg)](https://badge.fury.io/py/soundevent)
 ![tests](https://github.com/mbsantiago/soundevent/actions/workflows/test.yml/badge.svg)
 [![docs](https://github.com/mbsantiago/soundevent/actions/workflows/docs.yml/badge.svg)](https://mbsantiago.github.io/soundevent/)
-![Python 3.8 +](https://img.shields.io/badge/python->=_3.8-blue.svg)
+![Python 3.9 +](https://img.shields.io/badge/python->=_3.9-blue.svg)
 ![Static Badge](https://img.shields.io/badge/formatting-black-black)
 [![codecov](https://codecov.io/gh/mbsantiago/soundevent/branch/main/graph/badge.svg?token=42kVE87avA)](https://codecov.io/gh/mbsantiago/soundevent)
 

diff --git a/docs/data_schemas/descriptors.md b/docs/data_schemas/descriptors.md
@@ -1,22 +1,16 @@
 # Data Description
 
-Let's delve into **users**, **tags**, **features**, and **notes** – the tools
-that add depth to our bioacoustic research. Categorical tags, numerical
-features, and freeform notes bring an extra layer of understanding to our
-research objects, while user information provides adequate attribution
-to the contribution of all involved.
+Let's explore **users**, **terms**, **tags**, **features**, and **notes** – essential tools for enriching bioacoustic research.
+Controlled vocabularies (terms), categorical tags, numerical features, and free-form notes provide deeper context and insights into your research objects.
+User information ensures proper attribution for everyone involved.
 
 ## Users
 
-Collaboration is at the heart of most bioacoustic analyses, involving data
-collectors, annotators, reviewers, administrators, developers, and researchers.
-To ensure proper attribution of work, soundevent introduces a
-[**Users**][soundevent.data.User] data schema, holding minimal information about
-each individual involved. The **User** object can optionally include a _name_,
-_email_, _username_ (a commonly known alias), and _institution_. Recognizing the
-sensitivity of this information, it's important to ensure that individuals are
-comfortable sharing these details. If privacy concerns persist, User objects can
-be omitted altogether.
+Bioacoustic analysis often involves collaboration between data collectors, annotators, reviewers, administrators, developers, and researchers.
+To acknowledge contributions, soundevent introduces a [**Users**][soundevent.data.User] data schema, storing basic information about each individual.
+The **User** object can optionally include _name_, _email_, _username_ and _institution_.
+It's crucial to respect privacy and ensure individuals are comfortable sharing this information.
+If concerns remain, User objects can be omitted entirely.
 
 ```mermaid
 erDiagram
@@ -29,31 +23,35 @@ erDiagram
     }
 ```
 
+## Terms
+
+[**Terms**][soundevent.data.Term] ensure everyone's on the same page.
+Inconsistent naming like "species" vs. "Species" wastes time.
+**Terms** provide a controlled vocabulary for common properties used in annotations and descriptions.
+
+We've selected terms from established vocabularies like [Darwin Core](https://dwc.tdwg.org/list/) and [Audiovisual Core](https://ac.tdwg.org/termlist/), aligning your work with best practices.
+Take a look here for the [terms][soundevent.terms] defined in soundevent.
+
 ## Tags
 
-[**Tags**][soundevent.data.Tag] within the `soundevent` package are like
-categorical variables that add specific meaning to the objects they adorn—be it
-recordings, clips, or sound events. Serving as informative labels, **Tags**
-offer a way to organize and contextualize data.
+[**Tags**][soundevent.data.Tag] are informative labels within the `soundevent` package.
+They add meaning to recordings, clips, or sound events, helping organize and contextualize data.
 
-A **Tag** comprises two essential components: a _key_ and a _value_, both in the
-form of simple text. While in many computational contexts, a **Tag** might be
-considered just a text, we find it exceptionally beneficial to introduce a
-_"namespace"_—the _key_—for each tag. This _key_ refines the meaning of the
-**Tag** and establishes the context in which it is employed.
+A **Tag** has two parts: a _term_ and a _value_.
+The term acts as a namespace, refining the Tag's meaning and context.
 
 ```mermaid
 erDiagram
-    Tag{
-        string key
+    Tag {
         string value
     }
+    Term
+    Tag ||--o| Term: term
 ```
 
-The beauty lies in the flexibility offered – there are no restrictions on what
-can be employed as a _key_ or _value_. This flexibility accommodates
-project-specific requirements, allowing researchers to tailor **Tags** to their
-unique needs and objectives.
+You have the flexibility to use a term or not.
+We strongly recommend it, but it's not mandatory.
+This adaptability allows you to tailor Tags to your specific project needs.
 
 ??? Note "What is a namespace?"
 
@@ -72,42 +70,28 @@ unique needs and objectives.
 
 ## Features
 
-[**Features**][soundevent.data.Feature] serve as numerical descriptions,
-providing valuable information to the objects they enhance. They can encompass a
-range of nature – from measurements of environmental sensors to attributes of
-individuals creating a sound, even extending to abstract features extracted by
-general-purpose deep learning models. When multiple **Features** accompany sound
-events, clips, or recordings, they become tools for understanding similarities
-and differences, allowing comparison and visualization in feature space.
-**Features** play a pivotal role in outlier identification, gaining insights
-into characteristic distribution, and enabling statistical analyses.
-
-A **Feature** comprises a textual _name_ and a floating _value_. In
-`soundevent`, lists of **Features** can be attached to various objects without
-restrictions on the name or value. This flexibility allows for tailoring
-features to specific project needs
+[**Features**][soundevent.data.Feature] are numerical descriptions.
+They can include measurements from environmental sensors, attributes of sound-producing individuals, or even abstract features extracted by deep learning models.
+**Features** enable comparison, visualization, outlier identification, understanding characteristic distributions, and statistical analysis.
+
+A **Feature** consists of a **Term** and a numerical _value_.
 
 ```mermaid
 erDiagram
     Feature {
-        string name
         float value
     }
+    Term
+    Feature ||--o| Term: term
 ```
 
 ## Notes
 
-[**Notes**][soundevent.data.Note] serve as textual companions, allowing
-communication among researchers and providing nuanced context to the objects
-they accompany. Whether conveying vital information, engaging in discussions
-about specific aspects of the attached objects, or flagging potential data
-issues, **Notes** play an indispensable role in promoting collaboration and
-enriching the overall understanding of audio data.
-
-These textual _messages_, varying in length, also capture essential details such
-as the note's _creator_ and the _time of creation_, ensuring proper recognition.
-Beyond their informative role, **Notes** can be marked as _issues_ when
-highlighting significant points requiring external review.
+[**Notes**] are free-form textual additions, facilitating communication and providing context.
+They can convey information, enable discussions, or flag data issues.
+
+**Notes** can have any length and include the note's _creator_ and _time of creation_.
+Notes can also be marked as issues to highlight points needing review.
 
 ```mermaid
 erDiagram

diff --git a/docs/data_schemas/index.md b/docs/data_schemas/index.md
@@ -1,79 +1,60 @@
 # Data Schemas
 
-Welcome to the data schemas tour with the `soundevent` package! In this
-overview, we'll break down the various data schemas provided by the package into
-the following sections:
+Welcome to the data schemas tour with the `soundevent` package! In this overview, we'll break down the various data schemas provided by the package into the following sections:
 
 ## Describing the Data
 
-`soundevent` equips you with tools to attach crucial information to diverse
-objects encountered in bioacoustic analysis. These include:
+`soundevent` provides tools to attach essential information to various objects in bioacoustic analysis:
 
 - [Users](descriptors.md#users): Keeping reference of everyone's contribution.
+- [Terms](descriptors#terms): Standardized vocabularies ensure consistent language.
 - [Tags](descriptors.md#tags): Attaching semantic context to objects.
-- [Features](descriptors.md#features): Numerical descriptors capturing
-  continuously varying attributes.
+- [Features](descriptors.md#features): Numerical descriptors capturing continuously varying attributes.
 - [Notes](descriptors.md#notes): User-written free-text annotations.
 
 ## Audio Content
 
-Delving into the core of acoustic analysis, we have schemas for:
+At the core of acoustic analysis, we have schemas for:
 
 - [Recordings](audio_content.md#recordings): Complete audio files.
-- [Dataset](audio_content.md#datasets): A collection of recordings from a common
-  source.
+- [Dataset](audio_content.md#datasets): A collection of recordings from a common source.
 
 ## Acoustic Objects
 
-Identifying distinctive sound elements within audio content, we have:
+Identifying distinctive sound elements within audio content:
 
-- [Geometric Objects](acoustic_objects.md#geometries): Defining Regions of
-  Interest (RoI) in the temporal-frequency plane.
-- [Sound Events](acoustic_objects.md#sound_events): Individual sonic occurrences.
+- [Geometric Objects](acoustic_objects.md#geometries): Defining Regions of Interest (RoI) in the temporal-frequency plane.
+- [Sound Events](acoustic_objects.md#sound_events): Individual sonic occurrences.
 - [Sequences](acoustic_objects.md#sequences): Patterns of connected sound events.
 - [Clips](acoustic_objects.md#clips): Fragments extracted from recordings.
 
 ## Annotation
 
 `soundevent` places emphasis on human annotation processes, covering:
 
-- [Sound Event Annotations](annotation.md#sound_event_annotation): Expert-created
-  markers for relevant sound events.
-- [Sequence Annotations](annotation.md#sequence_annotation): User provided
-  annotations of sequences of sound events.
-- [Clip Annotations](annotation.md#clip_annotations): Annotations and notes at the
-  clip level.
-- [Annotation Task](annotation.md#annotation_task): Descriptions of tasks and the
-  status of annotation.
-- [Annotation Project](annotation.md#annotation_project): The collective
-  description of tasks and annotations.
+- [Sound Event Annotations](annotation.md#sound_event_annotation): Expert-created markers for relevant sound events.
+- [Sequence Annotations](annotation.md#sequence_annotation): User provided annotations of sequences of sound events.
+- [Clip Annotations](annotation.md#clip_annotations): Annotations and notes at the clip level.
+- [Annotation Task](annotation.md#annotation_task): Descriptions of tasks and the status of annotation.
+- [Annotation Project](annotation.md#annotation_project): The collective description of tasks and annotations.
 
 ## Prediction
 
 Automated processing methods also play a role, generating:
 
-- [Sound Event Predictions](prediction.md#sound_event_predictions): Predictions
-  made during automated processing.
-- [Sequence Predictions](prediction.md#sequence_predictions): Predictions of
-  sequences of sound events.
-- [Clip Predictions](prediction.md#clip_predictions): Collections of predictions
-  and additional information at the clip level.
-- [Model Runs](prediction.md#model_runs): Sets of clip predictions generated in a
-  single run by a specific model.
+- [Sound Event Predictions](prediction.md#sound_event_predictions): Predictions made during automated processing.
+- [Sequence Predictions](prediction.md#sequence_predictions): Predictions of sequences of sound events.
+- [Clip Predictions](prediction.md#clip_predictions): Collections of predictions and additional information at the clip level.
+- [Model Runs](prediction.md#model_runs): Sets of clip predictions generated in a single run by a specific model.
 
 ## Evaluation
 
-Assessing the accuracy of predictions is crucial, and `soundevent` provides
-schemas for:
-
-- [Matches](evaluation.md#matches): Predicted sound events overlapping with ground
-  truth.
-- [Clip Evaluation](evaluation.md#clip_evaluation): Information about matches and
-  performance metrics at the clip level.
-- [Evaluation](evaluation.md#evaluation_1): Comprehensive details on model
-  performance across the entire evaluation set.
-- [Evaluation Set](evaluation.md#evaluation_set): Human annotations serving as
-  ground truth.
+Assessing the accuracy of predictions is crucial, and `soundevent` provides schemas for:
+
+- [Matches](evaluation.md#matches): Predicted sound events overlapping with ground truth.
+- [Clip Evaluation](evaluation.md#clip_evaluation): Information about matches and performance metrics at the clip level.
+- [Evaluation](evaluation.md#evaluation_1): Comprehensive details on model performance across the entire evaluation set.
+- [Evaluation Set](evaluation.md#evaluation_set): Human annotations serving as ground truth.
 
 Want to know more? Dive in for a closer look at each of these schemas.
 

diff --git a/docs/javascripts/jquery-3.3.1.min.js b/docs/javascripts/jquery-3.3.1.min.js