Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 20 additions & 13 deletions models/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -21,26 +21,33 @@ listing:
number-sections: false
---

see [description of model](https://isamplesorg.github.io/metadata/) at https://isamplesorg.github.io/metadata/
See the [iSamples Metadata Model](https://isamplesorg.github.io/metadata/) for the full schema documentation.

## Taxonomies
::: {.callout-tip}
### Vocabulary Source of Truth
The authoritative versions of iSamples vocabularies are maintained as RDF/SKOS files in the [iSamples GitHub repositories](https://github.com/isamplesorg/). Vocabulary terms are also registered with the [Australian Research Data Commons (ARDC) Research Vocabularies](https://vocabs.ardc.edu.au/).
:::

One of the foundations for interoperability of iSamples material sample descriptions is the definition of vocabularies for the categorization of sample type. There are three core vocabularies for different aspects of sample type: material sample type, material type, and sampled feature type. Each vocabulary is maintained as an RDF file using the SKOS vocabulary, with hierarchical relationships using [`SKOS:broader`](https://www.w3.org/2009/08/skos-reference/skos.html#broader). In order to be domain agnostic, these core taxonomies cover a small set of top level terms. The taxonomies may be extended as necessary to support more specialized domains by relating additional terms using `SKOS:broader` and `SKOS:narrower`.
## Taxonomies {.unnumbered}

The iSamples core taxonomies are controlled vocabularies with terms related by [`SKOS:broader`](https://www.w3.org/2009/08/skos-reference/skos.html#broader) and [`SKOS:narrower`](https://www.w3.org/2009/08/skos-reference/skos.html#narrower). In order to be domain agnostic, the core taxonomies cover a small set of top level terms. The taxonomies may be extended as necessary to support more specialized domains by relating additional terms using `SKOS:broader` and `SKOS:narrower`.
One of the foundations for interoperability of iSamples material sample descriptions is the definition of vocabularies for the categorization of sample type. There are three core vocabularies for different aspects of sample type: material sample type, material type, and sampled feature type. Each vocabulary is maintained as an RDF file using the SKOS vocabulary, with hierarchical relationships using [`SKOS:broader`](https://www.w3.org/2009/08/skos-reference/skos.html#broader). In order to be domain agnostic, these core taxonomies cover a small set of top level terms. The taxonomies may be extended as necessary to support more specialized domains by relating additional terms using `SKOS:broader` and `SKOS:narrower`.

The iSamples taxonomies are used to characterize three fundamental concepts pertaining to physical samples:

1. The "iSamples Materials vocabulary" is a taxonomy of terms used to categorize the composition of a physical sample, that is "What material is the sample composed of?"
2. The "Sampled Feature Type Vocabulary" is a taxonomy of terms used to indicate what the sample is representative of.
3. The "iSamples Specimen Type Vocabulary" is a taxonomy of broad categories that classify what type of spcimen the physical sample record represents.
1. The **Materials Vocabulary** categorizes the composition of a physical sample ("What material is the sample composed of?")
2. The **Sampled Feature Type Vocabulary** indicates what the sample is representative of
3. The **Specimen Type Vocabulary** classifies what type of specimen the physical sample record represents

Three taxonomies are currently defined :
::: {.callout-note collapse="true"}
## Core Vocabularies

[Material Sample (specimen) Type Vocabulary](generated/vocabularies/material_sample_object_type.html)

[Materials Vocabulary](generated/vocabularies/material_type.html)

[Sampled Feature (context) Type vocabulary](generated/vocabularies/sampled_feature_type.html)
- [Material Sample (specimen) Type Vocabulary](generated/vocabularies/material_sample_object_type.html)
- [Materials Vocabulary](generated/vocabularies/material_type.html)
- [Sampled Feature (context) Type vocabulary](generated/vocabularies/sampled_feature_type.html)
:::

## Related Pages {.unnumbered}

- [Architecture Overview](../design/index.qmd) — system principles and architecture
- [Requirements](../design/requirements.html) — 18 use cases and requirements
- [Metadata Model](https://isamplesorg.github.io/metadata/) — schema and data model documentation
54 changes: 38 additions & 16 deletions tutorials/index.qmd
Original file line number Diff line number Diff line change
@@ -1,30 +1,52 @@
---
title: "Tutorials"
subtitle: "Learn to explore 6.7 million physical samples from scientific collections worldwide using modern browser-based tools."
number-sections: false
---

Learn to explore **6.7 million physical samples** from scientific collections worldwide using modern browser-based tools.

## Start Here
## Start Here {.unnumbered}

| Tutorial | What You'll Learn |
|----------|-------------------|
| [**Interactive Explorer**](isamples_explorer.qmd) | Search and filter samples with faceted search, view on 3D globe |
| [**Deep-Dive Analysis**](zenodo_isamples_analysis.qmd) | Comprehensive DuckDB-WASM analysis with Observable JS |
| [**3D Globe Visualization**](parquet_cesium_isamples_wide.qmd) | Cesium-based visualization of all iSamples data |
| [**Technical: Narrow vs Wide**](narrow_vs_wide_performance.qmd) | Schema comparison and performance benchmarks |
| [**Interactive Explorer**](isamples_explorer.qmd) | Search and filter samples with faceted search, view results on a 3D globe |
| [**Deep-Dive Analysis**](zenodo_isamples_analysis.qmd) | Comprehensive DuckDB-WASM analysis with Observable JS — charts, maps, statistics |
| [**3D Globe Visualization**](parquet_cesium_isamples_wide.qmd) | Cesium-based progressive visualization with H3 spatial clustering |
| [**Technical: Narrow vs Wide**](narrow_vs_wide_performance.qmd) | Schema comparison and performance benchmarks for the PQG data formats |

## What's in the Data? {.unnumbered}

| Source | Samples | Focus |
|--------|---------|-------|
| **SESAR** | 4.6M | Earth science — rocks, minerals, sediments, soils |
| **OpenContext** | 1M | Archaeology — artifacts, excavation materials |
| **GEOME** | 605K | Biology — genomic and tissue specimens |
| **Smithsonian** | 322K | Natural history — museum collections |

## Data Sources
## Data Files {.unnumbered}

All tutorials use **geoparquet files** - no server required:
All data is hosted on [`data.isamples.org`](https://data.isamples.org) with HTTP range request support — DuckDB-WASM only downloads the bytes it needs.

- **iSamples Full Dataset**: ~280 MB wide format, 6.7M samples from SESAR, OpenContext, GEOME, Smithsonian
- **Available via**: Cloudflare R2 with HTTP range requests
| File | Size | Description |
|------|------|-------------|
| [Wide format](https://data.isamples.org/isamples_202601_wide.parquet) | 278 MB | One row per entity, all sources — primary file for tutorials |
| [Wide + H3](https://data.isamples.org/isamples_202601_wide_h3.parquet) | 292 MB | Wide format with H3 spatial indices for globe visualizations |
| [Facet summaries](https://data.isamples.org/isamples_202601_facet_summaries.parquet) | 2 KB | Pre-computed filter counts — loads instantly |
| [H3 clusters (res4)](https://data.isamples.org/isamples_202601_h3_summary_res4.parquet) | 0.6 MB | Zoomed-out globe view |

## Why Browser-Based?
## Why Browser-Based? {.unnumbered}

Our approach using **geoparquet + DuckDB-WASM** provides:

- ✅ **Universal access** - No installation, works in any browser
- ✅ **Fast analysis** - 5-10x faster than downloading full datasets
- ✅ **Memory efficient** - Analyze 300MB using <100MB browser memory
- ✅ **Minimal transfer** - Only download the columns/rows you need
- **Universal access** — No installation, works in Chrome, Firefox, Edge, Safari, and Brave
- **Fast analysis** — 5-10x faster than downloading full datasets
- **Memory efficient** — Analyze 300MB datasets using <100MB browser memory
- **Minimal transfer** — HTTP range requests download only the columns and rows you need (typically <1 MB to start)
- **Reproducible** — All code is visible and foldable on tutorial pages

## For Developers {.unnumbered}

All tutorial source code is on [GitHub](https://github.com/isamplesorg/isamplesorg.github.io/tree/main/tutorials). Want to build your own analysis? Fork the repo, modify a `.qmd` file, and run `quarto preview`.

- [GitHub repositories](https://github.com/isamplesorg/) — all source code and data pipelines
- [Zenodo community](https://zenodo.org/communities/isamples) — archived datasets for reproducible research
- [Query architecture](https://github.com/isamplesorg/isamplesorg.github.io/issues/82) — how the Explorer queries work under the hood
Loading