# ISA-API Getting Started Guide

This notebook demonstrates the basic usage of the ISA-API for creating, manipulating, and converting ISA metadata.

## What is ISA?

The ISA (Investigation-Study-Assay) framework helps manage metadata for life science, environmental, and biomedical experiments. The ISA-API provides tools to:

- **Create** ISA objects programmatically
- **Validate** ISA datasets
- **Convert** between ISA-Tab, ISA-JSON, and other formats
- **Read and manipulate** existing ISA datasets

## Installation

```bash
pip install isatools
```

## 1. Creating a Simple ISA Investigation

In [1]:
from isatools.model import (
    Investigation,
    Study,
    Assay,
    Source,
    Sample,
    Material,
    Process,
    Protocol,
    DataFile,
    OntologyAnnotation,
    OntologySource,
    Person,
    Publication,
    Characteristic,
    batch_create_materials
)

# Create an Investigation
investigation = Investigation()
investigation.identifier = "INV001"
investigation.title = "My First ISA Investigation"
investigation.description = "A simple example investigation using ISA-API"
investigation.submission_date = "2025-10-01"
investigation.public_release_date = "2025-12-01"

print(f"Created investigation: {investigation.title}")

Created investigation: My First ISA Investigation


## 2. Adding Ontology Sources

Ontologies provide controlled vocabularies for describing experimental metadata.

In [2]:
# Define ontology sources
ncbitaxon = OntologySource(
    name='NCBITaxon',
    description="NCBI Taxonomy",
    file="http://purl.bioontology.org/ontology/NCBITAXON"
)

obi = OntologySource(
    name='OBI',
    description="Ontology for Biomedical Investigations",
    file="http://purl.obolibrary.org/obo/obi.owl"
)

# Add to investigation
investigation.ontology_source_references.extend([ncbitaxon, obi])

print(f"Added {len(investigation.ontology_source_references)} ontology sources")

Added 2 ontology sources


## 3. Creating a Study with Contacts and Publications

In [3]:
# Create a study
study = Study(filename="s_study.txt")
study.identifier = "STUDY001"
study.title = "Metabolomics Study of Plant Stress Response"
study.description = "Investigating metabolic changes in plants under drought stress"
study.submission_date = "2025-10-01"
study.public_release_date = "2025-12-01"

# Add study design descriptor
intervention_design = OntologyAnnotation(
    term="intervention design",
    term_accession="http://purl.obolibrary.org/obo/OBI_0000115",
    term_source=obi
)
study.design_descriptors.append(intervention_design)

# Add contact person
contact = Person(
    first_name="Jane",
    last_name="Scientist",
    affiliation="Research Institute",
    email="jane.scientist@example.com",
    roles=[OntologyAnnotation(term="principal investigator")]
)
study.contacts.append(contact)

# Add publication
publication = Publication(
    title="Plant Stress Response Study",
    author_list="Scientist J, Researcher A",
    pubmed_id="12345678",
    doi="10.1234/example.doi"
)
publication.status = OntologyAnnotation(term="published")
study.publications.append(publication)

# Add study to investigation
investigation.studies.append(study)

print(f"Created study: {study.title}")
print(f"  Contact: {contact.first_name} {contact.last_name}")
print(f"  Publication: {publication.title}")

Created study: Metabolomics Study of Plant Stress Response
  Contact: Jane Scientist
  Publication: Plant Stress Response Study


## 4. Creating Source Materials and Samples

Source materials represent the biological material before any processing.

In [4]:
# Create a source material
source = Source(name='plant_source')

# Add organism characteristic
organism_characteristic = Characteristic(
    category=OntologyAnnotation(term="Organism"),
    value=OntologyAnnotation(
        term="Arabidopsis thaliana",
        term_source=ncbitaxon,
        term_accession="http://purl.bioontology.org/ontology/NCBITAXON/3702"
    )
)
source.characteristics.append(organism_characteristic)
study.sources.append(source)
study.characteristic_categories.append(organism_characteristic.category)

# Create sample prototype
prototype_sample = Sample(name='sample', derives_from=[source])

# Add characteristics to sample
treatment_characteristic = Characteristic(
    category=OntologyAnnotation(term="Treatment"),
    value=OntologyAnnotation(term="drought stress")
)
prototype_sample.characteristics.append(treatment_characteristic)
study.characteristic_categories.append(treatment_characteristic.category)

# Create batch of samples (control and treated)
study.samples = batch_create_materials(prototype_sample, n=6)

# Rename samples for clarity
for i, sample in enumerate(study.samples):
    if i < 3:
        sample.name = f"control_sample_{i+1}"
    else:
        sample.name = f"treated_sample_{i-2}"

print(f"Created {len(study.samples)} samples:")
for sample in study.samples:
    print(f"  - {sample.name}")

Created 6 samples:
  - control_sample_1
  - control_sample_2
  - control_sample_3
  - treated_sample_1
  - treated_sample_2
  - treated_sample_3


## 5. Creating Protocols and Processes

Protocols describe the experimental procedures, and Processes are instances of protocol execution.

In [5]:
# Create sample collection protocol
sample_collection_protocol = Protocol(
    name="sample collection",
    protocol_type=OntologyAnnotation(term="sample collection")
)
study.protocols.append(sample_collection_protocol)

# Create sample collection process
sample_collection_process = Process(executes_protocol=sample_collection_protocol)
sample_collection_process.inputs.append(source)
sample_collection_process.outputs.extend(study.samples)
study.process_sequence.append(sample_collection_process)

print(f"Created protocol: {sample_collection_protocol.name}")
print(f"Process: {len(sample_collection_process.inputs)} input -> {len(sample_collection_process.outputs)} outputs")

Created protocol: sample collection
Process: 1 input -> 6 outputs


## 6. Creating an Assay with Data Files

Assays represent the analytical measurements performed on samples.

In [6]:
# Create an assay
assay = Assay(filename="a_metabolomics.txt")
assay.measurement_type = OntologyAnnotation(term="metabolite profiling")
assay.technology_type = OntologyAnnotation(term="mass spectrometry")

# Create extraction protocol
extraction_protocol = Protocol(
    name='metabolite extraction',
    protocol_type=OntologyAnnotation(term="extraction")
)
study.protocols.append(extraction_protocol)

# Create mass spectrometry protocol
ms_protocol = Protocol(
    name='mass spectrometry',
    protocol_type=OntologyAnnotation(term="mass spectrometry")
)
study.protocols.append(ms_protocol)

# Create processes for each sample
for i, sample in enumerate(study.samples):
    # Extraction process
    extraction_process = Process(executes_protocol=extraction_protocol)
    extraction_process.inputs.append(sample)
    
    extract = Material(name=f"extract_{i}")
    extract.type = "Extract Name"
    extraction_process.outputs.append(extract)
    
    # MS analysis process
    ms_process = Process(executes_protocol=ms_protocol)
    ms_process.inputs.append(extract)
    
    # Create data file
    data_file = DataFile(
        filename=f"ms_data_{sample.name}.mzML",
        label="Raw Data File"
    )
    ms_process.outputs.append(data_file)
    
    # Add to assay
    assay.samples.append(sample)
    assay.other_material.append(extract)
    assay.data_files.append(data_file)
    assay.process_sequence.append(extraction_process)
    assay.process_sequence.append(ms_process)

# Add assay to study
study.assays.append(assay)

print(f"Created assay: {assay.filename}")
print(f"  Measurement type: {assay.measurement_type.term}")
print(f"  Technology type: {assay.technology_type.term}")
print(f"  Data files: {len(assay.data_files)}")

Created assay: a_metabolomics.txt
  Measurement type: metabolite profiling
  Technology type: mass spectrometry
  Data files: 6


## 7. Exporting to ISA-JSON

In [7]:
import json
from isatools.isajson import ISAJSONEncoder

# Convert to JSON string
isa_json = json.dumps(investigation, cls=ISAJSONEncoder, sort_keys=True, indent=2)

# Display first 1000 characters
print("ISA-JSON output (first 1000 characters):")
print(isa_json[:1000])
print("\n... (output truncated)")

# Save to file
with open('example_isa.json', 'w') as f:
    f.write(isa_json)

print("\nSaved ISA-JSON to: example_isa.json")

ISA-JSON output (first 1000 characters):
{
  "comments": [],
  "description": "A simple example investigation using ISA-API",
  "identifier": "INV001",
  "ontologySourceReferences": [
    {
      "comments": [],
      "description": "NCBI Taxonomy",
      "file": "http://purl.bioontology.org/ontology/NCBITAXON",
      "name": "NCBITaxon",
      "version": ""
    },
    {
      "comments": [],
      "description": "Ontology for Biomedical Investigations",
      "file": "http://purl.obolibrary.org/obo/obi.owl",
      "name": "OBI",
      "version": ""
    }
  ],
  "people": [],
  "publicReleaseDate": "2025-12-01",
  "publications": [],
  "studies": [
    {
      "assays": [
        {
          "characteristicCategories": [],
          "comments": [],
          "dataFiles": [
            {
              "@id": "#data_file/f9d80419-4738-478d-9fbc-7fa91430e55c",
              "comments": [],
              "name": "ms_data_control_sample_1.mzML",
              "type": "Raw Data File"
       

## 8. Exporting to ISA-Tab Format

In [8]:
from isatools import isatab
import os

# Create output directory
output_dir = './isa_tab_output'
os.makedirs(output_dir, exist_ok=True)

# Write ISA-Tab files
isatab.dump(investigation, output_dir)

# List created files
created_files = os.listdir(output_dir)
print(f"Created ISA-Tab files in '{output_dir}':")
for file in sorted(created_files):
    print(f"  - {file}")

['Sample Name', 'Protocol REF.0']
['Sample Name', 'Protocol REF.0', 'Extract Name', 'Protocol REF.1', 'MS Assay Name.0']
Created ISA-Tab files in './isa_tab_output':
  - a_metabolomics.txt
  - i_investigation.txt
  - s_study.txt


  DF = DF.replace('', nan)
  DF = DF.replace('', nan)


## 9. Reading Existing ISA-Tab Files

In [9]:
# Read back the ISA-Tab we just created
with open(os.path.join(output_dir, 'i_investigation.txt')) as f:
    loaded_investigation = isatab.load(f)

print(f"Loaded investigation: {loaded_investigation.identifier}")
print(f"  Title: {loaded_investigation.title}")
print(f"  Number of studies: {len(loaded_investigation.studies)}")

for study in loaded_investigation.studies:
    print(f"\n  Study: {study.identifier}")
    print(f"    Title: {study.title}")
    print(f"    Sources: {len(study.sources)}")
    print(f"    Samples: {len(study.samples)}")
    print(f"    Assays: {len(study.assays)}")
    
    for assay in study.assays:
        print(f"      Assay: {assay.filename}")
        print(f"        Data files: {len(assay.data_files)}")

Loaded investigation: INV001
  Title: My First ISA Investigation
  Number of studies: 1

  Study: STUDY001
    Title: Metabolomics Study of Plant Stress Response
    Sources: 1
    Samples: 6
    Assays: 1
      Assay: a_metabolomics.txt
        Data files: 6


## 10. Validating ISA-Tab Files

In [10]:
from isatools import isatab

# Validate the ISA-Tab directory
try:
    validation_report = isatab.validate(open(os.path.join(output_dir, 'i_investigation.txt')))
    
    print("Validation Report:")
    print(f"  Errors: {len(validation_report.get('errors', []))}")
    print(f"  Warnings: {len(validation_report.get('warnings', []))}")
    print(f"  Info: {len(validation_report.get('info', []))}")
    
    if validation_report.get('errors'):
        print("\nErrors found:")
        for error in validation_report['errors'][:5]:  # Show first 5 errors
            print(f"  - {error}")
    else:
        print("\n✓ Validation successful! No errors found.")
        
except Exception as e:
    print(f"Validation error: {e}")

Validation Report:
  Errors: 0
  Info: 2

✓ Validation successful! No errors found.


## 11. Converting ISA-Tab to ISA-JSON

In [11]:
from isatools import isatab
from isatools.isajson import ISAJSONEncoder

# Read ISA-Tab
with open(os.path.join(output_dir, 'i_investigation.txt')) as f:
    inv = isatab.load(f)

# Convert to JSON
json_output = json.dumps(inv, cls=ISAJSONEncoder, indent=2)

# Save JSON
with open('converted_isa.json', 'w') as f:
    f.write(json_output)

print("Converted ISA-Tab to ISA-JSON")
print(f"Output saved to: converted_isa.json")
print(f"JSON size: {len(json_output)} characters")

Converted ISA-Tab to ISA-JSON
Output saved to: converted_isa.json
JSON size: 26338 characters


## Summary

This notebook demonstrated:

1. ✓ Creating ISA Investigation, Study, and Assay objects
2. ✓ Adding ontology annotations and controlled vocabularies
3. ✓ Creating source materials, samples, and processes
4. ✓ Defining protocols and linking them to processes
5. ✓ Creating assays with data files
6. ✓ Exporting to ISA-JSON format
7. ✓ Exporting to ISA-Tab format
8. ✓ Reading existing ISA-Tab files
9. ✓ Validating ISA metadata
10. ✓ Converting between ISA-Tab and ISA-JSON

## Additional Resources

- **Documentation**: https://isa-tools.org/isa-api/
- **GitHub**: https://github.com/ISA-tools/isa-api
- **ISA Community**: https://www.isacommons.org
- **ISA Cookbook**: More advanced examples in the `isa-cookbook/` directory