# Dataset

A `Dataset` is a collection of `Projects` that contain molecular dynamics simulations or related data, with some shared metadata and characteristics due to how they were generated. For each `Project`, in the context of the **MDDB Workflow**, we are refering to a set of simulations/replicas, with one or more trajectory files and a common topology file. To complete the definitions, individual simulations or replicas are referred to as `MD`.

The main functionality of this class is keeping track of the state of many Projects: if they are still running, if they are done or if they fail and what caused the error. For this the only adjustment we have to do is adding the path where our main SQLite storage file will be kept. We can do this by using the `dataset_path` flag during the workflow execution:

`mwf run ... --dataset_path path/to/our_dataset.db`

Or, if we do no want to write the flag everytime, by using the field `dataset_path` in the input.yaml config file:
```yaml
- dataset_path: path/to/our_dataset.db
```


## Creating a new Dataset

However, having to modify the inputs file for every project of the dataset may be very cumbersome, as Datasets can be form by hundreds or thousand projects. For this we can make use of another feature of this class: automatic inputs file generation.

### Directory Structure

For this, we part from a root folder, that every person may be organize on its own ways, but they normally follow a hierarchical structure with all its project that may look something like this:

``` bash
new_dataset/
├── project_1/
├── project_2/
├── project_3/
├── project_4/ 
├── ...
├──── special_cases/
├────── case_1/
├────── case_2/
├────── ...
├──── wrong_cases/
├────── case_1/
├────── case_2/
├────── ...
├── scripts/
├── project_logs/
└── ...
```
Note of we do not specify nothing about `MDs` as we will take care of that later.

In [1]:
import os

# Create directory structure
dataset_dir = "new_dataset"
dirs = [
    dataset_dir+"/project_1",
    dataset_dir+"/project_2",
    dataset_dir+"/project_3",
    dataset_dir+"/project_4",
    dataset_dir+"/special_cases/case_1",
    dataset_dir+"/special_cases/case_2",
    dataset_dir+"/wrong_cases/case_1",
    dataset_dir+"/wrong_cases/case_2",
    dataset_dir+"/scripts",
    dataset_dir+"/project_logs",
]

for dir_path in dirs:
    os.makedirs(dir_path, exist_ok=True)

In [2]:
%load_ext autoreload
%autoreload 2
from mddb_workflow.core.dataset import Dataset

# Create test directory structure
dataset_dir = "new_dataset"
# Initialize the Dataset
db_path = dataset_dir+"/new_dataset.db"
# Remove database in case the notebook is re-run
if os.path.exists(db_path):
    os.remove(db_path)

# Create dataset and scan for projects and MDs
ds = Dataset(dataset_path=db_path)

In [None]:
# CLI: mwf dataset add new_dataset.db -p project_* special_cases/* --ignore-dirs */logs
ds.add_entries([dataset_dir+'/project_* ',
                dataset_dir+'/special_cases/*',],
                ignore_dirs=[dataset_dir+'/*logs'],
                verbose=True)

Adding project: special_cases/case_1 (UUID: b891ec30-42b5-4a4a-bafa-72ac8479fbf8)
Adding project: special_cases/case_2 (UUID: 6ec4235c-6b30-471e-a8d3-4cb6de133353)


In CLI this can be done by using the `mwf dataset add` command, specifying the root folders and some optional arguments to ignore some subfolders that do not contain projects. For example:

`mwf dataset add new_dataset.db -p project_* special_cases/* --ignore-dirs */logs`

Some useful glob patterns:

- `*`: matches all the folders.
- `**/*`: matches all subfolders.
- `**/[0-9]*`: matches subfolders starting with a digit.

In [None]:
ds.dataframe

Unnamed: 0_level_0,project_uuid,scope,rel_path,num_mds,state,message,last_modified
uuid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
b891ec30,,projects,special_cases/case_1,0,new,No information recorded yet.,12:27:05 27-01-2026
6ec4235c,,projects,special_cases/case_2,0,new,No information recorded yet.,12:27:05 27-01-2026


In [16]:
# CLI
!mwf dataset show {dataset_dir}/new_dataset.db

[3m                                  MDDB Dataset                                  [0m
┏━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃[1m [0m[1mproject_u…[0m[1m [0m┃[1m [0m[1mscope   [0m[1m [0m┃[1m [0m[1mrel_path [0m[1m [0m┃[1m [0m[1mnum_mds[0m[1m [0m┃[1m [0m[1mstate[0m[1m [0m┃[1m [0m[1mmessage   [0m[1m [0m┃[1m [0m[1mlast_mod…[0m[1m [0m┃
┡━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━┩
│            │ projects │ ../speci… │ 0       │ new   │ No         │ 12:27:05  │
│            │          │           │         │       │ informati… │ 27-01-20… │
│            │          │           │         │       │ recorded   │           │
│            │          │           │         │       │ yet.       │           │
│            │ projects │ ../speci… │ 0       │ new   │ No         │ 12:27:05  │
│            │          │           │         │       │ informati… │ 27-01-20… │
│            │

In [6]:
ds.get_status(dataset_dir+'/special_cases/case_1')

{'uuid': 'b891ec30-42b5-4a4a-bafa-72ac8479fbf8',
 'rel_path': 'special_cases/case_1',
 'num_mds': 0,
 'state': 'new',
 'message': 'No information recorded yet.',
 'last_modified': '12:27:05 27-01-2026',
 'scope': 'Project'}

In [23]:
!mwf dataset status {dataset_dir}/new_dataset.db -p {dataset_dir}'/special_cases/case_1'

UUID:          b891ec30-42b5-4a4a-bafa-72ac8479fbf8
Path:          special_cases/case_1
State:         new
Scope:         Project
MDs:           0
Last Modified: 12:27:05 27-01-2026
Message:       No information recorded yet.


### Generating inputs files programmatically

#### Jinja2 templates

In [11]:
inputs_template_str = """
authors:
- Rubén Chaves
collections:
- mdbind
contact: For any questions please send a mail to ruben.chaves@irbbarcelona.org
dataset_path: {{DATASET}}
description: 10 ns simulation of {{DIR}} pdb structure
linkcense: https://creativecommons.org/licenses/by/4.0/
name: Project {{DIR}}
"""

inputs_template = dataset_dir+'/inputs_template.yaml'
with open(inputs_template, 'w') as f:
    f.write(inputs_template_str)

In [None]:
# CLI: mwf dataset inputs new_dataset.db -it inputs_template.yaml -o
ds.generate_inputs_yaml(inputs_template, overwrite=True)

In [17]:
!cat {dataset_dir}/project_1/inputs.yaml


authors:
- Rubén Chaves
collections:
- mdbind
contact: For any questions please send a mail to ruben.chaves@irbbarcelona.org
dataset_path: ../new_dataset.db
description: 10 ns simulation of project_1 pdb structure
linkcense: https://creativecommons.org/licenses/by/4.0/
name: Project project_1

#### Adding custom fields

In [18]:
inputs_template_str = """
authors:
- Rubén Chaves
collections:
- mdbind
contact: For any questions please send a mail to ruben.chaves@irbbarcelona.org
dataset_path: {{DATASET}}
description: 10 ns simulation of {{DIR}} pdb structure
linkcense: https://creativecommons.org/licenses/by/4.0/
name: Project {{DIR}}
{%- if is_special_case %}
description: Special case description for {{DIR}}
{% endif %}
"""

inputs_template = dataset_dir+'/inputs_template.yaml'
with open(inputs_template, 'w') as f:
    f.write(inputs_template_str)

def input_generator(project_dir: str):
    if "special_cases" in project_dir:
        return {'is_special_case': True}

In [None]:
# CLI: mwf dataset inputs new_dataset.db -it inputs_template.yaml -ig inputs_generator.py -o
ds.generate_inputs_yaml(inputs_template, overwrite=True,
                        input_generator=input_generator)

In [20]:
# Now we add a field only for special cases
!cat {dataset_dir}/special_cases/case_1/inputs.yaml


authors:
- Rubén Chaves
collections:
- mdbind
contact: For any questions please send a mail to ruben.chaves@irbbarcelona.org
dataset_path: ../../new_dataset.db
description: 10 ns simulation of case_1 pdb structure
linkcense: https://creativecommons.org/licenses/by/4.0/
name: Project case_1
description: Special case description for case_1


In [21]:
# While for regular projects the file remains unchanged
!cat {dataset_dir}/project_1/inputs.yaml


authors:
- Rubén Chaves
collections:
- mdbind
contact: For any questions please send a mail to ruben.chaves@irbbarcelona.org
dataset_path: ../new_dataset.db
description: 10 ns simulation of project_1 pdb structure
linkcense: https://creativecommons.org/licenses/by/4.0/
name: Project project_1

#### Handling multiple and variable number of MDs

In [22]:
# Create directory structure
dirs = [
    dataset_dir+"/many_mds",
    dataset_dir+"/many_mds/project_1",
    dataset_dir+"/many_mds/project_2",
]
files = [
    # A project with 3 equilibration and 3 MD replicas
    dataset_dir+"/many_mds/project_1/equil_1.traj",
    dataset_dir+"/many_mds/project_1/equil_2.traj",
    dataset_dir+"/many_mds/project_1/equil_3.traj",
    dataset_dir+"/many_mds/project_1/prod_1.traj",
    dataset_dir+"/many_mds/project_1/prod_2.traj",
    dataset_dir+"/many_mds/project_1/prod_3.traj",
    # A project with 2 equilibration and 2 MD replicas
    dataset_dir+"/many_mds/project_2/equil_1.traj",
    dataset_dir+"/many_mds/project_2/equil_2.traj",
    dataset_dir+"/many_mds/project_2/prod_1.traj",
    dataset_dir+"/many_mds/project_2/prod_2.traj",
]
for dir_path in dirs:
    os.makedirs(dir_path, exist_ok=True)

for file_path in files:
    with open(file_path, 'w') as f:
        f.write("DUMMY TRAJ FILE\n")

In [23]:
ds.add_entries([dataset_dir+'/many_mds/*'], verbose=True)

Adding project: many_mds/project_1 (UUID: 452bf26e-02d5-4a65-bd47-ccdd4993abdf)
Adding project: many_mds/project_2 (UUID: 366a2d30-d171-4eab-b112-2107646af15d)


In [24]:
ds.dataframe

Unnamed: 0_level_0,project_uuid,scope,rel_path,num_mds,state,message,last_modified
uuid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
452bf26e,,projects,many_mds/project_1,0,new,No information recorded yet.,11:09:44 15-01-2026
366a2d30,,projects,many_mds/project_2,0,new,No information recorded yet.,11:09:44 15-01-2026
74fee8ef,,projects,project_1,0,new,No information recorded yet.,10:54:38 15-01-2026
367ec4ba,,projects,project_2,0,new,No information recorded yet.,10:54:38 15-01-2026
28e2773f,,projects,project_3,0,new,No information recorded yet.,10:54:38 15-01-2026
ed4f1fcc,,projects,project_4,0,new,No information recorded yet.,10:54:38 15-01-2026
228a3bb7,,projects,special_cases/case_1,0,new,No information recorded yet.,10:54:38 15-01-2026
b6a4f0d9,,projects,special_cases/case_2,0,new,No information recorded yet.,10:54:38 15-01-2026


In [25]:
inputs_template_str = """
authors:
- Rubén Chaves
collections:
- mdbind
contact: For any questions please send a mail to ruben.chaves@irbbarcelona.org
dataset_path: {{DATASET}}
description: 10 ns simulation of {{DIR}} pdb structure
linkcense: https://creativecommons.org/licenses/by/4.0/
name: Project {{DIR}}
mds:
{% for md in mds %}
  -
    mdir: {{ md.mdir }}
    input_trajectory_filepaths: {{ md.traj }}
{% endfor %}
"""

inputs_template = dataset_dir+'/inputs_template.yaml'
with open(inputs_template, 'w') as f:
    f.write(inputs_template_str)

In [26]:
from pathlib import Path


def input_generator(project_dir: str):
    """Generate a list of MD replicas based on the traj files in the project directory."""
    mds = []
    project_path = Path(project_dir)
    prod_trajs = sorted(project_path.glob('prod_*.traj'))
    num_replicas = len(prod_trajs)
    for i in range(num_replicas):
        mds.append({
            'mdir': f'md_replica_{i+1}',
            'traj': prod_trajs[i].relative_to(project_path).as_posix(),
        })
    return {'mds': mds}

In [27]:
input_generator(dataset_dir+'/many_mds/project_1')

{'mds': [{'mdir': 'md_replica_1', 'traj': 'prod_1.traj'},
  {'mdir': 'md_replica_2', 'traj': 'prod_2.traj'},
  {'mdir': 'md_replica_3', 'traj': 'prod_3.traj'}]}

In [28]:
ds.generate_inputs_yaml(inputs_template, overwrite=True,
                        input_generator=input_generator)

In [29]:
# Generated inputs.yaml for project with 3 replicas
!cat new_dataset/many_mds/project_1/inputs.yaml


authors:
- Rubén Chaves
collections:
- mdbind
contact: For any questions please send a mail to ruben.chaves@irbbarcelona.org
dataset_path: ../../new_dataset.db
description: 10 ns simulation of project_1 pdb structure
linkcense: https://creativecommons.org/licenses/by/4.0/
name: Project project_1
mds:

  -
    mdir: md_replica_1
    input_trajectory_filepaths: prod_1.traj

  -
    mdir: md_replica_2
    input_trajectory_filepaths: prod_2.traj

  -
    mdir: md_replica_3
    input_trajectory_filepaths: prod_3.traj


In [30]:
# Generated inputs.yaml for project with 2 replicas
!cat new_dataset/many_mds/project_2/inputs.yaml


authors:
- Rubén Chaves
collections:
- mdbind
contact: For any questions please send a mail to ruben.chaves@irbbarcelona.org
dataset_path: ../../new_dataset.db
description: 10 ns simulation of project_2 pdb structure
linkcense: https://creativecommons.org/licenses/by/4.0/
name: Project project_2
mds:

  -
    mdir: md_replica_1
    input_trajectory_filepaths: prod_1.traj

  -
    mdir: md_replica_2
    input_trajectory_filepaths: prod_2.traj


# Already run projects

In [35]:
import os
import json
from uuid import uuid4
from contextlib import chdir

# Create test directory structure
test_dir = "old_dataset"
if test_dir not in os.getcwd():
    os.makedirs(test_dir, exist_ok=True)

# Define the directory structure
projects = {
    'project1': ['replica1', 'replica2'],
    'project2': ['replica1', 'replica2', 'replica3'],
    'project3': [],  # Project with no MDs
}
with chdir(test_dir):
    # Create directories and cache files
    for project_name, md_dirs in projects.items():
        # Create project directory
        os.makedirs(project_name, exist_ok=True)

        # Create cache file for project with UUID
        project_uuid = str(uuid4())
        project_cache = {
            'uuid': project_uuid,
            # Project cache does NOT have project_uuid
        }
        cache_path = os.path.join(project_name, '.mwf_cache.json')
        with open(cache_path, 'w') as f:
            json.dump(project_cache, f, indent=4)

        print(f"Created project: {project_name} (UUID: {project_uuid})")

        # Create MD directories with their cache files
        for md_dir in md_dirs:
            md_path = os.path.join(project_name, md_dir)
            os.makedirs(md_path, exist_ok=True)

            # Create cache file for MD with its own UUID and parent project_uuid
            md_uuid = str(uuid4())
            md_cache = {
                'uuid': md_uuid,
                'project_uuid': project_uuid,  # MD cache HAS project_uuid
            }
            md_cache_path = os.path.join(md_path, '.mwf_cache.json')
            with open(md_cache_path, 'w') as f:
                json.dump(md_cache, f, indent=4)

            print(f"  Created MD: {md_dir} (UUID: {md_uuid})")

print("\nTest directory structure created with cache files!")

Created project: project1 (UUID: f3d6c8cc-c795-49d3-b3b2-acd9228a8c5e)
  Created MD: replica1 (UUID: ef62c188-944c-457b-bb29-b3643c5a60a6)
  Created MD: replica2 (UUID: f89caec2-56c2-4d32-9992-cb3f55f27dad)
Created project: project2 (UUID: adef845e-3bee-40c8-bb53-69bd8869ce5d)
  Created MD: replica1 (UUID: 3bb86ed2-cc7c-4e60-919a-1758d3cfbcf1)
  Created MD: replica2 (UUID: 45e5df52-1216-43d9-afdc-05073a3ff293)
  Created MD: replica3 (UUID: bee0004b-6f25-44e5-9427-009d2198aadb)
Created project: project3 (UUID: 27556625-e806-42cf-96e4-267eb5fc2151)

Test directory structure created with cache files!


In [36]:
%load_ext autoreload
%autoreload 2
from mddb_workflow.core.dataset import Dataset

# Initialize the Dataset
db_path = test_dir+"/dataset.db"
# Remove database in case the notebook is re-run
if os.path.exists(db_path):
    os.remove(db_path)

# Create dataset and scan for projects and MDs
ds = Dataset(dataset_path=db_path)
ds.scan(verbose=True)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Adding project: project3 (UUID: 27556625-e806-42cf-96e4-267eb5fc2151)
Adding project: project1 (UUID: f3d6c8cc-c795-49d3-b3b2-acd9228a8c5e)
Adding project: project2 (UUID: adef845e-3bee-40c8-bb53-69bd8869ce5d)
  Adding MD: project1/replica1 (UUID: ef62c188-944c-457b-bb29-b3643c5a60a6, Project UUID: f3d6c8cc-c795-49d3-b3b2-acd9228a8c5e)
  Adding MD: project1/replica2 (UUID: f89caec2-56c2-4d32-9992-cb3f55f27dad, Project UUID: f3d6c8cc-c795-49d3-b3b2-acd9228a8c5e)
  Adding MD: project2/replica1 (UUID: 3bb86ed2-cc7c-4e60-919a-1758d3cfbcf1, Project UUID: adef845e-3bee-40c8-bb53-69bd8869ce5d)
  Adding MD: project2/replica2 (UUID: 45e5df52-1216-43d9-afdc-05073a3ff293, Project UUID: adef845e-3bee-40c8-bb53-69bd8869ce5d)
  Adding MD: project2/replica3 (UUID: bee0004b-6f25-44e5-9427-009d2198aadb, Project UUID: adef845e-3bee-40c8-bb53-69bd8869ce5d)


In [37]:
# uuid are shortened and the paths are shown relative to the dataset
# To display full uuids and absolute paths, use ds.get_dataframe()
ds.dataframe

Unnamed: 0_level_0,project_uuid,scope,rel_path,num_mds,state,message,last_modified
uuid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
f3d6c8cc,,projects,project1,2.0,new,No information recorded yet.,11:11:38 15-01-2026
adef845e,,projects,project2,3.0,new,No information recorded yet.,11:11:38 15-01-2026
27556625,,projects,project3,0.0,new,No information recorded yet.,11:11:38 15-01-2026
ef62c188,f3d6c8cc,mds,project1/replica1,,new,No information recorded yet.,11:11:38 15-01-2026
f89caec2,f3d6c8cc,mds,project1/replica2,,new,No information recorded yet.,11:11:38 15-01-2026
3bb86ed2,adef845e,mds,project2/replica1,,new,No information recorded yet.,11:11:38 15-01-2026
45e5df52,adef845e,mds,project2/replica2,,new,No information recorded yet.,11:11:38 15-01-2026
bee0004b,adef845e,mds,project2/replica3,,new,No information recorded yet.,11:11:38 15-01-2026


In [38]:
# Test adding a new MD to an existing project
# Get UUID of project1
with open(test_dir+'/project1/.mwf_cache.json', 'r') as f:
    project1_data = json.load(f)
    project1_uuid = project1_data['uuid']

# Create new MD directory
new_md_dir = test_dir+'/project1/replica3'
os.makedirs(new_md_dir, exist_ok=True)

# Create cache for new MD with project_uuid
new_md_uuid = str(uuid4())
with open(os.path.join(new_md_dir, '.mwf_cache.json'), 'w') as f:
    json.dump({
        'uuid': new_md_uuid,
        'project_uuid': project1_uuid
    }, f, indent=4)

print(f"Created new MD: {new_md_dir} (UUID: {new_md_uuid}, Project: {project1_uuid})")

# Add it to dataset
ds.add_project(new_md_dir, verbose=True)

Created new MD: old_dataset/project1/replica3 (UUID: fbb396c7-045f-4406-aa5f-85cbbda84eec, Project: f3d6c8cc-c795-49d3-b3b2-acd9228a8c5e)
Adding project: project1/replica3 (UUID: fbb396c7-045f-4406-aa5f-85cbbda84eec)


In [39]:
ds.dataframe

Unnamed: 0_level_0,project_uuid,scope,rel_path,num_mds,state,message,last_modified
uuid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
fbb396c7,,projects,project1/replica3,0.0,new,No information recorded yet.,11:12:06 15-01-2026
f3d6c8cc,,projects,project1,2.0,new,No information recorded yet.,11:11:38 15-01-2026
adef845e,,projects,project2,3.0,new,No information recorded yet.,11:11:38 15-01-2026
27556625,,projects,project3,0.0,new,No information recorded yet.,11:11:38 15-01-2026
ef62c188,f3d6c8cc,mds,project1/replica1,,new,No information recorded yet.,11:11:38 15-01-2026
f89caec2,f3d6c8cc,mds,project1/replica2,,new,No information recorded yet.,11:11:38 15-01-2026
3bb86ed2,adef845e,mds,project2/replica1,,new,No information recorded yet.,11:11:38 15-01-2026
45e5df52,adef845e,mds,project2/replica2,,new,No information recorded yet.,11:11:38 15-01-2026
bee0004b,adef845e,mds,project2/replica3,,new,No information recorded yet.,11:11:38 15-01-2026


In [40]:
# Test removing an MD by directory path
ds.remove_entry(test_dir+'/project2/replica1', verbose=True)
ds.dataframe

Deleted MD with UUID '3bb86ed2-cc7c-4e60-919a-1758d3cfbcf1'


Unnamed: 0_level_0,project_uuid,scope,rel_path,num_mds,state,message,last_modified
uuid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
fbb396c7,,projects,project1/replica3,0.0,new,No information recorded yet.,11:12:06 15-01-2026
f3d6c8cc,,projects,project1,2.0,new,No information recorded yet.,11:11:38 15-01-2026
adef845e,,projects,project2,2.0,new,No information recorded yet.,11:11:38 15-01-2026
27556625,,projects,project3,0.0,new,No information recorded yet.,11:11:38 15-01-2026
ef62c188,f3d6c8cc,mds,project1/replica1,,new,No information recorded yet.,11:11:38 15-01-2026
f89caec2,f3d6c8cc,mds,project1/replica2,,new,No information recorded yet.,11:11:38 15-01-2026
45e5df52,adef845e,mds,project2/replica2,,new,No information recorded yet.,11:11:38 15-01-2026
bee0004b,adef845e,mds,project2/replica3,,new,No information recorded yet.,11:11:38 15-01-2026


In [41]:
# Test removing an entire project (will cascade delete all MDs)
ds.remove_entry(test_dir+'/project2/', verbose=True)
ds.dataframe

Deleted project with UUID 'adef845e-3bee-40c8-bb53-69bd8869ce5d'


Unnamed: 0_level_0,project_uuid,scope,rel_path,num_mds,state,message,last_modified
uuid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
fbb396c7,,projects,project1/replica3,0.0,new,No information recorded yet.,11:12:06 15-01-2026
f3d6c8cc,,projects,project1,2.0,new,No information recorded yet.,11:11:38 15-01-2026
27556625,,projects,project3,0.0,new,No information recorded yet.,11:11:38 15-01-2026
ef62c188,f3d6c8cc,mds,project1/replica1,,new,No information recorded yet.,11:11:38 15-01-2026
f89caec2,f3d6c8cc,mds,project1/replica2,,new,No information recorded yet.,11:11:38 15-01-2026


In [42]:
# Test get_status by directory path
ds.get_status(test_dir+'/project1')

{'uuid': 'f3d6c8cc-c795-49d3-b3b2-acd9228a8c5e',
 'rel_path': 'project1',
 'num_mds': 2,
 'state': 'new',
 'message': 'No information recorded yet.',
 'last_modified': '11:11:38 15-01-2026',
 'scope': 'Project'}