# What's going on?

Please do not take anything I say as criticism of the PyBids code! I barely know what I'm doing, and I still haven't wrapped my head around the logic of the code. Also, bear with me regarding the terminology I'm using; it may be a little off.

Working with Michael, we discovered an issue related to a `PaddedInt` variable (note: the variable was from a `.json` sidecar for a `bold.nii.gz` file). Essentially, the error arises because the code is trying to check whether a list contains all of the same value, and `==` doesn't work for this purpose in `PaddedInt`. If you go to the very end of this notebook, you can find the error.

After poking around a bit, I think there are one or two different problems occurring here. I will illustrate these below. 


### Question 1:
The problem arises because of `PaddedInt` and I was curious, why not treat `run` as a string instead of an integer? I notice that it is the only entity with a dtype (`int`) specified in `/bids/layout/config/bids.json`. Presumably, something breaks if this isn't handled correctly, and I was wondering what that was.

Assuming the goal is to ensure that `run` is (and remains) a `PaddedInt` throughout all the processes, that leads us to **Problem #1**.

### Problem #1: `run` flips back to `int` and currently the output files do not have zero-padded run numbers
Additionally, here's **Problem #2**:

### Problem #2: Values from the `.json` sidecar files are `PaddedInt`, but this causes the error that started this investigation

---

# What is in this notebook

I have traced the entities through various steps to verify that all `int`s start off as `PaddedInt` (we probably only want this to be the case for `run`, though?). At one point, a level of the entities still has `PaddedInt`, while another level has `int`. Presumably, the `int` version is used to generate a file name, while the `PaddedInt` version throws the error that started this exploration.

To cut to the chase, `int`s are converted to `PaddedInt`s in `BidsVariableCollection._index_entities()`, specifically in [this line](https://github.com/bids-standard/pybids/blob/85eacf24c345abf72413381c53e6a7847c1cd932/src/bids/variables/collections.py#L248), because it is converting the data into a `pandas` DataFrame, which automatically converts them to `int`. We won't discuss how long it took me to figure that out, but hopefully, this explanation is helpful!

The point at which these changes occur is shown below in the code.  Ignore that I manually removed session information from the data, I initially thought that was involved in the error.  So the filenames differ from what you'll find in OpenNeuro.

---

# My thoughts on a solution

I don't know the history of `PaddedInt`. What happens if it is omitted and the `run` dtype isn't pre-specified as an `int` in the `config/bids.json`? If it is treated as a `str`, does that break something further along the line?  Changing `PaddedInt` to `int` does allow the code to run without error.  Although this ensures the run numbers are not zero padded in the output files, they weren't zero padded anyhow (e.g., in datasets where there are not .json sidecar files).

In [1]:
import json
from pathlib import Path

from bids.layout import BIDSLayout
from bids.modeling import BIDSStatsModelsGraph

In [2]:
def print_layout_entities(layout):
    print('Run')
    for file in layout.entities['run'].files:
        print(
            f'file: {file.split('bids_sm_transformation', 1)[-1]}\n'
            f'Value: {layout.entities['run'].files[file]} Data Type: {type(layout.entities['run'].files[file])}'
        )

    print('\n AccelFactPE')
    for file in layout.entities['AccelFactPE'].files:
        print(
            f'file: {file.split('bids_sm_transformation', 1)[-1]}\n'
            f'Value: {layout.entities['run'].files[file]} Data Type: {type(layout.entities['run'].files[file])}'
        )

In [3]:
root = 'data/ds003425_no_ses'
db_path = 'data/ds003425_no_ses/dbcache'
reset_database = True
spec_path = 'model_specs/ds003425_spec_no_ses.json'

In [4]:
layout = BIDSLayout(
    root=root,
    database_path=db_path,
    reset_database=reset_database,
)

### Entity check
This is the first time entities have been set up and all `ints` are `PaddedInts`.  I'm tracking both `run` and `AccelFactPE`, since they are entities from different sources (filenames vs .json). (note, using run as an example, these are from `layout.entities['run'].files[file]` across all `file` values)

In [5]:
print_layout_entities(layout)

Run
file: /data/ds003425_no_ses/sub-01/func/sub-01_task-learning_run-01_events.tsv
Value: 01 Data Type: <class 'bids.layout.utils.PaddedInt'>
file: /data/ds003425_no_ses/sub-01/func/sub-01_task-learning_run-01_bold.json
Value: 01 Data Type: <class 'bids.layout.utils.PaddedInt'>
file: /data/ds003425_no_ses/sub-01/func/sub-01_task-learning_run-01_bold.nii.gz
Value: 01 Data Type: <class 'bids.layout.utils.PaddedInt'>

 AccelFactPE
file: /data/ds003425_no_ses/sub-01/func/sub-01_task-learning_run-01_bold.nii.gz
Value: 01 Data Type: <class 'bids.layout.utils.PaddedInt'>


In [6]:
spec = json.loads(Path(spec_path).read_text())
graph = BIDSStatsModelsGraph(layout, spec)

### Entity check
Still `PaddedInt`

In [7]:
print_layout_entities(layout)

Run
file: /data/ds003425_no_ses/sub-01/func/sub-01_task-learning_run-01_events.tsv
Value: 01 Data Type: <class 'bids.layout.utils.PaddedInt'>
file: /data/ds003425_no_ses/sub-01/func/sub-01_task-learning_run-01_bold.json
Value: 01 Data Type: <class 'bids.layout.utils.PaddedInt'>
file: /data/ds003425_no_ses/sub-01/func/sub-01_task-learning_run-01_bold.nii.gz
Value: 01 Data Type: <class 'bids.layout.utils.PaddedInt'>

 AccelFactPE
file: /data/ds003425_no_ses/sub-01/func/sub-01_task-learning_run-01_bold.nii.gz
Value: 01 Data Type: <class 'bids.layout.utils.PaddedInt'>


In [8]:
graph.load_collections(scan_length=250)  # scan_length is in seconds

### Entity check
Now `PaddedInt` has changed to `int` for `bold.nii.gz` and `AccelFactPE`.  Presumably this is due the the conversion to a Pandas dataframe, which I mentioned earlier. (`BidsVariableCollection._index_entities()`, specifically in [this line](https://github.com/bids-standard/pybids/blob/85eacf24c345abf72413381c53e6a7847c1cd932/src/bids/variables/collections.py#L248))  I didn't work my way through the code enough to connect the entities in layout to what is manipulated in `_index_entities()`.

In [9]:
print_layout_entities(layout)

Run
file: /data/ds003425_no_ses/sub-01/func/sub-01_task-learning_run-01_events.tsv
Value: 01 Data Type: <class 'bids.layout.utils.PaddedInt'>
file: /data/ds003425_no_ses/sub-01/func/sub-01_task-learning_run-01_bold.json
Value: 01 Data Type: <class 'bids.layout.utils.PaddedInt'>
file: /data/ds003425_no_ses/sub-01/func/sub-01_task-learning_run-01_bold.nii.gz
Value: 1 Data Type: <class 'int'>

 AccelFactPE
file: /data/ds003425_no_ses/sub-01/func/sub-01_task-learning_run-01_bold.nii.gz
Value: 1 Data Type: <class 'int'>


### But wait, how was `AccelFactPE` an issue later on because it was a `PaddedInt`?
What is interesting about this observation is that I *know* that `AccelFactPE` is a trouble maker because it is a `PaddedInt` later on (the cause of the error at the end of this notebook).  As you're more familiar with the logic of this code base, you probably already know, but this is because there are, what I'm calling, individual level entities and collection level entities.  I currently don't have the bandwidth to dig further to see how each is used.

This is where I lost steam.  No clue how `run` now is now universally an `int` and `AccelFactPE` is split.  I also don't understand which of these entity sets (if either) is involved in the error that is thrown below (with `AccelFactPE`) and which entity set (if either) is used to generate the run number in the output filenames.  If you could share that with me, I'd love to know.  Maybe I need to spend more time poking around the code.

In [10]:
print('run types')
print('---------')
print('Collection level')
print(type(graph.root_node._collections[0]['other_type'].entities['run']))
print('Individual level')
print(type(graph.root_node._collections[0].entities['run']))

print('\n AccelFactPE types')
print('--------------------')
print('Collection level')
print(type(graph.root_node._collections[0]['other_type'].entities['AccelFactPE']))
print('Individual level')
print(type(graph.root_node._collections[0].entities['AccelFactPE']))


run types
---------
Collection level
<class 'numpy.int64'>
Individual level
<class 'numpy.int64'>

 AccelFactPE types
--------------------
Collection level
<class 'bids.layout.utils.PaddedInt'>
Individual level
<class 'numpy.int64'>


# The Error that started it all

Again, if the `PaddedInt` class is replaced with `int`, this error disappears.


In [11]:
root_node = graph.root_node
outputs = root_node.run(
    group_by=root_node.group_by,
    force_dense=False,
    transformation_history=True,
)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()