Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RF: Support "flat" ASCII-encoded GIFTI DataArrays #1298

Merged
merged 5 commits into from
Feb 26, 2024

Conversation

pauldmccarthy
Copy link
Contributor

Hi @effigies I hope you're well!

I had a report from a FSLeyes user who has used a MATLAB GIfTI library to generate some .gii files that were failing to load in FSLeyes. The GIfTI files that this library generates are ASCII-encoded, but without any newlines to separate elements along the first dimension, e.g.. instead of the conventional:

<DataArray Dimensionality="2"
           Dim0="3"
           Dim1="3"
           Encoding="ASCII">
  <Data>155.17539978 135.58103943 98.30715179
        140.33973694 190.0491333  73.24776459
        157.3598938  196.97969055 83.65809631
  </Data>
</DataArray>

these files instead contain

<DataArray Dimensionality="2"
           Dim0="3"
           Dim1="3"
           Encoding="ASCII">
  <Data>155.17539978 135.58103943 98.30715179 140.33973694 190.0491333 73.24776459 157.3598938 196.97969055 83.65809631</Data>
</DataArray>

Once again, the GIfTI specification is light on details as to the expected format of ASCII-encoded data arrays - all it says is (from Section 2.3.4.5 Encoding):

The Data element contains ASCII text with each value separated by whitespace

But every ASCII-encoded GIfTI file I've encountered has contained newlines to separate elements along the first dimension.

In any case, when using nibabel to load such a file (e.g. the ascii_flat_data.gii test file in this PR), the loaded data arrays are one dimensional, e.g.:

import nibabel as nib
img = nib.load('ascii_flat_data.gii')
print('Expected shape:', img.darrays[0].dims)
print('Actual shape:  ', img.darrays[0].data.shape)
>> Expected shape: [10, 3]
>> Actual shape:   (30,)

It turns out that, for non-ASCII-encoded data arrays, the nibabel.gifit.parse_gifti_fast.read_data_block function will ensure that the loaded array has the shape specified in the DataArray attributes. But this is not the case for ASCII-encoded arrays.

This PR contains a small patch which ensures that ASCII-encoded arrays are reshaped to the expected dimensionality.

There is a slight complexity in that one of the existing unit tests is expecting the result to be 1D when loading a file that contains an array of shape (1, 3).

I added the .squeeze() call in to preserve this behaviour, but I'm not sure whether a better option is to always return the shape specified in the DataArray attributes. I'm not sure how widely used ASCII-encoded GIFTIs, so am unsure of the potential impact of such a change. It's obviously unlikely that one would encounter a POINTSET or TRIANGLE array with a dimension of length 1, but it could occur with SHAPE or TIMESERIES arrays.

Copy link

codecov bot commented Feb 23, 2024

Codecov Report

Attention: Patch coverage is 86.66667% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 92.26%. Comparing base (27c2427) to head (5bcf012).
Report is 6 commits behind head on master.

Files Patch % Lines
nibabel/gifti/parse_gifti_fast.py 86.66% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1298   +/-   ##
=======================================
  Coverage   92.25%   92.26%           
=======================================
  Files          99       99           
  Lines       12467    12457   -10     
  Branches     2563     2560    -3     
=======================================
- Hits        11502    11493    -9     
+ Misses        642      641    -1     
  Partials      323      323           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@effigies
Copy link
Member

We need to consider the ArrayIndexingOrder. MATLAB seems likely to ravel from column-major, while numpy will default to row-major. We do consider it below:

sh = tuple(darray.dims)
if len(newarr.shape) != len(sh):
newarr = newarr.reshape(sh, order=array_index_order_codes.npcode[darray.ind_ord])

I think if we uniformize the dtype at the top of the function, then we can safely get rid of the byteswap at the end and reuse the existing reshape.

byteorder = gifti_endian_codes.npcode[darray.endian]
dtype = data_type_codes.type[darray.datatype].newbyteorder(endian)

I'm also noticing that the memmap created here does not respect index order:

newarr = np.memmap(
ext_fname,
dtype=dtype,
mode=mmap,
offset=darray.ext_offset,
shape=tuple(darray.dims),
)

So perhaps we want to go ahead and determine the shape and order at the top as well:

shape = tuple(darray.dims)
order = array_index_order_codes.npcode[darray.ind_ord]

Then we can use these consistently.

@effigies
Copy link
Member

As to shape, I think we probably want to respect the actual shape as specified in the XML, not squeeze. We should update tests if this is not the case currently.

@effigies
Copy link
Member

Applied and tested my suggestions in pauldmccarthy#1

RF: Consistently apply data type, shape and index order
@pauldmccarthy
Copy link
Contributor Author

@effigies Thanks! You evidently also noticed that the row/column ordering was not being applied as well 👍

@effigies effigies merged commit 0683b66 into nipy:master Feb 26, 2024
51 of 52 checks passed
effigies pushed a commit that referenced this pull request Feb 27, 2024
TEST: Unit test for loading ASCII-encoded "flat" GIFTI data array. Currently failing

RF: Make sure that ASCII-encoded DataArrays are returned with expected shape

RF: Consistently apply data type, shape and index order in GIFTI data blocks

TEST: Expect data arrays to be the advertised shapes
effigies added a commit that referenced this pull request Feb 27, 2024
5.2.1 (Monday 26 February 2024)

Bug-fix release in the 5.2.x series.

Enhancements
------------
* Support "flat" ASCII-encoded GIFTI DataArrays (pr/1298) (PM, reviewed by CM)

Bug fixes
---------
* Tolerate missing ``git`` when reporting version info (pr/1286) (CM, reviewed by
  Yuri Victorovich)
* Handle Siemens XA30 derived DWI DICOMs (pr/1296) (CM, reviewed by YOH and
  Mathias Goncalves)

Maintenance
-----------
* Add tool for generating GitHub-friendly release notes (pr/1284) (CM)
* Accommodate pytest 8 changes (pr/1297) (CM)

* tag '5.2.1':
  REL: 5.2.1
  Backport gh-1296: Conditionally drop derived volumes from DWI sequences
  Backport gh-1298: Support "flat" ASCII-encoded GIFTI DataArrays
  Build(deps): Bump codecov/codecov-action from 3 to 4
  Build(deps): Bump the actions-infrastructure group with 3 updates
  Backport gh-1297: Accommodate pytest 8 changes
  Backport gh-1286: Tolerate missing git
  MNT: Advertise Python 3.12 support
  DOC: Fix intersphinx mapping and reference type
  Backport gh-1284: Add tool for generating GitHub-friendly release notes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants