RF: Support "flat" ASCII-encoded GIFTI DataArrays #1298

pauldmccarthy · 2024-02-23T10:37:10Z

Hi @effigies I hope you're well!

I had a report from a FSLeyes user who has used a MATLAB GIfTI library to generate some .gii files that were failing to load in FSLeyes. The GIfTI files that this library generates are ASCII-encoded, but without any newlines to separate elements along the first dimension, e.g.. instead of the conventional:

<DataArray Dimensionality="2"
           Dim0="3"
           Dim1="3"
           Encoding="ASCII">
  <Data>155.17539978 135.58103943 98.30715179
        140.33973694 190.0491333  73.24776459
        157.3598938  196.97969055 83.65809631
  </Data>
</DataArray>

these files instead contain

<DataArray Dimensionality="2"
           Dim0="3"
           Dim1="3"
           Encoding="ASCII">
  <Data>155.17539978 135.58103943 98.30715179 140.33973694 190.0491333 73.24776459 157.3598938 196.97969055 83.65809631</Data>
</DataArray>

Once again, the GIfTI specification is light on details as to the expected format of ASCII-encoded data arrays - all it says is (from Section 2.3.4.5 Encoding):

The Data element contains ASCII text with each value separated by whitespace

But every ASCII-encoded GIfTI file I've encountered has contained newlines to separate elements along the first dimension.

In any case, when using nibabel to load such a file (e.g. the ascii_flat_data.gii test file in this PR), the loaded data arrays are one dimensional, e.g.:

import nibabel as nib
img = nib.load('ascii_flat_data.gii')
print('Expected shape:', img.darrays[0].dims)
print('Actual shape:  ', img.darrays[0].data.shape)

>> Expected shape: [10, 3]
>> Actual shape:   (30,)

It turns out that, for non-ASCII-encoded data arrays, the nibabel.gifit.parse_gifti_fast.read_data_block function will ensure that the loaded array has the shape specified in the DataArray attributes. But this is not the case for ASCII-encoded arrays.

This PR contains a small patch which ensures that ASCII-encoded arrays are reshaped to the expected dimensionality.

There is a slight complexity in that one of the existing unit tests is expecting the result to be 1D when loading a file that contains an array of shape (1, 3).

I added the .squeeze() call in to preserve this behaviour, but I'm not sure whether a better option is to always return the shape specified in the DataArray attributes. I'm not sure how widely used ASCII-encoded GIFTIs, so am unsure of the potential impact of such a change. It's obviously unlikely that one would encounter a POINTSET or TRIANGLE array with a dimension of length 1, but it could occur with SHAPE or TIMESERIES arrays.

…rrently failing

…d shape

codecov · 2024-02-23T10:40:15Z

Codecov Report

Attention: Patch coverage is 86.66667% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 92.26%. Comparing base (27c2427) to head (5bcf012).
Report is 6 commits behind head on master.

Files	Patch %	Lines
nibabel/gifti/parse_gifti_fast.py	86.66%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1298   +/-   ##
=======================================
  Coverage   92.25%   92.26%           
=======================================
  Files          99       99           
  Lines       12467    12457   -10     
  Branches     2563     2560    -3     
=======================================
- Hits        11502    11493    -9     
+ Misses        642      641    -1     
  Partials      323      323

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

effigies · 2024-02-23T13:26:10Z

We need to consider the ArrayIndexingOrder. MATLAB seems likely to ravel from column-major, while numpy will default to row-major. We do consider it below:

nibabel/nibabel/gifti/parse_gifti_fast.py

Lines 135 to 137 in 27c2427

    
           sh = tuple(darray.dims) 
        
           if len(newarr.shape) != len(sh): 
        
               newarr = newarr.reshape(sh, order=array_index_order_codes.npcode[darray.ind_ord])

I think if we uniformize the dtype at the top of the function, then we can safely get rid of the byteswap at the end and reuse the existing reshape.

byteorder = gifti_endian_codes.npcode[darray.endian]
dtype = data_type_codes.type[darray.datatype].newbyteorder(endian)

I'm also noticing that the memmap created here does not respect index order:

nibabel/nibabel/gifti/parse_gifti_fast.py

Lines 97 to 103 in 27c2427

    
           newarr = np.memmap( 
        
               ext_fname, 
        
               dtype=dtype, 
        
               mode=mmap, 
        
               offset=darray.ext_offset, 
        
               shape=tuple(darray.dims), 
        
           )

So perhaps we want to go ahead and determine the shape and order at the top as well:

shape = tuple(darray.dims)
order = array_index_order_codes.npcode[darray.ind_ord]

Then we can use these consistently.

effigies · 2024-02-23T13:38:30Z

As to shape, I think we probably want to respect the actual shape as specified in the XML, not squeeze. We should update tests if this is not the case currently.

… blocks

effigies · 2024-02-23T15:11:54Z

Applied and tested my suggestions in pauldmccarthy#1

RF: Consistently apply data type, shape and index order

pauldmccarthy · 2024-02-26T08:55:44Z

@effigies Thanks! You evidently also noticed that the row/column ordering was not being applied as well 👍

TEST: Unit test for loading ASCII-encoded "flat" GIFTI data array. Currently failing RF: Make sure that ASCII-encoded DataArrays are returned with expected shape RF: Consistently apply data type, shape and index order in GIFTI data blocks TEST: Expect data arrays to be the advertised shapes

5.2.1 (Monday 26 February 2024) Bug-fix release in the 5.2.x series. Enhancements ------------ * Support "flat" ASCII-encoded GIFTI DataArrays (pr/1298) (PM, reviewed by CM) Bug fixes --------- * Tolerate missing ``git`` when reporting version info (pr/1286) (CM, reviewed by Yuri Victorovich) * Handle Siemens XA30 derived DWI DICOMs (pr/1296) (CM, reviewed by YOH and Mathias Goncalves) Maintenance ----------- * Add tool for generating GitHub-friendly release notes (pr/1284) (CM) * Accommodate pytest 8 changes (pr/1297) (CM) * tag '5.2.1': REL: 5.2.1 Backport gh-1296: Conditionally drop derived volumes from DWI sequences Backport gh-1298: Support "flat" ASCII-encoded GIFTI DataArrays Build(deps): Bump codecov/codecov-action from 3 to 4 Build(deps): Bump the actions-infrastructure group with 3 updates Backport gh-1297: Accommodate pytest 8 changes Backport gh-1286: Tolerate missing git MNT: Advertise Python 3.12 support DOC: Fix intersphinx mapping and reference type Backport gh-1284: Add tool for generating GitHub-friendly release notes

manifest-rules added 2 commits February 23, 2024 09:57

TEST: Unit test for loading ASCII-encoded "flat" GIFTI data array. Cu…

79792de

…rrently failing

RF: Make sure that ASCII-encoded DataArrays are returned with expecte…

6ffeeac

…d shape

effigies added 2 commits February 23, 2024 10:05

RF: Consistently apply data type, shape and index order in GIFTI data…

b46c829

… blocks

TEST: Expect data arrays to be the advertised shapes

afbcc88

effigies mentioned this pull request Feb 23, 2024

RF: Consistently apply data type, shape and index order pauldmccarthy/nibabel#1

Merged

Merge pull request #1 from effigies/mnt/reshape-gifti-ascii-data

5bcf012

RF: Consistently apply data type, shape and index order

effigies merged commit 0683b66 into nipy:master Feb 26, 2024
51 of 52 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RF: Support "flat" ASCII-encoded GIFTI DataArrays #1298

RF: Support "flat" ASCII-encoded GIFTI DataArrays #1298

pauldmccarthy commented Feb 23, 2024

codecov bot commented Feb 23, 2024 •

edited

Loading

effigies commented Feb 23, 2024

effigies commented Feb 23, 2024

effigies commented Feb 23, 2024

pauldmccarthy commented Feb 26, 2024

RF: Support "flat" ASCII-encoded GIFTI DataArrays #1298

RF: Support "flat" ASCII-encoded GIFTI DataArrays #1298

Conversation

pauldmccarthy commented Feb 23, 2024

codecov bot commented Feb 23, 2024 • edited Loading

Codecov Report

effigies commented Feb 23, 2024

effigies commented Feb 23, 2024

effigies commented Feb 23, 2024

pauldmccarthy commented Feb 26, 2024

codecov bot commented Feb 23, 2024 •

edited

Loading