Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestFileSet test failures on i686 architecture #1743

Closed
ghost opened this issue Dec 7, 2022 · 2 comments · Fixed by #1864
Closed

TestFileSet test failures on i686 architecture #1743

ghost opened this issue Dec 7, 2022 · 2 comments · Fixed by #1864

Comments

@ghost
Copy link

ghost commented Dec 7, 2022

Describe the bug

2 TestFileSet tests fail on i686 (32-bit x86) architecture with an OverflowError.
FAILED pydicom/tests/test_fileset.py::TestFileSet_Modify::test_write_file_id
FAILED pydicom/tests/test_fileset.py::TestFileSet_Copy::test_file_id

Expected behavior

Test behavior should be consistent across architectures.

Steps To Reproduce

Run

pytest pydicom/tests/test_fileset.py 

in the pydicom directory on an i686 machine.
Virtualizing i686 using QEMU also reproduces the issue.

=================================== FAILURES ===================================
____________________ TestFileSet_Modify.test_write_file_id _____________________

self = <pydicom.tests.test_fileset.TestFileSet_Modify object at 0xe87f1430>
tiny = Dataset.file_meta -------------------------------
(0002, 0000) File Meta Information Group Length  UL: 200
(0002, 0001...Syntax UID in F UI: Explicit VR Little Endian
   (0020, 0013) Instance Number                     IS: '49'
   ---------

    def test_write_file_id(self, tiny):
        """Test that the File IDs character sets switch correctly."""
        tdir, ds = temporary_fs(tiny)
    
        def my_len(self):
            return 10**6 + 1
    
        FileSet.__len__ = my_len
        fs = FileSet(ds)
        assert 10**6 + 1 == len(fs)
        ds, paths = write_fs(fs)
        instance = fs._instances[-1]
        # Was written with alphanumeric File IDs
        assert "IM00001D" in instance.path
    
        def my_len(self):
            return 36**6 + 1
    
        FileSet.__len__ = my_len
        fs = FileSet(ds)
>       assert 36**6 + 1 == len(fs)
E       OverflowError: cannot fit 'int' into an index-sized integer

pydicom/tests/test_fileset.py:2249: OverflowError
________________________ TestFileSet_Copy.test_file_id _________________________

self = <pydicom.tests.test_fileset.TestFileSet_Copy object at 0xe837f370>
tiny = Dataset.file_meta -------------------------------
(0002, 0000) File Meta Information Group Length  UL: 200
(0002, 0001...Syntax UID in F UI: Explicit VR Little Endian
   (0020, 0013) Instance Number                     IS: '49'
   ---------
tdir = <TemporaryDirectory '/tmp/guix-build-python-pydicom-2.3.1.drv-0/tmpcki9joul'>

    def test_file_id(self, tiny, tdir):
        """Test that the File IDs character sets switch correctly."""
        def my_len(self):
            return 10**6 + 1
    
        FileSet.__len__ = my_len
        fs = FileSet(tiny)
        assert 10**6 + 1 == len(fs)
        fs, ds, paths = copy_fs(fs, tdir.name)
        instance = fs._instances[-1]
        # Was written with alphanumeric File IDs
        assert "IM00001D" in instance.path
    
        def my_len(self):
            return 36**6 + 1
    
        FileSet.__len__ = my_len
        fs = FileSet(tiny)
>       assert 36**6 + 1 == len(fs)
E       OverflowError: cannot fit 'int' into an index-sized integer

pydicom/tests/test_fileset.py:2580: OverflowError

Your environment

module       | version
------       | -------
platform     | Linux-6.0.7-i686-with-glibc2.33
Python       | 3.9.9 (main, Jan  1 1970, 00:00:01)  [GCC 10.3.0]
pydicom      | 2.3.1
gdcm         | _module not found_
jpeg_ls      | _module not found_
numpy        | 1.21.6
PIL          | 9.2.0
pylibjpeg    | _module not found_
openjpeg     | _module not found_
libjpeg      | _module not found_
@darcymason
Copy link
Member

This is strange - I don't understand why, in Python 3 (which uses unlimited ints), any int value should cause an overflow. Although I see that 36**6 is just over half of 2**32, so if a 2-byte pointer were used, then one would not be able to index into a list/iterable with a len that long. (... But then why is it two bytes (16-bit) per pointer...?) Also, I don't know why the limit for FileSets is this 36**6 number to begin with.

Can you try just removing the assert statements? I don't understand why they are even there, as clearly the setting of __len__ just before ensures that number. Removing them will probably just move the error into FileSet.write and FileSet.copy. If so, as a workaround, I'd suggest just removing the checks for > 36**6 in those functions, unless you suspect that you may actually reach that kind of number.

If we can understand the underlying reasons, I'd support adding a check for 32-bit platform to the FileSet code and/or tests. Perhaps the 36**6 limit needs to be reduced on 32-bit platform, or at least on this particular one.

@scaramallion
Copy link
Member

scaramallion commented Aug 8, 2023

36**6 is [A-Z][0-9] for the 6 characters (out of 8 total) available for filenames. The asserts just check that the expected number of instances is correct, the real test is after that (I tend to do this in tests to make sure I'm testing what I expect to test since experience has shown that I sometimes get a passing test for the wrong reason).

sys.maxsize for a 32-bit system is 2**31 - 1, as int is signed.

The upper limit on file-set size is very unlikely to be reached even for just [0-9], but I thought it would be a good idea to at least make some effort at testing it. In practice you'd probably see memory issues long before the theoretical maximum got hit.

I could just change it to be in the range [A-Z] if [0-9] isn't enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants