Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Find dtype in genfromtxt for single column of data #16908

Closed
wants to merge 1 commit into from

Conversation

millen1m
Copy link

BUG: Find dtype in genfromtxt for single column of data

When single column of data is given to genfromtxt and names=True,
StringConverter was not given the right dtype.
The bug caused version 1.19 to fail to load single column text files.
However, a fix for the failure to load has already been provided in
the master branch, so that the dtype is found.
The fix proposed here is consistent with the 2D behaviour.

…, StringConverter was not given the right dtype.

The bug caused version 1.19 to fail to load single column text files. However, a fix for the failure to load has already been provided in the master branch, so that the dtype is found. The fix proposed here is consistent with the 2D behaviour.
mtest = np.genfromtxt(TextIO(data), skip_header=1, delimiter=",", names=True, usecols=0)
# Note that squeeze doesn't work when specifying names
ctrl = np.array([(5,), (6,)],
dtype=[('a_header', int)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without commenting on the nature of the regression, this passed for me locally with this minor adjustment:

         ctrl = np.array([(5,), (6,)],
-                        dtype=[('a_header', int)])
+                        dtype=[('a_header', float)])
         assert_equal(mtest, ctrl)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry the correction you have applied is needed. Actually the test passes without the change that I have made. But at line 1971
converters = [StringConverter(dtype, locked=True, parameter dtype has the header name in it and therefore the StringConverter does not find the right type and the converter[0].type is void, whereas it should be float. This only happens when you have a single column. While for multiple columns of data the dtype_flat parameter is passed in, which has the header names removed.
The correction I have provided makes it consistent with the multiple column approach and means the data type is found within the StringConverter object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test is incorrect: note that the default value for dtype is float, so you should expect a float type here.

The dtype-inference-machinery is triggered with dtype=None, which gives the expected result:

>>> np.genfromtxt(
...     StringIO(data), skip_header=1, delimiter=",", names=True, usecols=0, dtype=None
... )
array([(5,), (6,)], dtype=[('a_header', '<i8')])

AFAICT the regression has been fixed and the current behavior is correct --- I think this can be closed (unless I'm missing something @millen1m )

Base automatically changed from master to main March 4, 2021 02:05
@InessaPawson InessaPawson added the triage review Issue/PR to be discussed at the next triage meeting label Feb 23, 2022
@millen1m
Copy link
Author

Yes - the regression has been fixed in another way - can close. Cheers.

@millen1m millen1m closed this Feb 23, 2022
@InessaPawson InessaPawson added triaged Issue/PR that was discussed in a triage meeting and removed triage review Issue/PR to be discussed at the next triage meeting labels Feb 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug component: numpy.lib triaged Issue/PR that was discussed in a triage meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants