BUG: Find dtype in genfromtxt for single column of data #16908

millen1m · 2020-07-20T01:33:46Z

BUG: Find dtype in genfromtxt for single column of data

When single column of data is given to genfromtxt and names=True,
StringConverter was not given the right dtype.
The bug caused version 1.19 to fail to load single column text files.
However, a fix for the failure to load has already been provided in
the master branch, so that the dtype is found.
The fix proposed here is consistent with the 2D behaviour.

…, StringConverter was not given the right dtype. The bug caused version 1.19 to fail to load single column text files. However, a fix for the failure to load has already been provided in the master branch, so that the dtype is found. The fix proposed here is consistent with the 2D behaviour.

tylerjereddy · 2020-07-23T02:41:00Z

numpy/lib/tests/test_io.py

+        mtest = np.genfromtxt(TextIO(data), skip_header=1, delimiter=",", names=True, usecols=0)
+        # Note that squeeze doesn't work when specifying names
+        ctrl = np.array([(5,), (6,)],
+                        dtype=[('a_header', int)])


Without commenting on the nature of the regression, this passed for me locally with this minor adjustment:

ctrl = np.array([(5,), (6,)], - dtype=[('a_header', int)]) + dtype=[('a_header', float)]) assert_equal(mtest, ctrl)

Yes, sorry the correction you have applied is needed. Actually the test passes without the change that I have made. But at line 1971
converters = [StringConverter(dtype, locked=True, parameter dtype has the header name in it and therefore the StringConverter does not find the right type and the converter[0].type is void, whereas it should be float. This only happens when you have a single column. While for multiple columns of data the dtype_flat parameter is passed in, which has the header names removed.
The correction I have provided makes it consistent with the multiple column approach and means the data type is found within the StringConverter object.

I think this test is incorrect: note that the default value for dtype is float, so you should expect a float type here.

The dtype-inference-machinery is triggered with dtype=None, which gives the expected result:

>>> np.genfromtxt( ... StringIO(data), skip_header=1, delimiter=",", names=True, usecols=0, dtype=None ... ) array([(5,), (6,)], dtype=[('a_header', '<i8')])

AFAICT the regression has been fixed and the current behavior is correct --- I think this can be closed (unless I'm missing something @millen1m )

millen1m · 2022-02-23T20:48:50Z

Yes - the regression has been fixed in another way - can close. Cheers.

tylerjereddy added the component: numpy.lib label Jul 23, 2020

tylerjereddy reviewed Jul 23, 2020

View reviewed changes

Base automatically changed from master to main March 4, 2021 02:05

github-actions bot added the 00 - Bug label Mar 4, 2021

InessaPawson added the triage review Issue/PR to be discussed at the next triage meeting label Feb 23, 2022

millen1m closed this Feb 23, 2022

InessaPawson added triaged Issue/PR that was discussed in a triage meeting and removed triage review Issue/PR to be discussed at the next triage meeting labels Feb 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Find dtype in genfromtxt for single column of data #16908

BUG: Find dtype in genfromtxt for single column of data #16908

millen1m commented Jul 20, 2020

tylerjereddy Jul 23, 2020

millen1m Jul 23, 2020

rossbar Feb 23, 2022

millen1m commented Feb 23, 2022

BUG: Find dtype in genfromtxt for single column of data #16908

BUG: Find dtype in genfromtxt for single column of data #16908

Conversation

millen1m commented Jul 20, 2020

tylerjereddy Jul 23, 2020

Choose a reason for hiding this comment

millen1m Jul 23, 2020

Choose a reason for hiding this comment

rossbar Feb 23, 2022

Choose a reason for hiding this comment

millen1m commented Feb 23, 2022