ENH: Allow genfromtxt to unpack structured arrays #16650

AndrewEckart · 2020-06-20T22:50:38Z

Previously, genfromtxt failed to transpose its output when
unpack=True and dtype was structured (or dtype=None and the
inferred dtype was structured). This patch resolves the issue by
returning a list of arrays, as in loadtxt. Includes tests and updates
the docstring to match loadtxt.

Fixes #4341

numpy/lib/npyio.py

rossbar

LGTM, just a couple minor things. Thanks @AndrewEckart !

numpy/lib/tests/test_io.py

numpy/lib/npyio.py

eric-wieser · 2020-06-26T17:06:55Z

numpy/lib/npyio.py

    if unpack:
-        return output.squeeze().T
-    return output.squeeze()
+        if names is not None and len(names) > 1:


Why the len(names) > 1 condition?

If the goal is to match squeeze, then len(names) != 1 would work better for when names == [].

I can change it, but I think names == [] necessarily implies output is empty (i.e. the input was an empty file).

Consider the following two calls:

np.genfromtxt(StringIO(''), dtype=None, unpack=True) np.genfromtxt(StringIO(''), dtype={'names': (), 'formats': ()}, unpack=True)

With len(names > 1), both will return an empty ndarray , but with len(names) != 1), both will return an empty list.

anirudh2290 · 2020-07-08T00:34:03Z

numpy/lib/tests/test_io.py

+    def test_unpack_auto_dtype(self):
+        # Regression test for gh-4341
+        # Unpacking should work when dtype=None
+        txt = TextIO("21 72\n35 58")


Overall, LGTM! would be also useful to add a test with a mix of integer and string to make sure that the field array dtypes are being discerned correctly.

WarrenWeckesser · 2020-07-11T18:17:21Z

numpy/lib/tests/test_io.py

+        # even if dtype is structured
+        txt = TextIO("1")
+        dt = {'names': ('a',), 'formats': ('<i4',)}
+        x = np.genfromtxt(txt, dtype=dt, unpack=True)


Here's what I get with this example (using StringIO instead of the internal TextIO):

In [37]: dt = np.dtype({'names': ('a',), 'formats': ('<i4',)}) In [38]: x = np.genfromtxt(StringIO('1'), dtype=dt, unpack=True) In [39]: x Out[39]: array((1,), dtype=[('a', '<i4')])

I don't think the dtype of the return value should still be the structured dtype in this case--unpack=True is supposed to unpack the fields. So I expect this output to be array(1, dtype=int32). In other words, the structure should still be pulled apart even when len(names) == 1.

Good point. I modified the logic to return the single unpacked ndarray when len(names) == 1 and a list of ndarrays when len(names > 1), which I think is a little bit cleaner. Technically, len(names) == 0) is still possible with an empty input file and a structured dtype, but this already gives a UserWarning.

numpy/lib/npyio.py

Previously, `genfromtxt` failed to transpose its output when `unpack=True` and `dtype` was structured (or `dtype=None` and the inferred `dtype` was structured). This patch resolves the issue by returning a list of arrays, as in `loadtxt`. Includes tests and updates the docstring to match `loadtxt`. Fixes numpy#4341

Co-authored-by: Eric Wieser <wieser.eric@gmail.com>

numpy/lib/npyio.py

seberg · 2020-07-15T19:11:01Z

I think this is probably fine, although did not look too careful. Maybe @WarrenWeckesser can push this through?

It seems that this does change behaviour fairly strongly? Probably the old behaviour can be defined as a clear bug here? @AndrewEckart could you comment briefly on that for me, and add a release note unless it was a loud error before? The release notes go into numpy/doc/release/upcoming_changes/ there is a Readme there to explain how to create one.

AndrewEckart · 2020-07-15T21:08:40Z

I think this is probably fine, although did not look too careful. Maybe @WarrenWeckesser can push this through?

It seems that this does change behaviour fairly strongly? Probably the old behaviour can be defined as a clear bug here? @AndrewEckart could you comment briefly on that for me, and add a release note unless it was a loud error before? The release notes go into numpy/doc/release/upcoming_changes/ there is a Readme there to explain how to create one.

I'm not sure how to characterize it. In #4341, it's described as a bug, and the old behavior is contrasted with loadtxt which correctly unpacks structured arrays. It wasn't a loud error, it just failed to unpack and returned as if you had passed unpack=False. I don't think it's a major change to make it behave as expected (and in parity with loadtxt), but I'm happy to update the release notes if you like.

seberg · 2020-07-15T21:12:27Z

Definitely needs a release note IMO. Someones code will probably break somewhere. We have to make a call that it is very few people...

AndrewEckart · 2020-07-15T21:25:46Z

Definitely needs a release note IMO. Someones code will probably break somewhere. We have to make a call that it is very few people...

Would you consider this compatibility or improvement?

seberg · 2020-07-15T21:32:32Z

Lets put it under compat, since the actual improvement is pretty straight forward (and arguably even a bug fix).

numpy/lib/tests/test_io.py

seberg · 2020-08-03T17:24:55Z

doc/release/upcoming_changes/16650.compatibility.rst

+one for each column::
+
+    >>> np.genfromtxt(StringIO("1 2.0 \n 3 4.0"), unpack=True)
+    [array([1, 3]), array([2., 4.])]]


I think this example requires a structured dtype= being passed in? It may be good to point/show an example which triggers the changed behaviour without passing in dtype= (if that is possible).

(i.e. I tried the example on this branch, and the behaviour is not as written here – also there is an additional closing bracket)

Sorry about that, it looks like the numbers in my example were too small so the inferred dtype was '<u4'. I replaced it with the test case for dtype=None.

Maybe I have to take a closer look at the tests, or am I running the wrong code? But it seems to me that this gives:

In [2]: from io import StringIO In [3]: np.genfromtxt(StringIO("21 58.0 \n 35 72.0"), unpack=True) Out[3]: array([[21., 35.], [58., 72.]])

OTOH, this one changed behaviour as expected:

In [5]: np.genfromtxt(StringIO("21 58.0 \n 35 72.0"), unpack=True, dtype="i,d")

so, right now I think the example is still incorrect, and if there are tests for this, they should probably be refined to check that the result is (correctly, I assume) a single array in the first case, since it should be a single array when there is no structured dtype detected (which seems to be the default?).

Unless this should change and I am missing something?

Nope, you were running the right code; it turns out that the example was incorrect. I forgot that detection of a structured dtype relies on passing dtype=None to override the default dtype=float (otherwise, a single array is the correct return value). My example in the release note failed to specify the dtype argument. Should be correct now.

mattip · 2020-08-10T07:34:26Z

Unfortunately CircleCI does not merge-from-master before building. Could you either merge from master or rebase to clear the non-related failure?

rossbar

I took the liberty of expanding the release note a bit. There are a couple things that need addressing in the updated tests

numpy/lib/tests/test_io.py

Changes introduced after initial approval

rossbar

LGTM - thanks for sticking with this @AndrewEckart !

emirkmo · 2022-01-19T15:05:59Z

Hi, this change means that genfromtxt now ignores names=('column1','column2') when unpack=True is passed. Instead of giving a named/rec array as before, a list of arrays with no names is given. Is this the correct behavior? It means that passing the names explicitly into genfromtxt does nothing and the output is not an array of arrays/columns, which was very useful. Lists can't be indexed the same way. Can I be guaranteed that the returned list is in the right order so I can manually assign the names? Should I open an issue about this?

The example I am thinking of is, say we have a simple csv or space delimited document with 4 columns:

val1 val2 valignore1 valignore2
val3 val4 valignore3 valignore4
val5 val6 valignore5 valignore6

I read this in using np.genfromtxt(data.dat, unpack=True, names=('column1', 'column2'), usecols=(0,1)), the result is a list and the names are ignored. Previously, the names would be attached and the result would be a two column array that can be indexed with the given names. The list obviously can't be.

Now we get [array(val1, val3, val5), array(val1, val3, val5)], previously it would be an array of arrays with names (used to be called recarray I think but maybe that name has changed).

eric-wieser reviewed Jun 21, 2020

View reviewed changes

numpy/lib/npyio.py Outdated Show resolved Hide resolved

AndrewEckart force-pushed the fix/4341 branch from 6b33b2b to 886c386 Compare June 21, 2020 12:07

charris changed the title ~~BUG: Allow genfromtxt to unpack structured arrays~~ ENH: Allow genfromtxt to unpack structured arrays Jun 21, 2020

charris added 01 - Enhancement component: numpy.lib labels Jun 21, 2020

rossbar previously approved these changes Jun 26, 2020

View reviewed changes

numpy/lib/tests/test_io.py Outdated Show resolved Hide resolved

numpy/lib/npyio.py Outdated Show resolved Hide resolved

eric-wieser reviewed Jun 26, 2020

View reviewed changes

AndrewEckart force-pushed the fix/4341 branch from 886c386 to 357d8cc Compare June 26, 2020 18:03

anirudh2290 reviewed Jul 8, 2020

View reviewed changes

WarrenWeckesser reviewed Jul 11, 2020

View reviewed changes

AndrewEckart force-pushed the fix/4341 branch from 357d8cc to f5b57c3 Compare July 11, 2020 21:46

eric-wieser reviewed Jul 12, 2020

View reviewed changes

numpy/lib/npyio.py Show resolved Hide resolved

AndrewEckart force-pushed the fix/4341 branch from f5b57c3 to 189ad50 Compare July 12, 2020 14:58

mattip and others added 2 commits July 14, 2020 10:37

MAINT: add comment

798fa60

Co-authored-by: Eric Wieser <wieser.eric@gmail.com>

comment

44d26af

seberg reviewed Jul 15, 2020

View reviewed changes

numpy/lib/npyio.py Outdated Show resolved Hide resolved

Apply suggestions from code review

03924a1

Merge branch 'fix/4341' of github.com:AndrewEckart/numpy into fix/4341

1f6d7e6

Add release note

ee35155

rossbar reviewed Jul 29, 2020

View reviewed changes

numpy/lib/tests/test_io.py Outdated Show resolved Hide resolved

rossbar self-requested a review July 29, 2020 20:30

Remove byteorder from dtypes in tests

fec9b64

seberg reviewed Aug 3, 2020

View reviewed changes

Fix release note

bcf12b4

Must specify dtype=None for automatic unpacking

f9e245c

AndrewEckart and others added 2 commits August 10, 2020 10:30

Merge branch 'master' into fix/4341

596c1bf

DOC: minor touchups to release note.

aaf78bd

rossbar reviewed Aug 10, 2020

View reviewed changes

numpy/lib/tests/test_io.py Outdated Show resolved Hide resolved

numpy/lib/tests/test_io.py Outdated Show resolved Hide resolved

AndrewEckart added 2 commits August 10, 2020 15:18

Feedback from code review

2e809b7

Merge branch 'fix/4341' of github.com:AndrewEckart/numpy into fix/4341

6b81583

AndrewEckart requested a review from rossbar September 5, 2020 22:05

rossbar approved these changes Sep 11, 2020

View reviewed changes

rossbar merged commit 3329d26 into numpy:master Sep 11, 2020

charris mentioned this pull request Oct 10, 2020

REL: PRs that may need release notes #17533

Closed

20 tasks

seberg mentioned this pull request Jan 19, 2022

BUG: genfromtxt now ignores names=('column1','column2') when unpack=True #20855

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Allow genfromtxt to unpack structured arrays #16650

ENH: Allow genfromtxt to unpack structured arrays #16650

AndrewEckart commented Jun 20, 2020

rossbar left a comment

eric-wieser Jun 26, 2020

eric-wieser Jun 26, 2020

AndrewEckart Jun 26, 2020

anirudh2290 Jul 8, 2020

WarrenWeckesser Jul 11, 2020

AndrewEckart Jul 11, 2020 •

edited

seberg commented Jul 15, 2020

AndrewEckart commented Jul 15, 2020

seberg commented Jul 15, 2020

AndrewEckart commented Jul 15, 2020

seberg commented Jul 15, 2020

seberg Aug 3, 2020

seberg Aug 3, 2020

AndrewEckart Aug 3, 2020

seberg Aug 3, 2020

AndrewEckart Aug 3, 2020

mattip commented Aug 10, 2020

rossbar left a comment

rossbar left a comment

emirkmo commented Jan 19, 2022 •

edited

ENH: Allow genfromtxt to unpack structured arrays #16650

ENH: Allow genfromtxt to unpack structured arrays #16650

Conversation

AndrewEckart commented Jun 20, 2020

rossbar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewEckart Jul 11, 2020 • edited

Choose a reason for hiding this comment

seberg commented Jul 15, 2020

AndrewEckart commented Jul 15, 2020

seberg commented Jul 15, 2020

AndrewEckart commented Jul 15, 2020

seberg commented Jul 15, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattip commented Aug 10, 2020

rossbar left a comment

Choose a reason for hiding this comment

rossbar left a comment

Choose a reason for hiding this comment

emirkmo commented Jan 19, 2022 • edited

AndrewEckart Jul 11, 2020 •

edited

emirkmo commented Jan 19, 2022 •

edited