Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix np.genfromtxt field name handling when dtype != None #4649

Closed
wants to merge 2 commits into from
Closed

Fix np.genfromtxt field name handling when dtype != None #4649

wants to merge 2 commits into from

Conversation

alanbriolat
Copy link
Contributor

np.genfromtxt validates field names twice: once in genfromtxt and once in easy_dtype. Whilst the arguments to genfromtxt are used in the first validation, they aren't passed to easy_dtype (which is used only when dtype != None) and therefore in this case the default validation (strip non-alphanum, replace spaces) gets confusingly applied, ignoring genfromtxt's arguments.

This patch adds failing tests for the issue and fixes genfromtxt by passing the appropriate arguments onwards to easy_dtype.

This is probably the least invasive way to fix the issue. In my opinion, the whole thing is a nest of poorly-defined responsibilities between genfromtxt, easy_dtype and NameValidator, especially since the latter two are only used in the former. I'm willing to take the time to try and clean that up if it's likely to be well-received.

@jaimefrio
Copy link
Member

It took me some time to realize that, if the current behavior was some form of intentional safeguard, it is very easily circumvented by simply setting dtype=None. If no one has any strong opposition, I would like to merge this, will give it a day or two for people to complain.

My only nitpick is whether the tests, rather than comparing against hardcoded values for the array and its dtype, should compare against the result of running np.genfromtxt on the same input, and with all arguments set identically, except for dtype=None.

matt-kendall added a commit to cedadev/cis that referenced this pull request Jan 15, 2015
… aeronet variable names (e.g. 'Datetime(dd-mm-yy)' was becoming 'Datetimeddmmyy'). See this incredibly frustrating bug:

numpy/numpy#4649
@charris
Copy link
Member

charris commented Jan 23, 2015

Squashed commits and rewrote commit message in #5459. Thanks @alanbriolat..

@charris charris closed this Jan 23, 2015
@charris
Copy link
Member

charris commented Jan 23, 2015

As you say, genfromtxt is a bit of a mess. Any work you want to do cleaning it up is welcome.

charris referenced this pull request Jan 24, 2015
BUG: Fix genfromtext NameValidator arguments passed to easy_dtype.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants