ERR: dont' allow ambiguous usecols #12678

jreback · 2016-03-20T20:46:05Z

usecols=['a',1] should raise

The text was updated successfully, but these errors were encountered:

gfyoung · 2016-03-20T20:50:01Z

@jreback : I'll tackle this one as a follow-up in terms of fixing documentation and enforcement. In the meantime, I'll remove the tests I added that had the ambiguous columns.

jorisvandenbossche · 2016-03-20T20:59:33Z

Shouldn't we also just raise on duplicates in usecols ? eg ['a', 'a']

gfyoung · 2016-03-20T21:04:27Z

@jorisvandenbossche : usecols is a set, so you won't ever have such a problem.

jreback · 2016-03-20T21:06:00Z

#11822 and #11823 converts it to a list.

though if we allow only non-duplicated and make it a set operation against names...

though current doc-string say its array-like

jorisvandenbossche · 2016-03-20T21:08:28Z

@gfyoung aha, I didn't know that, but then adding tests for the behaviour for duplicates values in usecols as we are doing in #11882 makes no sense?

jreback · 2016-03-20T21:12:49Z

what we could do is:

don't allow mixed-integers (e.g. 'a', 1) as these are ambiguous
still use a set for usecols, so duplicates are gone, but handle this as a set-selection operation against .names/columns (IOW if names is passed or a header is read in). If there are duplicates there its ok, this usecols just sub-selects. Duplicate handling will be solely there.

gfyoung · 2016-03-20T21:19:59Z

@jreback : What do you mean by "duplicate" handling? Is that what #11823 will be doing?

jreback · 2016-03-20T21:23:17Z

This is on master.

In [12]: pd.read_csv(StringIO("""1,2,3"""), engine='c', header=None, 
   ....:             names=['a', 'b', 'a'], usecols=['a','a'])
Out[12]: 
   a  a
0  1  1

In [13]: pd.read_csv(StringIO("""1,2,3"""), engine='python', header=None, 
            names=['a', 'b', 'a'], usecols=['a','a'])
Out[13]: 
   a
0  1

but I think that these should BOTH output

   a   a
0  1   3

IOW, the usecols is just a filter as a set (so its 'a' in this case).

Then names takes over and you get the 0th and 2nd columns (that are named 'a')

jreback · 2016-03-20T21:24:13Z

cc @sxwang

gfyoung · 2016-03-20T21:26:51Z

@jreback : AFAICT, such behaviour will be fixed in #11882 right?

jorisvandenbossche · 2016-03-20T21:29:16Z

@jreback I agree

@gfyoung Well the PR for that issue is #11882, and currently there it is not yet this behaviour (but still in reviewing phase)

gfyoung · 2016-03-20T21:48:38Z

Okay, but in terms of allocation, that issue should be tackled there, and I could just handle the enforcing non-ambiguous usecols, right?

jorisvandenbossche · 2016-03-20T21:48:59Z

@gfyoung yes, that's right!

jreback · 2016-03-20T21:49:33Z

yep that sounds right. let's restrict this issue to ambiguous errors

Enforces the fact that 'usecols' must either be all integers (indexing) or strings (column names), as mixtures of the two are ambiguous. Closes pandas-devgh-12678.

jreback added Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv labels Mar 20, 2016

jreback added this to the 0.18.1 milestone Mar 20, 2016

jreback mentioned this issue Mar 20, 2016

BUG: Fix parse_dates processing with usecols and C engine #12512

Closed

jreback mentioned this issue Mar 20, 2016

BUG: dataframe loading with duplicated columns and usecols #11823 #11882

Closed

gfyoung added a commit to forking-repos/pandas that referenced this issue Apr 6, 2016

BUG: Prevent mixed-typed usecols

f0543a4

Enforces the fact that 'usecols' must either be all integers (indexing) or strings (column names), as mixtures of the two are ambiguous. Closes pandas-devgh-12678.

jreback closed this as completed in c6c201e Apr 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERR: dont' allow ambiguous usecols #12678

ERR: dont' allow ambiguous usecols #12678

jreback commented Mar 20, 2016

gfyoung commented Mar 20, 2016

jorisvandenbossche commented Mar 20, 2016

gfyoung commented Mar 20, 2016

jreback commented Mar 20, 2016

jorisvandenbossche commented Mar 20, 2016

jreback commented Mar 20, 2016

gfyoung commented Mar 20, 2016

jreback commented Mar 20, 2016

jreback commented Mar 20, 2016

gfyoung commented Mar 20, 2016

jorisvandenbossche commented Mar 20, 2016

gfyoung commented Mar 20, 2016

jorisvandenbossche commented Mar 20, 2016

jreback commented Mar 20, 2016

ERR: dont' allow ambiguous usecols #12678

ERR: dont' allow ambiguous usecols #12678

Comments

jreback commented Mar 20, 2016

gfyoung commented Mar 20, 2016

jorisvandenbossche commented Mar 20, 2016

gfyoung commented Mar 20, 2016

jreback commented Mar 20, 2016

jorisvandenbossche commented Mar 20, 2016

jreback commented Mar 20, 2016

gfyoung commented Mar 20, 2016

jreback commented Mar 20, 2016

jreback commented Mar 20, 2016

gfyoung commented Mar 20, 2016

jorisvandenbossche commented Mar 20, 2016

gfyoung commented Mar 20, 2016

jorisvandenbossche commented Mar 20, 2016

jreback commented Mar 20, 2016