BUG: Fix of handle missing CSV MI column names #23484

gfyoung · 2018-11-04T09:06:22Z

De-hackifies this hack:

Lines 3206 to 3211 in d78bd7a

    
           # hack 
        
           if (isinstance(index_names[0], compat.string_types) and 
        
                   'Unnamed' in index_names[0]): 
        
               index_names[0] = None 
        
           return index_names, columns, index_col

Setup:

from pandas.compat import StringIO
from pandas import read_csv

data = ",,col\na,c,1\na,d,2\nb,c,3\nb,d,4"
print(read_csv(StringIO(data), index_col=[0, 1]))

data = "NotReallyUnnamed,Unnamed: 0,col\na,c,1\na,d,2\nb,c,3\nb,d,4"
print(read_csv(StringIO(data), index_col=[0, 1]))

Before:

# Why is only `index_names[0]` replaced with `None`?
              col
  Unnamed: 1
a c             1
  d             2
b c             3
  d             4

# Having "Unnamed" in the name doesn't make it replace-able ?
# (this is also surfacing the `index_names[0]` bug too)
              col
  Unnamed: 0
a c             1
  d             2
b c             3
  d             4

After:

# All placeholder names get dropped.
              col
a c             1
  d             2
b c             3
  d             4

# Non-placeholder names never get dropped.
                             col
NotReallyUnnamed Unnamed: 0
a                c             1
                 d             2
b                c             3
                 d             4

pep8speaks · 2018-11-04T09:06:25Z

Hello @gfyoung! Thanks for submitting the PR.

There are no PEP8 issues in the file pandas/io/parsers.py !
There are no PEP8 issues in the file pandas/tests/io/parser/index_col.py !

pandas/io/parsers.py

codecov · 2018-11-04T22:10:39Z

Codecov Report

Merging #23484 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #23484      +/-   ##
==========================================
+ Coverage   92.23%   92.23%   +<.01%     
==========================================
  Files         161      161              
  Lines       51197    51204       +7     
==========================================
+ Hits        47220    47227       +7     
  Misses       3977     3977

Flag	Coverage Δ
#multiple	`90.61% <100%> (ø)`	⬆️
#single	`42.27% <64.7%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/io/parsers.py	`95.62% <100%> (+0.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 24ab22f...e18fef9. Read the comment docs.

Before, only the first index name got replaced with `None` so long as it had the string "Unnamed" in it. Now we replace all index names with `None` if they were deliberately set with placeholders.

gfyoung · 2018-11-05T07:12:30Z

@jreback : I simplified the logic a little bit, but I still needed unnamed_count in some cases because it's computed on a per-iteration basis of a for-loop, whereas unnamed_cols is a global collection. Everything is still green though. PTAL.

jreback · 2018-11-06T03:22:38Z

pandas/_libs/parsers.pyx

@@ -786,6 +793,9 @@ cdef class TextReader:
                            name = '%s.%d' % (name, count)
                            count = counts.get(name, 0)

+                    if old_name == '':


it seems like you could just add unamed_cols.add(name) at line 774 (e.g. after the if name == '' ?

Not quite. You need to add the name that you get post-mangling (e.g. if there are dupes). That's why you have to "keep track" after the logic ending at 794.

jreback · 2018-11-06T13:08:16Z

thanks!

gfyoung added Bug IO CSV read_csv, to_csv MultiIndex labels Nov 4, 2018

gfyoung added this to the 0.24.0 milestone Nov 4, 2018

jreback requested changes Nov 4, 2018

View reviewed changes

pandas/io/parsers.py Show resolved Hide resolved

gfyoung force-pushed the multi-index-column-names branch from 41ef255 to 2668351 Compare November 4, 2018 22:10

BUG: Fix of handle missing CSV MI column names

e18fef9

Before, only the first index name got replaced with `None` so long as it had the string "Unnamed" in it. Now we replace all index names with `None` if they were deliberately set with placeholders.

gfyoung force-pushed the multi-index-column-names branch from 2668351 to e18fef9 Compare November 4, 2018 23:41

jreback requested changes Nov 6, 2018

View reviewed changes

jreback approved these changes Nov 6, 2018

View reviewed changes

jreback merged commit 819ee75 into pandas-dev:master Nov 6, 2018

gfyoung deleted the multi-index-column-names branch November 7, 2018 06:15

gfyoung mentioned this pull request Nov 14, 2018

BUG: 'Unnamed' != unnamed column in CSV #23687

Merged

JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018

BUG: Fix of handle missing CSV MI column names (pandas-dev#23484)

079f632

gfyoung mentioned this pull request Nov 19, 2018

BUG: Don't warn if default conflicts with dialect #23775

Merged

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018

BUG: Fix of handle missing CSV MI column names (pandas-dev#23484)

d2f34db

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

BUG: Fix of handle missing CSV MI column names (pandas-dev#23484)

efe6065

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

BUG: Fix of handle missing CSV MI column names (pandas-dev#23484)

9b8294b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Fix of handle missing CSV MI column names #23484

BUG: Fix of handle missing CSV MI column names #23484

Uh oh!

gfyoung commented Nov 4, 2018 •

edited

Loading

Uh oh!

pep8speaks commented Nov 4, 2018

Uh oh!

Uh oh!

codecov bot commented Nov 4, 2018 •

edited

Loading

Uh oh!

gfyoung commented Nov 5, 2018

Uh oh!

jreback Nov 6, 2018

Uh oh!

gfyoung Nov 6, 2018

Uh oh!

jreback commented Nov 6, 2018

Uh oh!

Uh oh!

	# hack
	if (isinstance(index_names[0], compat.string_types) and
	'Unnamed' in index_names[0]):
	index_names[0] = None

	return index_names, columns, index_col

Uh oh!

BUG: Fix of handle missing CSV MI column names #23484

BUG: Fix of handle missing CSV MI column names #23484

Uh oh!

Conversation

gfyoung commented Nov 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented Nov 4, 2018

Uh oh!

Uh oh!

codecov bot commented Nov 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gfyoung commented Nov 5, 2018

Uh oh!

jreback Nov 6, 2018

Choose a reason for hiding this comment

Uh oh!

gfyoung Nov 6, 2018

Choose a reason for hiding this comment

Uh oh!

jreback commented Nov 6, 2018

Uh oh!

Uh oh!

gfyoung commented Nov 4, 2018 •

edited

Loading

codecov bot commented Nov 4, 2018 •

edited

Loading