BUG: DataFrame.insert with allow_duplicates=True fails when already duplicates present #14291

mbochk · 2016-09-23T14:04:17Z

upon DataFrame.insert option allow_duplicates works, but only only once.
When i have 2 columns with same name, additon of third throws

ValueError: Wrong number of items passed 2, placement implies 1

Code Sample, a copy-pastable example if possible

a = pd.DataFrame()
a.insert(0, "qwe", [1,2,3,4], allow_duplicates=True)
a.insert(0, "qwe", [1,2,3,4], allow_duplicates=True)
a.insert(0, "qwe", [1,2,3,4], allow_duplicates=True)

Expected Output

zxc qwe qwe
0 1 1 1
1 2 2 2
2 3 3 3
3 4 4 4

output of `pd.show_versions()`

## INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: ru_RU

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 25.1.6
Cython: 0.24.1
numpy: 1.11.1
scipy: 0.18.0
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: 0.7.6.None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None
<\details>

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2016-09-23T14:20:44Z

@mbochk That looks like a bug indeed. Thanks for reporting

shawnheide · 2016-10-04T03:02:43Z

I looked into this and discovered that the problem is in frame.py, in the _sanitize_column method. Here's the relevant code:

# broadcast across multiple columns if necessary
if key in self.columns and value.ndim == 1:
    if (not self.columns.is_unique or
            isinstance(self.columns, MultiIndex)):
        existing_piece = self[key]
        if isinstance(existing_piece, DataFrame):
            value = np.tile(value, (len(existing_piece.columns), 1))

On the third time insert is called, the existing_piece is a 2d array consisting of the previous values. I'm not sure how to fix this though as I don't understand why the values are being broadcast in the first place. Any thoughts?

jorisvandenbossche · 2016-10-04T14:12:34Z

What happens here is needed when you are setting a certain column (eg df[key] = value). If key then is a duplicate column name, the value has to be broadcasted to fit in those multiple columns.
But of course this part of _sanitize_column is not needed for an insert operation.

…s-dev#14291)

…olumns in a dataframe closes pandas-dev#14291 closes pandas-dev#14431 (cherry picked from commit 2e77536)

jorisvandenbossche changed the title ~~allow_duplicates doesn't work while several duplicates present~~ BUG: DataFrame.insert with allow_duplicates=True fails when already duplicates present Sep 23, 2016

jorisvandenbossche added the Bug label Sep 23, 2016

jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Difficulty Intermediate labels Oct 5, 2016

jreback added this to the Next Major Release milestone Oct 5, 2016

paul-mannino mentioned this issue Oct 10, 2016

BUG: Fix issue with inserting duplicate columns in a dataframe (GH14291) #14384

Closed

4 tasks

paul-mannino added a commit to paul-mannino/pandas that referenced this issue Oct 11, 2016

BUG: Fix issue with inserting duplicate columns in a dataframe (panda…

a00f0fe

…s-dev#14291)

paul-mannino mentioned this issue Oct 15, 2016

BUG: Fix issue with inserting duplicate columns in a dataframe (#14291) #14431

Closed

4 tasks

jreback modified the milestones: 0.19.1, Next Major Release Oct 19, 2016

paul-mannino added a commit to paul-mannino/pandas that referenced this issue Oct 19, 2016

BUG: Fix issue with inserting duplicate columns in a dataframe (panda…

ad06cb4

…s-dev#14291)

paul-mannino added a commit to paul-mannino/pandas that referenced this issue Oct 22, 2016

BUG: Fix issue with inserting duplicate columns in a dataframe (panda…

2698005

…s-dev#14291)

jreback closed this as completed in 2e77536 Oct 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DataFrame.insert with allow_duplicates=True fails when already duplicates present #14291

BUG: DataFrame.insert with allow_duplicates=True fails when already duplicates present #14291

mbochk commented Sep 23, 2016 •

edited by jorisvandenbossche

Loading

jorisvandenbossche commented Sep 23, 2016

shawnheide commented Oct 4, 2016

jorisvandenbossche commented Oct 4, 2016

BUG: DataFrame.insert with allow_duplicates=True fails when already duplicates present #14291

BUG: DataFrame.insert with allow_duplicates=True fails when already duplicates present #14291

Comments

mbochk commented Sep 23, 2016 • edited by jorisvandenbossche Loading

Code Sample, a copy-pastable example if possible

Expected Output

output of pd.show_versions()

jorisvandenbossche commented Sep 23, 2016

shawnheide commented Oct 4, 2016

jorisvandenbossche commented Oct 4, 2016

mbochk commented Sep 23, 2016 •

edited by jorisvandenbossche

Loading

output of `pd.show_versions()`