Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise ValueError when writing DataFrame with duplicate columns #18

Merged
merged 1 commit into from
Oct 7, 2019

Conversation

bclayman
Copy link
Collaborator

@bclayman bclayman commented Oct 4, 2019

Previously, the call to self._check_dtypes within ParquetDataFrameProtocol#write would fail with duplicate columns because it expects df[col] to always be a pd.Series. If the col in question appears more than once, df[col] produces a pd.DataFrame. We do not support writing DataFrames with duplicate columns. This commit raises a ValueError if it encounters this scenario when writing.

Previously, the call to self._check_dtypes within
ParquetDataFrameProtocol#write would fail with duplicate columns because
it expects df[col] to always be a series.  If the col in question
appears more than once, df[col] produces a DataFrame.  We do not support
writing DataFrames with duplicate columns.  This commit raises
a ValueError if it encounters this scenario when writing.
@bclayman bclayman requested a review from jqmp October 4, 2019 23:02
Copy link
Collaborator

@jqmp jqmp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@bclayman bclayman merged commit 6a80952 into master Oct 7, 2019
@bclayman bclayman deleted the bclayman/check_for_duplicate_df_columns branch October 7, 2019 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants