Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Improve error message for empty object array #23718

Merged
merged 2 commits into from
Nov 17, 2018

Conversation

bashtage
Copy link
Contributor

@bashtage bashtage commented Nov 15, 2018

Improve the error message shown when an object array is empty

closes #23572

No what's new since too small a change and no test needed.

@pep8speaks
Copy link

Hello @bashtage! Thanks for submitting the PR.

@gfyoung gfyoung added Error Reporting Incorrect or improved errors from pandas IO Stata read_stata, to_stata labels Nov 15, 2018
'exported. Object arrays containing only null '
'values are prohibited. Other object types'
'cannot be exported and must first be converted '
'to one of the supported types.')
Copy link
Member

@gfyoung gfyoung Nov 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • By one of the supported types, do you just mean string?
  • Given that you didn't check the box yet in the PR, I presume a test for this error message is coming as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test, for good measure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By one of the supported types, do you just mean string?

It could be any supported type, int8-32, float, double, string or datetime. The user has to make a choice as to what best expresses the object.

@codecov
Copy link

codecov bot commented Nov 15, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@a23f901). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #23718   +/-   ##
=========================================
  Coverage          ?   92.25%           
=========================================
  Files             ?      161           
  Lines             ?    51384           
  Branches          ?        0           
=========================================
  Hits              ?    47405           
  Misses            ?     3979           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.64% <ø> (?)
#single 42.32% <ø> (?)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a23f901...527a1e3. Read the comment docs.

@bashtage
Copy link
Contributor Author

I also extended the test in my previous PR to ensure that the case with all None when specifically requested to write as a strl passes. I hope this is OK, but could move to a new PR is needed.

@jtkiley
Copy link
Contributor

jtkiley commented Nov 15, 2018

@bashtage Consider adding the offending column to the error like in #23646. The all None case isn't hard to handle, but most of the work for me is identifying the column(s) that are the problem in datasets with hundreds of columns.

Since it's throwing on a particular column, it may be that the first encountered column is the only one that can be named, but that's still helpful. An alternative would be providing a brief code example that will identify these all None columns (and perhaps an example of fixing it with convert_strl).

I could imagine relatively novice pandas users (i.e. Stata switchers) hitting this error, so I'd lean toward a little more hand-holding/direction. Happy to hear counterarguments, though.

@bashtage bashtage force-pushed the update-warning-object-array-stata branch from 03b1cb4 to a1c3f48 Compare November 15, 2018 16:35
Improve the error message shown when an object array is empty

closes pandas-dev#23572
Test exception is hit when all values in an object column are None
Extend the test for strl conversion to ensure this case passes (as expected)
@bashtage bashtage force-pushed the update-warning-object-array-stata branch from a1c3f48 to 527a1e3 Compare November 15, 2018 16:36
@bashtage
Copy link
Contributor Author

@jtkiley I added this.

@jtkiley
Copy link
Contributor

jtkiley commented Nov 15, 2018

@bashtage Thanks for this (looks great) and all you do on the Stata format. It's a big help for those of us working with both, and it helps smooth the work I do to train Stata users to use pandas.

@jreback jreback added this to the 0.24.0 milestone Nov 17, 2018
@jreback jreback merged commit e9aee5b into pandas-dev:master Nov 17, 2018
@jreback
Copy link
Contributor

jreback commented Nov 17, 2018

thanks @bashtage

thoo added a commit to thoo/pandas that referenced this pull request Nov 19, 2018
…fixed

* upstream/master: (46 commits)
  DEPS: bump xlrd min version to 1.0.0 (pandas-dev#23774)
  BUG: Don't warn if default conflicts with dialect (pandas-dev#23775)
  BUG: Fixing memory leaks in read_csv (pandas-dev#23072)
  TST: Extend datetime64 arith tests to array classes, fix several broken cases (pandas-dev#23771)
  STYLE: Specify bare exceptions in pandas/tests (pandas-dev#23370)
  ENH: between_time, at_time accept axis parameter (pandas-dev#21799)
  PERF: Use is_utc check to improve performance of dateutil UTC in DatetimeIndex methods (pandas-dev#23772)
  CLN: io/formats/html.py: refactor (pandas-dev#22726)
  API: Make Categorical.searchsorted returns a scalar when supplied a scalar (pandas-dev#23466)
  TST: Add test case for GH14080 for overflow exception (pandas-dev#23762)
  BUG: Don't extract header names if none specified (pandas-dev#23703)
  BUG: Index.str.partition not nan-safe (pandas-dev#23558) (pandas-dev#23618)
  DEPR: tz_convert in the Timestamp constructor (pandas-dev#23621)
  PERF: Datetime/Timestamp.normalize for timezone naive datetimes (pandas-dev#23634)
  TST: Use new arithmetic fixtures, parametrize many more tests (pandas-dev#23757)
  REF/TST: Add more pytest idiom to parsers tests (pandas-dev#23761)
  DOC: Add ignore-deprecate argument to validate_docstrings.py (pandas-dev#23650)
  ENH: update pandas-gbq to 0.8.0, adds credentials arg (pandas-dev#23662)
  DOC: Improve error message to show correct order (pandas-dev#23652)
  ENH: Improve error message for empty object array (pandas-dev#23718)
  ...
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
* ENH: Improve error message for empty object array

Improve the error message shown when an object array is empty

closes pandas-dev#23572

* TST: Add tests for all None

Test exception is hit when all values in an object column are None
Extend the test for strl conversion to ensure this case passes (as expected)
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
* ENH: Improve error message for empty object array

Improve the error message shown when an object array is empty

closes pandas-dev#23572

* TST: Add tests for all None

Test exception is hit when all values in an object column are None
Extend the test for strl conversion to ensure this case passes (as expected)
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
* ENH: Improve error message for empty object array

Improve the error message shown when an object array is empty

closes pandas-dev#23572

* TST: Add tests for all None

Test exception is hit when all values in an object column are None
Extend the test for strl conversion to ensure this case passes (as expected)
@bashtage bashtage deleted the update-warning-object-array-stata branch March 21, 2019 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas IO Stata read_stata, to_stata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

df.to_stata fails when a column of type object contains only None
5 participants