Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Validation errors on doctests with output too long for a single line #27308

Open
mpmoran opened this issue Jul 9, 2019 · 1 comment
Open
Labels

Comments

@mpmoran
Copy link
Contributor

mpmoran commented Jul 9, 2019

RE: PR #27280

Multiple docstring tests are failing because of pretty whitespace formatting of the expected output. Here is an example from read_json() in pandas/io/json/_json.py:500

Encoding/decoding a Dataframe using ``'split'`` formatted JSON:

>>> df.to_json(orient='split')
'{"columns":["col 1","col 2"],
  "index":["row 1","row 2"],
  "data":[["a","b"],["c","d"]]}'

Running the validation script results in the following error:

Line 153, in pandas.read_json
Failed example:
    df.to_json(orient='split')
Expected:
    '{"columns":["col 1","col 2"],
      "index":["row 1","row 2"],
      "data":[["a","b"],["c","d"]]}'
Got:
    '{"columns":["col 1","col 2"],"index":["row 1","row 2"],"data":[["a","b"],["c","d"]]}'

There is another similarly failing example in this function's doctests. Series.to_json has similar failures as well.

The +NORMALIZE_WHITESPACE directive does not help and neither do ellipses. Removing the whitespace does cure the failures, but it looks nasty.

INSTALLED VERSIONS ------------------ commit : c64c9cb python : 3.7.3.final.0 python-bits : 64 OS : Darwin OS-release : 18.6.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 0.25.0rc0+23.gc64c9cb44
numpy : 1.16.4
pytz : 2019.1
dateutil : 2.8.0
pip : 19.1.1
setuptools : 41.0.1
Cython : 0.29.11
pytest : 5.0.0
hypothesis : 4.23.6
sphinx : 1.8.5
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.3.4
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.6.0
pandas_datareader: None
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : 0.3.0
gcsfs : None
lxml.etree : 4.3.4
matplotlib : 3.1.0
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : 0.11.1
pytables : None
s3fs : 0.2.1
scipy : 1.2.1
sqlalchemy : 1.3.5
tables : 3.5.2
xarray : 0.12.1
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8

@datapythonista datapythonista changed the title multiple doctests failing where expected output contains whitespace formatting DOC: Validation errors on doctests with output too long for a single line Jul 9, 2019
@mpmoran
Copy link
Contributor Author

mpmoran commented Jul 10, 2019

See https://stackoverflow.com/questions/17640416/doctest-normalize-whitespace-does-not-work. It does appear that to_json returns an str without any whitespace between the commas and this is why NORMALIZE_WHITESPACE doesn't work. A few ideas off the top of my head are wrapping the calls in pprint or adding a feature to to_json to allow for pretty output.

If y'all have any ideas as to how to fix this, I would be willing to take a crack at 'em.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants