Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added errors{'raise','ignore'} for keys not found in meta for json_normalize #14583

Closed
wants to merge 5 commits into from
Closed

Added errors{'raise','ignore'} for keys not found in meta for json_normalize #14583

wants to merge 5 commits into from

Conversation

dickreuter
Copy link
Contributor

@dickreuter dickreuter commented Nov 4, 2016

Continuation of #14505

Added errors{'raise','ignore'} for keys not found in meta for json_normalize

@jorisvandenbossche jorisvandenbossche added Error Reporting Incorrect or improved errors from pandas IO JSON read_json, to_json, json_normalize labels Nov 4, 2016
@codecov-io
Copy link

codecov-io commented Nov 4, 2016

Current coverage is 85.26% (diff: 100%)

Merging #14583 into master will decrease coverage by <.01%

@@             master     #14583   diff @@
==========================================
  Files           144        144          
  Lines         50980      50985     +5   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43470      43474     +4   
- Misses         7510       7511     +1   
  Partials          0          0          

Powered by Codecov. Last update 14e4815...701c140

@dickreuter
Copy link
Contributor Author

Cleaning it up, should be better now. Is there a new whatsnew file so I can add the comments? What's the next version number?

@dickreuter
Copy link
Contributor Author

Why is nobody merging this?

@@ -740,6 +742,8 @@ def json_normalize(data, record_path=None, meta=None,
If True, prefix records with dotted (?) path, e.g. foo.bar.field if
path to records is ['foo', 'bar']
meta_prefix : string, default None
error: {'raise', 'ignore'}, default 'raise'
Copy link
Contributor

@jreback jreback Nov 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a version in 0.20.0 tag

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a version added tag

@@ -80,3 +80,4 @@ Performance Improvements

Bug Fixes
~~~~~~~~~
- Enhancement in ``pandas.io.json.json_normalize``If meta keys are not always present a new option to set errors='ignore' has been implemented (:issue:`14505`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can put in enhancements section

@@ -80,3 +80,4 @@ Performance Improvements

Bug Fixes
~~~~~~~~~
- Enhancement in ``pandas.io.json.json_normalize``If meta keys are not always present a new option to set errors='ignore' has been implemented (:issue:`14505`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after quotes and before If. put errors='ignore' in double-backticks.

meta_val = np.nan
else:
raise KeyError(
"Try running with errors='ignore' as the following key may not always be present: " + str(e))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this won't pass linting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? What do I need to change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check for PEP8, so this line will be too long. Normally you can set your editor to check for those things, but can also see the output of the check on travis (3d build): https://travis-ci.org/pandas-dev/pandas/jobs/174180562 (at the bottom)

pandas/io/json.py:746:80: E501 line too long (85 > 79 characters)
pandas/io/json.py:853:80: E501 line too long (129 > 79 characters)
pandas/io/tests/json/test_json_norm.py:229:5: E303 too many blank lines (2)
pandas/io/tests/json/test_json_norm.py:271:80: E501 line too long (81 > 79 characters)
pandas/io/tests/json/test_json_norm.py:272:80: E501 line too long (104 > 79 characters)
pandas/io/tests/json/test_json_norm.py:273:17: E225 missing whitespace around operator
pandas/io/tests/json/test_json_norm.py:274:10: E128 continuation line under-indented for visual indent
pandas/io/tests/json/test_json_norm.py:275:10: E128 continuation line under-indented for visual indent
pandas/io/tests/json/test_json_norm.py:276:10: E128 continuation line under-indented for visual indent
pandas/io/tests/json/test_json_norm.py:277:10: E128 continuation line under-indented for visual indent

@@ -225,6 +225,59 @@ def test_nested_flattens(self):

self.assertEqual(result, expected)


def test_json_normalise_fix(self):
# issue 14505
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put a comment explaining what this is. rename test -> test_json_normalize_errors

'price': {0: '0', 1: '0', 2: '0', 3: '0'},
'symbol': {0: 'AAPL', 1: 'GOOG', 2: 'AAPL', 3: 'GOOG'}}

self.assertEqual(j.fillna('').to_dict(), expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test the same with errors='raise'

@jreback
Copy link
Contributor

jreback commented Nov 17, 2016

@dickreuter we have 107 open PR's. very few people do code review. simply ping if you want someone to look.

@jorisvandenbossche
Copy link
Member

@dickreuter just so you know: we don't get notifications when you push new code, only from comments. So it's always good to put a comment after you pushed new code to indicated that you updated. And as @jreback says, pinging like you now did always helps

@dickreuter
Copy link
Contributor Author

dickreuter commented Nov 17, 2016

Ok great. Please have a look. I think this can now be merged. 

@jorisvandenbossche
Copy link
Member

@dickreuter Did you push the changes? As there is still a merge conflict and there is no new commit since @jreback 's comments

@dickreuter
Copy link
Contributor Author

Just pushed the changes as instructed.

@dickreuter
Copy link
Contributor Author

Anything else I need to fix in this or can this be merged?

@dickreuter
Copy link
Contributor Author

Just did another rebase and pushed again. Can this be merged now? thanks

@@ -49,6 +49,7 @@ Other enhancements
^^^^^^^^^^^^^^^^^^

- ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)
- ``pandas.io.json.json_normalize`` If meta keys are not always present a new option to set errors="ignore" (:issue:`14583`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In pandas.io.json.json_normalize gained the option errors='ignore'|raise (in double-backticks); the default is raise and is backward compatible.

@@ -104,4 +105,4 @@ Performance Improvements
.. _whatsnew_0200.bug_fixes:

Bug Fixes
~~~~~~~~~
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this

@@ -740,6 +742,8 @@ def json_normalize(data, record_path=None, meta=None,
If True, prefix records with dotted (?) path, e.g. foo.bar.field if
path to records is ['foo', 'bar']
meta_prefix : string, default None
error: {'raise', 'ignore'}, default 'raise'
* ignore: will ignore keyErrors if keys listed in meta are not always present
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an entry for raise

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will ignore KeyError, if the keys passed in meta are not all present

meta_val = np.nan
else:
raise \
KeyError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better comment please

@@ -225,6 +225,65 @@ def test_nested_flattens(self):

self.assertEqual(result, expected)


def test_json_normalize_errors(self):
# If meta keys are not always present a new option to set errors='ignore' has been implemented (:issue:`14583`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GH14583

@jreback
Copy link
Contributor

jreback commented Dec 1, 2016

just some doc comments, otherwise lgtm.

@dickreuter
Copy link
Contributor Author

Implemented doc amendments as suggested. Let me know if anything else needs to be changed. thanks

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some PEP8 style issues, otherwise looks good!

@@ -740,6 +742,9 @@ def json_normalize(data, record_path=None, meta=None,
If True, prefix records with dotted (?) path, e.g. foo.bar.field if
path to records is ['foo', 'bar']
meta_prefix : string, default None
error: {'raise', 'ignore'}, default 'raise'
* ignore: will ignore KeyError if keys listed in meta are not always present
* raise: will raise KeyError if keys listed in meta are not always present
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect those two lines above are too long (> 80, we check for PEP8, which will let travis fail)

if errors == 'ignore':
meta_val = np.nan
else:
raise KeyError("Try running with errors='ignore' as key %s is not always present.", e)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, for PEP8 line too long

@@ -225,6 +225,65 @@ def test_nested_flattens(self):

self.assertEqual(result, expected)


def test_json_normalize_errors(self):
# GH14583: If meta keys are not always present a new option to set errors='ignore' has been implemented
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and in other places below line length is also a problem

@jorisvandenbossche
Copy link
Member

@dickreuter You still have some style issues (that's the reason that travis is failing). You can always look in the log of the third travis build to see the issues (https://travis-ci.org/pandas-dev/pandas/jobs/182143127, scroll to the bottom):

Linting  *.py
pandas/io/json.py:750:1: W293 blank line contains whitespace
pandas/io/json.py:857:80: E501 line too long (81 > 79 characters)
pandas/io/json.py:858:80: E501 line too long (85 > 79 characters)
pandas/io/tests/json/test_json_norm.py:229:5: E303 too many blank lines (2)
pandas/io/tests/json/test_json_norm.py:266:80: E501 line too long (81 > 79 characters)
pandas/io/tests/json/test_json_norm.py:267:80: E501 line too long (87 > 79 characters)
pandas/io/tests/json/test_json_norm.py:269:17: E225 missing whitespace around operator
pandas/io/tests/json/test_json_norm.py:270:10: E128 continuation line under-indented for visual indent
pandas/io/tests/json/test_json_norm.py:271:10: E128 continuation line under-indented for visual indent
pandas/io/tests/json/test_json_norm.py:272:10: E128 continuation line under-indented for visual indent
pandas/io/tests/json/test_json_norm.py:273:10: E128 continuation line under-indented for visual indent
pandas/io/tests/json/test_json_norm.py:278:17: E128 continuation line under-indented for visual indent
pandas/io/tests/json/test_json_norm.py:278:80: E501 line too long (86 > 79 characters)
pandas/io/tests/json/test_json_norm.py:279:28: E127 continuation line over-indented for visual indent
pandas/io/tests/json/test_json_norm.py:279:80: E501 line too long (87 > 79 characters)
Linting *.py DONE

(it also a good idea to set up your IDE to warn for this)

@jorisvandenbossche jorisvandenbossche added this to the 0.20.0 milestone Dec 10, 2016
@dickreuter
Copy link
Contributor Author

I believe this is fixed now. Some checks are still not successful but I can't find any reason why. Any suggestions why? All it says is:

The command "ci/lint.sh" exited with 1.
0.00s$ echo "script done"
script done

@jreback
Copy link
Contributor

jreback commented Dec 11, 2016

git diff master | flake8 -diff

@jorisvandenbossche
Copy link
Member

@dickreuter You can also always look at the third travis build (https://travis-ci.org/pandas-dev/pandas/jobs/183014087, scroll to bottom), which says "pandas/io/json.py:857:45: E211 whitespace before '('"

@dickreuter
Copy link
Contributor Author

Linting still seems to fail, even though when I do "git diff master | flake8 --diff" I only get problems in v0.20.0.txt, all the rest is fine. Any suggestions how I can find the remaining issues?

@jorisvandenbossche
Copy link
Member

@dickreuter It was not linting this time, the mac build failed.

@dickreuter
Copy link
Contributor Author

And it is related to my code? I can't find any details.

@jreback
Copy link
Contributor

jreback commented Dec 13, 2016

@dickreuter you need to rebase on master

git rebase -i origin/master then
git push -f

dickreuter added 5 commits December 13, 2016 21:50
… meta parameter that don't always occur in every item of the list

Added documentation and test for issue #14505
Added keyword errors {'raise'|'ignore}
Shortened what's new
Removed commas in dictionary for linting compatibility
Updated doc
@dickreuter
Copy link
Contributor Author

Now all checked have passed.

@jreback jreback closed this in 7d8bc0d Dec 13, 2016
@jreback
Copy link
Contributor

jreback commented Dec 13, 2016

thanks!

ischurov pushed a commit to ischurov/pandas that referenced this pull request Dec 19, 2016
…on_normalize

Author: dickreuter <dickreuter@yahoo.com>

Closes pandas-dev#14583 from dickreuter/json_normalize_enhancement and squashes the following commits:

701c140 [dickreuter] adjusted formatting
3c94206 [dickreuter] shortened lines to pass linting
2028924 [dickreuter] doc changes
d298588 [dickreuter] Fixed as instructed in pull request page
bcfbf18 [dickreuter] Avoids exception when pandas.io.json.json_normalize
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants