Added errors{'raise','ignore'} for keys not found in meta for json_normalize #14583

Closed
wants to merge 5 commits into
from

Conversation

Projects
None yet
4 participants
Contributor

dickreuter commented Nov 4, 2016 edited by jorisvandenbossche

Continuation of #14505

Added errors{'raise','ignore'} for keys not found in meta for json_normalize

codecov-io commented Nov 4, 2016 edited

Current coverage is 85.26% (diff: 100%)

Merging #14583 into master will decrease coverage by <.01%

@@             master     #14583   diff @@
==========================================
  Files           144        144          
  Lines         50980      50985     +5   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43470      43474     +4   
- Misses         7510       7511     +1   
  Partials          0          0          

Powered by Codecov. Last update 14e4815...701c140

Contributor

dickreuter commented Nov 4, 2016

Cleaning it up, should be better now. Is there a new whatsnew file so I can add the comments? What's the next version number?

Contributor

dickreuter commented Nov 17, 2016

Why is nobody merging this?

@@ -740,6 +742,8 @@ def json_normalize(data, record_path=None, meta=None,
If True, prefix records with dotted (?) path, e.g. foo.bar.field if
path to records is ['foo', 'bar']
meta_prefix : string, default None
+ error: {'raise', 'ignore'}, default 'raise'
@jreback

jreback Nov 17, 2016 edited

Contributor

need a version in 0.20.0 tag

@jreback

jreback Dec 1, 2016

Contributor

need a version added tag

doc/source/whatsnew/v0.20.0.txt
@@ -80,3 +80,4 @@ Performance Improvements
Bug Fixes
~~~~~~~~~
+- Enhancement in ``pandas.io.json.json_normalize``If meta keys are not always present a new option to set errors='ignore' has been implemented (:issue:`14505`)
@jreback

jreback Nov 17, 2016

Contributor

you can put in enhancements section

doc/source/whatsnew/v0.20.0.txt
@@ -80,3 +80,4 @@ Performance Improvements
Bug Fixes
~~~~~~~~~
+- Enhancement in ``pandas.io.json.json_normalize``If meta keys are not always present a new option to set errors='ignore' has been implemented (:issue:`14505`)
@jreback

jreback Nov 17, 2016

Contributor

space after quotes and before If. put errors='ignore' in double-backticks.

pandas/io/json.py
+ meta_val = np.nan
+ else:
+ raise KeyError(
+ "Try running with errors='ignore' as the following key may not always be present: " + str(e))
@jreback

jreback Nov 17, 2016

Contributor

this won't pass linting

@dickreuter

dickreuter Nov 18, 2016

Contributor

Why not? What do I need to change?

@jorisvandenbossche

jorisvandenbossche Nov 18, 2016

Owner

We check for PEP8, so this line will be too long. Normally you can set your editor to check for those things, but can also see the output of the check on travis (3d build): https://travis-ci.org/pandas-dev/pandas/jobs/174180562 (at the bottom)

pandas/io/json.py:746:80: E501 line too long (85 > 79 characters)
pandas/io/json.py:853:80: E501 line too long (129 > 79 characters)
pandas/io/tests/json/test_json_norm.py:229:5: E303 too many blank lines (2)
pandas/io/tests/json/test_json_norm.py:271:80: E501 line too long (81 > 79 characters)
pandas/io/tests/json/test_json_norm.py:272:80: E501 line too long (104 > 79 characters)
pandas/io/tests/json/test_json_norm.py:273:17: E225 missing whitespace around operator
pandas/io/tests/json/test_json_norm.py:274:10: E128 continuation line under-indented for visual indent
pandas/io/tests/json/test_json_norm.py:275:10: E128 continuation line under-indented for visual indent
pandas/io/tests/json/test_json_norm.py:276:10: E128 continuation line under-indented for visual indent
pandas/io/tests/json/test_json_norm.py:277:10: E128 continuation line under-indented for visual indent
pandas/io/tests/json/test_json_norm.py
@@ -225,6 +225,59 @@ def test_nested_flattens(self):
self.assertEqual(result, expected)
+
+ def test_json_normalise_fix(self):
+ # issue 14505
@jreback

jreback Nov 17, 2016

Contributor

put a comment explaining what this is. rename test -> test_json_normalize_errors

pandas/io/tests/json/test_json_norm.py
+ 'price': {0: '0', 1: '0', 2: '0', 3: '0'},
+ 'symbol': {0: 'AAPL', 1: 'GOOG', 2: 'AAPL', 3: 'GOOG'}}
+
+ self.assertEqual(j.fillna('').to_dict(), expected)
@jreback

jreback Nov 17, 2016

Contributor

test the same with errors='raise'

Contributor

jreback commented Nov 17, 2016

@dickreuter we have 107 open PR's. very few people do code review. simply ping if you want someone to look.

@dickreuter just so you know: we don't get notifications when you push new code, only from comments. So it's always good to put a comment after you pushed new code to indicated that you updated. And as @jreback says, pinging like you now did always helps

Contributor

dickreuter commented Nov 17, 2016 edited by jorisvandenbossche

Ok great. Please have a look. I think this can now be merged. 

@dickreuter Did you push the changes? As there is still a merge conflict and there is no new commit since @jreback 's comments

Contributor

dickreuter commented Nov 18, 2016

Just pushed the changes as instructed.

Contributor

dickreuter commented Nov 28, 2016

Anything else I need to fix in this or can this be merged?

Contributor

dickreuter commented Dec 1, 2016

Just did another rebase and pushed again. Can this be merged now? thanks

doc/source/whatsnew/v0.20.0.txt
@@ -49,6 +49,7 @@ Other enhancements
^^^^^^^^^^^^^^^^^^
- ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)
+- ``pandas.io.json.json_normalize`` If meta keys are not always present a new option to set errors="ignore" (:issue:`14583`)
@jreback

jreback Dec 1, 2016

Contributor

In pandas.io.json.json_normalize gained the option errors='ignore'|raise (in double-backticks); the default is raise and is backward compatible.

doc/source/whatsnew/v0.20.0.txt
@@ -104,4 +105,4 @@ Performance Improvements
.. _whatsnew_0200.bug_fixes:
Bug Fixes
-~~~~~~~~~
@jreback

jreback Dec 1, 2016

Contributor

revert this

pandas/io/json.py
@@ -740,6 +742,8 @@ def json_normalize(data, record_path=None, meta=None,
If True, prefix records with dotted (?) path, e.g. foo.bar.field if
path to records is ['foo', 'bar']
meta_prefix : string, default None
+ error: {'raise', 'ignore'}, default 'raise'
+ * ignore: will ignore keyErrors if keys listed in meta are not always present
@jreback

jreback Dec 1, 2016

Contributor

add an entry for raise

@jreback

jreback Dec 1, 2016

Contributor

will ignore KeyError, if the keys passed in meta are not all present

pandas/io/json.py
+ meta_val = np.nan
+ else:
+ raise \
+ KeyError(
@jreback

jreback Dec 1, 2016

Contributor

better comment please

pandas/io/tests/json/test_json_norm.py
@@ -225,6 +225,65 @@ def test_nested_flattens(self):
self.assertEqual(result, expected)
+
+ def test_json_normalize_errors(self):
+ # If meta keys are not always present a new option to set errors='ignore' has been implemented (:issue:`14583`)
@jreback

jreback Dec 1, 2016

Contributor

GH14583

Contributor

jreback commented Dec 1, 2016

just some doc comments, otherwise lgtm.

Contributor

dickreuter commented Dec 2, 2016

Implemented doc amendments as suggested. Let me know if anything else needs to be changed. thanks

@jorisvandenbossche

Some PEP8 style issues, otherwise looks good!

pandas/io/json.py
@@ -740,6 +742,9 @@ def json_normalize(data, record_path=None, meta=None,
If True, prefix records with dotted (?) path, e.g. foo.bar.field if
path to records is ['foo', 'bar']
meta_prefix : string, default None
+ error: {'raise', 'ignore'}, default 'raise'
+ * ignore: will ignore KeyError if keys listed in meta are not always present
+ * raise: will raise KeyError if keys listed in meta are not always present
@jorisvandenbossche

jorisvandenbossche Dec 2, 2016

Owner

I suspect those two lines above are too long (> 80, we check for PEP8, which will let travis fail)

pandas/io/json.py
+ if errors == 'ignore':
+ meta_val = np.nan
+ else:
+ raise KeyError("Try running with errors='ignore' as key %s is not always present.", e)
@jorisvandenbossche

jorisvandenbossche Dec 2, 2016

Owner

same here, for PEP8 line too long

pandas/io/tests/json/test_json_norm.py
@@ -225,6 +225,65 @@ def test_nested_flattens(self):
self.assertEqual(result, expected)
+
+ def test_json_normalize_errors(self):
+ # GH14583: If meta keys are not always present a new option to set errors='ignore' has been implemented
@jorisvandenbossche

jorisvandenbossche Dec 2, 2016

Owner

here and in other places below line length is also a problem

@dickreuter You still have some style issues (that's the reason that travis is failing). You can always look in the log of the third travis build to see the issues (https://travis-ci.org/pandas-dev/pandas/jobs/182143127, scroll to the bottom):

Linting  *.py
pandas/io/json.py:750:1: W293 blank line contains whitespace
pandas/io/json.py:857:80: E501 line too long (81 > 79 characters)
pandas/io/json.py:858:80: E501 line too long (85 > 79 characters)
pandas/io/tests/json/test_json_norm.py:229:5: E303 too many blank lines (2)
pandas/io/tests/json/test_json_norm.py:266:80: E501 line too long (81 > 79 characters)
pandas/io/tests/json/test_json_norm.py:267:80: E501 line too long (87 > 79 characters)
pandas/io/tests/json/test_json_norm.py:269:17: E225 missing whitespace around operator
pandas/io/tests/json/test_json_norm.py:270:10: E128 continuation line under-indented for visual indent
pandas/io/tests/json/test_json_norm.py:271:10: E128 continuation line under-indented for visual indent
pandas/io/tests/json/test_json_norm.py:272:10: E128 continuation line under-indented for visual indent
pandas/io/tests/json/test_json_norm.py:273:10: E128 continuation line under-indented for visual indent
pandas/io/tests/json/test_json_norm.py:278:17: E128 continuation line under-indented for visual indent
pandas/io/tests/json/test_json_norm.py:278:80: E501 line too long (86 > 79 characters)
pandas/io/tests/json/test_json_norm.py:279:28: E127 continuation line over-indented for visual indent
pandas/io/tests/json/test_json_norm.py:279:80: E501 line too long (87 > 79 characters)
Linting *.py DONE

(it also a good idea to set up your IDE to warn for this)

jorisvandenbossche added this to the 0.20.0 milestone Dec 10, 2016

Contributor

dickreuter commented Dec 11, 2016

I believe this is fixed now. Some checks are still not successful but I can't find any reason why. Any suggestions why? All it says is:

The command "ci/lint.sh" exited with 1.
0.00s$ echo "script done"
script done

Contributor

jreback commented Dec 11, 2016

git diff master | flake8 -diff

@dickreuter You can also always look at the third travis build (https://travis-ci.org/pandas-dev/pandas/jobs/183014087, scroll to bottom), which says "pandas/io/json.py:857:45: E211 whitespace before '('"

Contributor

dickreuter commented Dec 12, 2016

Linting still seems to fail, even though when I do "git diff master | flake8 --diff" I only get problems in v0.20.0.txt, all the rest is fine. Any suggestions how I can find the remaining issues?

@dickreuter It was not linting this time, the mac build failed.

Contributor

dickreuter commented Dec 12, 2016

And it is related to my code? I can't find any details.

Contributor

jreback commented Dec 13, 2016

@dickreuter you need to rebase on master

git rebase -i origin/master then
git push -f

dickreuter added some commits Oct 26, 2016

@dickreuter dickreuter Avoids exception when pandas.io.json.json_normalize contains items in…
… meta parameter that don't always occur in every item of the list

Added documentation and test for issue #14505
Added keyword errors {'raise'|'ignore}
Shortened what's new
Removed commas in dictionary for linting compatibility
Updated doc
bcfbf18
@dickreuter dickreuter Fixed as instructed in pull request page d298588
@dickreuter dickreuter doc changes 2028924
@dickreuter dickreuter shortened lines to pass linting 3c94206
@dickreuter dickreuter adjusted formatting
701c140
Contributor

dickreuter commented Dec 13, 2016

Now all checked have passed.

jreback closed this in 7d8bc0d Dec 13, 2016

Contributor

jreback commented Dec 13, 2016

thanks!

@ischurov ischurov added a commit to ischurov/pandas that referenced this pull request Dec 19, 2016

@dickreuter @ischurov dickreuter + ischurov ENH: Added errors{'raise','ignore'} for keys not found in meta for js…
…on_normalize

Author: dickreuter <dickreuter@yahoo.com>

Closes #14583 from dickreuter/json_normalize_enhancement and squashes the following commits:

701c140 [dickreuter] adjusted formatting
3c94206 [dickreuter] shortened lines to pass linting
2028924 [dickreuter] doc changes
d298588 [dickreuter] Fixed as instructed in pull request page
bcfbf18 [dickreuter] Avoids exception when pandas.io.json.json_normalize
4d40d6c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment