[MRG+1] Added JmesSelect #1016
[MRG+1] Added JmesSelect #1016
Conversation
try: | ||
import jmespath,ast,json | ||
except: | ||
print "JsonProcessor module requires the jmespath library to function. No processing will happen in its abscence" |
nramirezuy
Jan 19, 2015
Contributor
When I said required
, I meant for the Scrapy library. So it should raise the exception on the import of this Processor.
When I said required
, I meant for the Scrapy library. So it should raise the exception on the import of this Processor.
except: | ||
print "JsonProcessor module requires the jmespath library to function. No processing will happen in its abscence" | ||
return values | ||
compiledPath = jmespath.compile(self.json_path) |
nramirezuy
Jan 19, 2015
Contributor
This compilation should happen on __init__
, it will improve performance.
This compilation should happen on __init__
, it will improve performance.
Regarding the storage of ast module as self.astModule in the JsonProcessor class, I looked up the following document : https://wiki.python.org/moin/PythonSpeed/PerformanceTips (Import statement overhead topic). It seems that the current method should be acceptable. Kindly let me know if there are any other issues to be handled. |
try: | ||
import jmespath | ||
import ast | ||
self.astModule = ast |
nramirezuy
Jan 23, 2015
Contributor
Just save the function you are going to use. self.ast_literal_eval = ast.literal_eval
.
Also fix naming, we use camel case just for classes.
Just save the function you are going to use. self.ast_literal_eval = ast.literal_eval
.
Also fix naming, we use camel case just for classes.
raise ImportError("You need to have jmespath(https://github.com/boto/jmespath)" | ||
+ " and AST modules before being able to use this processor") | ||
|
||
def __call__(self, currentString): |
nramirezuy
Jan 23, 2015
Contributor
Rename currentString
to value
Rename currentString
to value
+ " and AST modules before being able to use this processor") | ||
|
||
def __call__(self, currentString): | ||
if currentString != "": # Empty strings will return None |
nramirezuy
Jan 23, 2015
Contributor
Be prepared for None
values. I think if value:
is enough.
Be prepared for None
values. I think if value:
is enough.
self.astModule.literal_eval(currentString) | ||
) | ||
return jsonValue | ||
return None |
nramirezuy
Jan 23, 2015
Contributor
return None
line can be removed
return None
line can be removed
@@ -63,6 +63,31 @@ def __call__(self, values): | |||
return values | |||
|
|||
|
|||
class JsonProcessor(object): | |||
""" |
nramirezuy
Jan 23, 2015
Contributor
Declare that it expect a string in __call__
Declare that it expect a string in __call__
We are also missing documentation. |
self.assertEqual(test, expected, | ||
msg='test "{}" got {} expected {}'.format(l, test, expected) | ||
) | ||
except: # in case jmespath isn't installed |
nramirezuy
Jan 30, 2015
Contributor
If jmespath isn't installed just let it fail.
You can add dependencies for tests: https://github.com/scrapy/scrapy/blob/master/tests/requirements.txt
If jmespath isn't installed just let it fail.
You can add dependencies for tests: https://github.com/scrapy/scrapy/blob/master/tests/requirements.txt
except: # in case jmespath isn't installed | ||
pass | ||
|
||
def test_errors(self): |
nramirezuy
Jan 30, 2015
Contributor
Not sure what this test.
Not sure what this test.
|
||
def test_equals(self): | ||
try: | ||
import jmespath |
nramirezuy
Jan 30, 2015
Contributor
Why are you importing jmespath ?
Why are you importing jmespath ?
2bdf817
to
d096792
self.json_path = json_path | ||
try: | ||
import jmespath | ||
import ast |
nramirezuy
Feb 2, 2015
Contributor
ast
is from the standard library so we can import it at the top. So storing literal_eval
in an attribute won't be needed.
ast
is from the standard library so we can import it at the top. So storing literal_eval
in an attribute won't be needed.
@@ -675,3 +675,18 @@ Here is a list of all built-in processors: | |||
constructor keyword arguments are used as default context values. See | |||
:class:`Compose` processor for more info. | |||
|
|||
.. class:: JsonProcessor(json_path) |
nramirezuy
Feb 2, 2015
Contributor
Let's rename it to JmesProcessor
.
Let's rename it to JmesProcessor
.
d0a1a2e
to
5c64b3a
@kmike Can you review this? I'm looking for py3 and documentation feedback. |
+1 LGTM. It needs to be rebased into a single commit before merging. |
Queries the value using the json path provided to the constructor and returns the output. | ||
Requires jmespath (https://github.com/boto/jmespath) to run. | ||
|
||
This processor takes only one string at a time and will return a python string/dictionary/None as answer |
kmike
Feb 2, 2015
Member
- Can it return a list? A number? A boolean?
- What is a "python string"? Does this processor return bytes or unicode?
- Can it return a list? A number? A boolean?
- What is a "python string"? Does this processor return bytes or unicode?
.. class:: JmesProcessor(json_path) | ||
|
||
Queries the value using the json path provided to the constructor and returns the output. | ||
Requires jmespath (https://github.com/boto/jmespath) to run. |
""" | ||
if value: # Empty strings will return None | ||
return_value = self.compiled_path.search( | ||
literal_eval(value) |
kmike
Feb 2, 2015
Member
Why is literal_eval needed?
Why is literal_eval needed?
kmike
Feb 2, 2015
Member
I mean, why not use json?
I mean, why not use json?
def __call__(self, value): | ||
"""Query value for the jmespath query and return answer | ||
Input : string | ||
Output : string / dict / None |
kmike
Feb 2, 2015
Member
What about other Python types?
Could you please also use Sphinx-compatible docstrings, like
:param str value: a string with JSON data to extract from
What about other Python types?
Could you please also use Sphinx-compatible docstrings, like
:param str value: a string with JSON data to extract from
Query the input string for the jmespath (given at instantiation), and return the answer | ||
Requires : jmespath(https://github.com/boto/jmespath), ast | ||
Note: JmesProcessor accepts only a single string at a time and returns string/dict/None based on the jmespath query. | ||
""" |
kmike
Feb 2, 2015
Member
- Could you please fix the indentation?
- According to PEP8 docstrings should be wrapped at 72th column.
- It would be nice for it to follow http://sphinx-doc.org/domains.html#info-field-lists
- Could you please fix the indentation?
- According to PEP8 docstrings should be wrapped at 72th column.
- It would be nice for it to follow http://sphinx-doc.org/domains.html#info-field-lists
SudShekhar
Feb 2, 2015
Author
Contributor
Thanks for your feedback. I will update the things you have mentioned.
Thanks for your feedback. I will update the things you have mentioned.
kmike
Feb 2, 2015
Member
Thanks!
The Sphinx link I posted here is not relevant, but it is relevant for __call__
docstring.
Thanks!
The Sphinx link I posted here is not relevant, but it is relevant for __call__
docstring.
self.compiled_path = jmespath.compile(self.json_path) | ||
except: | ||
raise ImportError("You need to have jmespath(https://github.com/boto/jmespath)" | ||
+ " and AST modules before being able to use this processor") |
kmike
Feb 2, 2015
Member
This is bad for several reasons:
- If "import jmespath" fails, it hides the original exception which can provide more information about why isn't jmespath available. See e.g. #902.
- If jmespath.compile fails the code raises an obscure ImportError, hiding the original exception again.
except:
shouldn't be used; it catches e.g. KeyboardInterrupt and SystemExit exceptions.
I suggest to remove try-except entirely. It will fail with "ImportError" pointing to jmespath, this is self-explanatory.
This is bad for several reasons:
- If "import jmespath" fails, it hides the original exception which can provide more information about why isn't jmespath available. See e.g. #902.
- If jmespath.compile fails the code raises an obscure ImportError, hiding the original exception again.
except:
shouldn't be used; it catches e.g. KeyboardInterrupt and SystemExit exceptions.
I suggest to remove try-except entirely. It will fail with "ImportError" pointing to jmespath, this is self-explanatory.
@@ -3,7 +3,7 @@ | |||
|
|||
from scrapy.contrib.loader import ItemLoader | |||
from scrapy.contrib.loader.processor import Join, Identity, TakeFirst, \ | |||
Compose, MapCompose | |||
Compose, MapCompose,JmesProcessor |
kmike
Feb 2, 2015
Member
a nitpick - space is missing after a comma
a nitpick - space is missing after a comma
class JmesProcessor(object):
def __init__(self, path, process=None):
self.process = process or self._process
def _process(self, data):
if isinstance(data, basestring):
return json.loads(data)
return data But I guess doing the # Load JSON and process
Compose(json.loads, JmesProcessor('foo'))
# Process several JSON objects: '[{"foo":"bar"}, {"baz":"tar"}]'
Compose(json.loads, MapCompose(JmesProcessor('foo')))
# Just process
JmesProcessor('foo')
# Chained processors with load
Compose(json.loads, JmesProcessor('foo'), JmesProcessor('bar')) I like more the second option. EDIT: We should also add examples to the doc. |
I like not doing json.loads in the JmesProcessor, and the Compose / MapCompose examples are good. @nramirezuy you should really create a few nice images/diagrams which show how Compose and MapCompose work :) What are the inputs, what are the outputs, how they can be used together. And maybe we should rename the processors to verbs - instead of JmesProcessor write SelectJmes - |
@nramirezuy a realted question: is it possible to use the processors without the item loaders?
is rather nice. If it works then I think it worths documenting. An maybe splitting this micro-framework from the item loaders, if it is not only about item loaders. |
Well every |
@nramirezuy : The second options looks much better and it gives users more control. So, should I remove the json.loads from inside the processor? Regarding the documentation, the examples you have given look pretty comprehensive to me EDIT: In the documentation, I didn't add the chaining example because I felt that it fit better in the Compose examples list (same can be said for the processor handling a list of json strings I guess). Do let me know your views on this. |
>>> proc({'foo':{'bar':'baz'}}) | ||
{'bar': 'baz'} | ||
>>> import json | ||
>>> procSingleJsonStr = Compose(json.loads,SelectJmes("foo")) |
nramirezuy
Feb 13, 2015
Contributor
We just use we just use CamelCase for classes; blame pep8 👅
procSingleJsonStr
-> proc_single_json_str
Space after commas please 😄
We just use we just use CamelCase for classes; blame pep8
procSingleJsonStr
-> proc_single_json_str
Space after commas please
>>> procSingleJsonStr = Compose(json.loads,SelectJmes("foo")) | ||
>>> procSingleJsonStr('{"foo":"bar"}') | ||
u'bar' | ||
>>> procJsonList = Compose(json.loads, MapCompose(SelectJmes('foo'))) |
nramirezuy
Feb 13, 2015
Contributor
same here
same here
:return: Element extracted according to jmespath query | ||
""" | ||
return_value = self.compiled_path.search(value) | ||
return return_value |
nramirezuy
Feb 13, 2015
Contributor
make it one line
make it one line
@@ -579,5 +579,44 @@ def test_replace_css_re(self): | |||
self.assertEqual(l.get_output_value('url'), [u'scrapy.org']) | |||
|
|||
|
|||
class SelectJmesTestCase(unittest.TestCase): |
nramirezuy
Feb 13, 2015
Contributor
Gotta test those python types. Json isn't relevant here.
Gotta test those python types. Json isn't relevant here.
'bar' | ||
>>> proc({'foo':{'bar':'baz'}}) | ||
{'bar': 'baz'} | ||
>>> import json |
nramirezuy
Feb 13, 2015
Contributor
Make it 2 blocks, little title in between Working with Json
.
Make it 2 blocks, little title in between Working with Json
.
'simple': ('foo.bar', '{"foo": {"bar": "baz"}}', "baz"), | ||
'invalid': ('foo.bar.baz', '{"foo": {"bar": "baz"}}', None), | ||
'top_level': ('foo', '{"foo": {"bar": "baz"}}', {"bar": "baz"}), | ||
'double_vs_single_quoteString': ('foo.bar', '{"foo":{"bar":"baz"}}', "baz"), |
nramirezuy
Feb 13, 2015
Contributor
That camel case 👅
That camel case
msg='test "{}" got {} expected {}'.format(l, test, expected) | ||
) | ||
|
||
def test_dict(self): |
nramirezuy
Feb 13, 2015
Contributor
Use the tests configuration above.
Use the tests configuration above.
SudShekhar
Feb 13, 2015
Author
Contributor
Can you please clarify what you mean by this? Json's used to load the input. Do you just want me to directly use the dict/list/simple strings to check?
Can you please clarify what you mean by this? Json's used to load the input. Do you just want me to directly use the dict/list/simple strings to check?
nramirezuy
Feb 13, 2015
Contributor
Yes, use python types and forget about json, since we don't load json anymore inside the class. Also use the test_list_equals
method, so it is easy to add more comparisons if needed.
Yes, use python types and forget about json, since we don't load json anymore inside the class. Also use the test_list_equals
method, so it is easy to add more comparisons if needed.
There is something with the markup but I don't know what it is. /cc @kmike |
I created the documentation locally but was unable to figure out the error. Can you please point out the issue? Thanks. |
I think you are missing |
Utilizes jmespath. Also, added tests and documentation for the same.
Hi, |
I think this PR is fine, +1 to merge it. |
Thanks |
Sorry I couldn't chime in before merge but, if we're gonna add |
@pablohoffman why does it matter? jmespath is not added to Imports are very fast after the first successful import (a lookup in a dict) - by moving import to module level we won't get any speed benefits, but the exception may become less clear is jmespath is absent. |
@pablohoffman @kmike |
Created the first implementation of Json processor, any comments/edits are welcome.
I am currently using the jmespath (https://github.com/boto/jmespath) module to search for paths in the given list of values. I have added some sample test cases too.
The processor will return the list of values unchanged in case the jmespath module isn't installed.
I wasn't sure how to show a warning message in such a case and thus, have used the python print statement for now.