Feature: adds 'json rows' serialisation #56

lsh-0 · 2018-11-12T04:59:23Z

research-article-report-index is the first to get it

lsh-0 · 2018-11-12T05:06:06Z

output of observer is being pushed into BigQuery
conversion to csv loses type information, so obj-to-csv and then csv-to-json means integers become strings etc
pushing this back into Observer spares NiFi some boilerplate wrangling .

There are ways and means to do this in NiFi, I just haven't fully explored them yet and it's something observer could use

giorgiosironi · 2018-11-12T09:59:35Z

src/observer/reports.py

@@ -92,7 +92,7 @@ def upcoming_articles():
 @report(article_meta(
    title='published research article index',
    description='The dates and times of publication for all _research_ articles published at eLife. If an article had a POA version, the date and time of the POA version is included.',
-    serialisations=[CSV],
+    serialisations=[CSV, JSON],


very easy to add to a particular report

giorgiosironi · 2018-11-12T10:02:18Z

src/observer/tests/test_json_rows.py

+from django.test import Client
+from django.core.urlresolvers import reverse
+
+class One(base.BaseCase):


TestJsonResponse|TestJsonRowsIntegration with a test_published_research_article_index method? Supports adding more examples for reproducing bugs

there are tests that go through every article and every serialisation of an article.

I was planning on switching to pytest for it's network isolation, but I'll update the names for you

giorgiosironi · 2018-11-12T10:03:19Z

src/observer/utils.py

+            return lu[type(val)](val)
+
+        #raise TypeError('Object of type %s with value of %s is not JSON serializable' % (type(obj), repr(obj)))
+        return "[unserialisable]"


why not an explicit error? This string will propagate through reports (potentially in Nifi) as incorrect data

giorgiosironi · 2018-11-12T10:04:56Z

src/observer/json_rows.py

+    body = []
+    body.append((utils.safe_json_dumps(row) + "\n") for row in rows)
+    response = StreamingHttpResponse(itertools.chain.from_iterable(body), content_type="application/json")
+    response['Content-Disposition'] = 'attachment; filename="%s.rows.json"' % filename


~~nothing strange at the HTTP headers level~~

Observer isn't following the REST model like the other services, it was never integrated properly. It might even be replaced entirely if BQ or something feeding off of BQ can return RSS feeds. Observer is very nice for querying article data but it just never caught on

giorgiosironi · 2018-11-12T10:17:57Z

src/observer/reports.py

@@ -1,5 +1,5 @@
 import copy
-from . import models, rss, csv, logic
+from . import models, rss, csv, logic, json_rows


I recently discovered there is a de-facto name for this format, JSON Lines, should rename to that. It also has a Python library that is not needed here, but validates that it is somewhat popular as a concept.

(and a Stack Overflow tag)

ah good to know, thanks. I wanted to leave room for a 'proper' json response where the entire body could be read in rather than per-row. The library looks a little pointless but I'll rename the module.

giorgiosironi · 2018-11-12T10:18:48Z

src/observer/json_rows.py

+def streaming_response(filename, rows):
+    body = []
+    body.append((utils.safe_json_dumps(row) + "\n") for row in rows)
+    response = StreamingHttpResponse(itertools.chain.from_iterable(body), content_type="application/json")


since it's multiple lines, the whole response is not valid JSON hence content_type needs to be something else. Trying to find a valid value

wardi/jsonlines#9 says application/x-ndjson which is from a "competing" standard but better than inventing a new value. The Python library does support both "standards".

it now raises a TypeError on unhandled types

added 'json rows' serialisation.

e848a7f

research-article-report-index is the first to get it

lsh-0 requested a review from giorgiosironi November 12, 2018 05:02

giorgiosironi reviewed Nov 12, 2018

View reviewed changes

lsh-0 added 4 commits November 13, 2018 10:22

renamed utils.safe_json_dumps to utils.json_dumps

47f58ac

it now raises a TypeError on unhandled types

renamed json_rows to json_lines, updated references

67cb09c

renamed test cases

7b98463

changed mimetype of json lines type response

363027c

lsh-0 merged commit 8a27a86 into develop Nov 13, 2018

lsh-0 deleted the feat-json-rows branch November 13, 2018 00:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: adds 'json rows' serialisation #56

Feature: adds 'json rows' serialisation #56

lsh-0 commented Nov 12, 2018

lsh-0 commented Nov 12, 2018

giorgiosironi Nov 12, 2018

giorgiosironi Nov 12, 2018

lsh-0 Nov 12, 2018

giorgiosironi Nov 12, 2018

giorgiosironi Nov 12, 2018 •

edited

lsh-0 Nov 12, 2018

giorgiosironi Nov 12, 2018

giorgiosironi Nov 12, 2018

lsh-0 Nov 12, 2018

giorgiosironi Nov 12, 2018

giorgiosironi Nov 12, 2018

Feature: adds 'json rows' serialisation #56

Feature: adds 'json rows' serialisation #56

Conversation

lsh-0 commented Nov 12, 2018

lsh-0 commented Nov 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

giorgiosironi Nov 12, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

giorgiosironi Nov 12, 2018 •

edited