New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: adds 'json rows' serialisation #56
Conversation
research-article-report-index is the first to get it
@giorgiosironi , context of PR is:
There are ways and means to do this in NiFi, I just haven't fully explored them yet and it's something observer could use |
@@ -92,7 +92,7 @@ def upcoming_articles(): | |||
@report(article_meta( | |||
title='published research article index', | |||
description='The dates and times of publication for all _research_ articles published at eLife. If an article had a POA version, the date and time of the POA version is included.', | |||
serialisations=[CSV], | |||
serialisations=[CSV, JSON], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very easy to add to a particular report
src/observer/tests/test_json_rows.py
Outdated
from django.test import Client | ||
from django.core.urlresolvers import reverse | ||
|
||
class One(base.BaseCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestJsonResponse
|TestJsonRowsIntegration
with a test_published_research_article_index
method? Supports adding more examples for reproducing bugs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are tests that go through every article and every serialisation of an article.
I was planning on switching to pytest for it's network isolation, but I'll update the names for you
src/observer/utils.py
Outdated
return lu[type(val)](val) | ||
|
||
#raise TypeError('Object of type %s with value of %s is not JSON serializable' % (type(obj), repr(obj))) | ||
return "[unserialisable]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not an explicit error? This string will propagate through reports (potentially in Nifi) as incorrect data
src/observer/json_rows.py
Outdated
body = [] | ||
body.append((utils.safe_json_dumps(row) + "\n") for row in rows) | ||
response = StreamingHttpResponse(itertools.chain.from_iterable(body), content_type="application/json") | ||
response['Content-Disposition'] = 'attachment; filename="%s.rows.json"' % filename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nothing strange at the HTTP headers level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Observer isn't following the REST model like the other services, it was never integrated properly. It might even be replaced entirely if BQ or something feeding off of BQ can return RSS feeds. Observer is very nice for querying article data but it just never caught on
src/observer/reports.py
Outdated
@@ -1,5 +1,5 @@ | |||
import copy | |||
from . import models, rss, csv, logic | |||
from . import models, rss, csv, logic, json_rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recently discovered there is a de-facto name for this format, JSON Lines
, should rename to that. It also has a Python library that is not needed here, but validates that it is somewhat popular as a concept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and a Stack Overflow tag)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah good to know, thanks. I wanted to leave room for a 'proper' json response where the entire body could be read in rather than per-row. The library looks a little pointless but I'll rename the module.
src/observer/json_rows.py
Outdated
def streaming_response(filename, rows): | ||
body = [] | ||
body.append((utils.safe_json_dumps(row) + "\n") for row in rows) | ||
response = StreamingHttpResponse(itertools.chain.from_iterable(body), content_type="application/json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since it's multiple lines, the whole response is not valid JSON hence content_type
needs to be something else. Trying to find a valid value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wardi/jsonlines#9 says application/x-ndjson
which is from a "competing" standard but better than inventing a new value. The Python library does support both "standards".
it now raises a TypeError on unhandled types
research-article-report-index is the first to get it