Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: External secondary instances (as in CommCare) #50

Closed
MartijnR opened this issue Dec 12, 2016 · 11 comments
Closed

Proposal: External secondary instances (as in CommCare) #50

MartijnR opened this issue Dec 12, 2016 · 11 comments

Comments

@MartijnR
Copy link
Contributor

MartijnR commented Dec 12, 2016

Also described here from an ODK Collect perspective and in an old wider discussion here.

As in https://dimagi.github.io/xform-spec/#secondary-instances---external, this feature just adds support for src attribute on a secondary instance and a jr://file connector for the src value.

<instance src="jr://file/countries.xml" />

I propose to start with only supporting the jr://file connector for now, and then build upon this feature in the future (e.g. by adding CSV support). This means the same resource retrieval method as jr://image, jr://audio and jr://video can be used.

Until now there is no proper agreement to add external data. This addition would not require re-inventing the wheel and as a bonus brings us closer to CommCare, raising the prospect for future compatibility.

@lognaturel
Copy link
Member

The one thing that makes me uncomfortable about this is the name of the protocol (jr://). It implies some connection to JavaRosa but of course this specification aims to be implementation-agnostic. orx:// would probably be a better choice. I talked to @MartijnR about this and the big value to having the name be jr:// is that it matches the Dimagi spec which is also important.

@MartijnR
Copy link
Contributor Author

MartijnR commented Jan 18, 2017

Sorry, I didn't see your response!

it matches the Dimagi spec which is also important

In addition, in this case jr: is not used as a namespace prefix but a URI Scheme (yeah, I just looked that up - I used to call it protocol), so it doesn't actually refer to http://javarosa.org/xforms and it doesn't change if you change the https://javarosa.org/xforms prefix from jr to something else in an XForm document (which is a totally valid thing to do). It's just unfortunately and 'coincidentally' exactly the same as the most commonly used namespace prefix for the JavaRosa namespace. This makes its use a lot more palatable in my opinion.

The biggest reason to use it for external data is to make it consistent with jr://image, jr://audio and jr://video.

Definitely or:// or something like that (for all of them) would have been nicer in hindsight.

With respect to agreeing to add this to the spec, I just want to make sure we're all on the same page and are voting on the its worthiness and technical merit. It's not some sort of commitment for a particular time when it is implemented in the various tools. Normally, the spec is ahead of the implementation, and should be.

@lognaturel
Copy link
Member

lognaturel commented Jan 19, 2017

Right, right, it's unfortunate in the jr://image case, too. But so be it, inconsistency would be even more confusing. Let's add a little bit about the URI Scheme somewhere in the spec to provide context.

getodk/collect#275 also has a good discussion related to this including Dimagi's experience.

This all seems like a good idea from an XForms / spec design standpoint but the more I learn, the more I find it difficult to divorce this question of how external data should be handled from some discussion of implementation and from how people are writing forms today.

@chrislrobert
Copy link

chrislrobert commented Jan 19, 2017 via email

@MartijnR
Copy link
Contributor Author

find it difficult to divorce this question of how external data should be handled from some discussion of implementation and from how people are writing forms today

I understand. In that respect I think we're lucky that XLSForm is so popular, as we could change things behind the scenes if we wanted when it's safe to do so.

However, I think we can really leave that as a separate discussion and take it one step at a time. The important thing to realize, also in relation to @chrislrobert's comment, is that there should be no implication to break existing XForms or XLSForms at all if they are currently working in a particular client.

This is about providing a flexible path forward for extensions to external data that doesn't involve going further off the deep end (re. search() appearance and query attribute). We need external data in this spec. An organization such as Enketo wouldn't want to follow a spec that didn't have such a crucial feature.

If the concern is with actually creating external XML documents, I think that's a concern that can be dealt with by the form/data servers. E.g. the real power of external data documents comes from the ability to dynamically create them (e.g. from previous surveys). I believe some are already doing this. They could be created in any format. Querying this data can be done in the same way. E.g. it's easy to use pulldata() for these documents without backwards-compatibility issues.

I am trying to be careful not to increase the scope of this issue, but if there is a specific concern, I think there may be some ideas to address them satisfactorily. Fyi, this is the future CSV support, I was referring to as a follow-up.

@chrislrobert
Copy link

chrislrobert commented Jan 23, 2017 via email

@MartijnR
Copy link
Contributor Author

MartijnR commented Jan 23, 2017

can be implemented such that external instances can be queried dynamically from
some kind of local DB and the actual options don't all need to be loaded
into memory

Exactly. If I understand correctly (though they lost me when they tried to explain), that is exactly what CommCare has done. So the language is XPath, but the underlying implementation could do whatever they want such as storing the data in an a db and transforming an XPath query into a db query (and simpler, converting a pulldata function call to a db query). That's no concern of the specification.

@chrislrobert
Copy link

chrislrobert commented Jan 23, 2017 via email

@MartijnR
Copy link
Contributor Author

MartijnR commented Jan 26, 2017

I think they do something more clever than a one-to-one transformation of XPath into SQL. Something involving virtual DOM (fragments?) and static analysis of the expressions and indexing to keep lookups very fast. They definitely don't deal with a huge XML documents in memory. Would be worth digging into their code I think.

So ready for a PR for this issue? (and a new proposal for CSV support)

@lognaturel
Copy link
Member

I had to give this some more thought but I am now sold. As @MartijnR says, it's a syntax that we can all agree on and that can be a path to greater interoperability. It also enables nice things like an alternative to preloads for session data. There's likely a way to match the current Collect "external itemset" functionality to this syntax and even if that's not done for a while, it's fine as long as we document the differences somewhere (I'll create a separate issue for that on the Collect side).

I had a fun adventure through the CommCare code. And then I realized that unless I'm massively missing something, there's already some support for external secondary instances in ODK JavaRosa. Additionally, it is a part of the original OpenRosa XForms spec that I believe this spec is based off of. This is kind of a separate question but it'd be nice to know how/why this spec differs from that one.

This is where secondary instances are identified in the XForm. In ODK's JavaRosa fork, they are stored in the FormDef in a map from instance name to FormInstance objects here. In Dimagi's fork, they are represented as ExternalDataInstance objects and stored in a map from instance ID to ExternalDataInstances also in the FormDef. In both, the EvaluationContext is set based on the main instance and the secondary instances. XPath expressions are evaluated within that context.

Dimagi has done a lot of (good) refactoring so it's time consuming to compare the two javarosa forks. I believe the way that they've gotten some better performance is by using that ExternalDataInstance class. In ODK javarosa, non-main instances are parsed and their data is loaded in XFormParser. On the other hand, I believe the Dimagi implementation does some selective parsing. @ctsims, I'm seeing your name on a lot of that Dimagi-specific code. Could you please let me know if I'm sort of on the right track?

I thought I could use the syntax described in the OpenRosa XForms spec and at least print something from an external instance in ODK Collect but I wasn't able to get anything working (it just fails silently). I'm not sure whether that's because something is missing in the implementation or just because I'm building the form wrong (I don't really know where to put the external xml file, for example).

I also see that this is actually already in XLSForm. It seems really strange to me that it would be there and not here. Certainly not to say that @MartijnR did the wrong thing at the time since this spec has been neglected but I'm wondering whether we should try to move things here first in the future (I don't know).

@MartijnR
Copy link
Contributor Author

MartijnR commented Feb 3, 2017

Great! I'll create a PR and a new proposal for CSV data.

Funny you found some code ODK Javarosa already. Indeed there is no inventing involved for this feature.

should try to move things here first in the future

I agree. I'm glad we're having great cross-organization commitment to managing an XForms spec now. I'm very optimistic we will completely avoid (all) these rogue pyxform/xforms additions now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants