Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow nested 'objects' (and lists) in webstore columns via jsonification #25

Open
rufuspollock opened this issue Aug 8, 2011 · 4 comments

Comments

@rufuspollock
Copy link
Member

Suppose I have data like:

{
   'geolocation': {
      'long': ...
      'lat': ...
   }
}

At the moment no way to save this into the webstore. We should have webstore auto json encode the geolocation field and save as a string. Obviously need to unencode on way out so suggest doing this via setting type of field to 'json' (if that is possible) and deserializing on the way out. Suggest doing this for hash and list type.

@pudo
Copy link
Contributor

pudo commented Aug 10, 2011

The internals of this are clear: adapt the JSONType column type from CKAN to automatically wrap and un-wrap data into the database when the value is a list or mapping type in the source data.

Exposing this data via JSON is also trivial, but exporting it into any other type (CSV, XLS, HTML) would require some additional magic. Two options here include:

  • Wrapping the inner value as JSON irrespective of the desired representation. Breaks REST and types can only be inferred via out-of-band schema metadata (is this a string or a JSON object encoded as a string?).
  • Using a convention to wrap objects into fields and applying this to all incoming CSV or Form data. Example: https://github.com/okfn/datahub/blob/master/datahub/util.py#L40 (this is used to handle "extras"-style arbitrary dictionaries in datahub).

This also raises the larger question of how broad the use case for webstore is: do we focus on mostly tabular data or try to extend the system to fully support more complex document-like data? IMO there is a large use case for tabular data alone and the constraint allows us to be specific: things like distinct, re-using parts of SQL as query syntax etc. Plus its simple to understand, which is not true of a mixed tabular/document store.

@rufuspollock
Copy link
Member Author

Definitely against option 1. I think option 2 (bullet point 2) would be better and would work particularly well in context of #27 (setting column types). Remember with csv we already have problem of guessing column types (which is pretty important).

On last question not sure i agree. Distinct can just be ignored for columns that are json (cf #27 again) and there will in any case be columns we do not want distinct run on (e.g. free text).

For me this is a very important feature for e.g. use for community dashboard (which seems reasonable). I hear your point about focusing on doing one thing well but I think if we had column types (#27) and had additional json types like hash / dict (or just json) we'd could have best of both worlds (i.e. tabular/sql functionality but extensible to nested / complex data).

@goatchurchprime
Copy link

Putting a compound structure into a database column destroys the ability to do anything with it. We might as well have a single column in every table called "value" which is an encoded dict of the data and lock out every ordering and filtering function from happening.

This particular example of a geolocation item wrapping lat and long values is egregious as it prevents the use of the r*tree index I have been looking forward to exercising:
http://www.sqlite.org/rtree.html

@frabcus
Copy link

frabcus commented Oct 12, 2011

Rufus - are you no longer thinking about a relational SQL store? What query language and indexing would you want on these recursive JSON objects?

There are really two choices I can think of - 1) SQL, or an SQL like language, for which you can't have the recursion, 2) recursive documents, and some other query language like MongoDB or something new and higher level above it.

One of the principles I think that we should definitively agree or disagree on is that Webstore is about SQL. If it is not about SQL, then we're doing something much harder and very different - creating a new query language. That's an interesting project, but I'd hoped was outside the scope of a practical datastore for people to use now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants