Skip to content
This repository has been archived by the owner on Nov 9, 2022. It is now read-only.

JSON Extensions

hunterhacker edited this page Nov 21, 2011 · 7 revisions

JSON Extensions

The simplicity of JSON is truly one of its strengths. However, with the only datatypes being objects, arrays, numbers, strings, booleans and nulls, there are things left to be desired. One of the most obvious and painful omissions is dates. Because MarkLogic is an XML database at it's core, having support for XML as a datatype is also quite desirable. Corona has added some conventions to alleviate these pains.

Casting Convention

Because JSON data and JSON parsers are quite prevalent, there is a strong desire to not modify its syntax. However, there is also a strong desire to tell a database like MarkLogic to "treat this value as a date" for indexing and searching purposes.

As a compromise, Corona uses a convention where they key name in an object can inform the parser to treat its value as either a date or as XML instead of simply a string. This convention is simply <name>::<type>.

There are of course shortcomings to this approach will are covered below.

Casting XML

To inform MarkLogic that a key's value is XML, simply quote the XML and add a ::xml onto the key's name. For example:

{"message::xml": "<div><h1 class='subject'>Hello World</h1><div class='body'>XML inside JSON!!</div></div>"}

Internally, the XML will be unquoted and stored in it's structured format. This means that keyword searches on the message::xml key will only search the text nodes of the XML. In the future the structure of the XML can also be queried just like and along with the structure of the JSON.

If the XML in the string isn't well formed an error will be thrown.

Casting Dates

Having MarkLogic interpret a string as a date functions in the same way as casting to XML:

{"message::date": "Fri Jul 08 2011 14:08:18 GMT-0700 (PDT)"}

The only hitch is that there are many different date formats. The date parser in Corona has support for a number of popular formats in a handful of languages but obviously can't support them all. If the date cannot be parsed, an error is thrown.

Examples of supported date formats

  • Fri Jul 08 2011 14:08:18 GMT-0700 (PDT)
  • Fri, 08 Jul 2011 14:08:18 -0700
  • 08 Jul 2011 14:08:18 -0700
  • 08-Jul-2011 14:08:18 -0700
  • Jul 08 14:08:18 2011 -0700
  • Jul 84 14:08:18 PDT 2011
  • 4 Jan 11 14:08 PDT
  • 07-08-2011
  • 2011/07/08
  • 07/08/2011
  • July 8th, 2011

All month names are supported in either their short form (Jul) or long form (July) and in english, spanish, french, german and italian.

Details and Shortcomings

The approach of using key name conventions to indicate their type has some obvious shortcomings. First off, it pollutes the key's name and is cosmetically less appealing. But functionally it has its limitations as well. Mainly that it's only able to cast the values in objects. In other words, if you simply have a JSON array of dates, there is no way to inform MarkLogic to treat the values in the array as such.

To partially mitigate this second problem, casting is carried through into arrays. This is best described via an example:

{"possibleDates::date": [
    "March 31st, 2011",
    "June 30th, "2011",
    "August 31st, 2011",
    "December 31st 2011"
]}

In the above example, the possibleDates array will be treated as an array of dates.

Discussion

Would be good to get user feedback on this ::date technique.

Should make sure to support the standard ISO date format. With and without milliseconds. Besides being common in XQuery it's supported in ECMAScript 5.

What if the XML text has double quotes, how are they to be escaped?

Clone this wiki locally