Skip to content

1. Documents

Robert Heath edited this page Mar 1, 2016 · 2 revisions

The OJAI API library is centered around documents that follow a JSON-like data model. Documents can describe entities such as products, people, and places, much more easily and in greater detail than can tables in relational databases. The greater flexibility and richness are due to the lack of schemas in documents, the ability to nest data within other data, the ability to create arrays, and in general the ability to range from simple to very complex data within a single document.

An OJAI document is a tree of fields. Each field has a type and a value, and also has either a name or an array index. Field names are strings. The root of each document is a map.

For example, an online retailer of sports equipment might have this OJAI document for storing data about a set of bicycle pedals:

{
  "_id" : "2DT3201",
  "product_ID" : "2DT3201",
  "name" : " Allegro SPD-SL 6800",
  "brand" : "Careen",
  "category" : "Pedals",
  "type" : "Components,
  "price" : 112.99,

  "features" : [
	"Low-profile design",
	"Floating SH11 cleats included"
  ],

  "specifications" : {
	"weight_per_pair" : "260g",
	"color" : "black"
  }
}

Field names can include any UTF-8 characters. _id is a special field name used in some document stores, such as MapR-DB and MongoDB, for a field that holds the unique identifier for a document. Field names that begin with the dollar-sign character ($) are discouraged because some document stores, such as MongoDB, use this character to designate keywords.

Data Types

OJAI documents can contain scalar data, nested documents (maps), and arrays.

####Scalar Data

These fields can contain strings or numbers. The scalar fields in the sample document are highlighted in bold below.

{
   "_id" : "2DT3201",
   "product_ID" : "2DT3201",
   "name" : " Allegro SPD-SL 6800",
   "brand" : "Careen",
   "category" : "Pedals",
   "type" : "Components,
   "price" : 112.99,
	  
   "features" : [
     "Low-profile design",
     "Floating SH11 cleats included"
   ],
	  
   "specifications" : {
     "weight_per_pair" : "260g",
     "color" : "black"
 }
}

Scalar fields can contain the following data types:

Data Type Description
Binary An uninterpreted sequence of bytes.
Boolean A data type of two possible values that are typically denoted by true and false.
Byte A 8-bit signed integer.
Date A 32-bit integer representing the number of DAYS since epoch, i.e. January 1, 1970 00:00:00 UTC. The value is absolute and is time-zone independent.
Double A double-precision 64-bit floating-point number.
Float A single-precision 32-bit floating-point number.
Int A 32-bit signed integer.
Long A 64-bit signed integer.
Short A 16-bit signed integer.
String A sequence of characters.
Time A 32-bit integer representing time of the day in milliseconds. The value is absolute and is time-zone independent.
Timestamp A 64-bit integer representing the number of milliseconds since epoch, i.e. January 1, 1970 00:00:00 UTC. Negative values represent dates before epoch.

Nested Documents (Maps)

These fields can contain documents that themselves contain scalar data, nested documents, and arrays. The nested document in the sample document is highlighted in bold below.

{ 
  "_id" : "2DT3201",
  "product_ID" : "2DT3201", 
  "name" : " Allegro SPD-SL 6800",
  "brand" : "Careen",
  "category" : "Pedals",
  "type" : "Components,
  "price" : 112.99,

  "features" : [
	"Low-profile design",
	"Floating SH11 cleats included"
  ],

  "specifications" : {
	"weight_per_pair" : "260g",
	"color" : "black"
  }
}

It is up to OJAI implementations to determine the order in which fields are stored within maps.

Arrays

These fields contain lists of values that are accessible by means of index numbers. The values can be scalar, documents, arrays, or a combination of any of these types. For example, the array in the sample document is highlighted in bold below and contains scalar values.

{ 
  "_id" : "2DT3201",
  "product_ID" : "2DT3201", 
  "name" : " Allegro SPD-SL 6800",
  "brand" : "Careen",
  "category" : "Pedals",
  "type" : "Components,
  "price" : 112.99,

  "features" : [
	"Low-profile design",
	"Floating SH11 cleats included"
  ],

  "specifications" : {
	"weight_per_pair" : "260g",
	"color" : "black"
  }
}

Schema Flexibility

The structure of each document, called the document's schema, is easy to change. Simply add new fields. For example, if the online retailer wanted to allow customers to review products, it would be simple to add the reviews to any document for a product.

In this example, highlighted in bold, the comments are added as in an array of documents:

{ 
  "_id" : "2DT3201",
  "product_ID" : "2DT3201", 
  "name" : " Allegro SPD-SL 6800",
  "brand" : "Careen",
  "category" : "Pedals",
  "type" : "Components,
  "price" : 112.99,

  "features" : [
	"Low-profile design",
	"Floating SH11 cleats included"
  ],

  "specifications" : {
	"weight_per_pair" : "260g",
	"color" : "black"
  }
  
  "comments" : [
    {
      "username" : "hlmencken",
      "comment" : "Best money I ever spent!"
    },
    {
      "username" : "vwoolf",
      "comment" : "What hlmencken said!"
    }    
  ]
}
You can’t perform that action at this time.