Inserting Documents

isubiker edited this page Jan 11, 2012 · 44 revisions
Clone this wiki locally

Inserting Documents

Documents are inserted using the /store endpoint. If a document already exists, it's overwritten and loses all of it's previous properties, permissions, collections and quality.

To insert a document, send a PUT request to the /store endpoint. The request is required to include a uri parameter. As the name implies, URI's must be valid Uniform Resource Identifier. That said, URIs that mimic a directory structure are usually a good idea and can be leveraged at query time. The body of the request should be set to the XML, JSON, text or binary document itself.

Request details

  • Endpoint: /store - For inserting XML, JSON, text and binary documents
  • Request type: PUT
  • Request body should be the XML, JSON, text or binary document
  • Parameters:
    • uri - A valid Uniform Resource Identifier for the document
    • permission (optional, multiple) - Must follow a <role>:<capability> pattern where capability is one of read, update or execute.
    • property (optional, multiple) - Must follow a <key>:<value> pattern where the key is alphanumeric, starts with a letter and can contain underscores and dashes
    • collection (optional, multiple) - A collection can be any string, can be thought of as tags
    • quality (optional) - An integer indicating this document's inherent relevance in search (see Glossary)
    • txid (optional) - A pre-existing transaction ID in which the insert should be performed. See creating transactions for more info. Note: If running in a clustered environment, document insert requests must be issued to the same host that created the ticket.
    • contentType (optional) - Without specifying a contentType, the extension in the URI is used to sniff the content type. The extension to type mapping is available to and controlled by MarkLogic administrators. The contentType parameter can be explicitly provided to override detected type. Valid values are: xml, json, text or binary.
    • contentForBinary (optional) - When inserting a binary document, this parameter can be used to augment that document with a searchable representation (in either XML or JSON formats).
    • extractMetadata (binary, optional) - When inserting a binary document metadata in the file will be attempted to be extracted. Setting this value to false turns this behavior off.
    • extractContent (binary, optional) - When inserting a binary document text content in the file will be attempted to be extracted. Setting this value to false turns this behavior off.
    • applyTransform (optional) - Applies a transformer to the supplied content before insertion. If inserting a binary document, the transformer will be handed whatever text content could be extracted from the binary.
    • respondWithContent (boolean, default: false) - If set to true, the response body will contain the content that was inserted. Useful in conjunction with applyTransform.
    • repair (boolean, default: false) - If set to true, MarkLogic will attempt to repair any malformed content that has been sent in the request body.
  • Returns
    • On success a 204 is returned with an empty body if respondWithContent is set to false (default). If respondWithContent is set to true a 200 will be returned along with the content that was inserted.
    • If inserting a document that can't be parsed as the specified contentType, a 400 is returned
    • If a permission is specified that doesn't exist, a 400 is returned.
    • If a property name isn't valid, a 400 is returned.

Examples

  • /store?uri=/foo/bar.json - Will insert a JSON document in the database with a URI (see Glossary) of "/foo/bar.json"
  • /store?uri=/foo/bar.xml - Will insert a XML document in the database with a URI of "/foo/bar.xml"
  • /store?uri=/foo/bar.xml&permission=public:read&permission=admin:write
  • /store?uri=/foo/bar.xml&property=key:value&property=published:false
  • /store?uri=/foo/bar.xml&collection=public&collection=published
  • /store?uri=/foo/bar.xml&quality=10
  • /store?uri=/foo/bar.xml&permission=public:read&collection=public&quality=10

Why Use a URI Parameter?

Why do you specify a document name as a URI parameter /store?uri=/foo/bar.xml rather than embed the document name in the request path /store/foo/bar.xml? Because with complex URIs this is a much simpler design. Imagine the legal document URI of http://myapp.com/boy//am/I?# ugly.xml. That's trivial to transmit as an encoded query string parameter but inelegant and unreliable (given proxies) to encode directly within the request path. Why not disallow such complex URIs? Because you can load data without using Corona.

Discussion

What's returned if MarkLogic security won't allow the document to be written?

What if the collection isn't legal? And can it really be any string or must it be a URI?

Are unknown parameters ignored or do they generate errors?

To be more conforming to HTTP, a successful creation should receive a 201 status code with the proper response headers.