Browse files

Document the Metakit extension

  • Loading branch information...
1 parent 4ece406 commit c76804182231ccfe08e3b7b9435e1b498bf83f59 @alex-shpilkin alex-shpilkin committed with Aug 30, 2011
Showing with 316 additions and 0 deletions.
  1. +316 −0 README.metakit
@@ -0,0 +1,316 @@
+title: Metakit
+Metakit Extension for Jim Tcl
+The mk extension provides an interface to the Metakit small-footprint
+embeddable database library (<>). The underlying
+library is efficient at manipulating not-so-large amounts of data and takes a
+different approach to composing database operations than common SQL-based
+relational databases.
+Both the Metakit core library and the mk package can be linked either
+statically or dynamically and loaded using
+ package require mk
+A database (called a "storage" in Metakit terms) may either reside totally in
+memory or be backed by a file. To open or create a database, call the
+`storage` command with an optional filename parameter:
+ set db [storage]
+The returned handle can be used as a command name to access the database. When
+you are done, execute the `close` method, that is, run
+ $db close
+A lost handle won't be found by GC but will be closed when the interpreter
+exits. Note that by default Metakit will only record changes to the database
+when you close the handle. Use the `commit` method to record the current
+state of the database to disk.
+*Views* in Metakit are what is called "tables" in conventional databases. A view
+may several typed *properties*, or columns, and contains homogenous *rows*, or
+records. New properties may be added to a view as needed; however, new properties
+are not stored in the database file by default. The structure method specifies
+the stored properties of a view, creating a new view or restructuring an old one
+as needed:
+ $db structure viewName description
+The view description must be a list of form `{propName type propName type ...}`.
+The supported property types include:
+: A NULL-terminated string, stored as an array of bytes (without any encoding
+ assumptions).
+: **Not yet supported by the `mk` extension.**
+ Blob of binary data that may contain embedded NULLs (zero bytes). Stored
+ as-is. This is more efficient than `string` when storing large blocks of
+ data (e.g. images) and will adjust the storage strategy as needed.
+: An signed integer value occupying a maximum of 32 bits. If all values
+ stored in a column can fit in a smaller range (16, 8, or even 4 or 2 bits),
+ they are packed automatically.
+: Like `integer`, but is required to fit into 64 bits.
+`float` and `double`
+: 32-bit and 64-bit IEEE floating-point values respectively.
+: This type is not usually specified directly; instead, a structure
+ description of a nested view is given. `subview` properties store complete
+ views as their value, creating hierarchical data structures. When retreived
+ from a view, a value of a subview property is a normal view handle.
+Without a `description` parameter, the `structure` method returns the current
+structure of the named view; without any parameters, it returns a dictionary
+containing structure descriptions of all views stored in the database.
+After specifying the properties you expect to see in the view, call
+ [$db view $viewName] as viewHandle
+to obtain a view handle. These handles are also commands, but are
+garbage-collected and also destroy themselves after a single method call; the
+`as viewHandle` call assigns the view handle to the specified variable and also
+tells the view not to destroy itself until all the references to it are gone.
+View handles may also be made permanent by giving them a global command name,
+ rename [$db view data]
+However, such view handles are not managed automatically at all and must be
+destroyed using the `destroy` method, or by renaming them to `""`.
+The value of a particular property is obtained using
+ cursor get $cur propName
+where `$cur` is a string of form `viewHandle!index`. Row indices are zero-based
+and may also be specified relative to the last row of the view using the
+`end[+-]integer` notation.
+A dictionary containing all property name and value pairs can be retreived by
+omitting the `propName` argument:
+ cursor get $cur
+Setting property values is also performed either individually, using
+ cursor set $cur propName value ?propName value ...?
+or via a dictionary with
+ cursor set $cur dictValue
+In the first form of the command, property names may also be preceded by a
+-_typeName_ option. In this case, a new property of the specified type will be
+created if it doesn't already exist; note that this will cause *all* the rows
+in the view to have the property (but see **A NOTE ON NULL** below).
+If the row index points after the end of the view, an appropriate number of
+fresh rows will be inserted first. So, for example, you can use `end+1`
+to append a new row. (Note that you then have to set it all at once, though.)
+The total number of rows can be obtained using
+ $viewHandle size
+and set manually with
+ $viewHandle resize newSize
+For example, you can use `$viewHandle resize 0` to clear a view.
+New rows may also be inserted at an arbitrary position in a view with
+ cursor insert $cur ?count?
+This will insert _count_ fresh rows into the view so that _$cur_ points to
+the first one. The inverse of this operation is
+ cursor remove $cur ?count?
+The real power of Metakit lies in the way existing views are combined to create
+new ones to obtain a particular perspective on the stored data. A single
+operation takes one or more views and possibly additional options and produces a
+new view, usually tracking notifications to the underlying views and sometimes
+even supporting modification.
+Binary operations are left-biased when there are conflicting property values;
+that is, they always prefer the values from the left view.
+### Unary operations ###
+*view* `unique`
+: Derived view with duplicate rows removed.
+*view* `sort` *crit ?crit ...?*
+: Derived view sorted on the specified criteria, in order. A single _crit_
+ is either a property name or a property name preceded by a dash; the latter
+ specifies that the sorting is to be performed in reverse order.
+### Binary operations ###
+The operations taking _set_ arguments require that the given views have no
+duplicate rows. The `unique` method can be used to ensure this.
+*view1* `concat` *view2*
+: Vertical concatenation; that is, all the rows of _view1_ and then all rows
+ of _view2_.
+*view1* `pair` *view2*
+: Pairing, or horizontal concatenation: every row in _view1_ is matched with
+ a row with the same index in _view2_; the result has all the properties of
+ _view1_ and all the properties of _view2_.
+*view1* `product` *view2*
+: Cartesian product: each row in _view1_ horizontally concatenated with every
+ row in _view2_.
+*set1* `union` *set2*
+: Set union. Unlike `concat`, this operation removes duplicates from the
+ result. A row is in the result if it is in _set1_ **or** in _set2_.
+*set1* `intersect` *set2*
+: Set intersection. A row is in the result if it is in _set1_ **and** in
+ _set2_.
+*set1* `different` *set2*
+: Symmetric difference. A row is in the result if it is in _set1_ **xor** in
+ _set2_, that is, in _set1_ or in _set2_, but not in both.
+*set1* `minus` *set2*
+: Set minus. A row is in the result if it is in _set1_ **and not** in _set2_.
+### Relational operations ###
+*view1* `join` *view2* ?`-outer`? *prop ?prop ...?*
+: Relational join on the specified properties: the rows from _view1_ and
+ _view2_ with all the specified properties equal are concatenated to form a
+ new row. If the `-outer` option is specified, the rows from _view1_ that do
+ not have a corresponding one in _view2_ are also left in the view, with the
+ properties existing only in _view2_ filled with default values.
+*view* `group` *subviewName prop ?prop ...?*
+: Groups the rows with all the specified properties equal; moves all the
+ remaining properties into a newly created subview property called
+ _subviewName_.
+*view* `flatten` *subviewProp*
+: The inverse of `group`.
+### Projections and selections ###
+*view* `project` *prop ?prop ...?*
+: Projection: a derived view with only the specified properties left.
+*view* `without` *prop ?prop ...?*
+: The opposite of `project`: a derived view with the specified properties
+ removed.
+*view* `range` *start end ?step?*
+ A slice or a segment of _view_: rows at _start_, _start+step_, and so on,
+ until the row number becomes larger than _end_. The usual `end[+-]integer`
+ notation is supported, but the indices don't change if the underlying view
+ is resized.
+**(!) select etc. should go here**
+### Search and storage optimization ###
+*view* `blocked`
+: Invokes an optimization designed for storing large amounts of data. _view_
+ must have a single subview property called `_B` with the desired structure
+ inside. This additional level of indirection is used by `blocked` to create
+ a view that looks like a usual one, but can store much more data
+ efficiently. As a result, indexing into the view becomes a bit slower. Once
+ this method is invoked, all access to _view_ must go through the returned
+ view.
+*view* `ordered` *prop ?prop ...?*
+: Does not transform the structure of the view in any way, but signals that
+ the view should be considered ordered on a unique key consisting of the
+ specified properties, enabling some optimizations. Note that duplicate keys
+ are not allowed in an ordered view.
+**(!) TODO: hash, indexed(?) -- these make no sense until searches are implemented**
+### Pipelines ###
+Because constructs like `[[view op1 ...] op2 ...] op3 ...` tend to be common in
+programs using Metakit, a shorthand syntax is introduced: such expressions may
+also be written as `view op1 ... | op2 ... | op3 ...`.
+Note though that this syntax is not in any way magically wired into the
+interpreter: it is understood only by the view handles and the two commands that
+can possibly return a view: `$db view` and `cursor get`. If you want to support
+this syntax in Tcl procedures, you'll need to do this yourself, or you may want
+to create a custom view method and have the view handle work out the syntax for
+you (see **USER-DEFINED METHODS** below).
+*view* `copy`
+: Creates a copy of view with the same data.
+*view* `clone`
+: Creates a view with the same structure, but no data.
+*view* `pin`
+: Specifies that the view should not be destroyed after a single method call.
+ Returns _view_.
+*view* `as` *varName*
+: In addition to the actions performed by `pin`, assigns the view handle to
+ the variable named varName in the caller's scope.
+*view* `properties`
+: Returns the names of all properties in the view.
+*view* `type` *prop*
+: Returns the type of the specified property.
+Note that Metakit does not have a special `NULL` value like conventional
+relational databases do. Instead, it defines _default_ property values: `""` for
+`string` and `binary` types, `0` for all numeric types and a view with no rows
+for subviews. These defaults are used when a fresh row is inserted and when
+a new property is added to the view to fill in the missing values.
+The storage and view handles support custom methods defined in Tcl: to define
+_methodName_ on every storage or view handle, create a procedure called
+{`` *methodName*} or {`mk.view` *methodName*} respectively. These
+procedures will receive the handle as the first argument and all the remaining
+arguments. Remember to `pin` the view handle in view methods if you call more
+than one method of it!
+Custom `cursor` subcommands may also be defined by creating a procedure called
+{`cursor` *methodName*}. These receive all the arguments without any

0 comments on commit c768041

Please sign in to comment.