Skip to content

migae/datastore

Repository files navigation

migae - clojurizing the appengine datastore

Very much a work in progress. Works well enough to have some fun playing around. Not packaged on clojars, to experiment, clone the repo and run the tests. Some recent documentation is at Entities, Keychains and Datastore. Lots of simple tests that demonstrate the semantics.

At this point, until I can get more complete documentation organized, you’ll probably want to at least review Google’s official Datastore documentation.

In no particular order:

getting started

NOTE: you do not need an App Engine account to explore this library in a development environment. The boot build system will download and install everything you need.

  1. Fork/clone

  2. $ boot repl

WARNING Tested only on Mac OS X 10.9.5

A demo servlet is in the works; keep an eye on migae and dev-trainer. For now, you can explore the lib by repl or by running unit tests.

CAVEAT Unstable. If the doco here is off, take a look at the unit tests to see what runs.

Fork/clone the library. The project is configured to load dev/user.clj on repl startup. That file sets up the GAE datastore local test environment and defines a few vars to save typing during experimentation. So from the root directory, fire up a repl and start experimenting:

Note
use boot -C if your terminal colors are hard to read
$ boot testing repl
;; the testing task puts test/clj on the class path, which exposes the GAE dev services
nREPL server started on port 57767 on host 127.0.0.1 - nrepl://127.0.0.1:57767
...etc...
boot.user=> (require '[migae.datastore :as em] :reload-all)
boot.user=> (require '[test.rig :as ds] :reload)
boot.user=> (ds/setup)   ;; initialize GAE datastore service emulation
;; if try to use the ds without running ds/setup, you'll get:
;; java.lang.NullPointerException: No API environment is registered for this thread.
boot.user=> (def em1 (em/entity-map! [:A/B] {:a 1}))
boot.user=> em1
{:a 1}
boot.user=> (em/entity-map? em1)
true
boot.user=> (type em1)
migae.datastore.PersistentEntityMap
boot.user=> (:migae/keychain em1)
[:A/B]
boot.user=> (:a em1)
1
boot.user=> (meta em1)
#:migae{:keychain [:A/B]}
boot.user=> (.content em1)
#object[com.google.appengine.api.datastore.Entity 0x5bb99523 "<Entity [A(\"B\")]:\n\ta = 1\n>\n"]
;; use println to get a formatted representation
boot.user=> (em/keychain? (em/keychain em1))
true

;; pull constructor fetches Entity from datastore
boot.user=> (em/entity-map* [:A/B])
{:a 1}

;; refresh test datastore
boot.user=> (ds/teardown)
boot.user=> (ds/setup)

;; run some tests from the repl
boot.user=> (require 'test.ctor_push)

...etc...

user=> (key em)
[:A/B]
user=> (val em)
{:a 1}

== [[types]]types

* `migae.datastore.PersistentEntityMap` - migae representation of underlying `com.google.appengine.api.datastore.Entity`.  Implements IPersistentCollection, IPersistentMap, IMapEntry, etc.  See link:doc/Entities.adoc[Entities].
* `migae.datastore.Keychain`  - migae representation of underlying `com.google.appengine.api.datastore.Key`.  A vector of Clojure keywords.  See link:doc/Keychains.adoc[Keychains].  (**Not yet implemented**)

These are represented in migae as `entity-map` and `keychain`, respectively.

== [[ctors]]constructors and co-constructors

A _constructor_ of type T starts from some (possibly non-T) values and
builds a value of type T; dually, a _co-constructor_ of type T starts
from a T value and "co-builds" something else (which may be another T
value).  In producing its result, a co-constructor depends on the work
of a constructor, which provides it with the "raw materials" from
which its result must be (co-) constructed.  What I am calling a
co-constructor is often thought of as "undoing" or "eliminating" the
work of a constructor, or "deconstructing" a constructed value; for
this reason in logic and Type Theory it is often called an
_eliminator_, or sometimes a _projection function_; in Object-Oriented
parlance, co-constructor corresponds to "gettter".

"Co-constructor" and "eliminator" as terms just represent different
perspectives on the same thing.  In fact it will often be most
intuitive to use "eliminator": one constructs a value of type T, and
then eliminates it.

For example, `ds/entity-map` is a constructor for the type
`PersistentEntityMap`; `ds/keychain` is an eliminator/co-constructor.

Native Java class definitions do not always support ctors and
co-ctors; sometimes so-called "Factory" classes are used to
instantiate objects.  For example, class `Key`
(`com.google.appengine.api.datastore.Key`) has no constructors; to
create a key, one uses the `createKey` method of class `KeyFactory`.
The migae datastore library hides this complexity by providing
corresponding constructors and co-constructors:

[source,clojure]

(= (ds/entity-key [:A/B]) (KeyFactory/createKey "A" "B")) (= (.getKind (KeyFactory/createKey "A" "B")) "A") ;; native kinds are Strings (= (ds/kind (ds/entity-key [:A/B]) :A)) ;; migae kinds are keywords (let [k (KeyFactory/createKey "A" "B")] (= (ds/kind k) (keyword (.getKind k))))

The migae operators also work on entity-maps:
[source,clojure]
(= (ds/kind (ds/entity-key [:A/B]) :A))

migae kinds are keywords (let [k (KeyFactory/createKey "A" "B")] (= (ds/kind k) (keyword (.getKind k))))

See below, <<keys,Keys, keychains, keylinks, dogtags>> for more
information on the Key API.

=== PersistentEntityMap ctors

We have three ways to construct PersistentEntityMap objects:

* local constructor:  `(entity-map <keychain> <map>)`
* push constructor:   `(entity-map! <keychain> <map>)` - construct locally and push to datastore
* pull constructor:   `(entity-map* <keychain> <map>)` - pull matching entities from datastore and construct corresponding PersistentEntityMap objects locally

Push: by default the push ctor `entity-map!` first checks to see if an entity with that key already exists, and throws an exception if so; otherwise it constructs the PersistentEntityMap and saves the underlying Entity to the datastore.  This default behavior can be overriden by using the `:force` key, which will make the ctor save the construction absolutely, thus overwriting anything that might have previously been stored with that key.

[[app-listing]]
[source,clojure]
(entity-map! [:A/B] {:a 1 :b 2})

std local ctor expression

(entity-map! [:A/B C/D] {:a 1 :b 2})

ditto

(entity-map! [:A] {:a 1 :b 2})

kinded ctor (see below)

(entity-map! [:A/B :C] {:a 1 :b 2})

kinded ctor (see below)

(entity-map! :force [:A/B] {:a 9 :b 10})

force replacement

==== [[reader]] reader syntax

tagged literals?  I can never manage to get them to work, but maybe:

[[app-listing]]
[source,clojure]

#entity-map [[:A/B] {:a 1}] #emap [[:A/B] {:a 1}]

=== co-constructors (aka eliminators)

Constructors and co-constructors must "harmonize":

[[app-listing]]
[source,clojure]

(is (= (ds/keychain (ds/entity-map k m)) k)) (is (= (ds/entity-map (ds/keychain (ds/entity-map k m)) m)))

== [[axioms]] structural axioms

[[app-listing]]
[source,clojure]

(def em (entity-map [:A/B C/D] {:a 1 :b 2})) (coll? em) (map? em) (entity-map? em) (= (key em) [:A/B :C/D]) (= (val em) {:a 1 :b 2}) (= (keys em) [:a :b]) (= (vals em) [1 2]) (= (:a (into em {:a 3}) (:a em)))

etc.  More to come.


== [[keys]] keys, keychains, keylinks, and dogtags

A keylink is a namespaced keyword, e.g. `:Foo/Bar`.  A vector of
keylinks is a keychain, which corresponds to a datastore Key, which
has a Kind of type String, an Identifier (either a String name or a
long Id), and (optionally) a parent Key.  See
link:doc/Keychains.adoc[Keychains] for more detail.

**_Caveat_**: note the difference between a datastore Key and a
  clojure key.  The former is a type (class), the latter is a
  structural role (first element of a mapentry).

A proper keychain is a vector of namespaced keywords.  To use numeric
Ids, include a notational prefix, 'd' for decimal and 'x' for
hexadecimal.  E.g. `[:Foo/d11]` or `[:Foo/x0B]`.

The last link in the chain is the _dogtag_, so named because it serves
as a (quasi-) identifier for its entity-map.  A dogtag is just a
Clojure keyword with namespace (e.g. :A/B); it corresponds to the
datastore Key of the underlying datastore Entity.  The Key of an
Entity does identify it, because it contains a link to its parent
key; but a dogtag does not completely identify its entity-map, since
it contains no link to its predecessor.  In migae, the "key" of an
entity-map is the entire keychain.  However, the kind and identifier
(name or id) of the dogtag do characterize the entity-map.

Note that a dogtag predicate `(dogtag? x)` doesn't make sense - it's
not a type.  What makes a keyword a dogtag is its position in a
keychain.

[source,clojure]

user⇒ (ds/to-ekey :A/b) ; migae keylink to datastore entity Key (ekey) #object[com.google.appengine.api.datastore.Key 0x6c4f881d "A(\"b\")"] (ds/ekey? (ds/to-ekey :A/B)) (= (ds/dogtag [:A/B]) (ds/dogtag [:X/Y :A/B])) ;; dogtag is last link in chain :A/B (= (ds/keychain (ds/to-ekey :A/B)) [:A/B]) (= (ds/kind [:A/B]) (ds/kind [:X/Y :A/B])) (= (ds/name e1) (ds/name e2) (ds/name e3))

== [[kinds]] kinds

In the datastore, every Entity has a "Kind", which is a string.  A
Kind is effectively a tag that you attach to an Entity in order to
categorize it; a Kind is not a class.  Two objects of the same Kind
may have absolutely nothing in common except for their Kind.

The datastore supports what I'm calling "kinded construction": you
specify a Kind in your constructor, and the datastore autogens an Id.
You can also retrieve entities by Kind; querying for Kind "Foo" will
return all Entities of Kind "Foo".  You can narrow this by specifying
an "ancestor key", so only kinded Entities having that key as parent
will be fetched.

The migae api makes both of these operations simple and transparent.
To do a kinded construction, just use an improper keychain with the
push constructor, like so: `(entity-map! [:A] {:a 1})`; to fetch
Entities by kind, do the same with the pull constructor: `(entity-map*
[:A])`.  Kinded construction is not supported for the local
constructor (`entity-map`); the datastore can only generate Ids for
stored entities.

[source,clojure]

(= (kind (entity-map [:Foo/Bar] {:a 1})) :Foo) (= (kind (entity-map [:Foo/Bar :X/d3] {:a 1})) :X)

== [[autogen]] autogenned ids

Use a partial ("improper") keychain to have the datastore autogen Id
values.  All but the last links in the vector must be namespaced;
e.g. `[:A/B :C/D :E]`.  Only valid for push ctor, since the datastore
can only autogen an Id on saved entities.

[source,clojure]
(def em1 (entity-map! [:Foo] m))

kind="Foo", id is autogenned.

(def em2 (entity-map! [:Foo] m))

em1 and em2 have different key ids

(def em2 (entity-map! [:A/B :C/D :Foo] m))

long keychains ok too

.TODO: documentation
****
*Key interface:*

* ctors:
** ds/entity-key
** ds/keychain
* co-ctors
** ds/kind
** ds/identifier
** ds/name
** ds/id
****

== [[fields]] field types

The value part of an entity-map is just a map.  The datastore
restricts the permissible value types; see  link:https://cloud.google.com/appengine/docs/java/datastore/entities#Java_Properties_and_value_types[Properties and value types].

[source,clojure]
(entity-map [:Foo/Bar] {:a 1})

java.lang.Long

(entity-map [:Foo/Bar] {:a 1.0})

java.lang.Double

(entity-map [:Foo/Bar] {:a true})

java.lang.Boolean

(entity-map [:Foo/Bar] {:a "baz"})

java.lang.String

(entity-map [:Foo/Bar] {:a :b})

keywords (stored as String)

(entity-map [:Foo/Bar] {:a 'b})

symbols (stored as String)

(entity-map [:Foo/Bar] {:a [1 2 3]})

vectors

(entity-map [:Foo/Bar] {:a '(1 2)})

lists

(entity-map [:Foo/Bar] {:a {:b :c}})

maps

(entity-map [:Foo/Bar] {:a #{1 'b "c"}})

sets

(entity-map [:Foo/Bar] {:a {:b [1 {:c true}]})

mixed, nested (entity-map [:Foo] {:a {:b :c} :b [1 2] :c '(foo bar) :d #{1 'x :y "z"}})

Datastore field types:
[source,clojure]
(entity-map [:Foo/bar] {:int 1

BigInt and BigDecimal not supported :float 1.1 :bool true :string "I’m a string" :today (java.util.Date.) :email (Email. "foo@example.org")

:dskey [:A/B :C/D]

foreign key :link (Link. "http://example.org") ;; TODO: EmbeddedEntity (not same as map value) ;; TODO: Blob, ShortBlob, Text, GeoPt, PostalAddress, PhoneNumber, etc. })

TODO: support all datastore property types.

== [[metadata]] metadata

Meatadata works:

[source,clojure]

(meta (with-meta (ds/entity-map [:A/B] {:a 1}) {:foo "metadata here"})) ⇒ {:foo "metadata here"}

"Literal syntax" doesn't work: `(meta '^{:metastuff "o boy!"} (entity-map [:A/B] {:a 2}))`. Nope.


== [[retrieval]] retrieval

We treat the datastore as just another map: `(get datastore k)`
retrieves the entity-map whose keychain is `k`.  Since there is only
one datastore, we sugar this to `(get-ds k)`.

== [[co-ctors]] experimental:  co-construction

[source,clojure]
(entity-map* [:A/B])

"co-constructs" (retrieves) entity with key [:A/B] if it exists, otherwise throws exception

== [[queries]] queries

**NB**: these query patterns are experimental and very likely to change.

Query syntax looks like constructor syntax; the difference is we treat
the map part as a pattern.

The pull constructor:

[[app-listing]]
[source,clojure]
(entity-map* [])

fetch all entities

(entity-map* [:A/B])

fetch unique entity with key :A/B

(entity-map* [:A])

fetch all entities with kind :A

(entity-map* [:A/B :C])

fetch all entities with kind :C and ancestor :A/B

(entity-map* :pfx [:A/B :C/D])

fetch all entities with keychain prefix (i.e. ancestor) [:A/B :C/D]

== [[mutation]] mutation

[[app-listing]]
[source,clojure]
(entity-map! [:A/B] {:a 1})

push to datastore; throw exception if already exists

(entity-map! :force [:A/B] {:a 1})

same, but ignore existing and overwrite

(into-ds (entity-map [:A/B] {:a 1}))

non-destructive: fail if already exists

(into-ds! (entity-map [:A/B] {:a 1}))

destructive: replace existing

TODO.  A hybrid push/pull ctor: pull entity if it exists, otherwise
construct and push it.  Not sure what keyword is most appropriate.
For now, ":maybe": maybe push, otherwise pull.  Since this combines
push and pull, maybe we should combine the decorations:
`entity-map*!`.  Or maybe not.  Usage expectation: probably would be
when constructing and pushing entities, rather than searching and
then, on search failure, deciding to construct and push.

[[app-listing]]
[source,clojure]
(entity-map! :maybe [:A/B] {:a 1})

if [:A/B] in ds, return it; otherwise construct and push

Patterns:

* augmentation: add a field, or add a value to a field
* replacement:  replace value of a field, replace entire entity
* removal:  delete a field or entity

Note that datastore fields may be singletons or collections.  So for
example you can start by storing an int, and then you can add another
value to the field, effectively converting it from type int to type
collection.  So there are three kinds of change that can apply to a
field: change the value, or augment it by adding another value, or
remove it.

This clashes a little bit with Clojure abstractions.  For example,
`into` replaces stuff.  That's fine, but we also need a way to
augment, so we'll have to spell that out - call it `onto`?.

[source,clojure]

(let [e (entity-map [:A/B]) e2 (into e {:foo "bar"})] ;; std clojure.core/into: replace val at :foo, or add if not present (into-ds! e2)) ;; replace e

TODO: augmentation.  Maybe something like:

[source,clojure]

(let [e1 (entity-map [:A/B] {:foo "bar"}) (e2 (ds/entity-map! :augment e1 {:foo 27}))] ;; turn {:foo "bar"} into {:foo ["bar" 27]} ⇒ e2 == (entity-map [:A/B] {:foo ["bar" 27]}) …​)

Of course, there are various ways to accomplish this sort of thing
using standard Clojure facilities, so we don't really need to define
an "augmentation" op.


== [[testing]] testing

See the code in link:/test/clojure/tests[tests] for lots of examples of how to use
the library.  Be **be forewarned**, the test cases are in flux; some
of them are outdated, and the api is changing.

The testing env:

==== unit testing:

Run the `testing` task before the `test` task; it puts test/clj on the
class path, which exposes the GAE dev services. Use -n to specify the
namespace to test (omit to run all tests):

[[app-listing]]
[source,clojure]

$ boot testing test -n test.local-ctor

==== testing at the repl
dev/user.clj:
(require '[clojure.test :refer [run-tests test-var]])

:all (require '[clojure.tools.namespace.repl :refer [refresh refresh-all]])

[source,clojure]
$ lein repl

runs dev/user.clj

(refresh-all)

reload everything

(clojure.test/run-tests 'test.ctor-local)

run all tests in test/ctor_local.clj

;

edit test/ctor_local.clj …​

(refresh)

reload changed test/ctor_local.clj only

(ds-reset)

reinitialize appengine datastore local test env

(clojure.test/test-var #'test.ctor-local/hashmap-axioms)

run one test

;

edit migae datastore lib source

(refresh-all)

reloads everything

(ds-reset)

local datastore test env times out relatively rapidly, so reinitialize it ;; or: (do (refresh-all) (ds-reset)) (clojure.test/test-var #'test.ctor-local/hashmap-axioms)

(clojure.test/run-tests 'test.tutorial)

run everything in test.tutorial

You can also use leiningen to run tests, but the repl is faster by
orders of magnitude.

= [[comparisons]] comparisons

* link:http://redis.io/topics/data-types-intro[redis]
* link:http://clojuremongodb.info/[monger] clojure client for link:https://www.mongodb.org/[mongodb]
* link:http://www.datomic.com/[datomic]

test