Archie L. Cobbs edited this page Apr 20, 2018 · 20 revisions

Architecture

Is Permazen a database?

Permazen is a persistence layer that sits between your Java application and some other, underlying key/value database. The underlying database is responsible for providing transactions and durably storing information. Permazen provides all of the remaining features you expect from a "database" and more, including indexes, a command line tool, auto-generated Vaadin GUI, etc.

With this design Permazen can make persistence simple, natural, and completely type safe for a Java application, without sacrificing scalability or practical convenience.

Almost every database in existence is, at its heart, just some form of key/value store. Permazen let’s the database do what it’s really good at - storing key/value pairs - and takes over from there with the goal of providing an optimal experience Java programmers.

Having said that, Permazen also provides several key/value store implementations.

What does the overall design of Permazen look like?

Permazen is has these layers (from top to bottom):

  • The Java model (or Permazen) layer

  • The core API layer

  • The key/value store API layer

At the bottom layer is a simple byte[] array key/value database. Transactions are supported at this layer and several implementations are included; see io.permazen.kv and sub-packages.

On top of that sits the core API layer, which provides a rigourous database abstraction on top of the key/value store. It supports simple fields of any atomic Java type, as well as list, set, and map complex fields, tightly controlled schema versioning, simple and composite indexes, and lifecycle and change notifications. It is not Java-specific or explicitly object-oriented. The core API is provided via the Database class.

The Java model layer is a Java-centric, type safe, object-oriented persistence layer for Java applications. It sits on top of the core API layer and provides a fully type-safe, Java-centric view of a core API database. All data access is through user-supplied Java model classes. Database types and fields, as well as listener methods, are all inferred from a simple set of Java annotations. This layer also provides automatic incremental JSR 303 validation. The Permazen class represents an instance of the top layer.

This top layer is what Java programmers normally deal with. It’s main job is mapping an object-oriented, Java-centric presentation onto the simpler structure/field world of the Core API. In turn, the core API relies on the key/value store API to provide a basic sorted, transactional key/value store.

Key/Value Store API

The key/value Store API is very simple. Keys and values are arbitrary byte[] arrays, and keys are sorted lexicographically (with unsigned byte values). A key/value store supports transactions.

Core API Layer

The core API layer provides basic types, and concepts roughly analogous to tables, columns, and rows. The core API calls these concepts object type, field, and object, respectively. However, this analogy is loose and there are some subtle but important differences. For example, two different object types can contain the same field. This allows you, for example, to index a field across multiple object types - even if the types are not in the same type hierarchy. For example, you can index a field corresponding to a bean property declared by a Java interface, and then query the index for objects having any type that implements the interface.

All core API fields have a strictly well-defined type, sort ordering, and serialized encoding as byte[] values (see FieldType). Along with "atomic" field types, the core API includes support for a few special field types, including reference types (i.e., "pointers"), identifier list "enum" types, lock-free counter types. The core API also supports user-defined types.

In addition to the aforementioned simple field types, the core API layer also provides support for complex field types: List, Set, and Map.

Indexes on both simple and complex fields are supported, and composite indexes on multiple simple fields are supported.

The set of all object types and their fields defines a schema. With certain restrictions, the core API allows multiple different schemas to exist at the same time in the same database; each schema has a unique integer version. As a consequence, all objects in the database are versioned. An object type may have different fields in different schema versions.

All core API layer stored types (objects, fields, indexes, etc.) are identified by an integer storage ID, not by name. This allows names to change at a higher level without affecting the core API schema structure.

Although written in Java, there’s nothing inherently Java specific about the core API layer. The "objects" in the core API layer are just data structures: there is no explicit notion of class, inheritance, or methods (it is the job of the Permazen layer to perform that mapping).

Java Layer

The Java layer sits on top of the core API layer. It provides the developer-friendly, Java-centric view of the core API layer. You normally only need to deal with the Permazen layer.

At the Java layer, the "schema" is implicitly defined by your Java model classes, which are identified by the @PermazenType annotation. To restate that: your set of Java model classes is your Permazen schema; there is no separate schema "configuration" required. Under the covers of course, the Permazen layer generates an appropriate core API schema from your model classes and provides this to the core API layer.

The Java layer also does any necessary translation of core API values. For example, in the core API layer a "reference" is described by a 64-bit object identifier (see ObjId), whereas in the Java layer a reference is a Java model object.

Why is the core API layer / Java layer split important?

First, it allows complete flexibility in your Java model classes, while still providing well-defined semantics, strict type safety, and easy version migration, even in the face of arbitrary code refactoring (no small feat).

Secondly, sometimes you want to inspect or modify data directly, without any "object orientedness", i.e., without the possibility of any Java model class methods being invoked as listeners or whatever. The core API lets you do this.

The Permazen command line interface (CLI) utility also supports this notion: it can run in either core API mode or Java mode (aka. "Permazen" mode).

Fields and Types

What simple types are supported?

Permazen supports the following simple types out of the box:

  • Primitive types

  • Primitive wrapper types

  • References to Java model classes (or any wider type)

  • Enum types

  • Arrays of any simple type up to 255 dimensions (passed by value)

  • java.lang.String

  • java.util.Date

  • java.util.UUID

  • java.util.File

  • java.util.regex.Pattern

  • java.time.*

Can I create my own simple types?

Yes, by writing a class that subclasses FieldType, annotating it with @JFieldType, and putting in on the classpath.

An easy way to create a custom type for any type that can be encoded as a String is to pass an appropriate Converter<T, String> to a new instance of StringEncodedType. Database equality and sort order then derives from the string representation.

How are Enum values stored?

In the core API layer, Enum values are represented by EnumValue objects which serialize (usually) into a single byte. In the Permazen layer, they are represented by instances of the appropriate Enum Java model class.

At the core layer, two enum types are considered equivalent if and only if they have the same (ordered) identifier list. This means you can move an Enum model class to a different package without requiring a schema change. However, if you add or change an Enum value, that forces a schema change, because the field’s type has effectively changed.

By default, when a field’s type changes during a schema change, the field is reset to its default value (which is null for non-primitive types). However, you have the option of telling Permazen to automatically map the old Enum value to the new Enum type if its identifier still exists.

This is part of a more general mechanism for automatic conversion of field values when a field’s type changes during a schema upgrade; see @JField.upgradeConversion() for details. In short, your options are: reset the field or try to automatically convert it.

Or for complete control, provide an @OnVersionChange method to map between the old and new field values. Permazen will supply the old Enum field values as EnumValue objects, which are just an int, String pair.

Here’s an example showing an original model class:

// Schema version #1
@PermazenType
public abstract class Vehicle {
  public enum Color {
    RED,
    LIGHT_GREEN,
    DARK_GREEN,
    BLUE
  }

  public abstract Color getColor();
  public abstract void setColor(Color color);
}

and a new model class with the renamed field and schema "fixup":

// Schema version #2
@PermazenType
public abstract class Vehicle {
  public enum Color {
    RED,
    GREEN,   // was LIGHT_GREEN or DARK_GREEN
    BLUE
  }

  @JField(name = "color2")
  public abstract Color getColor();
  public abstract void setColor(Color color);

  // Semantic update for version 1 -> 2
  @OnVersionChange(oldVersion = 1, newVersion = 2)
  private void update(Map<String, Object> prev) {
    EnumValue colorName = ((EnumValue)prev.get("color")).getName();
    if (colorName.endsWith("_GREEN"))
      colorName = "GREEN";
    this.setColor(Color.valueOf(colorName));
  }
}

Unlike with JPA, because Permazen takes care to not mix incompatible types, it’s not possible to read an Enum value that doesn’t exist from the database, even after schema changes, and you have total control of whether and how fields are converted during a schema change.

What collection types are supported?

Lists, Sets, and Maps.

The element, key, and value can have any simple type. In the case of primitive types, null values will be disallowed.

Sets actually implement NavigableSet, and Maps actually implement NavigableMap.

Lists have performance characteristics similar to ArrayList.

Querying Data

Does Permazen have a query language?

No. All queries are done using normal Java.

Do I need a DAO layer?

No.

For a few operations such as creating a new instance and querying an index, you invoke methods on the current JTransaction.

Everything else can be normal Java, and all access methods can be either instance or static methods in your Java model classes.

Let’s take a simple example Java model with Account and User model classes. We have these requirements:

  • Every user must have an account

  • Usernames must be unique

  • We must be able to efficiently find users by username

  • We must be able to efficiently find all users associated with an account

Here’s an what those classes might look like, including all the "DAO" methods you would need:

@PermazenType
public abstract class User implements JObject {

  // Fields

    // Get this user's username
    @JField(indexed = true, unique = true)
    @NotNull
    public abstract String getUsername();
    public abstract void setUsername(String username);

    // Get this user's account
    @NotNull
    public abstract Account getAccount();
    public abstract void setAccount(Account account);

  // "DAO" methods

    // Create new user
    public static User create() {
        return JTransaction.getCurrent().create(User.class);
    }

    // Find user by username
    public static User getByUsername(String username) {
        final NavigableSet<User> users = JTransaction.getCurrent().queryIndex(
          String.class, "username", User.class).asMap().get(username);
        return users != null ? users.first() : null;
    }
}

@PermazenType
public abstract class Account implements JObject {

  // Fields

    // Get the name of this account
    @NotNull
    public abstract String getName();
    public abstract void setName(String name);

  // "DAO" methods

    // Create new account
    public static Account create() {
        return JTransaction.getCurrent().create(Account.class);
    }

    // Get all users associated with this account
    public NavigableSet<User> getUsers() {
        final NavigableSet<User> users = this.getTransaction().queryIndex(
          User.class, "account", Account.class).asMap().get(this);
        return users != null ? users : NavigableSets.<User>empty();
    }

    // Get all accounts
    public static NavigableSet<Account> getAll() {
        return JTransaction.getCurrent().getAll(Account.class);
    }
}

Congratulations, you’re done! You’ve just configured an entire Java application persistence layer.

How do I do aggregate queries like AVG() and SUM() and things like GROUP BY?

You write them yourself in Java.

Isn’t that inconvenient?

Yes and no.

Permazen believes that having everything done in maintainable Java code is worth the trade-off of having to write a few helper methods. Code is only written once, but it’s maintained forever.

Also, and perhaps more importantly, Permazen makes it impossible to write a poorly performing query unless you explicitly write it that way yourself.

For example, in SQL a query like SELECT * FROM USER WHERE LOWER(USERNAME) = 'fred' will require examining every row of the USER table even if the USERNAME column is indexed, because of the use of LOWER() in the WHERE clause.

The problem is that it’s not obvious that this query is going to be slow just by looking at it. Of course this is just a simple example, in the real world query performance can be much more obfuscated.

In Permazen, to implement that query, you’d have to write a loop that iterates over every User in the database. This makes the performance reality obvious.

The more "correct" thing to do would be to add a new private field that contained the lowercase version of the user’s name, somehow always keep it up to date, and then index that field. Permazen makes this easy using the @OnChange annotation:

@PermazenType
public abstract class User implements JObject {

  // Fields

    // Get this user's username
    @JField(indexed = true, unique = true)
    @NotNull
    public abstract String getUsername();
    public abstract void setUsername(String username);

  // Derived fields

    // Get this user's lower case username - automatically kept in sync
    @JField(indexed = true)
    public abstract String getLowercaseUsername();
    protected abstract void setLowercaseUsername(String username);   // not public

    @OnChange("username")
    private void onUsernameChange(SimpleFieldChange<User, String> change) {
        final String username = change.getNewValue();
        this.setLowercaseUsername(username != null ? username.toLowerCase() : null);
    }

  // "DAO" methods

    // Find users by lowercase username
    public static NavigableSet<User> getByLowercaseUsername(String lowername) {
        return JTransaction.getCurrent().queryIndex(
          String.class, "lowercaseUsername", User.class).asMap().get(lowername);
    }
}

Now you’ve got a fast query by lowercase username, and all the details are contained in one place and hidden from other classes.

Can I query by any Java type? What about interface types?

Yes and yes.

How do I do database joins?

Instead of thinking in terms dictated by the database technology, Permazen lets you think in more natural terms of sets, specifically NavigableSet, which provides efficient range queries, reverse ordering, etc.

Permazen also provides efficient union, intersection, and difference implementations (see NavigableSets). These operations provide the functionality of database joins.

Indexes

How do you query an index?

Using JTransaction.queryIndex().

Index queries are parameterized by the Java types you are interested in and type safe.

These Java types can be arbitrarily wide or narrow.

What happens if I make a schema change that simply adds or removes an index on a field?

Permazen supports schema changes that add or remove indexes. If you do this, only objects whose schema versions have the field indexed will be found in the index.

Key/Value Stores

What requirements must the key/value store satisfy?

The key/value store must support data access via the KVStore interface:

  • Efficiently get, put, and remove keys

  • Efficiently find the next higher or lower key

  • Support transaction; see KVDatabase for details.

What key/value databases are supported?

Currently the following key/value stores are supported:

Several other popular NoSQL databases are not compatible because of one or more of the following:

  • Keys are not sorted (only hashed)

  • Keys have limited length (e.g., at most 64 or 128 bits)

Does Permazen require ACID semantics from the key/value store?

Preferred but not required. The philosophy behind Permazen states that simplicity promotes solid, reliable, maintainable code. In particular, if the code is too complicated, it becomes unfeasible for developers to prove to themselves that the code is fully correct — and of course if the developers can’t ensure the code is fully correct, it won’t magically become fully correct by itself. Stated another way, "complexity kills".

A persistence technology that doesn’t provide consistent, ACID-compliant transactions can be too difficult for programmers to reason about. In addition, recently there has been a change in the traditional belief that you can’t have both ACID compliance and scalability: Google Cloud Spanner and FoundationDB are proving this assumption wrong.

In any case, you are welcome to use any key/value store you want to; you just need to make sure you understand how it affects your program logic. In particular, Permazen uses the key/value to store both primary object information and secondary (derived) index information. So, for example, if transaction mutations are not applied atomically, it’s possible an index could return results that are inconsistent with the fields that it indexes.

Data Storage and Layout

How does Permazen encode information as keys and values?

See LAYOUT.txt for a basic overview.

Object IDs are 64 bits (8 bytes), with a prefix that indicates the object type.

Simple field values are encoded as self-delimiting byte[] arrays. Because they are self-delimiting, any two simple values and/or object ID’s can be concatenated. Integral values are stored using an encoding that requires only one byte for small values (-118 through 119), two bytes for larger values, etc.

Configuration

Use the PermazenFactory class to configure your Java model classes and your underlying key/value database, and you’re good to go.

See the Spring package for an example of configuring Permazen in a Spring application.

Schemas and Versioning

Do schema changes affect the whole database?

No. Permazen is designed to avoid any "whole database" operations that might limit scalability.

Schema changes are applied on demand, on a per-object basis, as objects are accessed during normal operation.

What happens if my Java model classes change? Won’t that break the mapping to the core API objects and fields?

The short answer is: Permazen always guarantees Java type safety and correct encoding/decoding of objects, even in the face of arbitrary Java model class refactoring.

Permazen allow arbitrary code refactoring at the Java model layer, but if the generated core API schema changes in a structurally incompatible way, then a new schema version is required. Normally schema version numbers are auto-generated based on the generated core API schema, so this happens automatically.

If you want you can define schema version numbers manually, so in this case you’ll need to specify a new schema version number, and if you try to use an incompatible schema without changing the schema version number, you’ll get a SchemaMismatchException when trying to open a new transaction.

How does Permazen know that some older version of my code had a different schema?

When you run code with a new schema version for the first time, Permazen records the schema in the database. From that point onward, Permazen will not allow the use of any other, incompatible schema with that same version number.

What happens to objects created by an older schema version after an upgrade to a newer schema version?

After a schema change, your new code will create objects with the new schema version. Objects created by your old code will continue to exist in the database unchanged.

What happens when a new version of my code tries to read an object created by an old version of my code?

When your new code first encounters an object with an older version number, the object will be automatically upgraded to the new schema version. Newly added fields and fields whose types have changed will be initialized to their default values, and removed fields will be deleted.

If that’s good enough for you, you don’t need to do anything else.

For simple fields whose type has changed (e.g., from int to long), you can configure whether they are automatically converted (default) or reset to their default values; see @JField.upgradeConversion().

However, Permazen also gives you an opportunity to perform arbitrary schema change "fixup" logic if necessary, by invoking any @OnVersionChange methods on the object. All of the fields in the old version of the object (including fields that were removed) are made available to this method.

What happens when an old version of my code tries to read an object created by a new version of my code?

Same thing. Permazen doesn’t really care about the schema version numbers themselves; they are simply unique identifiers. So "upgrades" and "downgrades" are handled exactly the same way.

If you will have multiple versions of your code writing to the same database, then both versions will need to know how to handle an object version change from the other version. In this situation a phased upgrade process is recommended:

  • Upgrade nodes to understand both the old and new schema versions, but disable newer functionality until all nodes are upgraded

  • Once all nodes are upgraded, start using using the new schema and associated new functionality

  • (Optional) Force upgrade all remaining database objects, e.g., use CLI command: eval all().forEach(JObject::upgrade)

  • (Optional) Garbage collect the old schema version from your database meta-data, e.g., use CLI command: delete-schema-version 3

  • (Optional) Remove support for the old schema version in your code

This process allows for rolling schema upgrades across multiple nodes with no downtime.

How will newer versions of my code know how to properly decode objects stored by older versions?

The core API layer records all of the schemas ever used in a database (until you garbage collect them) in the database meta-data, so it always knows how to decode any object.

It’s not possible to garbage collect a schema version until no more objects exist with that version.

What happens if a newer schema version removes a Java model class? How can I access those objects?

Objects created by older schema versions whose model class no longer exists are still accessible, but the will have type UntypedJObject. If needed, you can access their fields using the field introspection methods of the JTransaction class. Typically, however, deleting a Java model class means you don’t need or want the data anymore.

You can encounter UntypedJObject instances in the following two situations:

  • As the value of a removed field in an @OnVersionChange schema update callback method when:

    • The older version contained the model class; and

    • The newer version does not

  • In index query results, when:

    • The older version contained the indexed field in a model class that was removed; and

    • UntypedJObject is assignable to the Java type requested by the query (e.g., you request all objects in the index of type Object).

Note that type safety is still preserved in all situations.

What if a new schema changes an object reference to have a narrower Java type? Won’t then older versions of the class violate type safety?

No, because during a schema upgrade Permazen automatically eliminates any references that would no longer be valid due to narrowing Java types.

Of course, you have an opportunity to do something with the old, invalid references in your @OnVersionChange method.

What happens if I change a float field to String, etc.?

Permazen requires a limited amount of consistency between schema versions. Specifically, a field cannot have two different types between schema versions and also be indexed in both schema versions. This restriction is required because indexes can index objects from any schema version, and mixing types would result in an ambiguous encoding of values in the index.

Under your control, Permazen can optionally perform some automatic conversions (e.g., from float to String, Enum values with the same identifier, etc.) for you. See @JField.upgradeConversion()).

If you need more control, you can do arbitrary conversions in an @OnVersionChange method.

How would I handle a schema change that splits a class Vehicle into Car and Truck? Or that does the reverse?

These types of schema changes are tricky for any Java persistence framework. For example, there’s no way to avoid visiting every Vehicle at some point to decide whether it needs to be a Car or a Truck.

The easiest way to handle this scenario is to upgrade in two steps. In the first phase, all three classes exist (in the obvious inheritance arrangement), and your code knows how to handle all three. During this phase, a custom background upgrade thread iterates through every instance, deciding what to do with it, updating or replacing it as necessary. In the second phase, all objects have been transitioned to the new classes, so the old class(es) are no longer needed and can be removed.

What if my schema change requires replacing instances of one class with instances of a different class? How do I update incoming references?

In Permazen all reference fields are indexed, so you can simply query the index for each reference field that refers to the instance you are replacing, and then update those references.

What if model class A contains a reference to model class B, and then a schema change deletes class B?

Then the Java type of the reference in class A will also have to change, otherwise your code won’t compile, or schema generation will fail because class B isn’t a model class.

When and how do objects get upgraded to a newer schema version?

Objects are upgraded automatically the first time your code attempts to read or write a field in the object.

Is there a way to forcibly upgrade objects to the current schema version?

Yes, JObject.upgrade() upgrades an object to the current schema version.

From the CLI, you can upgrade every object by invoking eval all().forEach(JObject::upgrade).

How exactly does Permazen prevent me from reading or writing a field incompatibly?

Fields are expliclitly typed; each type has an associated FieldType implementation.

Will my database get cluttered up with old schema versions from years gone by?

You can use the CLI command delete-schema-version to remove a recorded schema version from the database.

This operation will fail if any objects with that version still exist - you must upgrade (or delete) them first, e.g., using the CLI command eval all().forEach(JObject::upgrade).

For simplicity, it is recommended to always upgrade objects after a schema change, so your @OnVersionChange methods only have to deal with one version change at a time.

How can I tell what schema versions are in use by objects in my database?

Permazen keeps an internal index on object versions. Therefore, it’s easy to query for which objects of which types have which versions.

For example, in the CLI to find how many objects of type Vehicle have version four, you could say eval all(Vehicle) & queryVersion().get(4)).stream().count().

I changed my model classes and now new transactions are failing with SchemaMismatchException…​ now what do I do?

Avoid this problem by configuring your schema version as -1 to have a random version auto-generated for you based on hashing the schema.

Don’t forget to add @OnVersionChange methods as necessary to handle any required schema change fixups.

How do I manually specfify the schema version?

The schema version number can be provided explicitly when you configure a Permazen instance, or auto-generated based on hashing your schema (by setting the version to -1). In the latter case, you don’t have to do anything.

However, you need to give Permazen permission to record a new schema version in the database; this is just an extra safety check.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.