Skip to content

QuickGuide: Shared Storage Data Structure Development

Thomas Cujé edited this page Jan 19, 2017 · 1 revision

This guide will teach you how to organize your services data in the shared storage. As example we develop a message storage structure for the ShortMessageService. Also we explain the main differences to regular file system storage and databases.

NEVER USE DATABASES WITH YOUR SERVICES!

  • single point of failure in an redundant environment
  • credentials may be exposed to public
  • not scalable in a otherwise scaling p2p network
  • possible reachability problems for some network members
  • blocking IO operations may reduce performance

Prerequisites

This guide will not explain how to use the API, its focus is more on the conceptual part. The API is described here.

Requirements for the data structure

The storage back-end for our ShortMessageService should store the messages between two agents. The agents can represent a person, group or even a service. They are identified by their id. Usually a database would be used to store each message with the recipient id, sender id and the messages text. As shown above this would lead to several problems, therefore we want to store the messages in the network.

Solution Design

First thought is to create a Java container class that stores all messages and store this container in the network. This would solve some problems, but the container instance can't be encrypted. The service instance has multiple service agents on several nodes, which all have different encryption keys. The messages inside the container would still be encrypted with their recipients keys, but the container would leak all meta information to the outside.

Second thought is to create a Java container for each communication between two agents. The container would be only readable for these two agents and leak no information to the outside. But still it's a single point of failure. Even if las2peer provides many techniques to prevent a data loss, it's not perfectly secure as in every p2p network. Furthermore if you have a really active long term chat with thousands of the messages, the single container instance would grow. Such a big container would not scale, leading to a very bad performance on storing or fetching messages.

So for the final solution we have the following requirements:

  • full end-to-end encryption
  • no metadata leakage
  • small parts to fetch and store

To achieve these features we have to store each message as single object. This usually sounds not like a good idea, because it increases overhead and would have bad performance e. g. on a usual filesystem. But the shared storage is not like a usual filesystem. Via the p2p structure a service can fetch many objects in parallel without loosing any performance. The fact that each object can be stored on a different node is even increasing performance, by splitting up objects.

Additionally each of these single message objects is linked to one recipient agent. This makes providing a good end-to-end cryptography simple, by just using the recipients agent public key.

Final Solution

The final solution to the problem is to find a method to generate a unique identifier string for each message object. To do so we concatenate a service related prefix with the senders agent id, the recipients agent id and add a message index number at the end. The prefix in the beginning ensures that we almost never collide with other services.

To find the last used index number you should use a binary search and look for the first not used identifier with the given sender agent id and recipient agent id, while increasing the index number. It's recommended to cache the last index in the service instance memory, to boost future lookups. However keep in mind to search before each store operation, because another instance of the service may has created further objects. In cases with low object numbers, a linear search could also provide sufficient performance.

Using this storage structure is mostly like an Java ArrayList type. You can access objects by index and add new objects to the list. To query for new objects you just have to fetch for the biggest recently not used identifier.

Conclusion

As service developer you should keep the following points in mind:

  • split up your data, wherever you can
  • use the data owners agent public key to provide good encryption
  • use lengthy identifiers and implement features with them, e. g. lists with indexing
  • you can't just query all data, so create metadata objects with well known ids if needed.
Clone this wiki locally