HackTheRainforest

Offline Data Sync Proposal

Feb 2015 - Tarapoto, Peru

Camilo @SeguridadBlanca
Josh @jlev

Problem

The environmental monitoring and mapping application requires full functionality in offline scenarios. Monitors will be taking observations several days travel from towns, and should not be required to send their phones to connect to the internet. Reliable peer-to-peer data & media sync between mobile and management applications is essential.

User Scenario

Monitors will be equipped with mobile phones running offline web applications will store observations when recorded in a simple key-value store (indexDB, levelDB, or equivalent builtin to Chrome). Managers will have a laptop or Chromebook running a background data-sync service and analysis application.

The laptop or portable hotspot will create a wifi network, so that data-sync can occur without requiring a plug-in connection.

The manager can view progress and control the data-sync service with a clear web interface, which could also be extended to provide simple filtering and local analysis capabilites (integrating MapFilter and turf.js).

After syncing recent monitor observations, the manager can export their current database to flat json and media files on an attached drive, for transportation to town, further analysis and syncing, and optional upload to the internet.

Communication Protocol

Ideally the user should not have to initiate a sync manually, the application should identify peers, copy records, and resolve conflicts manually.

When the monitor app is open and connected to a wireless network, it will send a multicast UDP packet announcing its presence. If the data-sync service is running, it will receive the packet, respond with a packet directed to the client IP with a url scheme for further communication.

The data-sync service will provide a REST API, to which the monitor app will initiate further communications. The monitor app can POST a list of new observation records, or GET a list of changes to download. PUT and DELETE actions should not be allowed by the monitor app.

Sync Algorithm

To reliably copy observation records with intermittent connections, we have investigated two strategies: use a vector clock to track each node's version history, or compare files solely on a hash of the data. Each has advantages and areas of concern, and more testing is necessary to determine an appropriate solution given our problem constraints.

Vector Clock

A vector clock is a method of tracking document versions between nodes, applying partial ordering to determine which has the most recent edit. On starting a sync action, the sending node increments its own counter and sends it along with the data. On accepting a sync, the receiving node increments its own counter and updates every other counter in the vector with the maximum of the value in its own vector clock and the value in the vector in the received message.

In highly-connected systems, this can maintain consistency between temporarily disconnected nodes. In primarily offline systems, extra care will need to be taken to ensure that versions are not lost if a divergence occurs and a descendent chain cannot be determined. This problem is discussed in detail in why vector clocks are hard.

One solution to the problem of divergent trees is to track modification time as well as version in each clock, and prune the oldest entries on updating. This avoids adding disconnected versions from old entries, at the cost of potentially increased merge conflicts. Since this system will primarily add records instead of modifying them, this should not be a significant burden on the user.

Good example python and javascript implementations of vector clocks exist with GPL licenses, and the technique is widely used in "gossip protocols" and distributed databases.

Hash Comparison

A potentially simpler, but more resource intensive approach is to hash each record on a device, compare each one with every hash on the server, and copy those which are missing in both directions. This will not provide an ordering of edit histories, but that may be possible by comparing file creation and modification times.

Camilo, add more detail here

Database Design

TBD

User Interface

TDB

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
test-client-app		test-client-app
test-data-sync		test-data-sync
.gitignore		.gitignore
README.md		README.md
concept-overview.graffle		concept-overview.graffle
concept-overview.png		concept-overview.png
test-udp.py		test-udp.py
testhttp.py		testhttp.py
user-scenario.graffle		user-scenario.graffle
user-scenario.png		user-scenario.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HackTheRainforest

Offline Data Sync Proposal

Problem

User Scenario

Communication Protocol

Sync Algorithm

Vector Clock

Hash Comparison

Database Design

User Interface

About

Releases

Packages

Contributors 2

Languages

spacedogXYZ/hacktherainforest-data-sync-proposal

Folders and files

Latest commit

History

Repository files navigation

HackTheRainforest

Offline Data Sync Proposal

Problem

User Scenario

Communication Protocol

Sync Algorithm

Vector Clock

Hash Comparison

Database Design

User Interface

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages