Skip to content
nickel-chrome edited this page Oct 22, 2015 · 10 revisions

Cucumber Sync White Paper

Introduction

As the use of social networks has exploded and, perhaps more importantly, as the use of cloud based storage for personal information management and syncing (PIMS), such as contact lists and calendars, has become pervasive there is an increasing risk to the confidentiality and security of personal information. Although this information is often dismissed as inconsequential, social networks and other cloud based services arguably hold as much if not more personal information than a person's physical home.

Of particular concern is the increasing ability and willingness of governments to engage in the mass collection of personal information stored in the cloud. Given the quantity and reach of this information governments and their security services seem to be attracted, like bees to nectar, unable to resist the temptation to consume and store it all under the guise of "national security". Further it appears that access to information accumulated through mass collection is being granted to a variety of government agencies beyond the traditional security services and with lower standards of judicial oversight in comparison to more conventional surveillance methods such as search warrants, postal intercept and wire taps.

Thus the security of our personal information can only be maintained by ourselves. Cucumber Sync (think PIMMS the drink) is an attempt to address the needs of personal information management and syncing across multiple devices, while maintaining the confidentiality of the information from targeted attacks by third parties and assuming a hostile systems administrator of the cloud based storage.

Security

Threat Model

When assessing the threat model two distinct use cases have been taken in to account; a member of the general public who wants to keep their information confidential; and an activist who wants to keep their information secret. In both cases it is assumed that the systems administrator of cloud based storage is hostile and actively engaged in a man-in-the-middle attack against all remote access to the service and in addition also engages in brute force attacks against the data held in storage.

For the purposes of this paper confidential information is defined as exhibiting the following attributes:

  • Access restricted to authorised parties

Similarly secret information is defined as exhibiting all attributes of confidential information with the addition of:

  • Only the user and their authorised devices are trusted. Third parties, such as cloud storage services, are specifically not trusted.
  • The user is not identifiable by application metadata

Note: defence against a targeted attack involving physical access to the device and/or remote installation of malicious software is not considered and in this case all bets are off.

Design

To achieve the desired attributes there are three key problems to solve; access control; end-to-end encryption; and key exchange.

Access Control

Although in theory access to the contact list can be restricted using password based authentication, in practice this is fundamentally flawed when the systems administrator is hostile, as is the case in our threat model. In fact no authentication method mediated by a third party can address this threat. However, because of its familiarity to users, a standard authentication method does provide a shared secret that can be leveraged for key exchange and is arguably most suitable to maintain confidentiality.

On the other hand to effectively maintain secrecy a more secure authentication method is required. To achieve this the Key Exchange method needs to protect against an adversary that has full access to the storage and knows the password, i.e. a hostile systems administrator.

Importantly although password based authentication alone cannot maintain secrecy, it does prevent unauthorised access by other third parties allowing us to limit the threat exposure to the systems administrator and those that can influence her.

End-to-end Encryption

As explained earlier authentication based access control on its own is not enough to address the threat of a hostile systems administrator. To have confidence that only authorised parties can access the contact list its contents must be encrypted end-to-end, i.e. to synchronise her contact list between devices Alice must be able to send a message from one of her devices to another while Eve, even if she can intercept the message in transit, cannot determine its contents.

Importantly an implementation of end-to-end encryption necessitates that information is encrypted in storage and during transit and is only decrypted at the end point, e.g. mobile device or other client. This requirement imposes significant constraints, for example it makes server side searching impossible and for the most part makes the exposure of more than an ID and modified timestamp redundant. As a result existing client server protocols for sharing contact lists, namely CardDAV, provide little if any benefit over a simple protocol providing create, read, update, and delete (CRUD) functionality. Hence Mozilla Weave Sync, aka Firefox Sync, was chosen for the backend for CucumberSync as it supports simple, REST based, CRUD operations for an arbitrary set of objects grouped into collections. In addition Weave Sync provides a convention for managing a central key store allowing different keys to be used for each collection and for the rotation of keys if required.

Key Exchange

Two options immediately present themselves; use a passphrase as input in to a key derivation function (KDF); or generate a random secret key and use an appropriate key exchange mechanism.

Importantly the Mozilla Weave project has implemented both of these solutions; a passphrase is used in the Weave storage API/data v1.1/v3; and a random, but human readable, secret key is used in Weave storage API/data v1.1/v5. Mozilla have found significant short comings with both of these solutions and have now moved to a third solution, Weave storage API/data v1.5/v5, or Firefox Accounts onepw protocol, which uses a single password for both authentication and as input into a KDF. This is more convenient for the user, but importantly is more complicated to implement as the password must not be known by the server. Flock uses a similar approach, but a significant advantage is that the onepw protocol also supports password reset and recovery.

Weave storage API/data v1.5/v5 was chosen as the default key exchange method for Cucumber Sync for the same reasons as Mozilla, namely that at present it provides a good, and possibly the best, compromise between security and useability. For a higher level of security eXfio Peer v2, a custom P2P key exchange, meets the secrecy requirements of the activist use case by leveraging existing authorised clients as one-time password tokens when registering a new device.

A bonus of eXfio Peer v2 is that it naturally abstracts in to a generalised inter-device messaging protocol that can be used to send arbitrary messages. In addition perfect forward secrecy can be achieved using axolotl key ratcheting as used in the Text Secure application, although this is not implemented in Cucumber Sync at this stage.

Data Synchronisation

Weave Sync Storage API

As discussed the Weave Sync storage API provides a simple, REST based, protocol to manage an arbitrary set of objects grouped into collections. Importantly the API is also content neutral. It exposes an id, modified timestamp, sortindex and ttl (time to live) value for each object, however the payload is treated as a text blob providing the greatest flexibility in how content is used.

Weave Basic Object

{
  "id": "3DFC9ED6-F679-405E-A7EC-149E199E7F0F", //unique within collection
  "modified": 1388635807.41,                    //modified timestamp
  "sortindex": 140,                             //sortindex - sort order within collection
  "payload": "{ \"this is\": \"an example\" }"  //content neutral payload - typically JSON encoded
}

As Weave Sync uses JSON for data exchange for simplicity Cucumber Sync also uses JSON as the format for contacts and calendar data, specifically jCard and jCal, which is part of the vCard and iCalendar v4.0 standard.

Contacts

Management and synchronisation of a contact list is relatively straight forward. Weave Sync supports use of a UUID as a record identifier, which combined with modified timestamps can be used to determine what needs to be synced. There is also a similar concept to a CardDAV ctag which can be used to determine if the contact list has changed since the last sync.

Cucumber Sync stores contacts in the exfiocontact collection.

exfiocontact Payload

["vcard",
  [
    ["version", {}, "text", "4.0"],
    ["n", {}, "text", ["Gump", "Forrest", "", "", ""]],
    ["fn", {}, "text", "Forrest Gump"],
    ["org", {}, "text", "Bubba Gump Shrimp Co"],
    ["title", {} ,"text", "Shrimp Man"],
    ["photo", {"mediatype":"image/gif"}, "uri", "http://www.example.com/dir_photos/my_photo.gif"],
    ["tel", {"type":["work", "voice"]}, "uri", "tel:+1-111-555-1212"],
    ["tel", {"type":["home", "voice"]}, "uri", "tel:+1-404-555-1212"],
    ["adr",
      {"label":"100 Waters Edge\nBaytown, LA 30314\nUnited States of America", "type":"work"},
      "text",
      ["", "", "100 Waters Edge", "Baytown", "LA", "30314", "United States of America"]
    ],
    ["adr",
      {"label":"42 Plantation St.\nBaytown, LA 30314\nUnited States of America", "type":"home"},
      "text",
      ["", "", "42 Plantation St.", "Baytown", "LA", "30314", "United States of America"]
    ],
    ["email", {}, "text", "forrestgump@example.com"],
    ["rev", {}, "timestamp", "2008-04-24T19:52:43Z"]
  ]
]

Calendar

Management and synchronisation of a calendar is a little more complicated. Although for a full sync the same technique used for contacts would apply, as a calendar is constantly growing, the number of records being synchronised would quickly become unwieldily. To manage this typically only events within a limited window of time are synchronised, e.g. one month prior to and three months following the current date. However to maintain confidentiality this information is not exposed to the server, hence another solution is required.

A possible solution would be to maintain a separate hash table of the calendar events with the record identifier being the hash key being derived from the date an event takes place. Thus the UUID of all events that take place on a date can be determined by looking up the hash table. Further synchronisation can now be limited to a window of time by looking up the hash table for all dates within the window.

####Data Model

 --------------                   --------------                   --------------
|              |                 |              |                 |              |
|  datehash    |                 | datehash_cal |                 |     cal      |
|              |                /|              |\                |              |
| date         |-|---------------| datehash     |-|-------------|-| calitemid    |
| datehash     |                \| calitemid    |/                |              |
|              |                 |              |                 |              |
|              |                 |              |                 |              |
 --------------                   --------------                   -------------- 

There may be security implications to hashing a value within such a limited information space, i.e. dates, however this can be mitigated by using a separate key and rotating it frequently.

Cucumber Sync stores calendar items in the exfiocalitem collection and maps these using a date hash table stored in the exfiocaldate collection.

exfiocalitem Payload

["vcalendar",
  [
    ["calscale", {}, "text", "GREGORIAN"],
    ["prodid", {}, "text", "-//Example Inc.//Example Calendar//EN"],
    ["version", {}, "text", "2.0"]
  ],
  [
    ["vevent",
      [
        ["dtstamp", {}, "date-time", "2008-02-05T19:12:24Z"],
        ["dtstart", {}, "date", "2008-10-06"],
        ["summary", {}, "text", "Planning meeting"],
        ["uid", {}, "text", "4088E990AD89CB3DBB484909"]
      ],
      []
    ]
  ]
]

exfiocaldate Payload

{
  "datehash": "06990AD89C88EB3DBB9A6B",
  "itemids": [
    "4088E990AD89CB3DBB4849095",
    "9CB3DBB40248490978E990ADF",
    ...
    "2E990ADF4849CB3DBB4090978 "
  ]
}

Conclusion

Cucumber Sync provides a working example of a personal information management and syncing solution, supporting multiple devices, while meeting the security needs of the general public and an activist.

To achieve this Cucumber Sync restricts access to authorised parties using a combination of authentication based access control, end-to-end encryption and peer to peer key exchange. In doing so Cucumber Sync is able to maintain the confidentiality, and arguably the secrecy, of the information from targeted attacks by third parties and most importantly a hostile systems administrator of the cloud based storage.