Skip to content

[Researchers] Beiwe Data Privacy and Security

Eli Jones edited this page Sep 27, 2021 · 1 revision

Key security aspects of the Beiwe Research Platform

  • Participant names are coded with a unique 8-character Beiwe Participant ID.
  • Participants will login to the Beiwe smartphone application with their unique ID and password.
  • All data collection is tied to the 8-character Beiwe Participant ID (no identifiers like participant name or contact information), and only clinical research collaborators will have access to the master key, which will be stored securely.
  • All data is encrypted in transit and at rest. The application will not store data on the participants' mobile device in an unencrypted form.
  • Audio recordings (voice surveys) will be encrypted once recording is complete.
  • Indirect identifiers (telephone numbers and MAC addresses) are permanently anonymized using industry recognized encryption techniques which renders all data unidentifiable.
  • No identifiable data will be stored on the mobile device. All identifiers are rendered innocuous using an encryption scheme whereby every phone generates its own unique cryptographic code during the Beiwe registration process, and then uses that code to permanently encrypt the phone numbers and MAC addresses if configured to be collected by Beiwe.

Types of Data Collected

Data that is generated without any direct involvement from the subject, such as GPS data and accelerometer data.

Data that requires active participation from the subject for its generation, such as surveys and audio samples.

Data Anonymity

Participant Anonymity

Every participant is assigned a randomly generated 8-character participant ID (for example, "d4w192bg"), and all participant data are connected only to that ID.

Other Data Anonymity

If collected, these four types of data are encrypted by Beiwe using the industry-standard PBKDF2 protocol and SHA-256 algorithm to permanently anonymize the data by replacing them with surrogate keys.:

  • Phone numbers of incoming and outgoing phone calls
  • Phone numbers of incoming and outgoing text messages
  • MAC addresses of nearby Wi-Fi routers
  • MAC addresses of nearby Bluetooth devices

Encrypting these data means that each phone number and MAC address gets turned into a string of 88-character random numbers and letters, but a certain phone number always gets transformed into the same random string.

Imagine that participant D4MAAW called the phone number 617-123-4567 once on Monday, and then received two calls from that same phone number on Tuesday. A researcher analyzing the Beiwe data could see when those three calls happened and could tell that they all involve the same phone number, but couldn't tell what that phone number was.

Encrypting the MAC addresses of Wi-Fi routers has the same effect. A researcher analyzing Beiwe could tell that a participant was near a certain Wi-Fi router at 10am on every morning, Monday through Friday, and could thereby surmise that the participant was probably in the same room at 10am every morning, Monday through Friday, but the data would not reveal the actual MAC address of the router.

During installation, the Beiwe app generates its own encryption key, which is never uploaded to the server making it theoretically impossible to reveal the original phone numbers or MAC addresses. If you had access to someone's phone, you could steal that person's key and potentially reveal data for them alone, but if you stole someone's phone, you'd be able to read their call logs directly from the phone without Beiwe anyway. The surrogate key mapping is not just different on every phone, it is also different for one participant if that participant uninstalled and then reinstalled the app (because the phone would lose the encryption key when the app is uninstalled). Importantly, it is impossible to undo the encryption without knowing the attributes of the generated encryption key which ensures that identifying information remains secure. As such, this data is not directly identifiable.

Potential Gaps in Data Anonymity

Two types of data can potentially contain personally identifiable information: the GPS data (which record location) and the voice recording data (in which a participant could mention personally identifiable information).

Beiwe's GPS data provide enough detail to identify individual buildings or street addresses within some degree of confidence, although a fair amount of analysis would be required to transform a series of GPS coordinates into a home address for a participant. As stated above, the Beiwe smartphone application is completely customizable when it comes to data collection, so a particular study could disable GPS data collection from a study if desired if it is not part of the research question.

Two new features were added to Beiwe to enable use of GPS data without viewing the actual Lat/Long coordinates:

  1. When setting up a study, configure it as a "production study". Then researchers will only be able to download processed data output from the Data Analysis Pipeline. This setting is higher security.
  2. The Fuzz GPS feature is a toggle in a Beiwe study's App Settings. If Fuzz GPS is on for a study, random latitude and longitude offsets are added to the GPS coordinates. This means that for any study participant, you still have their complete movement trace, but you don't know their actual starting point or true coordinates at any time.

The app's voice recording feature does not ask for any identifiable information, but it is conceivable that, in the course of describing his/her day, a participant could speak his/her own name or reveal other identifying details. To prevent this, researchers can add text to the application voice recording screen to ask participants NOT to mention their name, the names of any other people, or any specific locations. An example of user interface text that can be show on the Voice Recording screen is as follows: "Please describe how you've felt over the last 24 hours in relation to events that have occurred as well as to upcoming events that are on your mind. It is okay to describe situations and people abstractly ("friend", "restaurant") but avoid specific names. When you are ready, press 'Record' and speak for no more than 4 minutes. Press 'Stop' when you are finished, 'Play' to listen to the recording, and 'Done' to submit the recording."

Participant Authentication

Login Protection

Participants must log in to use the app in any capacity with a minimum 6-character password. All functions of the app, including filling out surveys and making voice recordings are protected behind a login wall. The only parts of the app that are not login-protected are the "Call My Clinician" button and the notification reminders that say either "please take a survey" or "please make a voice recording." The app automatically logs out after a configurable number of minutes of inactivity.

If a participant forgets his/her password, the participant can have the password reset by calling "Call the Research Assistant" button in the Forgot Password section of the application. The application will inform the participant that they should not reveal their name to the Research Assistant when requesting a password reset. The Research Assistant will give the participant a temporary password over the phone, and the participant is immediately required to choose a new, permanent password in the application.The server does not store the participant's plaintext password, only the participant's hashed password.

When calling the Research Assistant to request a password reset, the identity of the caller will be verified by the participant when he or she reads the clinical research assistant their Beiwe Patient ID number. The number programmed into the participant's version of the Beiwe app will be the phone number for the clinical collaborator research assistant (collaborator's staff, not the Onnela Lab staff). The participant's Beiwe Patient ID number is listed within the app on the Forgot Password section of the application, so is readily available to the participant when calling the clinical research assistant, as is a prompt for the participant not to reveal their name or any identifying information. Using only the Beiwe Patient ID number to verify the identity of the caller is more secure than other alternative methods (ex: name or email address).

Signup Safeguards

In order for a participant to use the Beiwe app, a study administrator must create a participant ID and temporary password for that participant. Anyone can install the Beiwe app by downloading it from http://studies.beiwe.org/download for Android devices or from the Apple app store for iOS devices, but the app does nothing unless the user registers it with a valid participant ID and password provided by the study coordinators to register the app in a particular study.

A participant ID can only be connected to one phone at a time; if someone tries to register a second phone with a participant ID that is already registered to an existing phone, the second phone will not be able to register.

In order to upload data, a phone must have a valid username, password, and phone ID number. This is to prevent unauthorized phones from spoofing data.

Data Encryption

All data on phones, on the server, and in-transit use industry-standard encryption techniques. The phone also uses asymmetric encryption, meaning that even the phone cannot read its own data; data recorded on the phone can only be read on the server.vi

During registration the device is provided with the public half of a 2048 bit RSA encryption key. With this key the device can encrypt data, but only the server, which has the private key, can decrypt it. The RSA key is then used to encrypt a symmetric AES key for bulk encryption. These keys are generated as needed by the app, are not stored, and must be decrypted by the server before any data can be recovered. Data received by the server are then re-encrypted with a master key provided for that study, and then stored on Amazon S3, an industry-standard secure storage platform housed in data centers that are protected by armed guards.

Amazon Web Services has released a whitepaper describing how EC2 and S3, the two Amazon services Beiwe uses, meet HIPAA compliance standards. Encrypted Beiwe data is stored on the Onnela Lab AWS account, which only Jukka-Pekka Onnela and authorized Onnela Lab staff has login credentials to access. All data connections to the web service hosting the study are negotiated on industry-standard SSL/TLS connections, removing the vulnerability of man-in-the-middle attacks or packet-sniffing data leaks.

Below is a visual of the data encryption system including the phones, Amazon servers, and the separation of participant information behind a collaborator's Firewall (if collaborators choose to store patient information electronically).

Clone this wiki locally