Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open Data 2.0 (Lightning talk): Privacy preserving tech: the tools for safe open data use #18

Open
jsheunis opened this issue May 1, 2020 · 2 comments

Comments

@jsheunis
Copy link
Contributor

jsheunis commented May 1, 2020

Privacy preserving tech: the tools for safe open data use

By Emma Bluemke, University of Oxford

  • Theme: Open Data 2.0
  • Format: Lightning talk

Abstract

In medical imaging, necessary privacy concerns limit us from fully maximizing the benefits of AI in our research. Fortunately, with other industries also limited by regulations of private data, three cutting edge techniques have been developed that have huge potential for the future of machine learning in healthcare: federated learning, differential privacy, and encrypted computation. These modern privacy techniques would allow us to train our models on encrypted data from multiple institutions, hospitals, and clinics without sharing the patient data.
Recently, these techniques have become increasingly easier for researchers to implement, thanks to the efforts of scientists from Google, DeepMind, Apple, OpenAI, and many others.

It's also becoming increasingly important to maintain this data privacy: true anonymization of data is difficult to achieve because it's unclear what kind of information machine learning can extract from seemingly innocuous data. For example, it's possible to predict the age and sex of a patient from some medical images, and we've seen that in some cases, multiple anonymized datasets can be combined to deanonymize them.

These tools will make is easy for us (imaging scientists) to securely train our models while preserving patient privacy, without being privacy experts ourselves.

It's important that our medical imaging community is aware of these new possibilities. These developments could inspire new collaborations between institutions, enable meta-analysis that were previously considered impossible, and allow us to make rapid improvements in our current AI models as we're able to train them on more data.

Not only will this allow us to have more training data, it will allow us to have more accurate training data: if we can train on data from other institutions worldwide, we can properly diversify our datasets to ensure our research better serves our world's population. For example, current volunteer-base datasets can often feature a disproportionate number of young, university student subjects, which results in training data that is not representative of our patient populations.

https://blog.openmined.org/federated-learning-differential-privacy-and-encrypted-computation-for-medical-imaging/

And I'd like to mention that free, open-source tools like PySyft and PyGrid are & will soon be available for this purpose:

https://blog.openmined.org/what-is-pygrid-demo/

PyGrid is a peer-to-peer platform for private data science and federated learning. With PyGrid, data owners can provide, monitor, and manage access to their own private data clusters. The data does not leave the data owner’s server. Data scientists can then use PyGrid to perform private statistical analysis on the private dataset, or even perform federated learning across multiple institution’s datasets.

Just to be clear - this has nothing to do with blockchain.

Useful Links

https://www.openmined.org/

Tagging @em-blue

@em-blue
Copy link

em-blue commented Jun 11, 2020

Since OHBM is international, I'd like to also mention that PySyft tutorials have been translated into 15 languages so far:
https://github.com/OpenMined/PySyft/tree/master/examples/tutorials/translations

@em-blue
Copy link

em-blue commented Jun 22, 2020

Looks good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants