Skip to content

Natural Language Processing models using private and secure data. Powered by OpenMined's tools PySyft and SyferText.

License

Notifications You must be signed in to change notification settings

salgadev/private_nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

private_nlp

Natural Language Processing using private and secure data. Powered by OpenMined's tools PySyft and SyferText.

Blog post

The contents of this repo were featured in the Encrypted training on medical text data using SyferText and PyTorch blog post at OpenMined's blog

Disclaimer

This is an ongoing work in progress. Be prepared to tackle coding errors and/or typos.

Getting Started

Follow the instructions to install:

  • PySyft==0.2.5. There is an incompatibility issue with Tensorflow on version 0.2.6
  • SyferText

Data

Dataset compiled for Natural Language Processing using a corpus of medical transcriptions and custom-generated clinical stop words and vocabulary.

  • X.csv. Fully processed dataset obtained from running the Data Modelling notebook.
  • classes.txt. Text file describing the dataset's classes: Surgery, Medical Records, Internal Medicine and Other
  • train.csv. Training data subset. Contains 90% of the X.csv processed file.
  • test.csv. Test data subset. Contains 10% of the X.csv processed file.

Authors and acknowledgment

Notebooks

Scripts

Holds the script used to download whole datasets using url

Contributing

Issues and Pull requests welcomed

License

GNU GENERAL PUBLIC LICENSE VERSION 3

About

Natural Language Processing models using private and secure data. Powered by OpenMined's tools PySyft and SyferText.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published