Skip to content

Commit

Permalink
Update docs and labels for CXR-3 release
Browse files Browse the repository at this point in the history
  • Loading branch information
haydengunraj committed Jun 2, 2022
1 parent 6599980 commit 5d4f01c
Show file tree
Hide file tree
Showing 6 changed files with 60,451 additions and 28 deletions.
15 changes: 8 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

**Recording to webinar on [How we built COVID-Net in 7 days with Gensynth](https://darwinai.news/fny)**

**Update 06/02/2022:** We released [COVIDx CXR-3](https://www.kaggle.com/datasets/andyczhao/covidx-cxr2/versions/7), a cleaned version of the dataset in which several hundred bad training images have been removed. The new dataset contains 29,986 images from 16,648 patients.\
**Update 11/28/2021:** We released a new training dataset with over 30,000 CXR images from a multinational cohort of over 16,400 patients. The dataset contains 16,490 positive COVID-19 images from over 2,800 patients. The COVIDx V9A dataset is for detection of no pneumonia/non-COVID-19 pneumonia/COVID-19 pneumonia, and COVIDx V9B dataset is for COVID-19 positive/negative detection.\
**Update 10/19/2021:** We released a new COVID-Net CXR-3 [model](docs/models.md) for COVID-19 positive/negative detection which was trained and tested on the COVIDx8B dataset leveraging the new MEDUSA (Multi-scale Encoder-Decoder Self-Attention) architecture.\
**Update 04/21/2021:** We released a new COVIDNet CXR-S [model](docs/models.md) and [COVIDxSev](create_COVIDxSev.ipynb) dataset for airspace severity grading in COVID-19 positive patient CXR images. For more information on training, testing and inference please refer to severity [docs](docs/covidnet_severity.md).\
Expand Down Expand Up @@ -71,14 +72,14 @@ If you find our work useful, can cite our paper using:
}
```
## Quick Links
1. COVIDNet-CXR models (COVID-19 detection using chest x-rays): https://github.com/lindawangg/COVID-Net/blob/master/docs/models.md
2. COVIDNet-CT models (COVID-19 detection using chest CT scans): https://github.com/haydengunraj/COVIDNet-CT/blob/master/docs/models.md
3. COVIDNet-CXR-S models (COVID-19 airspace severity grading using chest x-rays): https://github.com/lindawangg/COVID-Net/blob/master/docs/models.md
4. COVIDNet-S models (COVID-19 lung severity assessment using chest x-rays): https://github.com/lindawangg/COVID-Net/blob/master/docs/models.md
5. COVIDx-CXR dataset: https://github.com/lindawangg/COVID-Net/blob/master/docs/COVIDx.md
6. COVIDx-CT dataset: https://github.com/haydengunraj/COVIDNet-CT/blob/master/docs/dataset.md
1. COVID-Net CXR models (COVID-19 detection using chest x-rays): https://github.com/lindawangg/COVID-Net/blob/master/docs/models.md
2. COVID-Net CT models (COVID-19 detection using chest CT scans): https://github.com/haydengunraj/COVIDNet-CT/blob/master/docs/models.md
3. COVID-Net CXR-S models (COVID-19 airspace severity grading using chest x-rays): https://github.com/lindawangg/COVID-Net/blob/master/docs/models.md
4. COVID-Net S models (COVID-19 lung severity assessment using chest x-rays): https://github.com/lindawangg/COVID-Net/blob/master/docs/models.md
5. COVIDx CXR dataset: https://github.com/lindawangg/COVID-Net/blob/master/docs/COVIDx.md
6. COVIDx CT dataset: https://github.com/haydengunraj/COVIDNet-CT/blob/master/docs/dataset.md
7. COVIDx-S dataset: https://github.com/lindawangg/COVID-Net/tree/master/annotations
8. COVIDNet-P inference for pneumonia: https://github.com/lindawangg/COVID-Net/blob/master/docs/covidnet_pneumonia.md
8. COVID-Net P inference for pneumonia: https://github.com/lindawangg/COVID-Net/blob/master/docs/covidnet_pneumonia.md
9. CancerNet-SCa models for skin cancer detection: https://github.com/jamesrenhoulee/CancerNet-SCa/blob/main/docs/models.md

Training, inference, and evaluation scripts for COVIDNet-CXR, COVIDNet-CT, COVIDNet-S, and CancerNet-SCa models are available at the respective repos
Expand Down
44 changes: 23 additions & 21 deletions docs/COVIDx.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
# COVIDx Dataset
**Update 11/26/2021:Released a new training dataset with over 30,000 CXR images from a multinational cohort of over 16,400 patients. The dataset contains 16,490 positive COVID-19 images from over 2,800 patients. The COVIDx V9A dataset is for detection of no pneumonia/non-COVID-19 pneumonia/COVID-19 pneumonia, and COVIDx V9B dataset is for COVID-19 positive/negative detection.**\
**Update 04/21/2021:Released COVIDxSev, a new airspace severity grading dataset for COVID-19 positive patients for COVIDNet CXR-S model.**\
**Update 03/19/2021:Released new datasets with both over 16,000 CXR images from a multinational cohort of over 15,100 patients from at least 51 countries. The dataset contains over 2,300 positive COVID-19 images from over 1,500 patients. The COVIDx V8A dataset is for detection of no pneumonia/non-COVID-19 pneumonia/COVID-19 pneumonia, and COVIDx V8B dataset is for COVID-19 positive/negative detection.**\
**Update 01/28/2021:Released new datasets with over 15600 CXR images and over 1700 positive COVID-19 images. The COVIDx V7A dataset is for detection of no pneumonia/non-COVID-19 pneumonia/COVID-19 pneumonia, and COVIDx V7B dataset is for COVID-19 positive/negative detection.**\
# COVIDx CXR Dataset
**Update 06/02/2022: Released a cleaned training dataset (COVIDx CXR-3) with several hundred training images removed which were not representative of typical CXRs. The [new dataset](link tbd) contains 29,986 images from 16,648 patients.**\
**Update 11/26/2021: Released a new training dataset with over 30,000 CXR images from a multinational cohort of over 16,400 patients. The dataset contains 16,490 positive COVID-19 images from over 2,800 patients. The COVIDx V9A dataset is for detection of no pneumonia/non-COVID-19 pneumonia/COVID-19 pneumonia, and COVIDx V9B dataset is for COVID-19 positive/negative detection.**\
**Update 04/21/2021: Released COVIDxSev, a new airspace severity grading dataset for COVID-19 positive patients for COVIDNet CXR-S model.**\
**Update 03/19/2021: Released new datasets with both over 16,000 CXR images from a multinational cohort of over 15,100 patients from at least 51 countries. The dataset contains over 2,300 positive COVID-19 images from over 1,500 patients. The COVIDx V8A dataset is for detection of no pneumonia/non-COVID-19 pneumonia/COVID-19 pneumonia, and COVIDx V8B dataset is for COVID-19 positive/negative detection.**\
**Update 01/28/2021: Released new datasets with over 15600 CXR images and over 1700 positive COVID-19 images. The COVIDx V7A dataset is for detection of no pneumonia/non-COVID-19 pneumonia/COVID-19 pneumonia, and COVIDx V7B dataset is for COVID-19 positive/negative detection.**\
**Update 01/05/2021: Released new dataset for binary classification (COVID-19 positive or COVID-19 negative). Train dataset contains 517 positive and 13794 negative samples. Test dataset contains 100 positive and 100 negative samples.**\
**Update 10/30/2020: Released new dataset containing 517 COVID-19 train samples. Test dataset remains the same for consistency.**\
**Update 06/26/2020: Released new dataset with over 14000 CXR images containing 473 COVID-19 train samples. Test dataset remains the same for consistency.**\
**Update 05/13/2020: Released new dataset with 258 COVID-19 train and 100 COVID-19 test samples. There are constantly new xray images being added to covid-chestxray-dataset, Figure1, Actualmed and COVID-19 radiography database so we included train_COVIDx3.txt and test_COVIDx3.txt, which are the xray images we used for training and testing of the CovidNet-CXR3 models.**

The current COVIDx dataset can be downloaded from the following open source site:
* https://www.kaggle.com/andyczhao/covidx-cxr2?select=competition_test
The current COVIDx CXR dataset can be downloaded from the following open source site:
* https://www.kaggle.com/andyczhao/covidx-cxr2

Or can be manually constructed through our dataset scripts using the following open source chest radiography datasets:
* https://github.com/ieee8023/covid-chestxray-dataset
Expand All @@ -24,15 +25,13 @@ Or can be manually constructed through our dataset scripts using the following o
<!--We especially thank the Radiological Society of North America, National Institutes of Health, Figure1, Actualmed, M.E.H. Chowdhury et al., Dr. Joseph Paul Cohen and the team at MILA involved in the COVID-19 image data collection project for making data available to the global community.-->

## Steps to download the dataset directly
The latest COVIDx9 training and testing dataset can be downloaded directly from Kaggle using the following steps:
1. Download the complete train and test datasets for Covidx9 from the [COVIDx CXR-2 Kaggle Dataset](https://www.kaggle.com/andyczhao/covidx-cxr2?select=competition_test)
The latest COVIDx CXR-3 training and testing dataset can be downloaded directly from Kaggle using the following steps:
1. Download the complete train and test datasets for COVIDx CXR-3 from the [COVIDx CXR Kaggle Dataset](https://www.kaggle.com/andyczhao/covidx-cxr2?select=competition_test)

The version 5 train and test text files are compatible with the latest [train\_COVIDx9B.txt](../labels/train_COVIDx9B.txt) and [test\_COVIDx9B.txt](../labels/test_COVIDx9B.txt) label files for COVID-19 positive/negative detection, and [train\_COVIDx9A.txt](../labels/train_COVIDx9A.txt) and [test\_COVIDx9A.txt](../labels/test_COVIDx9A.txt) label files for for detection of no pneumonia/non-COVID-19 pneumonia/COVID-19 pneumonia.

* [train\_COVIDx9A.txt](../labels/train_COVIDx9A.txt): This file contains the training labels for detection of no pneumonia/non-COVID-19 pneumonia/COVID-19 pneumonia.
* [test\_COVIDx9A.txt](../labels/test_COVIDx9A.txt): This file contains the testing labels for detection of no pneumonia/non-COVID-19 pneumonia/COVID-19 pneumonia.
* [train\_COVIDx9B.txt](../labels/train_COVIDx9B.txt): This file contains the training labels for COVID-19 positive/negative detection.
* [test\_COVIDx9B.txt](../labels/test_COVIDx9B.txt): This file contains the testing labels for COVID-19 positive/negative detection.
* [train\_COVIDx_CXR-3A.txt](../labels/train_COVIDx_CXR-3A.txt): This file contains the training labels for detection of no pneumonia/non-COVID-19 pneumonia/COVID-19 pneumonia.
* [test\_COVIDx_CXR-3A.txt](../labels/test_COVIDx_CXR-3A.txt): This file contains the testing labels for detection of no pneumonia/non-COVID-19 pneumonia/COVID-19 pneumonia.
* [train\_COVIDx_CXR-3B.txt](../labels/train_COVIDx_CXR-3B.txt): This file contains the training labels for COVID-19 positive/negative detection.
* [test\_COVIDx_CXR-3B.txt](../labels/test_COVIDx_CXR-3B.txt): This file contains the testing labels for COVID-19 positive/negative detection.

## Steps to generate the dataset
The older COVIDx8 training and testing dataset can be reconstructed using the following steps:
Expand All @@ -42,7 +41,7 @@ The older COVIDx8 training and testing dataset can be reconstructed using the fo
* `git clone https://github.com/agchung/Actualmed-COVID-chestxray-dataset.git`
* go to this [link](https://www.kaggle.com/tawsifurrahman/covid19-radiography-database/version/3) to download the COVID-19 Radiography database. Only the COVID-19 image folder and metadata file is required. The overlaps between covid-chestxray-dataset are handled in the dataset curation scripts. **Note:** for COVIDx versions 8 & 7 please use Version 3 of the dataset, and for versions COVIDx6 and below please use Version 1.
* go to this [link](https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/data) to download the RSNA pneumonia dataset
* go to this [link] (https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70230281) to download the RICORD COVID-19 dataset, clinical data csv, and annotations
* go to this [link](https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70230281) to download the RICORD COVID-19 dataset, clinical data csv, and annotations
2. Create a `data` directory and within the data directory, create a `train` and `test` directory
3. Use [create\_ricord\_dataset\\create\_ricord\_dataset.ipynb](../create_ricord_dataset/create_ricord_dataset.ipynb) to pre-process the RICORD dataset before handling.
3. Use [create\_COVIDx\_binary.ipynb](../create_COVIDx_binary.ipynb) to combine the three datasets to create COVIDx for binary classification. Make sure to remember to change the file paths. Use [create\_COVIDx.ipynb](../create_COVIDx.ipynb) for datasets compatible with COVIDx5 and earlier models (not binary classification).
Expand All @@ -52,26 +51,29 @@ The older COVIDx8 training and testing dataset can be reconstructed using the fo
* [train\_COVIDx8B.txt](../labels/train_COVIDx8B.txt): This file contains the samples used for training COVIDNet-CXR for COVID-19 positive/negative detection.
* [test\_COVIDx8B.txt](../labels/test_COVIDx8B.txt): This file contains the samples used for testing COVIDNet-CXR for COVID-19 positive/negative detection.

## Latest COVIDx data distribution
COVIDx V9B
## Latest COVIDx CXR data distribution
COVIDx CXR-3 B\
Chest radiography images distribution

| Type | COVID-19 Negative | COVID-19 Positive | Total |
|:-----:|:-----------------:|:-----------------:|:-----:|
| train | 13992 | 16490 | 30482 |
| train | 13992 | 15994 | 29986 |
| test | 200 | 200 | 400 |

Patients distribution

| Type | COVID-19 Negative | COVID-19 Positive | Total |
|:-----:|:-----------------:|:-----------------:|:-----:|
| train | 13850 | 2808 | 16648 |
| test | 200 | 178 | 378 |


COVIDx V9A
COVIDx CXR-3 A\
Chest radiography images distribution

| Type | Normal | Pneumonia | COVID-19 | Total |
|:-----:|:------:|:---------:|:--------:|:-----:|
| train | 8085 | 5555 | 16490 | 30130 |
| train | 8085 | 5555 | 15994 | 29634 |
| test | 100 | 100 | 200 | 400 |

Patients distribution
Expand Down
Loading

0 comments on commit 5d4f01c

Please sign in to comment.