![logo](https://drive.google.com/uc?id=1VrvlBTHH4D7xsrNp74wtLBamMZygG8Sy)

## Table of Contents
1. [Introduction](#introduction)
2. [Medical Professional input](#presentation)
3. [Task Overview](#task_overview)
4. [Team Overview](#team_overview)
5. [Next Tasks / Datasets (2 weeks)](#task_next)
6. [Daily calls](#task_calls)






# 1. Introduction <a id="introduction"></a>
This is a notebook created by a collaborative effort of <a href="coronawhy.org">CoronaWhy.org</a>, multi-disciplinary global effort of volunteers. 

- Visit our [website](https://www.coronawhy.org) to learn more.
- Read our [story](https://medium.com/@arturkiulian/im-an-ai-researcher-and-here-s-how-i-fight-corona-1e0aa8f3e714).
- Visit our [project page](https://www.coronawhy.org/projects/pulmonary-fibrosis-model) 


# 2. Medical Professional Presentation <a id="presentation"></a>


#### Agenda:
- Introduction from Sukhwinder Kaur and her current research/work
- Overview of the pulmonary fibrosis challenge and associated medical imagery examples
- Mapping out problems to specific tasks for #team-pulmonary-fibrosis-model
- Q&A from community

#### About Sukhwinder Kaur:
- Assistant Professor, Biochemistry and Molecular Biology at University of Nebraska Medical Center

https://www.unmc.edu/news.cfm?match=23347

https://www.unmc.edu/biochemistry/faculty/kaur.html


#### Aug 18, 2020 - Call Summary 

09:12 - Brief introduction of the presentation

11:59 - Pulmonary Fibrosis defined as scarring of the lung tissue

16:52 - Comparison of healthy lungs versus fibrotic lungs and the consequences

18:30 - Symptoms and causes of the fibrosis (occupation, genetic, drugs, medications)

20:05 - Tools used to diagnose and monitor the disease

22:11 - Risk factors: older age, male gender, smoking, family history, etc.

22:54 - Spirometry as the most common lung function test

24:37-  Aim of the Kaggle competition and present competition: Need to predict, for each week, the FVC prediction and model confidence

25:07 - How to diagnose pulmonary fibrosis

29:20 - Sample of CT scan images and the signs and patterns of the disease

35:13 - HRCT (High-Resolution Computed Tomography)-used in diagnosing pulmonary fibrosis

36:20 - Discussed articles/studies related to pulmonary fibrosis and the tests done

45:35 - Discussions and Questions

48:10 - Discussed the standard lung CT window used on the specific paper discussed

50:58 - Suggestion: Segment entire CT scans into different regions and use different algorithms to look for characteristic patterns

55:37 - Comments on the difference between Obstructive lung disease (hard to exhale) and Restrictive lung disease (hard to inhale)

59:36 - Worry of long term effects of COVID-19 to people with mild symptoms- might develop pulmonary fibrosis in 5-10 years

1:04:18 - Issue of gender bias in the scoring system, suggested running assessment with and without gender to determine the difference


In [None]:
from IPython.display import HTML

HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/UG4qZMt4t64?rel=0&amp;controls=0&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>')

# 3. Task Overview <a id="task_overview"></a>

#### Open Source Artificial Intelligence Model for Medical Imagery Screening

A recently proposed alternative COVID-19 screening alternative is AI-powered diagnosis that is based on chest radiography images such as X-rays or computed tomography (CT) scans.

This is an Open Source, Open Science project for building a semi-supervised model that can be used for any CT Lung based task, including ARDS and any other COVID-19 related comorbidities.

Read more on Notion:
https://www.notion.so/Team-Pulmonary-Fibrosis-Model-9bab848371c14a0f9075faf88e454252


# 4. Team <a id="team_overview"></a>

We are cross-disciplinary team of data scientists, medical professionals and volunteers. 

If you are interested in helping - please join our team here:
https://www.coronawhy.org/join-the-fight


In [None]:
from IPython.display import HTML

HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/x2uJJFmnijc?rel=0&amp;controls=0&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>')

In [None]:
from IPython.display import HTML

HTML('<iframe width="1060" height="615" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vRKGv0H8bXIT9Tfobu-uBUmuEaxD1YiPmzmJfn7WaqAgE9w3vYn1k22kouoNboSXUAMH9FXDDPC3vql/pubhtml?gid=550569074&amp;single=true&amp;widget=true&amp;headers=false" frameborder="0" allowfullscreen></iframe>')



# 5. Next Tasks / Datasets <a id="task_next"></a>



Questions regarding radiology:
-  I have read some of the articles, I see certain features are considered predictors for IPV: (1) GGO (ground glass opacities), (2) honeycombing, (3) reticulation and (4) traction bronchectasis.
-  I was looking for a good explanation and examples what they are and how they appear in the HRCT but I did not and I am still in the dark. Does anyone in the team shade light on this?


We have following tasks:

- understanding the data format of datasets we will use for semi supervised model

- write scripts for extracting the data from datasets. they have different structure and root package usually have nested .zip or .gz packages. we need a script for each dataset

- script for uploading dicoms to CoronaWhy dataverse

- write scripts for converting all dicom data into normalized images

- uploading images to CoronaWhy dataverse (i think it's a good idea to store both, raw and preprocessed data)

- let me know if you'd like to help with any of those and if you need any help with it. meanwhile I will work on the same list going from top


#### Data exploration and preparations


# Datasets for unsupervised training

---

### 2019 Novel Coronavirus Resource (2019nCoVR)

By China National Center for Bioinformation. **104,009 CT slices from 1,489 patients**. The best bet. But we can extend it and combine with other datasets listed below

[http://ncov-ai.big.ac.cn/download?lang=en](http://ncov-ai.big.ac.cn/download?lang=en)

### MosMedData: Chest CT Scans with COVID-19 Related Findings

**1110 patients, 1110 scans.** Has data with various pneumonia levels: normal lung tissue, no CT-signs of viral pneumonia, several ground-glass opacifications, ground-glass opacifications and regions of consolidation, diffuse ground-glass opacifications and consolidation as well as reticular changes in lungs

[http://academictorrents.com/details/f2175c4676e041ea65568bb70c2bcd15c7325fd2](http://academictorrents.com/details/f2175c4676e041ea65568bb70c2bcd15c7325fd2)

### COVID-CTset

[https://github.com/mr7495/COVID-CTset](https://github.com/mr7495/COVID-CTset)

This dataset contains the full original **CT scans of 377 persons**. There are 15589 and 48260 CT scan images belonging to 95 Covid-19 and 282 normal persons, respectively

### SARS-COV-2 Ct-Scan Dataset

[https://www.kaggle.com/plameneduardo/sarscov2-ctscan-dataset](https://www.kaggle.com/plameneduardo/sarscov2-ctscan-dataset)

1252 positive COVID-19 slices and 1230 negative CT slices. These data have been collected from real patients in hospitals from Sao Paulo, Brazil. Format: PNG

### Medicalsegmentation

1000 slices

[https://medium.com/@hbjenssen/covid-19-radiology-data-collection-and-preparation-for-artificial-intelligence-4ecece97bb5b](https://medium.com/@hbjenssen/covid-19-radiology-data-collection-and-preparation-for-artificial-intelligence-4ecece97bb5b)

[http://medicalsegmentation.com/covid19/](http://medicalsegmentation.com/covid19/)


# Datasets with annotation for supervised training

---

### UCSD-AI4H/COVID-CT

**Around 250 sices of covid and non covid cases. Contains CT data with annotation**. Including ARDS related patterns!

## ieee8023/covid-chestxray-dataset

[https://github.com/ieee8023/covid-chestxray-dataset/blob/master/metadata.csv](https://github.com/ieee8023/covid-chestxray-dataset/blob/master/metadata.csv)

**84 CT slices. With annotation!** We can extract CT slices from dataset.

### SIRM

[https://www.sirm.org/en/category/senza-categoria-en/](https://www.sirm.org/en/category/senza-categoria-en/)

Around 100 CT scans available for downloading. Looks like already included in medsegmentation. We can check it later. 

### kaggle/osic-pulmonary-fibrosis-progression


### Radiopedia

[https://radiopaedia.org/search?lang=us&page=6&q=pneumonia&scope=cases](https://radiopaedia.org/search?lang=us&page=6&q=pneumonia&scope=cases)

We can write scrapper to get the publicly available data. There are couple of hundred of CT slices with pneumonia

### Eurorad

[https://www.eurorad.org/advanced-search?search=pneumonia](https://www.eurorad.org/advanced-search?search=pneumonia)

We can write scrapper to get the publicly available data.


# Cancer-related

---

- **DeepLesion**
- LUNA16
- Data Science Bowl 2017

### Small datasets

- Lung CT Segmentation Challenge 2017


# X-rays

---

### Big dataset of chest x-rays

14 Common Thorax Disease Categories. 112,120 frontal-view X-ray images of 30,805 unique patients.

[http://academictorrents.com/details/557481faacd824c83fbf57dcf7b6da9383b3235a](http://academictorrents.com/details/557481faacd824c83fbf57dcf7b6da9383b3235a)

### CheXpert: Huge dataset by Stanford and MIT

500,000 images!



# 6. Daily Calls <a id="task_calls"></a>

#### #team-pulmonary-fibrosis-model - Aug 14, 2020 - kickoff call

Kaggle Competition: The challenge is to use machine learning techniques to make a prediction with the image, metadata, and baseline FVC as input.

04:31 - Agenda: Application of computer vision for the diagnosis of the pulmonary fibrosis

05:21 - Short self-introduction from team members

12:46 - Short intro from Serhiy who introduced this project to the team

13:47 - Goal: To build a model trained in a semi-supervised way that can be used for any Lung CT-related task.

15:44 - Impact of the project: Anyone can use the model once we trained it and released to the public

16:57 -  Shared link of Google's project regarding supervised learning and discussed gave an overview of the model

18:00 - Brief overview of how the planned model would work

22:00 - Preliminary data exploration is needed to understand what data sets the team would be dealing with

25:10 - Age, Smoker or non-smoker- factors that need to be considered in segmenting data images

28:51 - Discussed how to distribute computing credits and the process


In [None]:
from IPython.display import HTML

HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/O8jTx985wpc?rel=0&amp;controls=0&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>')

**#team-pulmonary-fibrosis-model - Aug 25, 2020 - meeting with Keerti Bhogaraju**

video recording: https://www.youtube.com/watch?v=yzzv1AlcfCw

(you can watch at 2x speed)

In [None]:
from IPython.display import HTML

HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/yzzv1AlcfCw?rel=0&amp;controls=0&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>')