# What's inside the Coswara database 
> Let's take a peek.

- toc: true 
- badges: true
- comments: true
- categories: [coswara]
- image: images/chart-preview.png
- author: Neeraj Sharma

## About the database 

We have released a web application for data collection via crowsourcing. Essentially, this way anyone with a mobile phone and internet connectivity can contribute to the dataset. The website link can be accessed [here](https://coswara.iisc.ac.in/). We target particpants from following three populations:
* healthy: these are individuals with no respiratory illness
* unhealthy: these are individuals with respiratory illness
* COVID-19 positive: these are individuals identified as COVID-19 positive after RT-PCR test

## Metadata description
When a user opens the website, s/he is asked to fill a short questionaire which helps us collect metadata to categorize the user into one of the above three population. The complete metadata is composed of age, gender, location (country, state/province), current health status (healthy / exposed / cured / infected) and the presence of co-morbidity (pre-existing medical conditions) information. We do not collect any personally identifiable information. Each user gets a unique anonymized ID during data storage. A screenshot of the coswara webpage is provided below. Here, page 1 and 2 corresponnd to the metadata collection.

![](./my_images/webpage_ss_metadata.png "Credit: http://coswara.iisc.ac.in/")


## Sound sample description

In the screenshot shown above, page 3 corresponds to the "audio sample collection" step. We collect audio samples corresponding to nine categories shown in the figure below. These categories as chosen to capture sound signals which embed in them most of the attributes of the respiratory system associated with speech production. To understand how let's tae a small detour to speech production system.
![](./my_images/soundtypes_ss.png "Credit: http://coswara.iisc.ac.in/")


### Human speech production system

The human speech production system draws contributions from diaphragm, lungs, trachea, larynx, pharynx, tounge, nasal cavity, and lips. You may note that many of these organs are not solely dedicated for speech production, example, we use mouth for eating and lungs to purify the air! Speech and vocal sound production is is an extra feat achieved, thanks to evolution, by these organs. 

Lungs have elastic property, as they are in some sense repurposed swim bladders. Doing normal respiration, the diaphragm and the abdominal muscles between the ribs work together to expand the lungs. The elastic recoil of the lungs then provides the force that expels air during expiration. This means that the alveolar air pressure increases when you inhale and decreases when you exhale. Something different happens when you speak.

Speaking happens during exhaling. The aleveolar air pressure is released gradually in a co-ordinated manner via the opening and closing of the vocal cords (in the glottis). Something interesting happens here. For voiced sounds, such as vowels, the vocal folds open and close in a periodic fashion. This rate of opening and closing results in imparting periodicity to the output sound pressure wave. Further, this periodicity is one of the easiest perceived attributes in speech, and is referred to as the pitch of the speaker. You would have noticed that male speakers usually have lower pitch than female speakers, and female speakers have lower pitch than kid speakers. Why so? This is related to mass of the vocal folds. Heavier mass means lower pitch, and male anatomy often reveals a higher mass of the vocal folds. But that does not you cannot change your pitch. You can by altering the tension of the vocal folds, and we often do this when we want to emphasize something in our speech. Another attribute of speech we perceive quite easily is loudness. Increase in airflow from the lungs blows the vocal folds wider apart resulting in increased strength of the output pressure wave, thus making the sound louder. Voiced sounds are just one category of speech sounds. For unvoiced sounds, such as fricatives, the vocal folds remain open, and for stop consonant sounds, the vocal folds remain closed. Note that, these sounds do not have any perceived pitch associated with them. The below figure shows a schematic of human speech production system.

![](./my_images/humanSpeechProduction.png "Credit: http://coswara.iisc.ac.in/")

What happens during coughing? The textbook [explaination](https://en.wikipedia.org/wiki/Cough_reflex#:~:text=Air%20rushes%20into%20the%20lungs,the%20other%20expiratory%20muscles%20contract.) suggests that cough is a reflex action. The diaphragm contract, creating a negative pressure around the lung, and the glottis opens. This enables air to rush into the lungs in order to equalise the pressure. The glottis closes and the vocal cords contract to shut the glottis. The abdominal muscles contract to accentuate the action of the relaxing diaphragm, simultaneously, the other expiratory muscles contract. These actions increase the pressure of air within the lungs. The vocal cords relax and the glottis opens, releasing air at over 100 mph. The bronchi and non-cartilaginous portions of the trachea collapse to form slits through which the air is forced, which clears out any irritants attached to the respiratory lining. So a single cough will have no periodic opening and closing of vocal folds, unlike in vowels. However, often natural coughing results in a sequence of 3-4 coughs, and the physiological description of the opening and closing of glottis can become difficult to describe. An attempt to understand this is made [here](https://erj.ersjournals.com/content/erj/28/1/10.full.pdf)

What happens during breathing? The glottis largely remains open to enhance free flow of air into and out from the lungs, co-ordinated by the movement of the diaphragm and elasticity of the lungs. A nice video is shown [here](https://www.ims.uni-stuttgart.de/institut/arbeitsgruppen/ehemalig/ep-dogil/EGG/page5a.htm).

### Why the nine sound categories
As discussed above, we ask every user to record and upload nine sound samples. These can be grouped as follows:
* **breathing (two kinds; shallow and deep)**
* **coughing (two kinds; shallow and heavy)**

The choice of the above two is is driven by the reporting by WHO and CDC which have listed dry cough, difficulty in breathing, and chest pain (or pressure) as key symptoms of this viral infection, visible between 2-14 days after exposure to the virus. Also, a recent modeling [study] (https://www.nature.com/articles/s41591-020-0916-2) of symptoms data collected from a pool of 7178 COVID-19 positive individuals validated the presence of these symptoms, and proposed a real-time prediction and tracking approach. Repeated coughing can adversely impact the mass and tension in the vocal folds. This can in turn alter the speaking style of the patient. You might of noticed that you can make a guess if your friend has cold his/her speaking style over phone.

* **sustained vowel phonations (three kinds; /ey/~as in made, /i/~as in beet, /u:/ as in cool)**

The choosen vowels have special place in the [quantal theory of speech](https://en.wikipedia.org/wiki/Quantal_theory_of_speech). These vowels are easy to produce and appear alomost in every spoken language. Further, these vowel sounds are perceived as most distinct amongst all other vowels, and have been argued to capture the vocal tract attributes effectively. For more details see [here](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.8682&rep=rep1&type=pdf)) and [here](https://www.the-scientist.com/features/why-human-speech-is-special--64351#:~:text=%C2%A9%20laurie%20o%E2%80%99keefe)

* **one to twenty digit counting (two kinds; normal and fast paced)**

Counting a sequence of digits corresponds to continuous speaking for close to 20 secs. Any breathing difficulty will make this task difficult, and we expect this to reflect in the speaking style such as loudness, stress and pause patterns, and pace of speaking. 


### Visualizing sound as image
In the figure below we show an illustration of the waveforms and the corresponding [spectrograms](https://en.wikipedia.org/wiki/Spectrogram) of few sound samples. The waveforms represent the recorded time-domain signal. Here, the spectrogram depicts the spectral content of the signal in every 10 msec short-time window of the signal. We can make some observations from the shown plots.

![](./my_images/soundSample_ss.png "Credit: http://coswara.iisc.ac.in/")

* The breathing samples are wideband. The spectral energy is distributed over all frequecies. The inhale is lower in energy than the exhale however, both lasts for a similar time-spane, close to 1 sec. The exhale (the center burst) also depicts some formant like structure in the spectrogram. This can be expected as the air travels through the vocal tract. 

* For the coughing samples, we can see that these are repeating, and the first cough is little longer in duration. This can often happen as usually we take a deep breathe and release more in the first cough. Also, we can now see some formant structure also in the spectrogram.

* For sustained vowel phonations, we can see clear distinct formant structure, specifically, for the second formant in the spectrogram.

* For the digit counting, we can see the fluctuating formant structure in the spectrogram. 

It should be noted that these recordings are obtained via crowdsourcing and recorded through web browsers. All sound samples are recorded at 48 kHz in WAV file format. Some of these recording may have ambient noise which cannot be filtered while recording. In another post we will try quantifying different artifacts we observe in these files. We manually listen to every uploaded file, and will share are opinion on the quality and curation procedure. 
