I had a network model up and running, and just wanted to add labels to the unclassified files. While preparing for this I found that I didn't agree with the labels of many of the already classified files, so I made new labels for the whole set.

In [None]:
import numpy as np
from scipy.io import wavfile
import matplotlib.pyplot as plt
%matplotlib inline
import pandas as pd

In [None]:
INPUT_LIB = '../input/'
SAMPLE_RATE = 44100

## Load the data

In [None]:
def clean_filename(fname, string):   
    file_name = fname.split('/')[1]
    if file_name[:2] == '__':        
        file_name = string + file_name
    return file_name

def load_wav_file(name, path):
    _, b = wavfile.read(path + name)
    assert _ == SAMPLE_RATE
    return b

In [None]:
file_info = pd.read_csv(INPUT_LIB + 'set_a.csv')
new_info = pd.DataFrame({'file_name' : file_info['fname'].apply(clean_filename, 
                                                                string='Aunlabelledtest'),
                         'target' : file_info['label'].fillna('unclassified')})   
new_info['time_series'] = new_info['file_name'].apply(load_wav_file, 
                                                      path=INPUT_LIB + 'set_a/')    
new_info['len_series'] = new_info['time_series'].apply(len)  

In [None]:
MAX_LEN = max(new_info['len_series'])

## Look (and listen) at the four classes

I will now go through the training set and relabel the data points. For reasons that will be explained below, I will not use the label 'extrahls' but instead will classify everything as either 0=artifact, 1=normal/extrahls, or  2=murmur.

In [None]:
new_info['target'].value_counts()

In [None]:
new_labels = np.zeros((176,), dtype="int")

First the artifacts:

In [None]:
print("artifacts:")
fig, ax = plt.subplots(10, 4, figsize = (12, 16))
for i in range(40):
    ax[i//4, i%4].plot(new_info['time_series'][i])
    ax[i//4, i%4].set_title(new_info['file_name'][i][:-4])
    ax[i//4, i%4].get_xaxis().set_ticks([])

These were all correctly classified. Some are obvious, like chatter in Italian or a Bollywood song from the radio. There are actually some heart beats on 18 and 23 so they could be classified either way. I've tried both, and keeping their label works better with my CNN. (And yes, I know changing data post hoc this way is fishy, but I'm not  publishing this.)

In [None]:
new_labels[:40] = 0
new_labels[18] = 0
new_labels[23] = 0

In [None]:
print("extrahls:")
fig, ax = plt.subplots(5, 4, figsize = (12, 16))
for i in range(19):
    ax[i//4, i%4].plot(new_info['time_series'][i+40])
    ax[i//4, i%4].set_title(new_info['file_name'][i+40][:-4])
    ax[i//4, i%4].get_xaxis().set_ticks([])

## Some medical background

The two *heart sounds* S1 / S2 (or lub / dub) are generated when the valves of the heart close. The first heart sound is generated by the closure of the inflow valves (mitralis and tricuspid) and marks the beginning of systole, when blood is pushed out to from the heart to the body and lungs. The second heart sound is generated by the closure of the outflow valves (aortic and pulmonary) and marks the beginning of diastole, the slightly longer phase when the heart is refilled.

A heart *murmur* is a low-frequency sound created by turbulent flow over the valves. Especially during systole, it can be a normal finding, created by the high speed flow in aorta. A murmur can also be a sign of malfunction of the valves, such as *stenosis*, when the valve is too narrow and tight, or *insufficiency*, when there is leakage over a valve that fails to close.

A *third or forth heart sound* is thought to be due to abnormal blood flow in the ventricles and is a sign of advanced heart disease. They have been described from the beginning of days, from the time when the stethoscope wasn't invented and doctors listened with ear-to-chest. 

My experience from med school is that extra heart sounds don't really exist any more - at least not outside of cardiac surgery. When a faint extra sound was found on the wards, it was mandatory for everyone to go there and listen, and sometimes the sound we heard was imagined rather than real. Even being more generous than that, most would agree that extra heart sounds are rarer and less important than murmurs by an order of magnitude. I have certainly never based any clinical decision on their presence or not.

Some of the recordings under ''extrahls'' do indeed have extra sounds, but I am not sure that they correspond to the classical third and fourth sound, especially as an extra peak is not seen on the amplitude graph. Instead, I believe many of them are a split second sound, which is created when the aortic and pulmonary valves don't close exactly at the same time. (This is normal when you hold your breath, but can sometimes be a sign of lung disease.) More problematic is that many recordings under "normal" had similar extra sounds, and I think that these in many cases are caused by the recording technique (which is after all different from ear-to-chest on which third heart sounds were defined). 

I went back and forth between recordings a few times, but the distinction did not become any clearer for me. As the category "extrahls" therefore seems to be poorly defined I decided to combine it with "normal".

In addition, 40 and 55 which are murmur. In fact, 55 is identical to 66 below, which was labelled murmur. 

In [None]:
new_labels[40:59] = 1
for x in [40, 55]:
    new_labels[x] = 2

In [None]:
print("murmur")
fig, ax = plt.subplots(9, 4, figsize = (12, 16))
for i in range(34):
    ax[i//4, i%4].plot(new_info['time_series'][i+59])
    ax[i//4, i%4].set_title(new_info['file_name'][i+59][-4])
    ax[i//4, i%4].get_xaxis().set_ticks([])

Most of these are correct, but as mentioned above 66 is identical with 55, and 62, 63, 65, 68 seem to be normal rhythm with some irregular noise rather than murmur.

Another problem is that many of these examples are too extreme to be helpful for any practical (screening) applications. A person with a murmur like 83 or 88 either has well-known cardiac disease, or needs to be sent to hospital straight away. (They are called grade 5 or 6 murmurs, when you can feel them on the skin, or hear them anywhere on the chest.) 

We will see that our model will struggle to identify the much more common milder murmurs, and many more examples of this would have been needed. Similarly, I would have preferred  if the artifact examples were less bizarre.

In [None]:
new_labels[59:93] = 2
for x in [62,63,65,68]:
    new_labels[x] = 1

In [None]:
print("normal")
fig, ax = plt.subplots(8, 4, figsize = (12, 16))
for i in range(31):
    ax[i//4, i%4].plot(new_info['time_series'][i+93])
    ax[i//4, i%4].set_title(new_info['file_name'][i+93][:-4])
    ax[i//4, i%4].get_xaxis().set_ticks([])

101, 107, 115, 116, 122 should be murmur.  If you listen to these normal recordings, you will realize why I felt that the border between normal and "extrahls" is very fuzzy indeed.

In [None]:
new_labels[93:124] = 1
for x in [101, 107, 115, 116, 122]:
    new_labels[x] = 2

In conclusion, I have removed the EHS category, and discovered quite a few mild murmurs among the other classes.

I thought about going one step further, and use the fact that these classes are not really exclusive. In some cases there are extra heart sounds, murmurs and artifacts on the same recording, and it could be given a triple score for this. In the end, I decided that it would not have been worth the effort. 

In [None]:
print("unclassified")
fig, ax = plt.subplots(13, 4, figsize = (12, 16))
for i in range(52):
    ax[i//4, i%4].plot(new_info['time_series'][i+124])
    ax[i//4, i%4].set_title(new_info['file_name'][i+124][17:-4])
    ax[i//4, i%4].get_xaxis().set_ticks([])

In [None]:
new_labels[124:]= [0,2,2,1,
                   1,1,1,1,
                   0,1,0,1,
                   1,1,2,1,
                   0,1,1,1,
                   1,1,2,0,
                   0,0,0,0,
                   0,0,1,0,
                   0,0,0,0,
                   0,1,0,2,
                   1,2,2,2,
                   2,2,2,2,
                   2,2,2,2]

132, 134, 141, and 150 have heartbeats but also a lot of noise, so could be classified either way. I chose normal for 141 and artifact for the others. Number 168, 169, and 170 are probably from the same patient.

We finish this by printing our label in a format that can be easily imported to another notebook.

In [None]:
print("[" + ", ".join([str(x) for x in new_labels]) + "]")

Please feel free to add corrections/suggestions on this. I am doctor, and I still use my stethoscope once in a while, but I'm not an expert by far. I have already spent far too much doing this, and I get more confused each time. I think I'll compose a rant for the forum about the difficulties of using human perception as gold standard, and then move on to other datasets. 

Please check out my other [notebook][1] where I apply a deep convolutional net to these labels, achieving 80-90% accuracy after a few minutes of training. And upvote if you used the labels yourself.

I'll be back when I've analysed set B. Or maybe I'll move on to PhysioNet.

  [1]: https://www.kaggle.com/toregil/d/kinguistics/heartbeat-sounds/what-s-in-a-heartbeat