Link to application in Coursera: https://cnepxqxy.labs.coursera.org/notebooks/Exercise%202.ipynb

# Introduction

Since there are negative values to be encoded/decoded, the encode and decode functions will be adapted to implement signed integers and how it works will be explained below.

# Encoding

Encoding is done using the quotient and remainder concept. 
For example, if the number to be encoded is -135, we divide it by M, which is 2 to the power of K. Let us assume K is 4, so M is 16. So, 135/16 gives us a quotient of 8 and a remainder of 7. The quotient is known as Q or R2, while the remainder is normally represented as R or R2. R2 (Q) will be encoded in unary, while R1 (R) will be encoded in binary. Thus, Q = 8 will be represented as a unary value of 111111110, while R = 7 will be represented as a binary value of 111. Then, since we are encoding signed int16, we concatenate a 1 or 0 for negative or positive values. Finally, we concatenate the Q + sign + R and get 1111111101111 as the encoded value.

In [53]:
S = -135
K = 4

def encode1(S,K):
    S = int(S)
    sign = ''
    # checking if it is a negative or positive integer
    if S < 0:
        sign = 'neg'
        S = abs(S)
    else:
        sign = 'pos'
    M = 2**K
    Q = S // M
    R = S % M

    counter = Q
    temp = ''
    while counter != 0:
        # generating R2
        temp = temp + '1'
        counter = counter-1

    R2 = temp + '0'
    R1 = format(R, "b")
    # concatenating to form the encoded bitstring
    if sign == 'pos':
        R2R1 = R2 + '0' + R1
    elif sign == 'neg':
        R2R1 = R2 + '1' + R1
    return R2R1
print(encode1(S,K))
print(len(encode1(S,K)))

1111111101111
13


# Decoding

Decoding is similar to the encoding process, but does the opposite. It takes in an encoded value and decodes it. For instance, in our previous example, where S = -135 and K = 4, the encoded value is 1111111101111. In decoding, we will take in the 1111111101111 value as well as the K value, and decode it back into -135. 

In [54]:
# Accounting for neg/pos
# Note: R2R1 has to be a str
def decode1(R2R1, K):
    M = 2**K
    R2R1 = str(R2R1)
    for index in range(0,len(R2R1)):
        # splitting into sign, Q and R
        if R2R1[index] == '0':
            Q = R2R1[:index+1]
            sign = R2R1[index+1]
            R1 = R2R1[index+2:]
            break
    while len(R1) != K and len(R1) < K:
        R1 = '0' + R1

    counter = 0
    for i in Q:
        if i == '1':
            counter = counter + 1

    intR1 = int(R1, 2)
    S_prime = counter*M+intR1
    if sign == '1':
        # apply negative if sign bit is 1
        S_prime = -(S_prime)
    
    return S_prime

R2R1 = '1111111101111'
K = 4
print(decode1(R2R1,K))

-135


# Information about Sound1.wav

In [5]:
import numpy as np
import soundfile as sf

wave_file = sf.SoundFile('Sound1.wav')
print('Sample rate: {}'.format(wave_file.samplerate))
print('Channels: {}'.format(wave_file.channels))
print('Subtype: {}'.format(wave_file.subtype))

# Since we know that subtype is PCM_16, we set dtype to int16
samples, fs = sf.read('Sound1.wav', dtype='int16')

print(np.shape(samples))
print(samples)
print(type(samples))

Sample rate: 44100
Channels: 1
Subtype: PCM_16
(501022,)
[-7 -7 -7 ...  0  2  1]
<class 'numpy.ndarray'>


Here, we have extracted information about Sound1.wav.
Its sample rate is 44100Hz, it has 1 channel, and its in 16 bits

# Encoding with Sound1.wav

## K = 4

In [7]:
sampleslist = samples.tolist()
newfile = open("Sound1_Enc_k4.ex2", "w")
K = 4
s1totalk4 = 0
for s in sampleslist:
    encoded = encode1(s, K)
    s1totalk4 = s1totalk4 + len(encoded)
    newfile.write(encoded + "\n")
newfile.close()

Now, we have created our new file Sound1_Enc_k4.ex2, which holds the encoded values when K = 4.

# Decoding with Sound1.wav

In [9]:
import numpy as np
encodedfile = open("Sound1_Enc_k4.ex2", "r")
K = 4
s1k4declist = []
for line in encodedfile:
    if line != '\n':
        decoded = decode1(line, K)
        s1k4declist.append(decoded)
s1k4nparr = np.array(s1k4declist)
encodedfile.close()

Now that we have decoded the values in our encode file, we append each value to a list, which we will then convert it to numpy array, as it is the datatype required to write the new encoded_decoded wav file.

# Saving to new .wav file

In [12]:
from scipy.io.wavfile import write
write("Sound1_Enc_Dec_k4.wav", 44100, s1k4nparr.astype(np.int16))

# Checking the files are the same

In [14]:
import numpy as np
import soundfile as sf

wave_file = sf.SoundFile('Sound1_Enc_Dec_k4.wav')
print('Sample rate: {}'.format(wave_file.samplerate))
print('Channels: {}'.format(wave_file.channels))
print('Subtype: {}'.format(wave_file.subtype))

s1k4samples, fs = sf.read('Sound1_Enc_Dec_k4.wav', dtype='int16')
# Ts = 1/fs

print(np.shape(s1k4samples))
print(s1k4samples)

Sample rate: 44100
Channels: 1
Subtype: PCM_16
(501022,)
[-7 -7 -7 ...  0  2  1]


In [15]:
if s1k4samples.all() == samples.all():
    print("All data samples same")

All data samples same


As can be seen above, our Enc_Dec file has the same data as well as file size as the original. Thus, we have achieved this method of lossless data compression.

# % Compression

In [18]:
# Total number of bits in original sound file
oritotal = 501022*16

# Calculating percentage of compression
print(s1totalk4/oritotal*100)

313.2528985753121


From the calculations above, after applying Rice encoding, the compression percentage is calculated to be about 300%. This is calculated by first deducing the number of bits in the original wav file, which takes the total number of data samples times 16, which is the datatype it is encoded in. Then, we compare the two and generate the compression %. The end result is not surprising as rice encoding has no limit to the length of bits to be compressed into. For example, if S is -2257, then Q will be 142 bits long, while R is constantly 4 bits long. This will make up a total of 142+1+4=147 bits long just for encoding -2257. 

If we are more concerned with the file size, we can calculate the compression rate by comparing the original file size with the encoded file size. The compression rate will thus be calculated as such: 

25.6MB (encoded file size) / 1MB (original file size) * 100 = 2560%

## K = 2

In [19]:
sampleslist = samples.tolist()
newfile = open("Sound1_Enc_k2.ex2", "w")
K = 2
s1totalk2 = 0
for s in sampleslist:
    encoded = encode1(s, K)
    s1totalk2 = s1totalk2 + len(encoded)
    newfile.write(encoded + "\n")
newfile.close()

Now, we have created our new file Sound1_Enc_k2.ex2, which holds the encoded values when K = 2.

# Decoding with Sound1.wav

In [20]:
import numpy as np
encodedfile = open("Sound1_Enc_k2.ex2", "r")
K = 2
s1k2declist = []
for line in encodedfile:
    if line != '\n':
        decoded = decode1(line, K)
        s1k2declist.append(decoded)
s1k2nparr = np.array(s1k2declist)
encodedfile.close()

Now that we have decoded the values in our encode file, we append each value to a list, which we will then convert it to numpy array, as it is the datatype required to write the new encoded_decoded wav file.

# Saving to new .wav file

In [22]:
from scipy.io.wavfile import write
write("Sound1_Enc_Dec_k2.wav", 44100, s1k4nparr.astype(np.int16))

# Checking the files are the same

In [23]:
import numpy as np
import soundfile as sf

wave_file = sf.SoundFile('Sound1_Enc_Dec_k2.wav')
print('Sample rate: {}'.format(wave_file.samplerate))
print('Channels: {}'.format(wave_file.channels))
print('Subtype: {}'.format(wave_file.subtype))

s1k2samples, fs = sf.read('Sound1_Enc_Dec_k2.wav', dtype='int16')
# Ts = 1/fs

print(np.shape(s1k2samples))
print(s1k2samples)

Sample rate: 44100
Channels: 1
Subtype: PCM_16
(501022,)
[-7 -7 -7 ...  0  2  1]


In [24]:
if s1k2samples.all() == samples.all():
    print("All data samples same")

All data samples same


# % Compression

In [25]:
# Calculating percentage of compression
print(s1totalk2/oritotal*100)

1162.6740193045416


From the calculations above, after applying Rice encoding, the compression percentage is calculated to be more than 1000%. This is calculated by first deducing the number of bits in the original wav file, which takes the total number of data samples times 16, which is the datatype it is encoded in. Then, we compare the two and generate the compression %. The end result is not surprising as rice encoding has no limit to the length of bits to be compressed into. For example, if S is -2257, then Q will be 2257//4+1=564 bits long, while R is constantly 2 bits long. This will make up a total of 565+1+2=568 bits long just for encoding -2257. 

If we are more concerned with the file size, we can calculate the compression rate by comparing the original file size with the encoded file size. The compression rate will thus be calculated as such: 

93.7MB (encoded file size) / 1MB (original file size) * 100 = 9370%

Then, we do the whole of the same process for Sound2.wav file, where we set K = 4 and K = 2.

# Information about Sound2.wav

In [26]:
import numpy as np
import soundfile as sf

wave_file = sf.SoundFile('Sound2.wav')
print('Sample rate: {}'.format(wave_file.samplerate))
print('Channels: {}'.format(wave_file.channels))
print('Subtype: {}'.format(wave_file.subtype))

samples, fs = sf.read('Sound2.wav', dtype='int16')
# Ts = 1/fs

print(np.shape(samples))
print(samples)
print(type(samples))

Sample rate: 44100
Channels: 1
Subtype: PCM_16
(504000,)
[ -999   886 -1325 ...    31  -876   339]
<class 'numpy.ndarray'>


Here, we have extracted information about Sound2.wav.
Its sample rate is 44100Hz, it has 1 channel, and its in 16 bits, much like Sound1.wav, only the samples are different in values.

# Encoding with Sound2.wav

## K = 4

In [27]:
sampleslist = samples.tolist()
newfile = open("Sound2_Enc_k4.ex2", "w")
K = 4
s2totalk4 = 0
for s in sampleslist:
    encoded = encode1(s, K)
    s2totalk4 = s2totalk4 + len(encoded)
    newfile.write(encoded + "\n")
newfile.close()

Now, we have created our new file Sound2_Enc_k4.ex2, which holds the encoded values when K = 4.

# Decoding with Sound2.wav

In [28]:
import numpy as np
encodedfile = open("Sound2_Enc_k4.ex2", "r")
K = 4
s2k4declist = []
for line in encodedfile:
    if line != '\n':
        decoded = decode1(line, K)
        s2k4declist.append(decoded)
s2k4nparr = np.array(s2k4declist)
encodedfile.close()

Now that we have decoded the values in our encode file, we append each value to a list, which we will then convert it to numpy array, as it is the datatype required to write the new encoded_decoded wav file.

# Saving to new .wav file

In [30]:
from scipy.io.wavfile import write
write("Sound2_Enc_Dec_k4.wav", 44100, s2k4nparr.astype(np.int16))

# Checking the files are the same

In [31]:
import numpy as np
import soundfile as sf

wave_file = sf.SoundFile('Sound2_Enc_Dec_k4.wav')
print('Sample rate: {}'.format(wave_file.samplerate))
print('Channels: {}'.format(wave_file.channels))
print('Subtype: {}'.format(wave_file.subtype))

s2k4samples, fs = sf.read('Sound2_Enc_Dec_k4.wav', dtype='int16')
# Ts = 1/fs

print(np.shape(s2k4samples))
print(s2k4samples)

Sample rate: 44100
Channels: 1
Subtype: PCM_16
(504000,)
[ -999   886 -1325 ...    31  -876   339]


In [32]:
if s2k4samples.all() == samples.all():
    print("All data samples same")

All data samples same


# % Compression

In [33]:
# Total number of bits in original sound file
oritotal = 504000*16

# Calculating percentage of compression
print(s2totalk4/oritotal*100)

1954.684548611111


From the calculations above, after applying Rice encoding, the compression percentage is calculated to be more than 1000%. This is calculated by first deducing the number of bits in the original wav file, which takes the total number of data samples times 16, which is the datatype it is encoded in. Then, we compare the two and generate the compression %. The end result is not surprising as rice encoding has no limit to the length of bits to be compressed into. 

If we are more concerned with the file size, we can calculate the compression rate by comparing the original file size with the encoded file size. The compression rate will thus be calculated as such: 

158MB (encoded file size) / 1.01MB (original file size) * 100 = 15643%

## K = 2

In [35]:
sampleslist = samples.tolist()
newfile = open("Sound2_Enc_k2.ex2", "w")
K = 2
s2totalk2 = 0
for s in sampleslist:
    encoded = encode1(s, K)
    s2totalk2 = s2totalk2 + len(encoded)
    newfile.write(encoded + "\n")
newfile.close()

Now, we have created our new file Sound2_Enc_k2.ex2, which holds the encoded values when K = 2.

# Decoding with Sound2.wav

In [36]:
import numpy as np
encodedfile = open("Sound2_Enc_k2.ex2", "r")
K = 2
s2k2declist = []
for line in encodedfile:
    if line != '\n':
        decoded = decode1(line, K)
        s2k2declist.append(decoded)
s2k2nparr = np.array(s2k2declist)
encodedfile.close()

# Saving to new .wav file

In [38]:
from scipy.io.wavfile import write
write("Sound2_Enc_Dec_k2.wav", 44100, s2k2nparr.astype(np.int16))

# Checking the files are the same

In [39]:
import numpy as np
import soundfile as sf

wave_file = sf.SoundFile('Sound2_Enc_Dec_k2.wav')
print('Sample rate: {}'.format(wave_file.samplerate))
print('Channels: {}'.format(wave_file.channels))
print('Subtype: {}'.format(wave_file.subtype))

newsamples, fs = sf.read('Sound2_Enc_Dec_k2.wav', dtype='int16')
# Ts = 1/fs

print(np.shape(s2k4samples))
print(s2k4samples)

Sample rate: 44100
Channels: 1
Subtype: PCM_16
(504000,)
[ -999   886 -1325 ...    31  -876   339]


In [40]:
if s2k4samples.all() == samples.all():
    print("All data samples same")

All data samples same


# % Compression

In [41]:
# Calculating percentage of compression
print(s2totalk2/oritotal*100)

7721.826612103175


From the calculations above, after applying Rice encoding, the compression percentage is calculated to be more than 7000%. The end result is not surprising as rice encoding has no limit to the length of bits to be compressed into. 

If we are more concerned with the file size, we can calculate the compression rate by comparing the original file size with the encoded file size. The compression rate will thus be calculated as such: 

623MB (encoded file size) / 1.01MB (original file size) * 100 = 61683%

# Further Development

From the extraction of samples above, we can see that there are values as large as +/- 2000. The main reason for the compression % being very large is due to the rice encoding algorithm having no limit to the number of bits to be encoded into. 

A way to improve the compression rate is to perhaps set a larger K value, reduce the range of samples, reduce the size of Q, or saving the file in a different way.

### 1. Larger K

Since Sound2.wav produces a larger compression % than Sound1.wav, we can simply test the extent of our way to reduce the compression % only on Sound2.wav.

In [49]:
K = 6
sampleslist = samples.tolist()
newfile = open("Sound2_Enc_k6.ex2", "w")
s2totalk6 = 0
for s in sampleslist:
    encoded = encode1(s, K)
    s2totalk6 = s2totalk6 + len(encoded)
    newfile.write(encoded + "\n")
newfile.close()

In [50]:
# Calculating percentage of compression
print(s2totalk6/oritotal*100)

522.243142361111


Here, we can see that the compression % has been significantly reduced! From when K = 2 and K = 4, the compression % were more than 1000%. Now when we use K = 6, compression % reduced to 500%. In terms of file size, compression % is 42.6/1.01*100=4217%. Let us try a higher K value of 8.

In [51]:
K = 8
sampleslist = samples.tolist()
newfile = open("Sound2_Enc_k8.ex2", "w")
s2totalk8 = 0
for s in sampleslist:
    encoded = encode1(s, K)
    s2totalk8 = s2totalk8 + len(encoded)
    newfile.write(encoded + "\n")
newfile.close()

In [52]:
# Calculating percentage of compression
print(s2totalk8/oritotal*100)

173.45221974206348


Here, we can see that the compression % has once again been significantly reduced! From when K = 2, K = 4 and K = 6, the compression % were more than 500%. Now when we use K = 8, compression % reduced to 100%. In terms of file size, compression % is 14.5/1.01*100=1435%. This shows that this method of improving the compression rates is successful. 

However, this may not work for all sound files as there may be different samples. Thus, it is more efficient to use an appropriate K value for different files to achieve the optimal compression rate.

### 2. Reducing range of samples

This will not be implemented in code due to the insufficient syllabus coverage but the theory of the idea behind it will be explained.

Rice coding was originally known as residual coding, which is used to compress the residuals. Residuals are the errors between the predictions of the model and the real signal. Thus, to implement this, a prediction algorithm has to be implemented to get the list of residuals, for example, a linear predictor. The residuals will thus be smaller than the original sample so it can be encoded with less bits, not as high as 2000. The compression rate will thus be improved to be significantly less than the original file.

### 3. Reduce size of Q

The length of Q could be reduced to be half of what it was originally. This will also improve compression rate as Q is of variable length, depending on the sample as well as the K value. For example, if K is 4 and S is -2257, then Q will be 142 bits long. We could possibly reduce Q to be half of what it was, and end up with a new Q value of 71 bits instead. This will help to improve the compression rate.

### 4. Saving file

If we are concerned with compressing the file size, since the data was saved as bitstring, we can try to save it in bytes instead to get a smaller file. However, to be saved in bytes, the encoded bitstring must first be converted to integers. This makes the rice encoding redundant, as well as increases the time complexity of the application, as the data can directly be converted into bytes, or they are already in bytes. Thus, a better alternative would be to use residual coding with an appropraite algorithm to predict, or compress audio using other file formats such as zip.

Additionally, we could get the frames instead of the data, as this method also achieves lossless compression. The compression rate will be lower than when compressing the data samples. We first create another function which based on rice encoding.

In [1]:
def encode(S,K):
    M = 2**K
    Q = S // M
    R = S % M

    counter = Q
    temp = ''
    while counter != 0:    
        temp = temp + '1'
        counter = counter-1

    R2 = temp + '0'
    R1 = format(R, "b")
    R2R1 = R2 + R1
    return str(R2R1)

Then, we get the total number of frames from Sound1.wav and read them. Since the frames are in the range 0-255, the encoded file will be smaller. In the case of reading the data samples, we have to account for negative and positive values. 

In [4]:
import wave

# opens wav file to read frames
audio_file = wave.open("Sound1.wav", mode="rb")
frames = audio_file.readframes(audio_file.getnframes())

# open file to write encoded bits into
newfile = open("Sound1_Enc_k4_frames.ex2", "w")
K = 4

for i in range(len(frames)):
    encoded = encode(frames[i], K)
    newfile.write(encoded + "\n")
print('End of write')

# close files
newfile.close()
audio_file.close()

End of write


We can see that the file compression rate is lower than when applying rice coding on the data samples. The file compression can be calculated by 12MB/1MB*100 = 1200%, which is lesser than encoding the data samples. However, because rice coding is done on samples, encoding using frames is inappropriate even though it generates a lower compression rate. A frame is the quantity of audio samples taken during a video frame interval, while a sample is the smallest usable quantum of digital audio. (waveform and Mack, 2022)

## References

waveform, D. and Mack, J., 2022. Difference between frame and sample in waveform. [online] Sound Design Stack Exchange. Available at: <https://sound.stackexchange.com/questions/41567/difference-between-frame-and-sample-in-waveform#:~:text=A%20sample%20is%20the%20smallest,during%20a%20video%20frame%20interval.> [Accessed 23 March 2022].