Link to application in Coursera: https://cnepxqxy.labs.coursera.org/notebooks/Exercise%202.ipynb

# Introduction

Rice coding is a lossless compression method for compressing audio files. We first define our encoder and decoder functions based on rice coding. 

# Encoder

The encoding function is implemented using the quotient and remainder concept. 
For example, if the number to be encoded is 129, we divide it by M, which is 2 to the power of K. Let us assume K is 4, so M is 16. So, 129/16 gives us a quotient of 8 and a remainder of 1. The quotient is known as Q or R2, while the remainder is normally represented as R or R2. R2 (Q) will be encoded in unary, while R1 (R) will be encoded in binary. Thus, Q = 8 will be represented as a unary value of 111111110, while R = 1 will be represented as a binary value of 1. Then, we concatenate the Q + R and get 1111111101 as the encoded value.

In [1]:
S = 129
K = 4

def encode(S,K):
    M = 2**K
    Q = S // M
    R = S % M

    counter = Q
    temp = ''
    while counter != 0:    
        temp = temp + '1'
        counter = counter-1

    R2 = temp + '0'
    R1 = format(R, "b")
    R2R1 = R2 + R1
    return str(R2R1)
print(encode(S,K))

1111111101


# Decoder

Decoding is similar to the encoding process, but does the opposite. It takes in an encoded value and decodes it. For instance, in our previous example, where S = 129 and K = 4, the encoded value is 1111111101. In decoding, we will take in the 1111111101 value as well as the K value, and decode it back into 129. 

In [2]:
# my encoder/decoder is only for 8 bits (must set the number of bits beforehand)
def decode(R2R1, K):
    M = 2**K
    R2R1 = str(R2R1)
    for index in range(0,len(R2R1)):
        if R2R1[index] == '0':
            Q = R2R1[:index+1]
            R1 = R2R1[index+1:]
            break
    while len(R1) != K and len(R1) < K:
        R1 = '0' + R1

    counter = 0
    for i in Q:
        if i == '1':
            counter = counter + 1

    intR1 = int(R1, 2)
    S_prime = counter*M+intR1
    
    return str(S_prime)

R2R1 = '1111111101'
K = 4
print(decode(R2R1,K))

129


# Information about Sound1.wav

In [3]:
import numpy as np
import soundfile as sf

wave_file = sf.SoundFile('Sound1.wav')
print('Sample rate: {}'.format(wave_file.samplerate))
print('Channels: {}'.format(wave_file.channels))
print('Subtype: {}'.format(wave_file.subtype))

s1samples, fs = sf.read('Sound1.wav', dtype='int16')

print(np.shape(s1samples))
print(s1samples)

Sample rate: 44100
Channels: 1
Subtype: PCM_16
(501022,)
[-7 -7 -7 ...  0  2  1]


Here, we have extracted information about Sound1.wav.
Its sample rate is 44100Hz, it has 1 channel, and its in 16 bits. 
However, since the audio samples are too large in value, the compression rate will be very high, since the samples include negative values. Thus, we can instead read n frames and rice encode those into the .ex2 file, which will also achieve lossless compression and generate a lower compression rate since the each frame will be a value between 0 and 255. 

# Encoding with Sound1.wav

## K = 4

In [4]:
import wave

# open wav file to read frames
audio_file = wave.open("Sound1.wav", mode="rb")
frames = audio_file.readframes(audio_file.getnframes())

# open file to write encoded bits into
newfile = open("Sound1_Enc_k4.ex2", "w")
K = 4

for i in range(len(frames)):
    encoded = encode(frames[i], K)
    newfile.write(encoded + "\n")
print('End of write')

# close files
newfile.close()
audio_file.close()

End of write


Assuming we store bitstring in the .ex2 file, we have created our new file 8bits_Sound1_Enc_k4.ex2, which holds the encoded values when K = 4.

# Decoding file

In [5]:
# open encoded file to decode
encodedfile = open("Sound1_Enc_k4.ex2", "r")
K = 4
# creating a bytearray variable
newbytes = bytearray()

# decoding process and storing the decoded values into the bytearray
for line in encodedfile:
    if line != '\n':
        decoded = decode(line, K)
        newbytes.append(int(decoded))

# close file
encodedfile.close()
inbytes = bytes(newbytes)

Now that we have decoded the values in our encode file, we append each value to a bytearray, which we will then convert it to bytes, as bytearray is mutable while bytes is immutable. We use this method of conversion as bytes is the datatype required to write the new encoded_decoded wav file.

# Saving to new .wav file

In [6]:
# open original wav file and new enc_dec wav file
audio_file = wave.open("Sound1.wav", mode="rb")
newWav = wave.open('Sound1_Enc_Dec_k4.wav', "wb")

# write bytes and set params for the new wav file
newWav.setparams(audio_file.getparams())
newWav.writeframes(inbytes)

# close files
newWav.close()
audio_file.close()

# Checking the files are the same

In [7]:
import numpy as np
import soundfile as sf

wave_file = sf.SoundFile('Sound1_Enc_Dec_k4.wav')
print('Sample rate: {}'.format(wave_file.samplerate))
print('Channels: {}'.format(wave_file.channels))
print('Subtype: {}'.format(wave_file.subtype))

s1k4samples, fs = sf.read('Sound1_Enc_Dec_k4.wav', dtype='int16')

print(np.shape(s1k4samples))
print(s1k4samples)

Sample rate: 44100
Channels: 1
Subtype: PCM_16
(501022,)
[-7 -7 -7 ...  0  2  1]


In [8]:
if frames == inbytes:
    print("Data is the same")

Data is the same


As can be seen above, our Enc_Dec file has the **same data** as well as **file size** as the original. Thus, we have achieved this method of lossless data compression.

# % Compression

In terms of file size, we can calculate the compression rate by comparing the original file size with the encoded file size. The compression rate will thus be calculated as such: 

12MB (encoded file size) / 1MB (original file size) * 100 = 1200%

## K = 2

In [9]:
import wave

# open wav file to read frames
audio_file = wave.open("Sound1.wav", mode="rb")
frames = audio_file.readframes(audio_file.getnframes())

# open file to write encoded bits into
newfile = open("Sound1_Enc_k2.ex2", "w")
K = 2

for i in range(len(frames)):
    encoded = encode(frames[i], K)
    newfile.write(encoded + "\n")
print('End of write')

# close files
newfile.close()
audio_file.close()

End of write


Assuming we store bitstring in the .ex2 file, we have created our new file 8bits_Sound1_Enc_k2.ex2, which holds the encoded values when K = 2.

# Decoding with Sound1.wav

In [10]:
# open encoded file to decode
encodedfile = open("Sound1_Enc_k2.ex2", "r")
K = 2
# creating a bytearray variable
newbytes = bytearray()

# decoding process and storing the decoded values into the bytearray
for line in encodedfile:
    if line != '\n':
        decoded = decode(line, K)
        newbytes.append(int(decoded))

# close file
encodedfile.close()
inbytes = bytes(newbytes)

Now that we have decoded the values in our encode file, we append each value to a bytearray, which we will then convert it to bytes, as it is the datatype required to write the new encoded_decoded wav file.

# Saving to new .wav file

In [11]:
# open original wav file and new enc_dec wav file
audio_file = wave.open("Sound1.wav", mode="rb")
newWav = wave.open('Sound1_Enc_Dec_k2.wav', "wb")

# write bytes and set params for the new wav file
newWav.setparams(audio_file.getparams())
newWav.writeframes(inbytes)

# close files
newWav.close()
audio_file.close()

# Checking the files are the same

In [12]:
import numpy as np
import soundfile as sf

wave_file = sf.SoundFile('Sound1_Enc_Dec_k2.wav')
print('Sample rate: {}'.format(wave_file.samplerate))
print('Channels: {}'.format(wave_file.channels))
print('Subtype: {}'.format(wave_file.subtype))

s1k2samples, fs = sf.read('Sound1_Enc_Dec_k2.wav', dtype='int16')

print(np.shape(s1k2samples))
print(s1k2samples)

Sample rate: 44100
Channels: 1
Subtype: PCM_16
(501022,)
[-7 -7 -7 ...  0  2  1]


In [13]:
if frames == inbytes:
    print("Data is the same")

Data is the same


As can be seen above, our Enc_Dec file has the **same data** as well as **file size** as the original. Thus, we have achieved this method of lossless data compression.

# % Compression

In terms of file size, we can calculate the compression rate by comparing the original file size with the encoded file size. The compression rate will thus be calculated as such: 

33.4MB (encoded file size) / 1MB (original file size) * 100 = 3340%

Then, we do the whole of the same process for Sound2.wav file, where we set K = 4 and K = 2.

# Information about Sound2.wav

In [14]:
import numpy as np
import soundfile as sf

wave_file = sf.SoundFile('Sound2.wav')
print('Sample rate: {}'.format(wave_file.samplerate))
print('Channels: {}'.format(wave_file.channels))
print('Subtype: {}'.format(wave_file.subtype))

s2samples, fs = sf.read('Sound2.wav', dtype='int16')

print(np.shape(s2samples))
print(s2samples)

Sample rate: 44100
Channels: 1
Subtype: PCM_16
(504000,)
[ -999   886 -1325 ...    31  -876   339]


Here, we have extracted information about Sound2.wav.
Its sample rate is 44100Hz, it has 1 channel, and its in 16 bits, much like Sound1.wav, only the samples are different in values.
However, since 16 bits is too large, the compression rate will be very high, since the samples consists of negative values. Thus, we can instead read n frames and rice encode those into the .ex2 file, which will also achieve lossless compression and generate a lower compression rate since the each frame will be a value between 0 and 255. 

# Encoding with Sound2.wav

## K = 4

In [15]:
import wave

# opens wav file to read frames
audio_file = wave.open("Sound2.wav", mode="rb")
frames = audio_file.readframes(audio_file.getnframes())

# open file to write encoded bits into
newfile = open("Sound2_Enc_k4.ex2", "w")
K = 4

for i in range(len(frames)):
    encoded = encode(frames[i], K)
    newfile.write(encoded + "\n")
print('End of write')

# close files
newfile.close()
audio_file.close()

End of write


Assuming we store bitstring in the .ex2 file, we have created our new file 8bits_Sound2_Enc_k4.ex2, which holds the encoded values when K = 4.

# Decoding with Sound2.wav

In [16]:
encodedfile = open("Sound2_Enc_k4.ex2", "r")
K = 4
newbytes = bytearray()
for line in encodedfile:
    if line != '\n':
        decoded = decode(line, K)
        newbytes.append(int(decoded))

# close file
encodedfile.close()
inbytes = bytes(newbytes)

Now that we have decoded the values in our encode file, we append each value to a bytearray, which we will then convert it to bytes, as it is the datatype required to write the new encoded_decoded wav file.

# Saving to new .wav file

In [17]:
audio_file = wave.open("Sound2.wav", mode="rb")
newWav = wave.open('Sound2_Enc_Dec_k4.wav', 'wb')

newWav.setparams(audio_file.getparams())
newWav.writeframes(inbytes)
newWav.close()
audio_file.close()

# Checking the files are the same

In [18]:
import numpy as np
import soundfile as sf

wave_file = sf.SoundFile('Sound2_Enc_Dec_k4.wav')
print('Sample rate: {}'.format(wave_file.samplerate))
print('Channels: {}'.format(wave_file.channels))
print('Subtype: {}'.format(wave_file.subtype))

s2k4samples, fs = sf.read('Sound2_Enc_Dec_k4.wav', dtype='int16')

print(np.shape(s2k4samples))
print(s2k4samples)

Sample rate: 44100
Channels: 1
Subtype: PCM_16
(504000,)
[ -999   886 -1325 ...    31  -876   339]


In [19]:
if frames == inbytes:
    print("Data is the same")

Data is the same


# % Compression

In terms of file size, we can calculate the compression rate by comparing the original file size with the encoded file size. The compression rate will thus be calculated as such: 

12.7MB (encoded file size) / 1.01MB (original file size) * 100 = 1257%

## K = 2

In [20]:
import wave

# opens wav file to read frames
audio_file = wave.open("Sound2.wav", mode="rb")
frames = audio_file.readframes(audio_file.getnframes())

# open file to write encoded bits into
newfile = open("Sound2_Enc_k2.ex2", "w")
K = 2

for i in range(len(frames)):
    encoded = encode(frames[i], K)
    newfile.write(encoded + "\n")
print('End of write')

# close files
newfile.close()
audio_file.close()

End of write


Assuming we store bitstring in the .ex2 file, we have created our new file 8bits_Sound2_Enc_k2.ex2, which holds the encoded values when K = 2.

# Decoding with Sound2.wav

In [21]:
encodedfile = open("Sound2_Enc_k2.ex2", "r")
K = 2
newbytes = bytearray()
for line in encodedfile:
    if line != '\n':
        decoded = decode(line, K)
        newbytes.append(int(decoded))

# close file
encodedfile.close()
inbytes = bytes(newbytes)

Now that we have decoded the values in our encode file, we append each value to a bytearray, which we will then convert it to bytes, as it is the datatype required to write the new encoded_decoded wav file.

# Saving to new .wav file

In [22]:
audio_file = wave.open("Sound2.wav", mode="rb")
newWav = wave.open('Sound2_Enc_Dec_k2.wav', 'wb')

newWav.setparams(audio_file.getparams())
newWav.writeframes(inbytes)
newWav.close()
audio_file.close()

# Checking the files are the same

In [23]:
import numpy as np
import soundfile as sf

wave_file = sf.SoundFile('Sound2_Enc_Dec_k2.wav')
print('Sample rate: {}'.format(wave_file.samplerate))
print('Channels: {}'.format(wave_file.channels))
print('Subtype: {}'.format(wave_file.subtype))

s2k2samples, fs = sf.read('Sound2_Enc_Dec_k2.wav', dtype='int16')

print(np.shape(s2k2samples))
print(s2k2samples)

Sample rate: 44100
Channels: 1
Subtype: PCM_16
(504000,)
[ -999   886 -1325 ...    31  -876   339]


In [24]:
if frames == inbytes:
    print("Data is the same")

Data is the same


# % Compression

In terms of file size, we can calculate the compression rate by comparing the original file size with the encoded file size. The compression rate will thus be calculated as such: 

35.3MB (encoded file size) / 1.01MB (original file size) * 100 = 3495%

# Further Development

A way to improve the compression rate is to perhaps set a larger K value, reduce the range of samples, or saving the file in a different way.

### 1. Larger K

Since Sound2.wav produces a larger compression % than Sound1.wav, we can simply test the extent of our way to reduce the compression % only on Sound2.wav.

#### Sound2.wav (K = 6)

In [25]:
import wave

# opens wav file to read frames
audio_file = wave.open("Sound2.wav", mode="rb")
frames = audio_file.readframes(audio_file.getnframes())

# open file to write encoded bits into
newfile = open("Sound2_Enc_k6.ex2", "w")
K = 6

for i in range(len(frames)):
    encoded = encode(frames[i], K)
    newfile.write(encoded + "\n")
print('End of write')

# close files
newfile.close()
audio_file.close()

End of write


Here, we can see that the compression % has been significantly reduced! From when K = 2 and K = 4, the compression % were more than 1000%. Now when we use K = 6, compression % reduced to 800%. In terms of file size, compression % is 8.48/1.01*100=839%. Let us try a higher K value of 8.

#### Sound2.wav (K = 8)

In [26]:
import wave

# opens wav file to read frames
audio_file = wave.open("Sound2.wav", mode="rb")
frames = audio_file.readframes(audio_file.getnframes())

# open file to write encoded bits into
newfile = open("Sound2_Enc_k8.ex2", "w")
K = 8

for i in range(len(frames)):
    encoded = encode(frames[i], K)
    newfile.write(encoded + "\n")
print('End of write')

# close files
newfile.close()
audio_file.close()

End of write


Here, we can see that the compression % has been significantly reduced! From when K = 2 and K = 4, the compression % were more than 1000%. Now when we use K = 8, compression % reduced to 800%. In terms of file size, compression % is 8.56/1.01*100=847%. 
This shows that this method of improving the compression rates is successful. However, it is important to note that for every sound file, the optimal K value is different. It may seem that having a larger K value reduces the compression rate, however, as can be seen here, if we compare K = 6 and K = 8, K = 6 suits Sound2.wav more than when K = 8. This idea can be further explained and proven with the codes below.

#### Sound1.wav (K = 6)

In [27]:
import wave

# opens wav file to read frames
audio_file = wave.open("Sound1.wav", mode="rb")
frames = audio_file.readframes(audio_file.getnframes())

# open file to write encoded bits into
newfile = open("Sound1_Enc_k6.ex2", "w")
K = 6

for i in range(len(frames)):
    encoded = encode(frames[i], K)
    newfile.write(encoded + "\n")
print('End of write')

# close files
newfile.close()
audio_file.close()

End of write


Here, we can see that the compression % has been significantly reduced! From when K = 2 and K = 4, the compression % were more than 1000%. Now when we use K = 6, compression % reduced to 700%. In terms of file size, compression % is 7.59/1.01*100=759%. Let us try a higher K value of 8.

#### Sound1.wav (K = 8)

In [28]:
import wave

# opens wav file to read frames
audio_file = wave.open("Sound1.wav", mode="rb")
frames = audio_file.readframes(audio_file.getnframes())

# open file to write encoded bits into
newfile = open("Sound1_Enc_k8.ex2", "w")
K = 8

for i in range(len(frames)):
    encoded = encode(frames[i], K)
    newfile.write(encoded + "\n")
print('End of write')

# close files
newfile.close()
audio_file.close()

End of write


Here, we can see that the compression % has been significantly reduced! From when K = 2 and K = 4, the compression % were more than 1000%. Now when we use K = 8, compression % reduced to 700%. In terms of file size, compression % is 7.4/1.01*100=740%. 
This proves our point that this method of improving the compression rates is successful and that for every sound file, the optimal K value is different, as proven with Sound1.wav having an optimal value of K = 8 (or maybe higher) and Sound2.wav being K = 6. 

### 2. Reducing range of samples

This will not be implemented in code due to the insufficient syllabus coverage but the theory of the idea behind it will be explained.

Rice coding was originally known as residual coding, which is used to compress the residuals. Residuals are the errors between the predictions of the model and the real signal. Thus, to implement this, a prediction algorithm has to be implemented to get the list of residuals, for example, a linear predictor. The residuals will thus be smaller than the original sample so it can be encoded with less bits, not as high as 2000. The compression rate will thus be improved to be significantly less than the original file.

Low compression rate can be achieved if the following measures have been fully or partially taken:
- Apply Rice encoding to residuals (instead of audio samples directly)
- Scale down audio samples before applying Rice encoding

### 3. Saving file

If we are concerned with compressing the file size, since the data was saved as bitstring, we can try to save it in bytes instead to get a smaller file. However, to be saved in bytes, the encoded bitstring must first be converted to integers. This makes the rice encoding redundant, as well as increases the time complexity of the application, as the data can directly be converted into bytes, or they are already in bytes. Thus, a better alternative would be to use residual coding with an appropriate algorithm to predict, or compress audio using other file formats such as flac.