# Audio Basics

A walkthough the different functions in pyramidman that handle the recording, playing and processing of audio. It is mainly for didactic and testing purposes.

In [1]:
%load_ext autoreload
%autoreload 2

from pyramidman.audio_parameters import AudioParameters
from pyramidman.basic_audio_IO import play_audio, record_audio
from pyramidman.audio_utils import get_available_microphones, get_sysdefault_microphone_index, get_all_devices_str
from pyramidman.queue_utils import record_with_queue
from pyramidman.unwrapper import unwrap

from pyramidman.Ihy import get_audio_menu_wav_file

from pyramidman.audio_utils import calibrate_microphone


import speech_recognition as sr
import numpy as np
from scipy import signal
from scipy.io import wavfile

import plotly.graph_objs as go
from IPython.display import display
import ipywidgets as widgets

### Instance of the class AudioParameters 

In [2]:
audio_params = AudioParameters()
unwrap(audio_params)

<AudioParameters>	object has children:
    <int>	chunk:	1024
    <int>	sample_format:	8
    <NoneType>	subtype:	None
    <int>	channels:	1
    <int>	sample_rate:	48000
    <int>	input_device_index:	0




In [3]:
audio_params.get_input_device_info()

{'name': 'HDA Intel PCH: ALC295 Analog (hw:0,0)',
 'hostapi': 0,
 'max_input_channels': 2,
 'max_output_channels': 2,
 'default_low_input_latency': 0.005804988662131519,
 'default_low_output_latency': 0.005804988662131519,
 'default_high_input_latency': 0.034829931972789115,
 'default_high_output_latency': 0.034829931972789115,
 'default_samplerate': 44100.0}

## Information of avilable devices

In [4]:
get_all_devices_str()

   0 HDA Intel PCH: ALC295 Analog (hw:0,0), ALSA (2 in, 2 out)
   1 HDA Intel PCH: HDMI 0 (hw:0,3), ALSA (0 in, 8 out)
   2 HDA Intel PCH: HDMI 1 (hw:0,7), ALSA (0 in, 8 out)
   3 HDA Intel PCH: HDMI 2 (hw:0,8), ALSA (0 in, 8 out)
   4 HDA Intel PCH: HDMI 3 (hw:0,9), ALSA (0 in, 8 out)
   5 HDA Intel PCH: HDMI 4 (hw:0,10), ALSA (0 in, 8 out)
   6 sysdefault, ALSA (128 in, 128 out)
   7 front, ALSA (0 in, 2 out)
   8 surround40, ALSA (0 in, 2 out)
   9 surround51, ALSA (0 in, 2 out)
  10 surround71, ALSA (0 in, 2 out)
  11 hdmi, ALSA (0 in, 8 out)
  12 pulse, ALSA (32 in, 32 out)
  13 dmix, ALSA (0 in, 2 out)
* 14 default, ALSA (32 in, 32 out)

In [5]:
devices_dict = get_available_microphones()
devices_dict

{'0': 'HDA Intel PCH: ALC295 Analog (hw:0,0)',
 '6': 'sysdefault',
 '12': 'pulse',
 '14': 'default'}

## Play audio

We pass it a AudioParameters instance, from it, it is going to get chunk.

In [6]:
file_to_play = "../audios/standard/english.wav"
play_audio(audio_params, filename = file_to_play )

## Recording audio

First we should select a proper mycrophone

In [7]:
audio_params.set_sysdefault_microphone_index()
audio_params.set_default_input_parameters()
unwrap(audio_params)

<AudioParameters>	object has children:
    <int>	chunk:	1024
    <int>	sample_format:	8
    <NoneType>	subtype:	None
    <int>	channels:	1
    <int>	sample_rate:	48000
    <int>	input_device_index:	6




### Call the recording function with the correct parameters

In [8]:
file_to_record = "../audios/temp/recording.wav"
record_audio(audio_params, seconds = 3, filename = file_to_record)

Recording
Finished recording


Play the recorded audio

In [9]:
play_audio(audio_params,  filename = file_to_record)

### Record using queues

It seems better, with way less in between cuts that are probably because there is less delay due to processing of the chucks. But still as time grows, this shit takes too much time.

In [10]:
filename_w = "../audios/temp/caec2.wav"
record_with_queue(audio_params, filename_w)


Recording finished: 


In [11]:
play_audio(audio_params, filename_w)

## Using a Microphone

The speech_recognition library has a Microphone class that is helpful at recording data. We can get one instance directly from the AudioParameters class.

In [12]:
mic = audio_params.get_microphone()

In [13]:
mic.device_index

6

In [14]:
mic.list_microphone_names()

['HDA Intel PCH: ALC295 Analog (hw:0,0)',
 'HDA Intel PCH: HDMI 0 (hw:0,3)',
 'HDA Intel PCH: HDMI 1 (hw:0,7)',
 'HDA Intel PCH: HDMI 2 (hw:0,8)',
 'HDA Intel PCH: HDMI 3 (hw:0,9)',
 'HDA Intel PCH: HDMI 4 (hw:0,10)',
 'sysdefault',
 'front',
 'surround40',
 'surround51',
 'surround71',
 'hdmi',
 'pulse',
 'dmix',
 'default']

We can use this instance to together with the recognizer of the library in order to capture text.

In [15]:
r = sr.Recognizer()
with mic as source:                # use the default microphone as the audio source
    audio = r.record(source, duration = 3)                   # listen for the first phrase and extract it into audio data

audio

<speech_recognition.AudioData at 0x7f13a9416a50>

In [16]:
unwrap(audio)

<AudioData>	object has children:
    <bytes>	frame_data
    <int>	sample_rate:	48000
    <int>	sample_width:	2

  <bytes>	frame_data has children:




In [17]:
filename_mic = '../audios/temp/hello_world.wav'

with mic as source:
    audio = r.record(source,duration = 5)

with open(filename_mic, "wb") as f:
    f.write(audio.get_wav_data())

In [18]:
play_audio(audio_params, filename_mic)

### Convert audio to numpy array

In [19]:
audio_raw_data = audio.frame_data
audio_raw_data = audio.get_raw_data()
type(audio_raw_data)

bytes

In [20]:
audio_array = np.frombuffer(audio.frame_data, np.int16)
audio_array

array([-1502, -1436, -1600, ..., -4596, -4376, -4170], dtype=int16)

## Plotting of the audiowave.

In [21]:
tabs = get_audio_menu_wav_file(filename_mic)
display(tabs)

Tab(children=(FigureWidget({
    'data': [{'line': {'color': 'deepskyblue'},
              'name': 'AAPL High'…

## Own listener implementation

Since the listener from the speech_recognition library was not good enough, we have implemented another one.

The main changes are:
- Refactoring of code.
- Removing the snowboy option
- Adding timestamp of the returned data.
- Adding 

In [22]:
from pyramidman.listener import listen

In [26]:
# Maximum number of seconds of non-speaking seconds before and after the audio
r.non_speaking_duration

# Number of non-speaking seconds to be considered end of sentence.
r.pause_threshold

# Minimum number of seconds of a sentence.
r.phrase_threshold

# The amount of energy in 
r.energy_threshold

# Number of bytes in the 
mic.SAMPLE_WIDTH

2

In [29]:
with mic as source:
    audio = listen(r, source, timeout = 1, phrase_time_limit=5)

SyntaxError: keyword can't be an expression (<ipython-input-29-2126205d9c6b>, line 2)