Permalink
Browse files

Audio manual update

  • Loading branch information...
nitslp-ri committed Jan 26, 2019
1 parent 90c557a commit 90da141a0230511c88b345920a6b4c755753e63d
Showing with 104 additions and 5 deletions.
  1. +9 −3 INSTALL.txt
  2. +93 −0 julius/Audio.md
  3. +2 −2 julius/Options.md
@@ -32,11 +32,17 @@ specifying "--prefix=..." to configure script.
Linux (tested on Ubuntu-14.04)

% sudo apt-get install build-essential zlib1g-dev libsdl2-dev
% sudo apt-get libasound2-dev (or libpulse-dev, whichever you like)
% sudo apt-get libasound2-dev (or libpulse-dev, whichever you want to enable)
% ./configure
% make
% (optional) make install

If you want only one audio interface, use `--with-mictype=xxx`.

% ./configure --with-mictype=pulseaudio
% make
% (optional) make install

-----------------------------------------------
Mingw on Ubuntu (tested on 16.04)

@@ -47,7 +53,7 @@ cannot be build on Ubuntu.
% sudo apt-get install libz-mingw-w64-dev
% ./configure --host=x86_64-w64-mingw32 --disable-sdl
% make

-----------------------------------------------
Cygwin

@@ -115,5 +121,5 @@ your own compiler flags via "CFLAGS" environment value, like this:
% make

(tested on cross-compilation on Ubuntu)

-----------------------------------------------
@@ -0,0 +1,93 @@
# Audio Input

Julius can recognize audio data via file, live audio device and tcp-ip network. A single source is chosen by option [-input](https://github.com/julius-speech/julius/blob/master/julius/Options.md#-input-micfilerawfilemfcfileoutprobadinnetvecnetstdinnetaudioalsaossesdpulseaudio).

Data should be in 16 bit (signed short), monaural (1 channel).

Note that **sampling rate of input should be set to the same as the training data** of acoustic model. If you give data with different sampling rate with the acoustic model condition, it will not be recognized correctly. Julius has no down-sampling or up-sampling scheme in it.

Julius assumes the default sampling rate to 16 kHz, so when using acoustic model trained with other sampling rate, the rate should be given explicitly by option, either [-smpPediod](https://github.com/julius-speech/julius/blob/master/julius/Options.md#-smpperiod-period) or [-smpFreq](https://github.com/julius-speech/julius/blob/master/julius/Options.md#-smpfreq-hz),

Descriptions how to set up audio input is as follow:

## File Input

File input is chosen when specifying "`-input file`".
Supported file types are:

- WAV format (.wav), Linear PCM
- RAW format (no header), signed short (16bit), Big Endian

Other formats such as .au, .nist and more can be used by using `libsndfile`. To use, install `libsndfile` headers and libraries before build.

Notes on RAW format:

- Samples should be in **Big Endian** byte order.
- Sampling rate check will not work because no header information is available in RAW file.

Voice activity detection is disabled by default for file. Each file is assumed as a single utterance speech. VAD can be enabled by option [-cutsilence](https://github.com/julius-speech/julius/blob/master/julius/Options.md#-cutsilence--nocutsilence) to perform voice part detection, just as the same as live audio capture.

Batch processing can be performed by giving the list of input files by option [-filelist](https://github.com/julius-speech/julius/blob/master/julius/Options.md#-filelist-filename).

## Capture Live Audio

Julius can read live audio input from audio device and perform on-the-fly recognition with low latency.

### On Linux

Available audio APIs are:

- `alsa` - ALSA (Advanced Linux Sound Architecture)
- `oss` - OSS (Open Sound System)
- `pulseaudio` - PulseAudio
- `esd` - ESD (Enlightened Sound Daemon)

ALSA, PulseAudio and ESD requires corresponding library to be incorporated into Julius. See the Installation instruction how to enable them.

When `-input mic` is specified, live audio capturing is chosen. All the enabled APIs are searched by the order of the list above, and the first one found will be used. You can also specify which audio API to use by "[`-input`](https://github.com/julius-speech/julius/blob/master/julius/Options.md#-input-micfilerawfilemfcfileoutprobadinnetvecnetstdinnetaudioalsaossesdpulseaudio) `apiname`". For example, `-input pulseaudio` will choose PulseAudio.

Choosing audio devices:

- On ALSA, set device name to environment variable "`ALSADEV`". See [ALSA document](https://www.alsa-project.org/main/index.php/DeviceNames#Capture_device_names) about naming rules.
- On OSS, set device path to environment variable "`AUDIODEV`". Default is "`/dev/dsp`".

### On Windows

At run time, Julius try to check for supported audio interface in the following order consulting [portaudio library](http://www.portaudio.com/).

- WASAPI
- ASIO
- DirectSound
- MME

The first found one will be chosen. DirectSound will be chosen in most PCs.

The default device will be opened by default. To open other device, set the device's name to env "`PORTAUDIO_DEV` or index number to env "`PORTAUDIO_DEV_NUM`". The list of available devices, their index numbers and names, are outputted at startup process of Julius in the following format:

```text
id [desc1: desc2]
```

You can choose the device by either setting its "`id`" number by `PORTAUDIO_DEV_NUM`, or setting the string "`desc1: desc2`" to `PORTAUDIO_DEV`. If the same name is found, the first one will be chosen.

### On Other OS

Default audio interface, default audio device will be used.

## Network

"`-input adinnet`" enables network streamed audio input. Julius will wait for tcp-ip connection from client, and then start receiving audio streams from the client. The tool [adintool](https://github.com/julius-speech/julius/tree/master/adintool) can be run as a sample streaming client.

```shell
% julius ... -input adinnet
% adintool -in mic -out adinnet -server localhost
```

No checks for sampling frequency, the client should sent the audio data whose sampling rate matches the conditions.

## Checking Audio Input

It is strongly recommended to test your audio setting separately with Julius setup.
Use [adinrec](https://github.com/julius-speech/julius/tree/master/adinrec) or [adintool](https://github.com/julius-speech/julius/tree/master/adintool) to check for audio recording and receiving before Julius. They are simple, and uses the same audio module as Julius, thus "what they record is what Julius listens".

You can also snoop what Julius listens by logging the audio inputs to files. Option [-record](https://github.com/julius-speech/julius/blob/master/julius/Options.md#-record-dir) records all segmented input into files to a specified directory.
@@ -256,7 +256,7 @@ On some OS, instead of `mic`, you can explicitly specify available audio API (al
(With -input rawfile|mfcfile|outprob) perform recognition on
all files listed in the file. The file should contain input
file per line. Engine will end when all of the files are
processed.
processed. See also `-outfile` for per-input result output.

### -48

@@ -320,7 +320,7 @@ Silence margin at the start of speech segment in milliseconds.
Silence margin at the end of speech segment in milliseconds.
(default: 400)

### -chunk_size
### -chunk_size size

Buffer length of the audio input can be set with number of
samples (default number is 1000). If you set small number, you

0 comments on commit 90da141

Please sign in to comment.