No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Tidzam: Ambient Sound Analysis

Tidzam is an ambient sound analysis system for outdoor environment. It is a component of the Tidmarsh project which monitors the environmental evolution of an industrial cramberry farm during its ecological restoration of wetland. Tidzam analyses the audio streams generated by the deployed microphones into the wild in order to detect the sonic events happening on the site, such as bird calls, insects, frogs, rain, storms, car noise, human voices and others. This system is used to cross-validate other sensors for weather monitoring, to identify, geolocalize and track present wildlife and bird specimens over time. It also controls the audio mixers in order to mute or to change the gain on noisy microphones.

This system uses deeplearning technology in order to learn its classification tasks. A Human Computer Interface API provides tools to build a training database from the targeted sonic environment. A new task of classification could be boostraped by external audio recordings in order to create poor classifiers which will be refined by the addition of audio samples that the system automatically extracts from the environment. Therefore the system improves its accurancy after several generations of its iterative learning process.

Watch the video

Get Started

Tidzam is composed of several independent processes which can require important ressources in terms of CPU, GPU and memory according to the classification tasks complexity and the number of processed audio streams. Its different processes are multi-threaded and can be deployed on a cluster-based architectures.


Several external components are required by Tidzam and must be installed. Currently, the system has been tested on clusted based Ubuntu 16.04 with Titan X GPUs.

tidzam install


Tidzam has been implemented on Python 3.x and uses the following external components:

  • JACK server is a low latency audio mixer server which routes the audio streams between the different components of the system. It is managed by the TidzamStreamManager which loads the incoming audio sources, configure them and monitor the JACK server. Manual client configurations can be operated without conflict with the TidzamStreamManager.
  • Icecast server pushs the audio streams processed by the system to a Web interface in order to allow the clients to listen all the streams independently. This functionnality is required when the incoming audio sources contains a lot of channels (like in OPUS encoding) which cannot be played with a classical audio player. Each channel of the audio input sources is splited into an independant mono channel.
  • MPV and FFMPEG are used to load the incoming audio sources in JACK server and to push them to the Icecast server.
  • Tensorflow is the deeplearning framework on which is implemented Tidzam. It is recommended to install its GPU-enabled version for real-time processing and for CPU load / memory saving.
  • Python Package Dependencies:


The tidzam script can be used to start, stop and restart all the processes on a single server architecture with one system command. The check option verifies that the system is running properly and restart it if not.

tidzam [start | stop | restart | check]


alt text


TidzamStreamManager feeds the JACK server with audio streams, denoted sources, which are routed to the Icecast Server and the TidzamAnalyzer. Currently all audio formats supported by MPV can be used on Tidzam such as MPV is responsible to load the sources on the JACK Server. A source can be a local file on the server, a Web URL or a LiveStream. A LiveStream is a PCM16bits stream pushed through to the TidzamStreamManager. It can be for example a recording from the microphone of a mobile device.

Usage: [options]

  -h, --help            show this help message and exit
                        Set the Jack ring buffer size in seconds (default: 100
                        Set the sample rate (default: 44100).
                        Number of available ports for live streams.
                        (default: 10).
  --port=PORT           Socket.IO Web port (default: 8080).
                        Socket.IO address of the tidzam server (default:
  --sources=SOURCES     JSON file containing the list of the initial audio
                        source streams (default: None).
  --debug=DEBUG         Set debug level (Default: 0).

An initial JSON config file can be provided to the manager (--sources options) in order to automatically load some sources at the starting. In such case, they are automatically defined such as permanent sources which means that if the streaming is disconnected, the system will automatically try to reload them periodically.

JSON config file

JACK Server

The JACK Server can handle a lot of different clients which can increase drastically the CPU load til saturating the system ressources. Several parameters should be adapted according to the number of streams that Tidzam is supposed to handle according to the available ressources of the server.

  • --port-max defines the maximum number of clients that the JACK servers will authorize. It is the safety lock in Tidzam to protect the system load. This parameter bounds Tidzam such as if the maximum number of clients is reached, the TidzamStreamManager will not be able to load new sources. An audio source in Tidzam consums in the worth case and by source: (3 x #NumberOfSources x #LoadedChannels). The

  • -r desactivates the real-time mode (if the OS doesn't support it) or if the system load cannot handle it.

  • -t defines the timeout to kick out a jack client. When a client is disconnected, the TidzamStreamManager will automatically try to reload it. If the client is disconnected because it is too slow to response to JACK server due to the system load, then its restarting will produce the disconnection of another one and them trigger a cascade of failures. In order to avoid such situation, the timeout should be enough large to avoid disconnection due to system load or the number of client MUST be reduced with the --port-max option.

  • -ddumpy defines a dumpy audio driver for JACK server in order to not lock the hardware sound device.

  • -r defines the sample rate of JACK server (must be the same that one of Tidzam).

  • ** -pXXXX ** defines the size XXXX of JACK buffer used by the client and so their latency. An higher value increases the system latency but decrease the CPU load. It must be a power of two value.

jackd  --port-max 2048 -v -r -t50000 -ddummy -r44100 -p8192

Icecast Server

The Icecast Server receives the audio streams from different FFMPEG clients which are started and managed by the TidzamStreamManager. Any format supported by FFMPEG can be used as output stream on the Icecast server. The special copy mode of FFMPEG can be also used in order to copy directly the input source to Icecast output without encoding-decoding process (see Source Management - Source Loading). This feature is used when the input source contains more channels (like in OPUS stream) than FFMPEG can decode.

Socket.IO Interface

TidzamStreamManager has a interface in order to manage the sources during its runtime.

Source Management

A source can be loaded from a remote URL, from a local file or from a LiveStream. A permanent source can also have a local database composed of its previous recordings with their filename formatted by database_name-YYYY-MM-DD-HH-MM-SS.{ogg | opus | wav} (see JSON config file). Therefore the request of source loading can select the proper file to load through the field date. If the date is in the future, the URL field will be loaded as the online audio stream. By default all channels are loaded but a list of channels can be also provided through the field channels. If the field is_permanent is turn on True (default is False), this source would be considered as permanent and will be restarted in case of termination. If the field format (default is ogg) is turn on copy, the TidzamStreamManager will use the FFMEPG copy option to push the stream in Icecast without any encoding-decoding processing.

Source Loading

Request on "sys" event:

      'is_permanent':True | False,
      'format':'ogg | copy'
Source Unloading

Request on "sys" event:

Get the List of Source Databases:

Request on "sys" event:


Response on "sys" event:

          [start_time, end_time],
          [start_time, end_time],
      }, ...

Live Stream Management

New Live Stream

A LiveStream is automatically created and connected to the Icecast server and TidzamAnalyzer when its data is received on event "audio" from a client. The audio stream MUST be in PCM16bits format. The system will generate a unique portname identifier based on the SID of the client which can be request by: Getting the created portname:

Request on event 'sys'


Response on event 'sys'

Close a live stream

Request on event 'sys'



Input Stream Loading

TidzamAnalyzer plays the different loaded classifiers on its input streams which can be a regular audio file (--stream argument) or source channels from the Jack server (--jack argument). TidzamStreamManager does not connect the sources to the TidzamAnalyzer, it must be indicated in --jack argument as a list of portname pattern machings (for example impoudment- will connect all portname starting by this prefix).

Hierarchical Expert Architecture

The classifiers, that must be loaded, are specified by the --nn argument. They can be cascaded if there is a pattern matching between their classe name. The primary classifier must be named by selector. If a classifier contains a classe that matchs the name of another classifier, then the output classe of the first classifier weights all classes of the second one. (For example, the classe birds of the selector classifier weights the classe of the bird specimen classifier named birds).

Automatic Recording Extraction

TidzamAnalyzer has an optional module for automatic recording extraction which would stores them in the folder specified by --out argument. This rule based engine allows you to indicate under which conditions a recording must be extract. A rule can be applied on one or multiple channels, and regarding of one or multiple classes. Several rules can be applied simulatenously. These rules can be provided by the command line or through socket.IO which provides more flexibility and more functionalities.

Usage: --nn=build/test [--stream=stream.wav | --jack=jack-output] [OPTIONS]

  -h, --help            show this help message and exit
  -s STREAM, --stream=STREAM
                        Input audio stream to analyze.
  -c CHANNEL, --channel=CHANNEL
                        Select a particular channel (only with stream option).
  -j JACK, --jack=JACK  List of Jack audio mixer ports to process.
  -n NN, --nn=NN        Neural Network session to load.
  -o OUT, --out=OUT     Output folder for audio sound extraction.
  --extract=EXTRACT     List of classes to extract (--extract=unknown,birds).
                        Specify an id list of particular channels for the
                        sample extraction (Default: ).
  --show                Play the audio samples and show their spectrogram.
  --overlap=OVERLAP     Overlap value (default:0).
  --chainAPI=CHAINAPI   Provide URL for chainAPI username:password@url
                        (default: None).
  --port=PORT           Socket.IO Web port (default: 8080).
  --debug=DEBUG         Set debug level (Default: 0).

Socket.IO Interface

Getting output prediction

Subscription on event 'sys'

      'result':[ classe2 ],
        'classe1': 0.001,
        'classe2': 0.91
Classifier list

Request on event 'sys'

Extraction rules
Getting extraction rules

Request on event 'SampleExtractionRules'

Setting extraction rules

Extraction rules define when a sample must be extracted. Its extraction is determined according to the parameter rate which defines its extraction probability when an element of classes is detected. If rate is set to auto, its extraction probability depends of the sample distribution in the database. The length parameter defines the audio file length in seconds (default 0.5 second), the detected sample will be localized in the middle of the audio file. object_filter parameter applies a filter which doesn't extract samples in which the spectrogram energy is located on the sample border. It tries to extract centered sound object. Request on event 'RecorderRules'

      "channels":["channel1", "channel2", etc],
      "classes":["classe1", "classe2", etc],
Getting rules

Return the list of extracted rules with the number of recordings extracted by each rule. Request on event 'RecorderRules'

      "channels":["channel1", "channel2", etc],
      "classes":["classe1", "classe2", etc],
  }, ...]
Getting Information on extracted recording in database

Return the list of classes which have been extracted by automatic rules with the number of concerned recordings. Request on event 'RecorderRules'


Tidzam Training

The TidzamTrain process has a cluster-based implementation of Asynchronous Between-graph Replication which allows the training to be executed in parallel on several GPUs distributed on several machines. A Parameter Server (ps) is responsible to aggregate and share the weights between the different distributed workers. If TidzamTrain is executed without explicit cluster configuration (see --workers, --ps, --task-index and --job), only local GPUs will be used. The training and testing datasets can be provided by two approaches:

  • On the fly A set of independent workers generate on live the batchs directly from the audio file folders specified in --dataset-train. Based on an online file indexing, some files are used for the training and some others for the validation (hardcoded rate of 80%). A master process is managing the sample batch queue in order to deliver them to the different workers of the training process.

  • Compiled Dataset approach uses datasets which has been processed offline with the Database Editor or Database Tool. A dataset is composed of several archives containing the audio FFT samples with their labels. The dataset MUST be manually randomize and splitted into the two training and a validation datasets which must be provided to the trainer with the arguments --dataset-train and --dataset-test.

Usage: --dataset-train=mydataset --dnn=models/ --out=save/ [OPTIONS]

  -h, --help            show this help message and exit
  -d DATASET_TRAIN, --dataset-train=DATASET_TRAIN
                        Define the dataset to train.
  -t DATASET_TEST, --dataset-test=DATASET_TEST
                        Define the dataset for evaluation.
  -o OUT, --out=OUT     Define output folder to store the neural network and
  --dnn=DNN             DNN model to train (Default: ).
                        Number of training iterations (Default: 20000
                        Number of training iterations between each testing
                        step (Default: 10).
                        Size of the training batch (Default:64).
                        Learning rate (default: 0.001).
                        Step period to compute statistics, embeddings and
                        feature maps (Default: 10).
                        Number of embeddings to compute (default: 50)..
  --job-type=JOB_TYPE   Selector the process job: ps or worker
                        Provide the task index to execute (default:0).
  --workers=WORKERS     List of workers
                        (worker1.mynet:2222,worker2.mynet:2222, etc).
  --ps=PS               List of parameter servers
                        (ps1.mynet:2222,ps2.mynet:2222, etc).

Disable GPU ?


Database Tool

Database Tool is a inline command interface in order to build compiled databases.

Usage: python src/

  -h, --help            show this help message and exit
  --dataset=DATASET     Open an exisiting dataset.
  --rename=RENAME       Rename the dataset.
  --classe=CLASSE       Create an empty classe.
                        Load the audio file folder in the dataset (as a single
                        classe if --classe is specified).
  --merge=MERGE         Merge the dataset with another one (as a single classe
                        if --classe is specified).
  --split=SPLIT         Extraction proportion of a sub dataset for testing
                        --split in [0...1]
                        Name for the generated dataset.
  --balance             Automatic balance the classe in the dataset (by
                        duplicating samples in small classes).
  --randomize           Randomize the dataset.
  --file-count          Return the number of files which compose the dataset.
  --metadata            Generate metadata information and store them on file
  --info                Return some dataset information.

Database Editor

Database Editor is Text-Based User Interface with a menu to create and manipulate compiled databases.

Usage: python src/

  -h, --help        show this help message and exit
  --dataset=OPEN    Open an exisiting dataset
  --stream=STREAM   Sample extraction from an audio stream [WAV/OGG/MP3].
  --play            Play the dataset content.
  --play-id=PLAYID  Play the dataset content of a particular classe.
  -s, --show        Select a specific classe ID for --play option.

Neural Network vizualisation

TidzamTrainer generates periodically (see --stats-step) some summaries for tensorboard :

  • Accurancy, costs, recall, precision and confusion matrix
  • GraphDef with memory usage and computation distribution over the devices
  • Weight histograms, distributions and feature maps
  • Embeddings for 3D visualization of output classes distance.

tensorboard --logdir=checkpoints