# VGGish Audio Embedding Collab

This colab demonstrates how to extract the AudioSet embeddings, using a VGGish deep neural network (DNN).

# Importing and Testing the VGGish System

Based on the directions at: https://github.com/tensorflow/models/tree/master/research/audioset

In [1]:
!python --version

Python 3.5.5 :: Anaconda custom (64-bit)


In [2]:
!pip install --upgrade pip
!pip install numpy scipy
!pip install resampy tensorflow-gpu six 

Requirement already up-to-date: pip in /anaconda/envs/py35/lib/python3.5/site-packages (18.0)


In [3]:
!pip list | grep tensorflow

tensorflow-gpu                        1.11.0     


In [4]:
!sudo git clone https://github.com/google/youtube-8m.git

fatal: destination path 'youtube-8m' already exists and is not an empty directory.


In [5]:
# Check to see where are in the kernel's file system.
!pwd

/home/yvradsmi


In [6]:
# Grab the VGGish model
!sudo curl -O https://storage.googleapis.com/audioset/vggish_model.ckpt
!sudo curl -O https://storage.googleapis.com/audioset/vggish_pca_params.npz

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  277M  100  277M    0     0   139M      0  0:00:01  0:00:01 --:--:--  139M
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 73020  100 73020    0     0   300k      0 --:--:-- --:--:-- --:--:--  300k


In [7]:
# Make sure we got the model data.
!ls

audioset_v1_embeddings
cuda-repo-ubuntu1604_8.0.44-1_amd64.deb
cuda-repo-ubuntu1604_8.0.44-1_amd64.deb.1
cuda-repo-ubuntu1604_8.0.44-1_amd64.deb.2
cuda-repo-ubuntu1604_8.0.44-1_amd64.deb.3
Desktop
features.tar.gz
index.html
model_new
models
notebooks
-p
R
-v
VGGish_Audioset_&_Audio_embedding_Tutorial.ipynb
vggish_model.ckpt
vggish_pca_params.npz
youtube-8m


In [8]:
# Copy the source files to the current directory.
!sudo curl -O http://storage.googleapis.com/us_audioset/youtube_corpus/v1/features/features.tar.gz

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2468M  100 2468M    0     0   169M      0  0:00:14  0:00:14 --:--:--  193M


In [9]:
!sudo tar -xzf features.tar.gz

In [10]:
!git clone https://github.com/tensorflow/models

fatal: destination path 'models' already exists and is not an empty directory.


In [11]:
# Make sure the source files got copied correctly.
!ls

audioset_v1_embeddings
cuda-repo-ubuntu1604_8.0.44-1_amd64.deb
cuda-repo-ubuntu1604_8.0.44-1_amd64.deb.1
cuda-repo-ubuntu1604_8.0.44-1_amd64.deb.2
cuda-repo-ubuntu1604_8.0.44-1_amd64.deb.3
Desktop
features.tar.gz
index.html
model_new
models
notebooks
-p
R
-v
VGGish_Audioset_&_Audio_embedding_Tutorial.ipynb
vggish_model.ckpt
vggish_pca_params.npz
youtube-8m


In [12]:
# Verify the location of the AudioSet source files
%cd models/research/audioset
!ls

/home/yvradsmi/models/research/audioset
audioset_v1_embeddings			 vggish_model.ckpt
cuda-repo-ubuntu1604_8.0.44-1_amd64.deb  vggish_params.py
features.tar.gz				 vggish_pca_params.npz
mel_features.py				 vggish_postprocess.py
model_new				 vggish_slim.py
models					 vggish_smoke_test.py
README.md				 vggish_train_demo.py
vggish_inference_demo.py		 youtube-8m
vggish_input.py


# Enabling GPU Device

In [13]:
#Install Docker
!sudo -S apt-get update

Get:1 file:/var/nccl-repo-2.1.4-ga-cuda9.0  InRelease
Ign:1 file:/var/nccl-repo-2.1.4-ga-cuda9.0  InRelease
Get:2 file:/var/nv-tensorrt-repo-ga-cuda9.0-trt3.0.2-20180108  InRelease
Ign:2 file:/var/nv-tensorrt-repo-ga-cuda9.0-trt3.0.2-20180108  InRelease
Get:3 file:/var/nvidia-diag-driver-local-repo-390.46  InRelease
Ign:3 file:/var/nvidia-diag-driver-local-repo-390.46  InRelease
Get:4 file:/var/nvinfer-runtime-trt-repo-3.0.4-ga-cuda9.0  InRelease
Ign:4 file:/var/nvinfer-runtime-trt-repo-3.0.4-ga-cuda9.0  InRelease
Get:5 file:/var/nccl-repo-2.1.4-ga-cuda9.0  Release [574 B]
Get:6 file:/var/nv-tensorrt-repo-ga-cuda9.0-trt3.0.2-20180108  Release [574 B]
Get:7 file:/var/nvidia-diag-driver-local-repo-390.46  Release [574 B]          
Get:8 file:/var/nvinfer-runtime-trt-repo-3.0.4-ga-cuda9.0  Release [574 B]     
Get:5 file:/var/nccl-repo-2.1.4-ga-cuda9.0  Release [574 B]                    
Hit:9 http://azure.archive.ubuntu.com/ubuntu xenial InRelease                  
Hit:10 http://azure.a

In [14]:
!sudo -S apt-get -y install \
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common

Reading package lists... Done
Building dependency tree       
Reading state information... Done
apt-transport-https is already the newest version (1.2.27).
ca-certificates is already the newest version (20170717~16.04.1).
curl is already the newest version (7.47.0-1ubuntu2.9).
software-properties-common is already the newest version (0.96.20.7).
The following packages were automatically installed and are no longer required:
  bridge-utils ubuntu-fan
Use 'sudo apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 84 not upgraded.


In [15]:
!sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

OK


In [16]:
!sudo add-apt-repository \
   'deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable'

In [17]:
!sudo apt-get update

Get:1 file:/var/nccl-repo-2.1.4-ga-cuda9.0  InRelease
Ign:1 file:/var/nccl-repo-2.1.4-ga-cuda9.0  InRelease
Get:2 file:/var/nv-tensorrt-repo-ga-cuda9.0-trt3.0.2-20180108  InRelease
Ign:2 file:/var/nv-tensorrt-repo-ga-cuda9.0-trt3.0.2-20180108  InRelease
Get:3 file:/var/nvidia-diag-driver-local-repo-390.46  InRelease
Ign:3 file:/var/nvidia-diag-driver-local-repo-390.46  InRelease
Get:4 file:/var/nvinfer-runtime-trt-repo-3.0.4-ga-cuda9.0  InRelease
Ign:4 file:/var/nvinfer-runtime-trt-repo-3.0.4-ga-cuda9.0  InRelease
Get:5 file:/var/nccl-repo-2.1.4-ga-cuda9.0  Release [574 B]
Hit:6 http://azure.archive.ubuntu.com/ubuntu xenial InRelease
Get:5 file:/var/nccl-repo-2.1.4-ga-cuda9.0  Release [574 B]                    
Hit:7 http://azure.archive.ubuntu.com/ubuntu xenial-updates InRelease          
Hit:8 http://azure.archive.ubuntu.com/ubuntu xenial-backports InRelease        
Get:9 file:/var/nv-tensorrt-repo-ga-cuda9.0-trt3.0.2-20180108  Release [574 B] 
Get:9 file:/var/nv-tensorrt-repo-ga-cu

In [18]:
!sudo apt-get install docker 
!sudo apt-get install -y docker.io
!pip install docker

Reading package lists... Done
Building dependency tree       
Reading state information... Done
docker is already the newest version (1.5-1).
The following packages were automatically installed and are no longer required:
  bridge-utils ubuntu-fan
Use 'sudo apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 84 not upgraded.
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  pigz
Use 'sudo apt autoremove' to remove it.
Suggested packages:
  debootstrap docker-doc rinse zfs-fuse | zfsutils
The following packages will be REMOVED:
  docker-ce nvidia-docker
The following NEW packages will be installed:
  docker.io
0 upgraded, 1 newly installed, 2 to remove and 84 not upgraded.
Need to get 0 B/17.1 MB of archives.
After this operation, 122 MB disk space will be freed.
Preconfiguring packages ...
(Reading database ... 448267 files and director

In [19]:
!sudo apt-get -f -y install

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  pigz
Use 'sudo apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 84 not upgraded.


In [20]:
!sudo docker --version

Docker version 17.03.2-ce, build f5ec1e2


In [21]:
!sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

OK


In [22]:
#Install NVIDIA Drivers
!sudo apt-get install -y wget

Reading package lists... Done
Building dependency tree       
Reading state information... Done
wget is already the newest version (1.17.1-1ubuntu1.4).
The following package was automatically installed and is no longer required:
  pigz
Use 'sudo apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 84 not upgraded.


In [23]:
!sudo wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.44-1_amd64.deb

--2018-09-30 23:30:33--  http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.44-1_amd64.deb
Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 192.229.211.70, 2606:2800:21f:3aa:dcf:37b:1ed6:1fb
Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|192.229.211.70|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2690 (2.6K) [application/x-deb]
Saving to: ‘cuda-repo-ubuntu1604_8.0.44-1_amd64.deb.1’


2018-09-30 23:30:33 (524 MB/s) - ‘cuda-repo-ubuntu1604_8.0.44-1_amd64.deb.1’ saved [2690/2690]



In [24]:
!sudo dpkg -i --force-confdef cuda-repo-ubuntu1604_8.0.44-1_amd64.deb

(Reading database ... 448141 files and directories currently installed.)
Preparing to unpack cuda-repo-ubuntu1604_8.0.44-1_amd64.deb ...
Unpacking cuda-repo-ubuntu1604 (8.0.44-1) over (8.0.44-1) ...
Setting up cuda-repo-ubuntu1604 (8.0.44-1) ...
OK


In [25]:
!sudo apt-get update

Get:1 file:/var/nccl-repo-2.1.4-ga-cuda9.0  InRelease
Ign:1 file:/var/nccl-repo-2.1.4-ga-cuda9.0  InRelease
Get:2 file:/var/nv-tensorrt-repo-ga-cuda9.0-trt3.0.2-20180108  InRelease
Ign:2 file:/var/nv-tensorrt-repo-ga-cuda9.0-trt3.0.2-20180108  InRelease
Get:3 file:/var/nvidia-diag-driver-local-repo-390.46  InRelease
Ign:3 file:/var/nvidia-diag-driver-local-repo-390.46  InRelease
Get:4 file:/var/nvinfer-runtime-trt-repo-3.0.4-ga-cuda9.0  InRelease
Ign:4 file:/var/nvinfer-runtime-trt-repo-3.0.4-ga-cuda9.0  InRelease
Get:5 file:/var/nccl-repo-2.1.4-ga-cuda9.0  Release [574 B]
Get:6 file:/var/nv-tensorrt-repo-ga-cuda9.0-trt3.0.2-20180108  Release [574 B]
Get:7 file:/var/nvidia-diag-driver-local-repo-390.46  Release [574 B]          
Get:5 file:/var/nccl-repo-2.1.4-ga-cuda9.0  Release [574 B]                    
Get:8 file:/var/nvinfer-runtime-trt-repo-3.0.4-ga-cuda9.0  Release [574 B]     
Get:6 file:/var/nv-tensorrt-repo-ga-cuda9.0-trt3.0.2-20180108  Release [574 B] 
Get:7 file:/var/nvidi

In [26]:
!sudo apt-get -y install cuda

Reading package lists... Done
Building dependency tree       
Reading state information... Done
cuda is already the newest version (10.0.130-1).
The following package was automatically installed and is no longer required:
  pigz
Use 'sudo apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 84 not upgraded.


In [27]:
!sudo apt-get -f -y install

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  pigz
Use 'sudo apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 84 not upgraded.


In [28]:
!sudo apt-get install -y nvidia-docker

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  bridge-utils ubuntu-fan
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
  docker-ce
The following packages will be REMOVED:
  docker.io
The following NEW packages will be installed:
  docker-ce nvidia-docker
0 upgraded, 2 newly installed, 1 to remove and 84 not upgraded.
Need to get 0 B/42.3 MB of archives.
After this operation, 122 MB of additional disk space will be used.
(Reading database ... 448141 files and directories currently installed.)
Removing docker.io (17.03.2-0ubuntu2~16.04.1) ...
'/usr/share/docker.io/contrib/nuke-graph-directory.sh' -> '/var/lib/docker/nuke-graph-directory.sh'
Processing triggers for man-db (2.7.5-1) ...
Selecting previously unselected package docker-ce.
(Reading database ... 448044 files and directories currently installed.)
Prepar

In [29]:
!sudo apt-get update

Get:1 file:/var/nccl-repo-2.1.4-ga-cuda9.0  InRelease
Ign:1 file:/var/nccl-repo-2.1.4-ga-cuda9.0  InRelease
Get:2 file:/var/nv-tensorrt-repo-ga-cuda9.0-trt3.0.2-20180108  InRelease
Ign:2 file:/var/nv-tensorrt-repo-ga-cuda9.0-trt3.0.2-20180108  InRelease
Get:3 file:/var/nvidia-diag-driver-local-repo-390.46  InRelease
Ign:3 file:/var/nvidia-diag-driver-local-repo-390.46  InRelease
Get:4 file:/var/nvinfer-runtime-trt-repo-3.0.4-ga-cuda9.0  InRelease
Ign:4 file:/var/nvinfer-runtime-trt-repo-3.0.4-ga-cuda9.0  InRelease
Get:5 file:/var/nccl-repo-2.1.4-ga-cuda9.0  Release [574 B]
Hit:6 http://azure.archive.ubuntu.com/ubuntu xenial InRelease
Hit:7 http://azure.archive.ubuntu.com/ubuntu xenial-updates InRelease
Hit:8 http://azure.archive.ubuntu.com/ubuntu xenial-backports InRelease
Get:5 file:/var/nccl-repo-2.1.4-ga-cuda9.0  Release [574 B]                    
Get:9 file:/var/nv-tensorrt-repo-ga-cuda9.0-trt3.0.2-20180108  Release [574 B] 
Get:9 file:/var/nv-tensorrt-repo-ga-cuda9.0-trt3.0.2-201

In [30]:
#!sudo nvidia-docker run --rm nvidia/cuda nvidia-smi
!/usr/bin/nvidia-smi

Sun Sep 30 23:31:49 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla K80           Off  | 00006DE9:00:00.0 Off |                    0 |
| N/A   38C    P8    34W / 149W |     11MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage    

# Audioset Embedding Training

In [31]:
!sudo curl -O https://storage.googleapis.com/us_audioset/youtube_corpus/v1/features/features.tar.gz

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2468M  100 2468M    0     0   133M      0  0:00:18  0:00:18 --:--:--  160M


In [32]:
#Unpack the Audioset Features
!sudo tar -xzf features.tar.gz

In [33]:
cd /home/yvradsmi/notebooks

/home/yvradsmi/notebooks


In [34]:
ls

[0m[34;42maudioset_v1_embeddings[0m/
[34;42mazureml[0m/
[34;42mBatchAI[0m/
[34;42mcaffe2[0m/
[34;42mcatboost[0m/
[34;42mChainer[0m/
[34;42mCNTK[0m/
[01;31mcuda-repo-ubuntu1604_8.0.44-1_amd64.deb[0m
[34;42mdeep_water[0m/
[01;32mDocumentDBSample.ipynb[0m*
[01;32mfeatures.tar.gz[0m*
[34;42mh2o[0m/
[01;32mIDEAR.ipynb[0m*
[01;32mIntroduction to Azure ML R notebooks.ipynb[0m*
[01;32mIntroduction to Microsoft R Operationalization.ipynb[0m*
[01;32mIntroToJupyterPython.ipynb[0m*
[01;32mIntroTutorialinMicrosoftR.ipynb[0m*
[01;32mIntroTutorialinR.ipynb[0m*
[01;32mIrisClassifierPyMLWebService.ipynb[0m*
[34;42mjulia[0m/
[01;32mLoadDataIntoDW.ipynb[0m*
[34;42mMMLSpark[0m/
[01;34mmodel_new[0m/
[34;42mmodels[0m/
[34;42mmxnet[0m/
[01;32mpassword[0m*
[34;42mpytorch[0m/
[01;32mreaders.py[0m*
[34;42mSparkML[0m/
[01;32mSQLDW_Explorations.ipynb[0m*
[34;42mtensorflow[0m/
[01;32mVGGish_Audioset_&_Audio_embedding_Tut

In [35]:
cd youtube-8m

/home/yvradsmi/notebooks/youtube-8m


In [36]:
!sudo chmod -R 777 /home/yvradsmi/notebooks/

In [37]:
!sudo rm -f readers.py

In [38]:
%%writefile readers.py
# Copyright 2016 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS-IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Provides readers configured for different datasets."""

import tensorflow as tf
import utils

from tensorflow import logging
def resize_axis(tensor, axis, new_size, fill_value=0):
  """Truncates or pads a tensor to new_size on on a given axis.
  Truncate or extend tensor such that tensor.shape[axis] == new_size. If the
  size increases, the padding will be performed at the end, using fill_value.
  Args:
    tensor: The tensor to be resized.
    axis: An integer representing the dimension to be sliced.
    new_size: An integer or 0d tensor representing the new value for
      tensor.shape[axis].
    fill_value: Value to use to fill any new entries in the tensor. Will be
      cast to the type of tensor.
  Returns:
    The resized tensor.
  """
  tensor = tf.convert_to_tensor(tensor)
  shape = tf.unstack(tf.shape(tensor))

  pad_shape = shape[:]
  pad_shape[axis] = tf.maximum(0, new_size - shape[axis])

  shape[axis] = tf.minimum(shape[axis], new_size)
  shape = tf.stack(shape)

  resized = tf.concat([
      tf.slice(tensor, tf.zeros_like(shape), shape),
      tf.fill(tf.stack(pad_shape), tf.cast(fill_value, tensor.dtype))
  ], axis)

  # Update shape.
  new_shape = tensor.get_shape().as_list()  # A copy is being made.
  new_shape[axis] = new_size
  resized.set_shape(new_shape)
  return resized

class BaseReader(object):
  """Inherit from this class when implementing new readers."""

  def prepare_reader(self, unused_filename_queue):
    """Create a thread for generating prediction and label tensors."""
    raise NotImplementedError()


class YT8MAggregatedFeatureReader(BaseReader):
  """Reads TFRecords of pre-aggregated Examples.
  The TFRecords must contain Examples with a sparse int64 'labels' feature and
  a fixed length float32 feature, obtained from the features in 'feature_name'.
  The float features are assumed to be an average of dequantized values.
  """

  def __init__(self,
               num_classes=527,
               feature_sizes=[1024, 128],
               feature_names=["mean_rgb", "mean_audio"]):
    """Construct a YT8MAggregatedFeatureReader.
    Args:
      num_classes: a positive integer for the number of classes.
      feature_sizes: positive integer(s) for the feature dimensions as a list.
      feature_names: the feature name(s) in the tensorflow record as a list.
    """

    assert len(feature_names) == len(feature_sizes), \
    "length of feature_names (={}) != length of feature_sizes (={})".format( \
    len(feature_names), len(feature_sizes))

    self.num_classes = num_classes
    self.feature_sizes = feature_sizes
    self.feature_names = feature_names

  def prepare_reader(self, filename_queue, batch_size=1024):
    """Creates a single reader thread for pre-aggregated YouTube 8M Examples.
    Args:
      filename_queue: A tensorflow queue of filename locations.
    Returns:
      A tuple of video indexes, features, labels, and padding data.
    """
    reader = tf.TFRecordReader()
    _, serialized_examples = reader.read_up_to(filename_queue, batch_size)

    tf.add_to_collection("serialized_examples", serialized_examples)
    return self.prepare_serialized_examples(serialized_examples)

  def prepare_serialized_examples(self, serialized_examples):
    # set the mapping from the fields to data types in the proto
    num_features = len(self.feature_names)
    assert num_features > 0, "self.feature_names is empty!"
    assert len(self.feature_names) == len(self.feature_sizes), \
    "length of feature_names (={}) != length of feature_sizes (={})".format( \
    len(self.feature_names), len(self.feature_sizes))

    feature_map = {"video_id": tf.FixedLenFeature([], tf.string),
                   "labels": tf.VarLenFeature(tf.int64)}
    for feature_index in range(num_features):
      feature_map[self.feature_names[feature_index]] = tf.FixedLenFeature(
          [self.feature_sizes[feature_index]], tf.float32)

    features = tf.parse_example(serialized_examples, features=feature_map)
    labels = tf.sparse_to_indicator(features["labels"], self.num_classes)
    labels.set_shape([None, self.num_classes])
    concatenated_features = tf.concat([
        features[feature_name] for feature_name in self.feature_names], 1)

    return features["video_id"], concatenated_features, labels, tf.ones([tf.shape(serialized_examples)[0]])

class YT8MFrameFeatureReader(BaseReader):
  """Reads TFRecords of SequenceExamples.
  The TFRecords must contain SequenceExamples with the sparse in64 'labels'
  context feature and a fixed length byte-quantized feature vector, obtained
  from the features in 'feature_names'. The quantized features will be mapped
  back into a range between min_quantized_value and max_quantized_value.
  """

  def __init__(self,
               num_classes=527,
               feature_sizes=[1024, 128],
               feature_names=["rgb", "audio"],
               max_frames=300):
    """Construct a YT8MFrameFeatureReader.
    Args:
      num_classes: a positive integer for the number of classes.
      feature_sizes: positive integer(s) for the feature dimensions as a list.
      feature_names: the feature name(s) in the tensorflow record as a list.
      max_frames: the maximum number of frames to process.
    """

    assert len(feature_names) == len(feature_sizes), \
    "length of feature_names (={}) != length of feature_sizes (={})".format( \
    len(feature_names), len(feature_sizes))

    self.num_classes = num_classes
    self.feature_sizes = feature_sizes
    self.feature_names = feature_names
    self.max_frames = max_frames

  def get_video_matrix(self,
                       features,
                       feature_size,
                       max_frames,
                       max_quantized_value,
                       min_quantized_value):
    """Decodes features from an input string and quantizes it.
    Args:
      features: raw feature values
      feature_size: length of each frame feature vector
      max_frames: number of frames (rows) in the output feature_matrix
      max_quantized_value: the maximum of the quantized value.
      min_quantized_value: the minimum of the quantized value.
    Returns:
      feature_matrix: matrix of all frame-features
      num_frames: number of frames in the sequence
    """
    decoded_features = tf.reshape(
        tf.cast(tf.decode_raw(features, tf.uint8), tf.float32),
        [-1, feature_size])

    num_frames = tf.minimum(tf.shape(decoded_features)[0], max_frames)
    feature_matrix = utils.Dequantize(decoded_features,
                                      max_quantized_value,
                                      min_quantized_value)
    feature_matrix = resize_axis(feature_matrix, 0, max_frames)
    return feature_matrix, num_frames

  def prepare_reader(self,
                     filename_queue,
                     max_quantized_value=2,
                     min_quantized_value=-2):
    """Creates a single reader thread for YouTube8M SequenceExamples.
    Args:
      filename_queue: A tensorflow queue of filename locations.
      max_quantized_value: the maximum of the quantized value.
      min_quantized_value: the minimum of the quantized value.
    Returns:
      A tuple of video indexes, video features, labels, and padding data.
    """
    reader = tf.TFRecordReader()
    _, serialized_example = reader.read(filename_queue)

    return self.prepare_serialized_examples(serialized_example,
        max_quantized_value, min_quantized_value)

  def prepare_serialized_examples(self, serialized_example,
      max_quantized_value=2, min_quantized_value=-2):

    contexts, features = tf.parse_single_sequence_example(
        serialized_example,
        context_features={"video_id": tf.FixedLenFeature(
            [], tf.string),
                          "labels": tf.VarLenFeature(tf.int64)},
        sequence_features={
            feature_name : tf.FixedLenSequenceFeature([], dtype=tf.string)
            for feature_name in self.feature_names
        })

    # read ground truth labels
    labels = (tf.cast(
        tf.sparse_to_dense(contexts["labels"].values, (self.num_classes,), 1,
            validate_indices=False),
        tf.bool))

    # loads (potentially) different types of features and concatenates them
    num_features = len(self.feature_names)
    assert num_features > 0, "No feature selected: feature_names is empty!"

    assert len(self.feature_names) == len(self.feature_sizes), \
    "length of feature_names (={}) != length of feature_sizes (={})".format( \
    len(self.feature_names), len(self.feature_sizes))

    num_frames = -1  # the number of frames in the video
    feature_matrices = [None] * num_features  # an array of different features
    for feature_index in range(num_features):
      feature_matrix, num_frames_in_this_feature = self.get_video_matrix(
          features[self.feature_names[feature_index]],
          self.feature_sizes[feature_index],
          self.max_frames,
          max_quantized_value,
          min_quantized_value)
      if num_frames == -1:
        num_frames = num_frames_in_this_feature
      else:
        tf.assert_equal(num_frames, num_frames_in_this_feature)

      feature_matrices[feature_index] = feature_matrix

    # cap the number of frames at self.max_frames
    num_frames = tf.minimum(num_frames, self.max_frames)

    # concatenate different features
    video_matrix = tf.concat(feature_matrices, 1)

    # convert to batch format.
    # TODO: Do proper batch reads to remove the IO bottleneck.
    batch_video_ids = tf.expand_dims(contexts["video_id"], 0)
    batch_video_matrix = tf.expand_dims(video_matrix, 0)
    batch_labels = tf.expand_dims(labels, 0)
    batch_frames = tf.expand_dims(num_frames, 0)

    return batch_video_ids, batch_video_matrix, batch_labels, batch_frames

Writing readers.py


In [39]:
cd ..

/home/yvradsmi/notebooks


In [40]:
!git clone https://github.com/tensorflow/models.git

fatal: destination path 'models' already exists and is not an empty directory.


In [41]:
cd models/research/audioset

/home/yvradsmi/notebooks/models/research/audioset


In [42]:
!sudo chmod -R 777 audioset_v1_embeddings/

In [43]:
rm vggish_inference_demo.py

In [44]:
%%writefile vggish_inference_demo.py

# Copyright 2017 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

r"""A simple demonstration of running VGGish in inference mode.

This is intended as a toy example that demonstrates how the various building
blocks (feature extraction, model definition and loading, postprocessing) work
together in an inference context.

A WAV file (assumed to contain signed 16-bit PCM samples) is read in, converted
into log mel spectrogram examples, fed into VGGish, the raw embedding output is
whitened and quantized, and the postprocessed embeddings are optionally written
in a SequenceExample to a TFRecord file (using the same format as the embedding
features released in AudioSet).

Usage:
  # Run a WAV file through the model and print the embeddings. The model
  # checkpoint is loaded from vggish_model.ckpt and the PCA parameters are
  # loaded from vggish_pca_params.npz in the current directory.
  $ python vggish_inference_demo.py --wav_file /path/to/a/wav/file

  # Run a WAV file through the model and also write the embeddings to
  # a TFRecord file. The model checkpoint and PCA parameters are explicitly
  # passed in as well.
  $ python vggish_inference_demo.py --wav_file /path/to/a/wav/file \
                                    --tfrecord_file /path/to/tfrecord/file \
                                    --checkpoint /path/to/model/checkpoint \
                                    --pca_params /path/to/pca/params

  # Run a built-in input (a sine wav) through the model and print the
  # embeddings. Associated model files are read from the current directory.
  $ python vggish_inference_demo.py
"""

from __future__ import print_function

import numpy as np
from scipy.io import wavfile
import six
import tensorflow as tf

import vggish_input
import vggish_params
import vggish_postprocess
import vggish_slim

flags = tf.app.flags

flags.DEFINE_string(
    'wav_file', None,
    'Path to a wav file. Should contain signed 16-bit PCM samples. '
    'If none is provided, a synthetic sound is used.')

flags.DEFINE_string(
    'checkpoint', 'vggish_model.ckpt',
    'Path to the VGGish checkpoint file.')

flags.DEFINE_string(
    'pca_params', 'vggish_pca_params.npz',
    'Path to the VGGish PCA parameters file.')

flags.DEFINE_string(
    'tfrecord_file', None,
    'Path to a TFRecord file where embeddings will be written.')

FLAGS = flags.FLAGS


def main(_):
    # In this simple example, we run the examples from a single audio file through
    # the model. If none is provided, we generate a synthetic input.
    if FLAGS.wav_file:
        wav_file = FLAGS.wav_file
    else:
        # Write a WAV of a sine wav into an in-memory file object.
        num_secs = 5
        freq = 1000
        sr = 44100
        t = np.linspace(0, num_secs, int(num_secs * sr))
        x = np.sin(2 * np.pi * freq * t)
        # Convert to signed 16-bit samples.
        samples = np.clip(x * 32768, -32768, 32767).astype(np.int16)
        wav_file = six.BytesIO()
        wavfile.write(wav_file, sr, samples)
        wav_file.seek(0)
    examples_batch = vggish_input.wavfile_to_examples(wav_file)
    print(examples_batch)

    # Prepare a postprocessor to munge the model embeddings.
    pproc = vggish_postprocess.Postprocessor(FLAGS.pca_params)

    # If needed, prepare a record writer to store the postprocessed embeddings.
    writer = tf.python_io.TFRecordWriter(
        FLAGS.tfrecord_file) if FLAGS.tfrecord_file else None

    with tf.Graph().as_default(), tf.Session() as sess:
        # Define the model in inference mode, load the checkpoint, and
        # locate input and output tensors.
        vggish_slim.define_vggish_slim(training=False)
        vggish_slim.load_vggish_slim_checkpoint(sess, FLAGS.checkpoint)
        features_tensor = sess.graph.get_tensor_by_name(
            vggish_params.INPUT_TENSOR_NAME)
        embedding_tensor = sess.graph.get_tensor_by_name(
            vggish_params.OUTPUT_TENSOR_NAME)

        # Run inference and postprocessing.
        [embedding_batch] = sess.run([embedding_tensor],
                                     feed_dict={features_tensor: examples_batch})
        print(embedding_batch)
        postprocessed_batch = pproc.postprocess(embedding_batch)
        print(postprocessed_batch)

        # Write the postprocessed embeddings as a SequenceExample, in a similar
        # format as the features released in AudioSet. Each row of the batch of
        # embeddings corresponds to roughly a second of audio (96 10ms frames), and
        # the rows are written as a sequence of bytes-valued features, where each
        # feature value contains the 128 bytes of the whitened quantized embedding.
        seq_example = tf.train.SequenceExample(
            context=tf.train.Features(feature={
                'video_id': tf.train.Feature(bytes_list=tf.train.BytesList(value=[wav_file.encode()]))
            }),
            feature_lists=tf.train.FeatureLists(
                feature_list={
                    vggish_params.AUDIO_EMBEDDING_FEATURE_NAME:
                        tf.train.FeatureList(
                            feature=[
                                tf.train.Feature(
                                    bytes_list=tf.train.BytesList(
                                        value=[embedding.tobytes()]))
                                for embedding in postprocessed_batch
                            ]
                        )
                }
            )
        )
        print(seq_example)
        if writer:
            writer.write(seq_example.SerializeToString())

    if writer:
        writer.close()


if __name__ == '__main__':
    tf.app.run()

Writing vggish_inference_demo.py


In [45]:
cd /home/yvradsmi/notebooks

/home/yvradsmi/notebooks


In [46]:
ls

[0m[34;42maudioset_v1_embeddings[0m/
[34;42mazureml[0m/
[34;42mBatchAI[0m/
[34;42mcaffe2[0m/
[34;42mcatboost[0m/
[34;42mChainer[0m/
[34;42mCNTK[0m/
[01;32mcuda-repo-ubuntu1604_8.0.44-1_amd64.deb[0m*
[34;42mdeep_water[0m/
[01;32mDocumentDBSample.ipynb[0m*
[01;32mfeatures.tar.gz[0m*
[34;42mh2o[0m/
[01;32mIDEAR.ipynb[0m*
[01;32mIntroduction to Azure ML R notebooks.ipynb[0m*
[01;32mIntroduction to Microsoft R Operationalization.ipynb[0m*
[01;32mIntroToJupyterPython.ipynb[0m*
[01;32mIntroTutorialinMicrosoftR.ipynb[0m*
[01;32mIntroTutorialinR.ipynb[0m*
[01;32mIrisClassifierPyMLWebService.ipynb[0m*
[34;42mjulia[0m/
[01;32mLoadDataIntoDW.ipynb[0m*
[34;42mMMLSpark[0m/
[34;42mmodel_new[0m/
[34;42mmodels[0m/
[34;42mmxnet[0m/
[01;32mpassword[0m*
[34;42mpytorch[0m/
[01;32mreaders.py[0m*
[34;42mSparkML[0m/
[01;32mSQLDW_Explorations.ipynb[0m*
[34;42mtensorflow[0m/
[01;32mVGGish_Audioset_&_Audio_embedding_Tu

In [47]:
!python youtube-8m/train.py --frame_features --model=LstmModel --feature_names=audio_embedding --feature_sizes=128 --train_data_pattern=audioset_v1_embeddings/bal_train/*.tfrecord --train_dir model_new/dir --start_new_model --base_learning_rate=0.001 --num_epochs=5

  from ._conv import register_converters as _register_converters
INFO:tensorflow:/job:master/task:0: Tensorflow version: 1.11.0.
Instructions for updating:
This class is deprecated, please use tf.nn.rnn_cell.LSTMCell, which supports all the feature this cell currently has. Please replace the existing code with tf.nn.rnn_cell.LSTMCell(name='basic_lstm_cell').
INFO:tensorflow:/job:master/task:0: Removing existing train directory.
INFO:tensorflow:/job:master/task:0: Flag 'start_new_model' is set. Building a new model.
2018-09-30 23:34:32.067971: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-09-30 23:34:32.183779: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 6de9:00:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-09-30 23:34:32.183826: I tensorflow/core/common_runti

INFO:tensorflow:training step 38 | Loss: 13.01 Examples/sec: 1323.14
INFO:tensorflow:training step 39 | Loss: 13.46 Examples/sec: 1338.75
INFO:tensorflow:training step 40 | Loss: 13.51 Examples/sec: 1276.57 | Hit@1: 0.27 PERR: 0.21 GAP: 0.10
INFO:tensorflow:training step 41 | Loss: 13.42 Examples/sec: 1494.18
INFO:tensorflow:training step 42 | Loss: 12.82 Examples/sec: 1337.42
INFO:tensorflow:training step 43 | Loss: 12.81 Examples/sec: 1333.97
INFO:tensorflow:training step 44 | Loss: 13.38 Examples/sec: 1336.68
INFO:tensorflow:training step 45 | Loss: 13.25 Examples/sec: 1357.42
INFO:tensorflow:training step 46 | Loss: 13.13 Examples/sec: 1338.30
INFO:tensorflow:training step 47 | Loss: 13.38 Examples/sec: 1336.42
INFO:tensorflow:training step 48 | Loss: 13.35 Examples/sec: 1336.97
INFO:tensorflow:training step 49 | Loss: 13.55 Examples/sec: 1318.87
INFO:tensorflow:training step 50 | Loss: 13.15 Examples/sec: 1338.41 | Hit@1: 0.29 PERR: 0.22 GAP: 0.11
INFO:tensorflow:training step 51 

In [49]:
!ls /home/yvradsmi/notebooks/audioset_v1_embeddings/

bal_train  eval  unbal_train


In [50]:
!python youtube-8m/eval.py --eval_data_pattern=audioset_v1_embeddings/eval/*.tfrecord --train_dir model_new/dir --run_once

  from ._conv import register_converters as _register_converters
tensorflow version: 1.11.0
INFO:tensorflow:Using batch size of 1024 for evaluation.
INFO:tensorflow:number of evaluation files: 4062
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
Instructions for updating:
This class is deprecated, please use tf.nn.rnn_cell.LSTMCell, which supports all the feature this cell currently has. Please replace the existing code with tf.nn.rnn_cell.LSTMCell(name='basic_lstm_cell').
INFO:tensorflow:built evaluation graph
2018-09-30 23:44:08.314307: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-09-30 23:44:08.434972: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 

In [51]:
!python youtube-8m/inference.py --output_file Bal_SamplePredictions.csv --input_data_pattern=audioset_v1_embeddings/bal_train/a*.tfrecord --train_dir model_new/dir --top_k=3

  from ._conv import register_converters as _register_converters
2018-09-30 23:48:16.204317: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-09-30 23:48:16.316354: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 6de9:00:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-09-30 23:48:16.316399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0
2018-09-30 23:48:16.596182: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-30 23:48:16.596244: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977]      0 
2018-09-30 23:48:16.596264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0:   N 
2018-09-30 23:48:16.596525: I tensorflow/core/common_runtime/gpu/gp

In [53]:
!cat notebooks/Bal_SamplePredictions.csv

VideoId,LabelConfidencePairs
aL2jpZfUGF0,137 0.574492 0 0.483146 300 0.218803
aLnpHdhIqWE,137 0.573679 0 0.470482 300 0.207713
aL6wtF-CqmA,137 0.574365 0 0.483363 300 0.218834
aLiocIeE_A8,137 0.574256 0 0.483674 300 0.219061
aLShWsDr7oQ,137 0.574368 0 0.483129 300 0.218811
aLnBwjLUZao,137 0.574418 0 0.48309 300 0.218628
aL6ij87TUA8,137 0.574551 0 0.482986 300 0.218652
aLHxMaT3uYg,137 0.574506 0 0.483235 300 0.218777
aA0bk6Pnh7A,137 0.574131 0 0.484086 300 0.219288
aAU9NKbaGy4,137 0.57447 0 0.482616 300 0.218318
aA1Z3eeFYm0,137 0.575466 0 0.478522 300 0.215218
aADExWV1bsM,137 0.574436 0 0.483521 300 0.218996
aAJuPyUvHn8,137 0.574185 0 0.48379 300 0.219207
aAtMoktAtVs,137 0.574251 0 0.483607 300 0.218909
amvaj68CwfM,137 0.5745 0 0.482738 300 0.218326
ammKLbpCiwU,137 0.574252 0 0.483638 300 0.219043
au3EcuW7nHw,137 0.57476 0 0.481196 300 0.217086
a4D3uaGlkwg,137 0.574334 0 0.483547 300 0.219042
a4SXfQjkkbk,137 0.574207 0 0.483801 300 0.21909
a4-FJctXu38,137 0.574079 0 