##### Copyright 2020 The TensorFlow Authors.


In [12]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Simple audio recognition: Recognizing keywords


<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://www.tensorflow.org/tutorials/audio/simple_audio">
    <img src="https://www.tensorflow.org/images/tf_logo_32px.png" />
    View on TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/audio/simple_audio.ipynb">
    <img src="https://www.tensorflow.org/images/colab_logo_32px.png" />
    Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/docs/blob/master/site/en/tutorials/audio/simple_audio.ipynb">
    <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />
    View source on GitHub</a>
  </td>
  <td>
    <a href="https://storage.googleapis.com/tensorflow_docs/docs/site/en/tutorials/audio/simple_audio.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
</table>


This tutorial demonstrates how to preprocess audio files in the WAV format and build and train a basic [automatic speech recognition](https://en.wikipedia.org/wiki/Speech_recognition) (ASR) model for recognizing ten different words. You will use a portion of the [Speech Commands dataset](https://www.tensorflow.org/datasets/catalog/speech_commands) ([Warden, 2018](https://arxiv.org/abs/1804.03209)), which contains short (one-second or less) audio clips of commands, such as "down", "go", "left", "no", "right", "stop", "up" and "yes".

Real-world speech and audio recognition [systems](https://ai.googleblog.com/search/label/Speech%20Recognition) are complex. But, like [image classification with the MNIST dataset](../quickstart/beginner.ipynb), this tutorial should give you a basic understanding of the techniques involved.


## Setup

Import necessary modules and dependencies. You'll be using `tf.keras.utils.audio_dataset_from_directory` (introduced in TensorFlow 2.10), which helps generate audio classification datasets from directories of `.wav` files. You'll also need [seaborn](https://seaborn.pydata.org) for visualization in this tutorial.


In [13]:
# Step 1: Install TensorFlow and Datasets
%pip install -U -q tensorflow tensorflow_datasets

# Step 2: Install Wrapt
%pip install wrapt==1.14.1

# Step 3: Install Visualization Libraries
%pip install matplotlib seaborn

# Step 4: Install PySoundFile
%pip install pysoundfile

# Step 5: Reinstall TensorFlow I/O
# !pip uninstall -y tensorflow-io 
%pip install tensorflow-io
# %pip install --upgrade tensorflow

%pip install nbformat

# Step 6: Install IPykernel
%pip install ipykernel

%pip install ipynb

%pip install pickleshare

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.



In [14]:
import os
import pathlib
import glob
import matplotlib.pyplot as plt 
import numpy as np
import seaborn as sns
import soundfile as sf
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras import models
from IPython import display
import platform
# import tensorflow_io as tfio
import zipfile
import pathlib
import shutil
import sys
import subprocess

import nbformat 
from IPython import get_ipython
import pickle

print(sys.executable)
print(sys.path)

# from shared_vars import load_vars, save_all_vars, label_names

# Set directory paths
# directory_path_database = "../Tutorial/data/train_files"
# directory_path_testfiles = "../Tutorial/data/test_files/"
# DATASET_PATH = 'data/train_files'
TRAIN_DIR = pathlib.Path('data/train_files')
TEST_DIR = pathlib.Path('data/test_files')
DATA_DIR = pathlib.Path('data')

c:\Users\prett\AppData\Local\Programs\Python\Python311\python.exe
['c:\\Users\\prett\\AppData\\Local\\Programs\\Python\\Python311\\python311.zip', 'c:\\Users\\prett\\AppData\\Local\\Programs\\Python\\Python311\\Lib', 'c:\\Users\\prett\\AppData\\Local\\Programs\\Python\\Python311\\DLLs', '', 'C:\\Users\\prett\\AppData\\Roaming\\Python\\Python311\\site-packages', 'C:\\Users\\prett\\AppData\\Roaming\\Python\\Python311\\site-packages\\win32', 'C:\\Users\\prett\\AppData\\Roaming\\Python\\Python311\\site-packages\\win32\\lib', 'C:\\Users\\prett\\AppData\\Roaming\\Python\\Python311\\site-packages\\Pythonwin', 'c:\\Users\\prett\\AppData\\Local\\Programs\\Python\\Python311', 'c:\\Users\\prett\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages']


In [15]:
def notebook_extract():
    import ipynb.fs.defs.audio_extraction as audio_extraction
    audio_extraction.extract_zip(TRAIN_DIR, DATA_DIR)
    print('-' * 50)  
    audio_extraction.extract_zip(TEST_DIR, DATA_DIR)
    print('-' * 50)
    print('-' * 50)  
    audio_extraction.rename_audio_files(DATA_DIR)
    print('-' * 50)
    print('-' * 50)  
    audio_extraction.process_directory(TRAIN_DIR) 

notebook_extract()

Das Verzeichnis data\train_files existiert bereits. Ãœberspringe das Extrahieren.
--------------------------------------------------
Das Verzeichnis data\test_files existiert bereits. Ãœberspringe das Extrahieren.
--------------------------------------------------
--------------------------------------------------
renaming of data/data complete
renaming of data/test_files complete
renaming of data/train_files complete
renaming of data/mp3-128kbit complete
renaming of data/orig-wave complete
renaming of data/upscale-16-48 complete
--------------------------------------------------
--------------------------------------------------
tensorflow_io ist nicht importiert
Processing file: data\train_files\orig-wave\orig-wave_1-4 Inch Plug,In,Elec Guitar,Fiddles.wav
File 'data\train_files\orig-wave\orig-wave_1-4 Inch Plug,In,Elec Guitar,Fiddles.wav' successfully decoded!
Audio shape: (16000, 1), Sample rate: 48000
Processing file: data\train_files\orig-wave\orig-wave_1-4 Inch Plug,In,Guitar Amp

Divided into directories this way, you can easily load the data using `keras.utils.audio_dataset_from_directory`.

The audio clips are 1 second or less at 16kHz. The `output_sequence_length=16000` pads the short ones to exactly 1 second (and would trim longer ones) so that they can be easily batched.


In [16]:
# file_list = tf.data.Dataset.list_files(str(TRAIN_DIR / '**/*.wav'), shuffle=False)

# seconds=20
# train_ds, val_ds = tf.keras.utils.audio_dataset_from_directory(
#     directory=TRAIN_DIR,
#     batch_size=64,
#     validation_split=0.2,
#     seed=0,
#     output_sequence_length=16000*seconds,
#     subset='both'
#     )    
# label_names = np.array(train_ds.class_names)
# # set_label_names(np.array(train_ds.class_names))
# print()
# print("label names:", label_names)


The dataset now contains batches of audio clips and integer labels. The audio clips have a shape of `(batch, samples, channels)`.


In [17]:
# train_ds.element_spec

This dataset only contains single channel audio, so use the `tf.squeeze` function to drop the extra axis:


In [18]:
# def squeeze(audio, labels):
#   audio = tf.squeeze(audio, axis=-1)
#   return audio, labels

# train_ds = train_ds.map(squeeze, tf.data.AUTOTUNE)
# val_ds = val_ds.map(squeeze, tf.data.AUTOTUNE)

The `utils.audio_dataset_from_directory` function only returns up to two splits. It's a good idea to keep a test set separate from your validation set.
Ideally you'd keep it in a separate directory, but in this case you can use `Dataset.shard` to split the validation set into two halves. Note that iterating over **any** shard will load **all** the data, and only keep its fraction.


In [19]:
# test_ds = val_ds.shard(num_shards=2, index=0)
# val_ds = val_ds.shard(num_shards=2, index=1)

In [20]:
# for example_audio, example_labels in train_ds.take(1):  
#   print(example_audio.shape)
#   print(example_labels.shape)
  
#   # Extrahiere Dateinamen aus dem Dataset
# example_filenames = []
# for filepath in file_list.take(len(example_audio)):
#     example_filenames.append(pathlib.Path(filepath.numpy().decode('utf-8')).name)

In [21]:
# # Zähler für die tatsächlichen Labels
# actual_labels_count = 0

# # Konsolenausgabe zur Überprüfung der Zuordnung
# for i in range(len(example_filenames)):
#     # Hole das Label aus example_labels
#     label_index = example_labels[i].numpy()  
#     expected_label = label_names[label_index]

#     # Extrahiere das Label aus dem Dateinamen
#     label_in_filename = example_filenames[i].split('_')[0]  # Hier anpassen, falls das Trennzeichen anders ist
    
#     # Zähle die tatsächlichen Labels
#     if label_in_filename == expected_label:
#         actual_labels_count += 1
#         if(label_in_filename =="upscale-16-48"):
#           print('-' * 50)
#           print(f"Index: {i}")
#           print(f"Dateiname: {example_filenames[i]}")
#           print(f"Label im Dateinamen: {label_in_filename}, Erwartetes Label: {expected_label}")
#     else:
#       print('!' * 50)
#       print(f"Index: {i}")
#       print(f"Dateiname: {example_filenames[i]}")
#       print(f"Label im Dateinamen: {label_in_filename}, Erwartetes Label: {expected_label}")

# # Am Ende die Gesamtanzahl der Labels ausgeben
# print('-' * 50)  
# print('-' * 50)
# print('-' * 50)
# print(f"Anzahl der tatsächlichen Labels: {actual_labels_count}")
# print(f"Anzahl der Labels im Dataset: {len(example_labels)}")

In [23]:
import ipynb.fs.defs.build_database as build_database

result = build_database.run()

file_list, example_labels, example_audio, train_ds, val_ds, test_ds, label_names, example_audio, example_labels, example_filenames = result

NameError: name 'label_names' is not defined

Let's plot a few audio waveforms:


In [None]:
plt.figure(figsize=(16, 10))
rows = 4
cols = 2
n = rows * cols
for i in range(n):
  print("i ",i)
  # print(label_names)
  # print(example_labels[1])
  # print(example_labels)
  plt.subplot(rows, cols, i+1)
  audio_signal = example_audio[i]
  plt.plot(audio_signal)
  # plt.title(label_names[example_labels[i]])
  label = label_names[example_labels[i].numpy()] 
  plt.title(f"{i}_Label: {label} __ {example_filenames[i]}")
  plt.yticks(np.arange(-1.2, 1.2, 0.2))
  plt.ylim([-1.1, 1.1])
  
plt.tight_layout() 
plt.show()

## Convert waveforms to spectrograms

The waveforms in the dataset are represented in the time domain. Next, you'll transform the waveforms from the time-domain signals into the time-frequency-domain signals by computing the [short-time Fourier transform (STFT)](https://en.wikipedia.org/wiki/Short-time_Fourier_transform) to convert the waveforms to as [spectrograms](https://en.wikipedia.org/wiki/Spectrogram), which show frequency changes over time and can be represented as 2D images. You will feed the spectrogram images into your neural network to train the model.

A Fourier transform (`tf.signal.fft`) converts a signal to its component frequencies, but loses all time information. In comparison, STFT (`tf.signal.stft`) splits the signal into windows of time and runs a Fourier transform on each window, preserving some time information, and returning a 2D tensor that you can run standard convolutions on.

Create a utility function for converting waveforms to spectrograms:

- The waveforms need to be of the same length, so that when you convert them to spectrograms, the results have similar dimensions. This can be done by simply zero-padding the audio clips that are shorter than one second (using `tf.zeros`).
- When calling `tf.signal.stft`, choose the `frame_length` and `frame_step` parameters such that the generated spectrogram "image" is almost square. For more information on the STFT parameters choice, refer to [this Coursera video](https://www.coursera.org/lecture/audio-signal-processing/stft-2-tjEQe) on audio signal processing and STFT.
- The STFT produces an array of complex numbers representing magnitude and phase. However, in this tutorial you'll only use the magnitude, which you can derive by applying `tf.abs` on the output of `tf.signal.stft`.


In [None]:
import ipynb.fs.defs.waveforms_to_spectrograms as wave_to_spec
print(f"label_names vor Notebook 2 {label_names}")

result = wave_to_spec.run(label_names, example_labels, example_audio, train_ds, val_ds, test_ds, example_filenames, file_list)

label_names, example_labels, example_audio, train_ds, val_ds, test_ds, example_filenames, waveform, file_list, train_spectrogram_ds,val_spectrogram_ds,test_spectrogram_ds, example_spectrograms,example_spect_labels = result

## Build and train the model


Add `Dataset.cache` and `Dataset.prefetch` operations to reduce read latency while training the model:


In [23]:
train_spectrogram_ds = train_spectrogram_ds.cache().shuffle(10000).prefetch(tf.data.AUTOTUNE)
val_spectrogram_ds = val_spectrogram_ds.cache().prefetch(tf.data.AUTOTUNE)
test_spectrogram_ds = test_spectrogram_ds.cache().prefetch(tf.data.AUTOTUNE)

For the model, you'll use a simple convolutional neural network (CNN), since you have transformed the audio files into spectrogram images.

Your `tf.keras.Sequential` model will use the following Keras preprocessing layers:

- `tf.keras.layers.Resizing`: to downsample the input to enable the model to train faster.
- `tf.keras.layers.Normalization`: to normalize each pixel in the image based on its mean and standard deviation.

For the `Normalization` layer, its `adapt` method would first need to be called on the training data in order to compute aggregate statistics (that is, the mean and the standard deviation).


In [None]:
input_shape = example_spectrograms.shape[1:]
print('Input shape:', input_shape)
num_labels = len(label_names)

# Instantiate the `tf.keras.layers.Normalization` layer.
norm_layer = layers.Normalization()
# Fit the state of the layer to the spectrograms
# with `Normalization.adapt`.
norm_layer.adapt(data=train_spectrogram_ds.map(map_func=lambda spec, label: spec))

model = models.Sequential([
    layers.Input(shape=input_shape),
    # Downsample the input.
    layers.Resizing(32, 32),
    # Normalize.
    norm_layer,
    layers.Conv2D(32, 3, activation='relu'),
    layers.Conv2D(64, 3, activation='relu'),
    layers.MaxPooling2D(),
    layers.Dropout(0.25),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(num_labels),
])

model.summary()

Configure the Keras model with the Adam optimizer and the cross-entropy loss:


In [25]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=['accuracy'],
)

Train the model over 10 epochs for demonstration purposes:


In [None]:
EPOCHS = 200
history = model.fit(
    train_spectrogram_ds,
    validation_data=val_spectrogram_ds,
    epochs=EPOCHS,
    callbacks=tf.keras.callbacks.EarlyStopping(verbose=1, patience=2),
)

Let's plot the training and validation loss curves to check how your model has improved during training:


In [None]:
metrics = history.history
plt.figure(figsize=(16,6))
plt.subplot(1,2,1)
print(metrics)
# plt.plot(history.epoch, metrics['loss'], metrics['val_loss'])
plt.plot(history.epoch, metrics['loss'])
plt.legend(['loss', 'val_loss'])
plt.ylim([0, max(plt.ylim())])
plt.xlabel('Epoch')
plt.ylabel('Loss [CrossEntropy]')

plt.subplot(1,2,2)
# plt.plot(history.epoch, 100*np.array(metrics['accuracy']), 100*np.array(metrics['val_accuracy']))
plt.plot(history.epoch, 100*np.array(metrics['accuracy']))
plt.legend(['accuracy', 'val_accuracy'])
plt.ylim([0, 100])
plt.xlabel('Epoch')
plt.ylabel('Accuracy [%]')

## Evaluate the model performance

Run the model on the test set and check the model's performance:


In [None]:
model.evaluate(test_spectrogram_ds, return_dict=True)

### Display a confusion matrix

Use a [confusion matrix](https://developers.google.com/machine-learning/glossary#confusion-matrix) to check how well the model did classifying each of the commands in the test set:


In [None]:
y_pred = model.predict(test_spectrogram_ds)

In [30]:
y_pred = tf.argmax(y_pred, axis=1)

In [31]:
y_true = tf.concat(list(test_spectrogram_ds.map(lambda s,lab: lab)), axis=0)

In [None]:
confusion_mtx = tf.math.confusion_matrix(y_true, y_pred)
plt.figure(figsize=(10, 8))
sns.heatmap(confusion_mtx,
            xticklabels=label_names,
            yticklabels=label_names,
            annot=True, fmt='g')
plt.xlabel('Prediction')
plt.ylabel('Label')
plt.show()

## Run inference on an audio file

Finally, verify the model's prediction output using an input audio file of someone saying "no". How well does your model perform?


In [33]:
# # x = data_dir/'no/01bb6a2a_nohash_0.wav'
# # x= "../Tutorial/data/test_files/AMBIENCE_JAPAN_THUNDER_HEAVY_RAIN_SLIGHT_WIND_NOISE_STEREO.wav"
# x="../Tutorial/data/test_files/1-900_Hot_date.wav"
# with open(x, "rb") as f:
#     x = f.read()

#     print(len(x))
# # x = tf.io.read_file(str(x))
# x, sample_rate = tf.audio.decode_wav(x, desired_channels=1, desired_samples=16000,)
# x = tf.squeeze(x, axis=-1)
# waveform = x
# x = get_spectrogram(x)
# x = x[tf.newaxis,...]

# prediction = model(x)
# # x_labels = ['no', 'yes', 'down', 'go', 'left', 'up', 'right', 'stop']
# x_labels=label_names
# plt.bar(x_labels, tf.nn.softmax(prediction[0]))
# plt.title('orig')
# plt.show()

# display.display(display.Audio(waveform, rate=16000))

### Read Test File


In [34]:
# # Pfad zur WAV-Datei
# file_path = "../Tutorial/data/test_files/1-900_Hot_date.wav"

# # WAV-Datei im binären Modus lesen
# with open(file_path, "rb") as f:
#     # wav_data = f.read()
#     wav_data = tf.io.read_file(file_path)

# # Prüfen der Länge der gelesenen Daten
# # print(f"Länge der WAV-Daten: {len(wav_data)}")



### Convert to Standard Audio Format if WAVE File Does Not Meet Expectations


In [35]:
# def convert_wav(input_wav, output_wav, sample_rate=16000, channels=1):
#     """
#     Convert WAV file to a standardized format (Mono, 16kHz)
#     """
#     command = [
#         'ffmpeg', '-i', input_wav, '-ar', str(sample_rate), '-ac', str(channels), output_wav
#     ]
#     subprocess.run(command, check=True)

# def load_wav_file(file_path):
#     try:
#         # Try to decode the WAV file
#         with open(file_path, "rb") as f:
#             wav_data = f.read()
        
#         audio, sample_rate = tf.audio.decode_wav(wav_data, desired_channels=1, desired_samples=16000)
#         print(f"File '{file_path}' successfully decoded!")
#         return audio, sample_rate

#     except Exception as e:
#         print(f"Error decoding WAV file '{file_path}': {e}")
#         print("Attempting to convert the file...")

#         # Path for the converted file
#         converted_file = "converted_temp.wav"
        
#         # Convert the WAV file
#         convert_wav(file_path, converted_file)
        
#         # Load the converted file and try decoding again
#         with open(converted_file, "rb") as f:
#             wav_data = f.read()
        
#         audio, sample_rate = tf.audio.decode_wav(wav_data, desired_channels=1, desired_samples=16000)
#         print(f"Converted file '{file_path}' successfully decoded!")
        
#         # Remove temporary converted file
#         os.remove(converted_file)
        
#         return audio, sample_rate

# def process_directory(directory_path):
#     # Find all WAV files in the directory
#     wav_files = glob.glob(os.path.join(directory_path, "*.wav"))

#     if not wav_files:
#         print("No WAV files found in the directory.")
#         return

#     for file_path in wav_files:
#         print(f"Processing file: {file_path}")
#         audio, sample_rate = load_wav_file(file_path)
#         # Example: Output the shape and sample rate of the audio
#         print(f"Audio shape: {audio.shape}, Sample rate: {sample_rate}")

# # Example: Specify the directory with WAV files
# #directory_path = "../Tutorial/data/test_files"
# #process_directory(directory_path)

In [None]:
def visualize_audio(file_path):
    try:
       audio, sample_rate = load_wav_file_basic(file_path)
    except Exception as e:
        print(f"Could not process file '{file_path}': {e}")
        return
    # Entfernen der letzten Achse, falls nur ein Kanal vorliegt
    waveform = tf.squeeze(audio, axis=-1)

    print(f"Form des Audiosignals: {waveform.shape}")
    print(f"Sample Rate: {sample_rate}")

    spectrogram = get_spectrogram(waveform)

    # Dimension anpassen für das Modell
    input_tensor = spectrogram[tf.newaxis, ...]

    # Vorhersage des Modells
    prediction = model(input_tensor)

    # Labels für die Vorhersage
    x_labels = label_names  # Annahme: 'label_names' ist definiert

    # Balkendiagramm der Vorhersagen anzeigen
    plt.bar(x_labels, tf.nn.softmax(prediction[0]))
    plt.title(f'Vorhersage für {os.path.basename(file_path)}')
    plt.show()

    # Audio im Notebook abspielen
    display.display(display.Audio(waveform, rate=16000))

def process_directory_for_visualization(directory_path):
    wav_files = glob.glob(os.path.join(directory_path, "*.wav"))

    if not wav_files:
        print("Keine WAV-Dateien im Verzeichnis gefunden.")
        return

    for file_path in wav_files:
        print(f"Verarbeite Datei: {file_path}")
        visualize_audio(file_path)

# extract_zip(TEST_DIR, DATA_DIR)
# rename_audio_files(DATA_DIR)

process_directory_for_visualization(TEST_DIR)

## Export the model with preprocessing


The model's not very easy to use if you have to apply those preprocessing steps before passing data to the model for inference. So build an end-to-end version:


In [37]:
class ExportModel(tf.Module):
  def __init__(self, model):
    self.model = model

    # Accept either a string-filename or a batch of waveforms.
    # YOu could add additional signatures for a single wave, or a ragged-batch. 
    self.__call__.get_concrete_function(
        x=tf.TensorSpec(shape=(), dtype=tf.string))
    self.__call__.get_concrete_function(
       x=tf.TensorSpec(shape=[None, 16000], dtype=tf.float32))


  @tf.function
  def __call__(self, x):
    # If they pass a string, load the file and decode it. 
    if x.dtype == tf.string:
      x = tf.io.read_file(x)
      x, _ = tf.audio.decode_wav(x, desired_channels=1, desired_samples=16000,)
      x = tf.squeeze(x, axis=-1)
      x = x[tf.newaxis, :]
    
    x = get_spectrogram(x)  
    result = self.model(x, training=False)
    
    class_ids = tf.argmax(result, axis=-1)
    class_names = tf.gather(label_names, class_ids)
    return {'predictions':result,
            'class_ids': class_ids,
            'class_names': class_names}

Test run the "export" model:


In [None]:
export = ExportModel(model)
# export(tf.constant(str(data_dir/'no/01bb6a2a_nohash_0.wav')))
export(tf.constant(str("../Tutorial/data/test_files/AMBIENCE_JAPAN_THUNDER_HEAVY_RAIN_SLIGHT_WIND_NOISE_STEREO.wav")))

Save and reload the model, the reloaded model gives identical output:


In [None]:
tf.saved_model.save(export, "saved")
imported = tf.saved_model.load("saved")
imported(waveform[tf.newaxis, :])

## Next steps

This tutorial demonstrated how to carry out simple audio classification/automatic speech recognition using a convolutional neural network with TensorFlow and Python. To learn more, consider the following resources:

- The [Sound classification with YAMNet](https://www.tensorflow.org/hub/tutorials/yamnet) tutorial shows how to use transfer learning for audio classification.
- The notebooks from [Kaggle's TensorFlow speech recognition challenge](https://www.kaggle.com/c/tensorflow-speech-recognition-challenge/overview).
- The
  [TensorFlow.js - Audio recognition using transfer learning codelab](https://codelabs.developers.google.com/codelabs/tensorflowjs-audio-codelab/index.html#0) teaches how to build your own interactive web app for audio classification.
- [A tutorial on deep learning for music information retrieval](https://arxiv.org/abs/1709.04396) (Choi et al., 2017) on arXiv.
- TensorFlow also has additional support for [audio data preparation and augmentation](https://www.tensorflow.org/io/tutorials/audio) to help with your own audio-based projects.
- Consider using the [librosa](https://librosa.org/) library for music and audio analysis.
