## Setup

First, we need to import required libraries and functions.

In [2]:
#line to render the plots under the code cell that created it
%matplotlib inline
import json  # for working with json files
import sys  # Python system library needed to load custom functions
import numpy as np  # for performing calculations on numerical arrays
import pandas as pd  # home of the DataFrame construct, _the_ most important object for Data Science
import torch  # library to work with PyTorch tensors and to figure out if we have a GPU available
import os     # for changing the directory

from datasets import load_dataset, Audio  # required tools to create, load and process our audio dataset
from transformers import ASTFeatureExtractor, ASTForAudioClassification, TrainingArguments, Trainer  # required classes to perform the model training

sys.path.append('../..')  # add the source directory to the PYTHONPATH. This allows to import local functions and modules.
from gdsc_utils import download_directory, PROJECT_DIR # function to download the needed files from the official GDSC s3 bucket and our root directory
from config import DEFAULT_BUCKET  # S3 bucket with the GDSC data
from preprocessing import calculate_stats, preprocess_audio_arrays  # functions to calculate dataset statistics and preprocess the dataset with ASTFeatureExtractor
from gdsc_eval import make_predictions, compute_metrics  # functions to create predictions and evaluate them
os.chdir(PROJECT_DIR) # changing our directory to root

In [30]:
import torch.nn as nn

In [3]:
os.chdir('..')

In [4]:
os.getcwd()

'/root/data'

After having imported the required libraries it's about time to create a 🤗 dataset object that will allow us to handle our audio files during preprocessing and training. This will be also the first "proof" for ease of use of the 🤗 library.

The 🤗 datasets module has a neat way to load the audio data type with which we are working. The only thing we need is the paths to the folders with audio and metadata files.

In [5]:
# paths for the train and validation datasets
train_path = 'data/train'
val_path = 'data/val'

In [6]:
print(os.getcwd())

/root/data


Let's see what is the structure of the metadata files stored in those paths.

In [7]:
train_meta_df = pd.read_csv(f"{train_path}/metadata.csv")
val_meta_df = pd.read_csv(f"{val_path}/metadata.csv")

In [8]:
train_meta_df.head()

Unnamed: 0,file_name,label
0,Roeselianaroeselii_XC751814-dat028-019_edit1.wav,56
1,Roeselianaroeselii_XC752367-dat006-010.wav,56
2,Yoyettacelis_GBIF2465208563_IN36000894_50988.wav,64
3,Gomphocerippusrufus_XC752285-dat001-045.wav,26
4,Phaneropteranana_XC755717-221013-Phaneroptera-...,41


In [9]:
val_meta_df.head()

Unnamed: 0,file_name,label
0,Atrapsaltacorticina_GBIF2901504947_IN62966536_...,3
1,Chorthippusbrunneus_XC751398-dat022-008_edit5.wav,10
2,Psaltodaplaga_GBIF3031797565_IN68469430_159997...,53
3,Omocestusviridulus_XC752267-dat013-003_edit2.wav,39
4,Omocestusviridulus_XC752263-dat012-007_edit1.wav,39


In [10]:
# our first interaction with Hugging Face datasets!
train_dataset = load_dataset("audiofolder", data_dir=train_path).get('train').shuffle(seed = 42)  # load the dataset and shuffle the examples
val_dataset = load_dataset("audiofolder", data_dir=val_path).get('train')                         # load the validation dataset. But why do we have "get('train')" at the end of the line? :)

Resolving data files:   0%|          | 0/1753 [00:00<?, ?it/s]

Downloading and preparing dataset audiofolder/default to /root/.cache/huggingface/datasets/audiofolder/default-9540d76c2719c5c2/0.0.0/6cbdd16f8688354c63b4e2a36e1585d05de285023ee6443ffd71c4182055c0fc...


Downloading data files:   0%|          | 0/1753 [00:00<?, ?it/s]

Downloading data files: 0it [00:00, ?it/s]

Extracting data files: 0it [00:00, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset audiofolder downloaded and prepared to /root/.cache/huggingface/datasets/audiofolder/default-9540d76c2719c5c2/0.0.0/6cbdd16f8688354c63b4e2a36e1585d05de285023ee6443ffd71c4182055c0fc. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/580 [00:00<?, ?it/s]

Downloading and preparing dataset audiofolder/default to /root/.cache/huggingface/datasets/audiofolder/default-a29ad9e5c4708aae/0.0.0/6cbdd16f8688354c63b4e2a36e1585d05de285023ee6443ffd71c4182055c0fc...


Downloading data files:   0%|          | 0/580 [00:00<?, ?it/s]

Downloading data files: 0it [00:00, ?it/s]

Extracting data files: 0it [00:00, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset audiofolder downloaded and prepared to /root/.cache/huggingface/datasets/audiofolder/default-a29ad9e5c4708aae/0.0.0/6cbdd16f8688354c63b4e2a36e1585d05de285023ee6443ffd71c4182055c0fc. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

Seems that the dataset was loaded. Let's inspect the train_dataset and val_dataset variables.

In [11]:
train_dataset, val_dataset

(Dataset({
     features: ['audio', 'label'],
     num_rows: 1752
 }),
 Dataset({
     features: ['audio', 'label'],
     num_rows: 579
 }))

So clearly we've created some kind of dataset object. We can see that it has two features: 'audio' and 'label'. Let's see if we can unpack a bit more this vague looking object and see what exactly the data looks like.

In [12]:
train_dataset[0], val_dataset[0]

({'audio': {'path': '/root/data/data/train/Galangalabeculata_GBIF1978446031_IN19058490_28357.wav',
   'array': array([ 0.        ,  0.        ,  0.        , ..., -0.03515625,
          -0.02563477, -0.02597046]),
   'sampling_rate': 44100},
  'label': 24},
 {'audio': {'path': '/root/data/data/val/Achetadomesticus_XC751734-dat001-055_edit1.wav',
   'array': array([0.00054932, 0.00112915, 0.00067139, ..., 0.00094604, 0.00201416,
          0.00128174]),
   'sampling_rate': 44100},
  'label': 0})

In [13]:
MODEL_SAMPLING_RATE = 22050
train_dataset = train_dataset.cast_column("audio", Audio(sampling_rate=MODEL_SAMPLING_RATE))
val_dataset = val_dataset.cast_column("audio", Audio(sampling_rate=MODEL_SAMPLING_RATE))

In [14]:
train_dataset.info.features, val_dataset.info.features

({'audio': Audio(sampling_rate=22050, mono=True, decode=True, id=None),
  'label': Value(dtype='int64', id=None)},
 {'audio': Audio(sampling_rate=22050, mono=True, decode=True, id=None),
  'label': Value(dtype='int64', id=None)})

In [15]:
feature_extractor_stats = ASTFeatureExtractor.from_pretrained("MIT/ast-finetuned-audioset-10-10-0.4593", do_normalize=False)

Downloading (…)rocessor_config.json:   0%|          | 0.00/297 [00:00<?, ?B/s]

In [16]:
#train_dataset = train_dataset.map(lambda x: calculate_stats(x, audio_field='audio', array_field='array', feature_extractor=feature_extractor_stats), batched=True, batch_size=32)

In [18]:
dataset_mean = -8.141991150530815
dataset_std = 4.095692486358449

In [19]:
feature_extractor = ASTFeatureExtractor.from_pretrained("MIT/ast-finetuned-audioset-10-10-0.4593", mean=dataset_mean, std=dataset_std)

In [20]:
train_dataset_encoded = train_dataset.map(lambda x: preprocess_audio_arrays(x, audio_field='audio', 
                                                                            array_field='array', 
                                                                            feature_extractor=feature_extractor), remove_columns="audio", batched=True, batch_size=2)
val_dataset_encoded = val_dataset.map(lambda x: preprocess_audio_arrays(x, audio_field='audio', 
                                                                        array_field='array', 
                                                                        feature_extractor=feature_extractor), remove_columns="audio", batched=True, batch_size=2)

Map:   0%|          | 0/1752 [00:00<?, ? examples/s]

Map:   0%|          | 0/579 [00:00<?, ? examples/s]

In [21]:
train_dataset_encoded, val_dataset_encoded

(Dataset({
     features: ['label', 'input_values'],
     num_rows: 1752
 }),
 Dataset({
     features: ['label', 'input_values'],
     num_rows: 579
 }))

# Fine-tuning the AST model

In [22]:
with open('data/labels.json', 'r') as f:
    labels = json.load(f)

In [23]:
labels

{'Achetadomesticus': 0,
 'Aleetacurvicosta': 1,
 'Atrapsaltacollina': 2,
 'Atrapsaltacorticina': 3,
 'Atrapsaltaencaustica': 4,
 'Barbitistesyersini': 5,
 'Bicoloranabicolor': 6,
 'Chorthippusalbomarginatus': 7,
 'Chorthippusapricarius': 8,
 'Chorthippusbiguttulus': 9,
 'Chorthippusbrunneus': 10,
 'Chorthippusmollis': 11,
 'Chorthippusvagans': 12,
 'Chrysochraondispar': 13,
 'Cicadaorni': 14,
 'Clinopsaltaautumna': 15,
 'Conocephalusdorsalis': 16,
 'Conocephalusfuscus': 17,
 'Cyclochilaaustralasiae': 18,
 'Decticusverrucivorus': 19,
 'Diceroproctaeugraphica': 20,
 'Ephippigerdiurnus': 21,
 'Eumodicogryllusbordigalensis': 22,
 'Eupholidopteraschmidti': 23,
 'Galangalabeculata': 24,
 'Gampsocleisglabra': 25,
 'Gomphocerippusrufus': 26,
 'Gomphocerussibiricus': 27,
 'Gryllusbimaculatus': 28,
 'Grylluscampestris': 29,
 'Leptophyespunctatissima': 30,
 'Melanogryllusdesertus': 31,
 'Metriopterabrachyptera': 32,
 'Myrmeleotettixmaculatus': 33,
 'Nemobiussylvestris': 34,
 'Neotibicenpruinosus'

In [24]:
label2id, id2label = dict(), dict()
for k, v in labels.items():
    label2id[k] = str(v)
    id2label[str(v)] = k

In [25]:
label2id

{'Achetadomesticus': '0',
 'Aleetacurvicosta': '1',
 'Atrapsaltacollina': '2',
 'Atrapsaltacorticina': '3',
 'Atrapsaltaencaustica': '4',
 'Barbitistesyersini': '5',
 'Bicoloranabicolor': '6',
 'Chorthippusalbomarginatus': '7',
 'Chorthippusapricarius': '8',
 'Chorthippusbiguttulus': '9',
 'Chorthippusbrunneus': '10',
 'Chorthippusmollis': '11',
 'Chorthippusvagans': '12',
 'Chrysochraondispar': '13',
 'Cicadaorni': '14',
 'Clinopsaltaautumna': '15',
 'Conocephalusdorsalis': '16',
 'Conocephalusfuscus': '17',
 'Cyclochilaaustralasiae': '18',
 'Decticusverrucivorus': '19',
 'Diceroproctaeugraphica': '20',
 'Ephippigerdiurnus': '21',
 'Eumodicogryllusbordigalensis': '22',
 'Eupholidopteraschmidti': '23',
 'Galangalabeculata': '24',
 'Gampsocleisglabra': '25',
 'Gomphocerippusrufus': '26',
 'Gomphocerussibiricus': '27',
 'Gryllusbimaculatus': '28',
 'Grylluscampestris': '29',
 'Leptophyespunctatissima': '30',
 'Melanogryllusdesertus': '31',
 'Metriopterabrachyptera': '32',
 'Myrmeleotetti

In [26]:
id2label

{'0': 'Achetadomesticus',
 '1': 'Aleetacurvicosta',
 '2': 'Atrapsaltacollina',
 '3': 'Atrapsaltacorticina',
 '4': 'Atrapsaltaencaustica',
 '5': 'Barbitistesyersini',
 '6': 'Bicoloranabicolor',
 '7': 'Chorthippusalbomarginatus',
 '8': 'Chorthippusapricarius',
 '9': 'Chorthippusbiguttulus',
 '10': 'Chorthippusbrunneus',
 '11': 'Chorthippusmollis',
 '12': 'Chorthippusvagans',
 '13': 'Chrysochraondispar',
 '14': 'Cicadaorni',
 '15': 'Clinopsaltaautumna',
 '16': 'Conocephalusdorsalis',
 '17': 'Conocephalusfuscus',
 '18': 'Cyclochilaaustralasiae',
 '19': 'Decticusverrucivorus',
 '20': 'Diceroproctaeugraphica',
 '21': 'Ephippigerdiurnus',
 '22': 'Eumodicogryllusbordigalensis',
 '23': 'Eupholidopteraschmidti',
 '24': 'Galangalabeculata',
 '25': 'Gampsocleisglabra',
 '26': 'Gomphocerippusrufus',
 '27': 'Gomphocerussibiricus',
 '28': 'Gryllusbimaculatus',
 '29': 'Grylluscampestris',
 '30': 'Leptophyespunctatissima',
 '31': 'Melanogryllusdesertus',
 '32': 'Metriopterabrachyptera',
 '33': 'Myrmele

In [27]:
num_labels = len(label2id)
num_labels

66

In [28]:
model = ASTForAudioClassification.from_pretrained("MIT/ast-finetuned-audioset-10-10-0.4593", 
                                                  num_labels=num_labels, 
                                                  label2id=label2id, 
                                                  id2label=id2label,
                                                  ignore_mismatched_sizes=True
                                                 )

Downloading (…)lve/main/config.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/346M [00:00<?, ?B/s]

Some weights of ASTForAudioClassification were not initialized from the model checkpoint at MIT/ast-finetuned-audioset-10-10-0.4593 and are newly initialized because the shapes did not match:
- classifier.dense.weight: found shape torch.Size([527, 768]) in the checkpoint and torch.Size([66, 768]) in the model instantiated
- classifier.dense.bias: found shape torch.Size([527]) in the checkpoint and torch.Size([66]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [31]:
for i in range(12):
    model.audio_spectrogram_transformer.encoder.layer[i].output.dropout = nn.Dropout(p=0.2, inplace=False)
    model.audio_spectrogram_transformer.encoder.layer[i].attention.output.dropout = nn.Dropout(p=0.2, inplace=False)

In [33]:
NUM_TRAIN_EPOCHS = 8                        # variable defining number of training epochs

training_args = TrainingArguments(
    output_dir='experiments/models/dropout',                # directory for saving model checkpoints and logs
    num_train_epochs=NUM_TRAIN_EPOCHS,      #number of epochs
    per_device_train_batch_size=4,          # number of examples in batch for training
    gradient_accumulation_steps=8,
    per_device_eval_batch_size=2,           # number of examples in batch for evaluation
    evaluation_strategy="epoch",            # makes evaluation at the end of each epoch
    learning_rate=float(2e-5),              # learning rate
    optim="adamw_torch",                    # optimizer
    logging_steps=1,                        # number of steps for logging the training process - one step is one batch
    load_best_model_at_end=True,            # whether to load or not the best model at the end of the training
    metric_for_best_model="eval_loss",      # claiming that the best model is the one with the lowest loss on the validation set
    save_strategy='epoch'                   # saving is done at the end of each epoch
)

In [34]:
# create Trainer instance
trainer = Trainer(
    model=model,                          # passing our model
    args=training_args,                   # passing the above created arguments
    compute_metrics=compute_metrics,      # passing the compute_metrics function that we imported from gdsc_eval module
    train_dataset=train_dataset_encoded,  # passing the encoded train set
    eval_dataset=val_dataset_encoded,     # passing the encoded validation set
    tokenizer=feature_extractor           # passing the feature extractor
)

Amazing! Now we did everything that was required to fine-tune the model. We can finally run the cell which will give us our "version" of the AST classifier, which is capable to distinguish different species from audio recordings. Let's do it!

In [None]:
# train model
model_history = trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
0,2.7044,3.070569,0.265976,0.12946,0.149944,0.178152
1,1.5127,2.136324,0.512953,0.337185,0.365311,0.390501
2,1.0496,1.690186,0.583765,0.453687,0.521895,0.490047
4,0.7324,1.407212,0.670121,0.57436,0.665089,0.582178
4,0.727,1.368826,0.668394,0.570206,0.66196,0.5897
5,0.6082,1.26748,0.70639,0.635224,0.716336,0.640188
6,0.3784,1.195891,0.728843,0.671081,0.7413,0.67863


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Is it possible? We are performing better than the Random Forest model with only a fraction of data! Well, yes, that's possible, but remember that the validation set we are using here contains only 66 samples, so way less than the original set. If you want to really compare the model with the Random Forest we need to perform inference on the test set and send a submission. 

In the next section we will show you how to load the model from checkpoint and perform inference on the test set data.

**Key insights:**
* The 🤗 models hub offers you a variety of models, BUT you should always remember to adjust them to your task - create appropriate mapping of labels to integers and specify the number of classes that you are working with
* There is a number of parameters that define a training job - be mindful about how you are setting them and iterate over different values - this is called hyperparameter tuning
* Fine-tuning such a big model on such a small sample is almost always a bad idea - big models require big data!

# Loading the model and doing inference on the test set

If you look back at the *TrainingArguments* class you will see that we passed an *output_dir* argument that tells 🤗 where to put the checkpoint with training metadata and model. We set it to *models/AST*, so let's use this directory to load the feature extractor and the model from the best checkpoint (note that this is not necessary, as we put in our *TrainingArguments* object an argument called *load_best_model_at_end* and we set it to *True*. This ensures that the variable *model* contains already the best one based on the metric of choice. We just wanted to show you how to load the model from other checkpoints in case you'd like to experiment). With 🤗 library loading the checkpoint it's just a matter of two lines.

In [32]:
feature_extractor = ASTFeatureExtractor.from_pretrained("models/AST/checkpoint-352")
model = ASTForAudioClassification.from_pretrained("models/AST/checkpoint-352")

Cool! Now let's get the test set data. We need to preprocess them in the same way as we did for the training. Let's start with simply loading the dataset and resample the audio arrays. 

In [33]:
test_path = 'data/test'
test_dataset = load_dataset("audiofolder", data_dir=test_path).get('train')
test_dataset = test_dataset.cast_column("audio", Audio(sampling_rate=MODEL_SAMPLING_RATE))

Resolving data files:   0%|          | 0/557 [00:00<?, ?it/s]

Downloading and preparing dataset audiofolder/default to /root/.cache/huggingface/datasets/audiofolder/default-7efab534d5cbb83c/0.0.0/6cbdd16f8688354c63b4e2a36e1585d05de285023ee6443ffd71c4182055c0fc...


Downloading data files:   0%|          | 0/557 [00:00<?, ?it/s]

Downloading data files: 0it [00:00, ?it/s]

Extracting data files: 0it [00:00, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset audiofolder downloaded and prepared to /root/.cache/huggingface/datasets/audiofolder/default-7efab534d5cbb83c/0.0.0/6cbdd16f8688354c63b4e2a36e1585d05de285023ee6443ffd71c4182055c0fc. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

In [34]:
test_dataset

Dataset({
    features: ['audio'],
    num_rows: 556
})

In [35]:
test_dataset[0]

{'audio': {'path': '/root/data/data/test/0.wav',
  'array': array([ 4.08071404e-14, -3.05009002e-13,  1.55307760e-13, ...,
         -2.78414413e-03,  6.18211143e-02,  0.00000000e+00]),
  'sampling_rate': 16000}}

As we need the predictions file to have two columns - file_name and predicted_class_id, let's take care of extracting the paths for each data point and make it a feature called "file_name". 

For this purpose we'll use the metadata information from the dataset object that we just created.

So let's get the paths of the audio files.

In [36]:
test_paths = list(test_dataset.info.download_checksums.keys())

Let's inspect the variable.

In [37]:
test_paths[:3]

['/root/data/data/test/0.wav',
 '/root/data/data/test/1.wav',
 '/root/data/data/test/10.wav']

Great! We obtained the paths. One thing to note is that the test_paths variable contains also the metadata.csv file with file_names and labels (check it on your own!). We don't need it, so we will use a one-liner lambda function to extract only the items related to the audio files.

Furthermore, we don't need the whole path - just the file names, so we will define another one-liner that gets the string after the last "/" character, which is exactly the file name.

We will use the built-in filter and map methods that allow for applying a function on an Python iterable. With its help we will run the below defined lambda function.

In [38]:
remove_metadata = lambda x: x.endswith(".wav")
extract_file_name = lambda x: x.split('/')[-1]

test_paths = list(filter(remove_metadata, test_paths))
test_paths = list(map(extract_file_name, test_paths))

Let's see if the test_paths variable contains the file names.

In [39]:
test_paths[:3]

['0.wav', '1.wav', '10.wav']

Yes, we indeed have just the file names. Let's create a new column with the file names.

In [40]:
test_dataset = test_dataset.add_column("file_name", test_paths)

Let's inspect the newly created "file_name" feature.

In [41]:
test_dataset

Dataset({
    features: ['audio', 'file_name'],
    num_rows: 556
})

In [42]:
test_dataset[0]

{'audio': {'path': '/root/data/data/test/0.wav',
  'array': array([ 4.08071404e-14, -3.05009002e-13,  1.55307760e-13, ...,
         -2.78414413e-03,  6.18211143e-02,  0.00000000e+00]),
  'sampling_rate': 16000},
 'file_name': '0.wav'}

Amazing! We almost finished preprocessing the data. The last step is to pass the audio arrays through our feature extractor and set fromat of the "input_values" columns from numpy to torch, so that we can safely pass the spectrogram arrays through the model.

In [43]:
test_dataset_encoded = test_dataset.map(lambda x: preprocess_audio_arrays(x, 'audio', 'array', feature_extractor), remove_columns="audio", batched=True, batch_size = 2)
test_dataset_encoded.set_format(type='torch', columns=['input_values'])

Map:   0%|          | 0/556 [00:00<?, ? examples/s]

Now let's inform the 🤗 that we want to run the predicions on our GPU. To do this we need to define the *device* variable with help of the *PyTorch* library.

In [44]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

Good, we are set up to perform the inference on the test set. Let's use the *make_predictions* function from our *gdsc_eval* modeule located in *src* directory. This time we will set the *batch_size* argument to 8, to avoid any out-of-memory issues. We are also dropping the "input_values" column, as we won't need it anymore.

In [45]:
test_dataset_encoded = test_dataset_encoded.map(lambda x: make_predictions(x['input_values'], model, device), batched=True, batch_size=8, remove_columns="input_values")

Map:   0%|          | 0/556 [00:00<?, ? examples/s]

Let's now create a pandas dataframe from our 🤗 dataset. We should see the columns file_name and predicted_class_id

In [46]:
test_dataset_encoded_df = test_dataset_encoded.to_pandas()
test_dataset_encoded_df.head()

Unnamed: 0,file_name,predicted_class_id
0,0.wav,14
1,1.wav,60
2,10.wav,9
3,100.wav,17
4,101.wav,56


Great! Now we need to save the dataframe in a csv file and we are ready to send the predictions. We will save it in the directory of our model, to have everything in one place.

In [47]:
test_dataset_encoded_df.to_csv("models/AST/predictions.csv", index=False)

And done! We have our CSV file with the predictions ready. Let's upload it via the challenge website and see our results!

The score is way better than the one from Random Forest. Remember that in this tutorial we are using a much more powerful model, that was designed to work with audio data. But taking into account that the F1 metric ranges from 0 to 1, there is still some room for improvement. In the next tutorial, we will see how the model performs on the whole dataset. Then you will see what the model is really capable of! In the mean time, you can try to complete the exercises while making a coffee before the final tutorial.

***
**It is important that you name the columns exactly: **file_name** and **predicted_class_id**, otherwise your score won't appear on the leaderboard!**
***

**Exercise time:**

The last exercise in this notebook is to 
* try to think how we could improve the model further apart from running it on the whole sample. What does your Data Science intuition tell you? Post your thoughts in the Team's channel and gain some recognition for your team! 😃
* try also to use another model from the 🤗 model hub. You will need to import other classes instead of ASTFeatureExtractor and ASTForAudioClassification. You will also need to change the string in the *from_pretrained* method and adjust the preprocessing. Sounds like a lot? Well, this is how we do Data Science! 😃

REMINDER: After finishing your work remember to shut down the instance.