Library for sound processing with the Wav2Vec2 models: https://github.com/jonatasgrosman/huggingsound <br>
API of the Russian National Corpus: https://github.com/kunansy/RNC <br>
Not fine-tuned large Wav2Vec2 model pretrained on common_voice dataset for 53 languages: "facebook/wav2vec2-large-xlsr-53"<br>
Fine-tuned Wav2Vec2 model which is most probably useless for us because it uses a token set containing characters of the cyrillic alphabet and we want to also use tokens which mark the lexical stress: "jonatasgrosman/wav2vec2-large-xlsr-53-russian"<br>
Training arguments (might be useful for performing ablation studies): https://huggingface.co/transformers/v4.4.2/_modules/transformers/training_args.html

# Mount Google drive
Mounting google drive is necessary for working with files saved there. I shared the folder with RNC data with you. To work with the data folder, go to the "Shared with me" folder on your google drive, right-click on the RussianNationalCorpus and press "Add shortcut to Drive".

In [None]:
from google.colab import drive
drive.mount("/content/drive")
DATA_PATH = "/content/drive/MyDrive/RussianNationalCorpus"

Mounted at /content/drive


In [None]:
! git clone https://github.com/siasio/StressDetection.git

Cloning into 'StressDetection'...
remote: Enumerating objects: 177, done.[K
remote: Counting objects: 100% (12/12), done.[K
remote: Compressing objects: 100% (11/11), done.[K
remote: Total 177 (delta 2), reused 6 (delta 1), pack-reused 165[K
Receiving objects: 100% (177/177), 177.07 KiB | 1.86 MiB/s, done.
Resolving deltas: 100% (102/102), done.


In [None]:
DOWNLOADING_PATH = '/content/StressDetection/download_examples.py'
CLEANING_PATH = '/content/StressDetection/clean_csv.py'
TRAINING_PATH = '/content/StressDetection/run_training.py'
EVALUATION_PATH = '/content/StressDetection/evaluate_model.py'

# pip install libraries rnc and hugging sound
Attention: after installing the hugginsound library, you will most probably need to change a library file called trainer.py. In colab you will find it located in usr/local/lib/python3.7/dist-packages/hugginsound/trainer.py (to find the usr folder, open the file browser in the left panel and press two dots above the sample_data folder to reach the root directory). The change you need to make, is replacing self.use_amp with self.use_cuda_amp in lines 434 and 451.

After making the changes, you need to reinstall the library. In colab, you need to restart runtime first and then run the pip install command again.

Making the changes is not required if you only want to run the evaluation.

In [None]:
!pip install rnc
!pip install huggingsound

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting rnc
  Downloading rnc-0.10.0-py3-none-any.whl (37 kB)
Collecting types-aiofiles<0.9.0,>=0.8.4
  Downloading types_aiofiles-0.8.11-py3-none-any.whl (7.8 kB)
Collecting beautifulsoup4<4.12.0,>=4.11.1
  Downloading beautifulsoup4-4.11.2-py3-none-any.whl (129 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m129.4/129.4 KB[0m [31m15.3 MB/s[0m eta [36m0:00:00[0m
Collecting aiofiles<0.9.0,>=0.8.0
  Downloading aiofiles-0.8.0-py3-none-any.whl (13 kB)
Collecting ujson<5.5.0,>=5.4.0
  Downloading ujson-5.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (45 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.4/45.4 KB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting types-ujson<5.5.0,>=5.4.0
  Downloading types_ujson-5.4.0-py3-none-any.whl (2.2 kB)
Collecting soupsieve>1.2
  Downloading soupsieve-2.4-py3-none-any.whl (37 kB)


# Download samples if they are not downloaded yet
Perhaps, we need to import the nest_asyncio library. Without it, colab has problems with managing nested asynchronous processes.

In [None]:
!pip install nest_asyncio
import nest_asyncio
nest_asyncio.apply()

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting nest_asyncio
  Downloading nest_asyncio-1.5.6-py3-none-any.whl (5.2 kB)
Installing collected packages: nest_asyncio
Successfully installed nest_asyncio-1.5.6


In [None]:
%cd StressDetection

/content/StressDetection


In [None]:
!python $DOWNLOADING_PATH --help

usage: download_examples.py
       [-h]
       [--config CONFIG]
       [--eval]
       words
       [words ...]

positional arguments:
  words
    List of the
    words to
    query the
    Russian
    National
    Corpus for.

optional arguments:
  -h, --help
    show this
    help
    message and
    exit
  --config CONFIG
    Path to the
    config file
    (should be
    located in
    the config
    folder).
  --eval
    Whether to
    save the
    downloaded
    data in an
    eval csv


In [None]:
!python $CLEANING_PATH --help

usage: clean_csv.py
       [-h]
       [--config CONFIG]

optional arguments:
  -h, --help
    show this
    help
    message and
    exit
  --config CONFIG
    Path to the
    config file
    (should be
    located in
    the config
    folder).


# Run training
Run first the --help command to learn which arguments you should provide

In [None]:
# !pip3 install torch==1.9.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html

In [None]:
from google.colab import files

In [None]:
!python $TRAINING_PATH --config colab.yaml --num_train_epochs 6


2023-02-24 11:52:46.184490: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-24 11:52:47.305053: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-24 11:52:47.305167: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
INFO:huggingsound.speech_recognition.model:Loading model...
INFO:rnc:Requested to 'htt

In [None]:
import os
os.path.exists('/content/drive/MyDrive/RussianNationalCorpus/big_data/media/karyagina_empatiya_031.mp4')

True

In [None]:
!python $EVALUATION_PATH --config colab.yaml

2023-02-22 21:30:41.348960: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-22 21:30:42.238394: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-22 21:30:42.238508: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
INFO:huggingsound.speech_recognition.model:Loading model...
Traceback (most recent call last):

In [None]:
files.download('/content/model-cyrillic-stress/pytorch_model.bin') 
files.download('/content/model-cyrillic-stress/config.json') 
files.download('/content/model-cyrillic-stress/checkpoint-2500/pytorch_model.bin') 
files.download('/content/model-cyrillic-stress/checkpoint-2500/config.json') 

NameError: ignored

# Evaluate the model
Run first the --help command to learn which arguments you should provide

In [None]:
!python $EVALUATION_PATH --help

usage: python3 [option] ... [-c cmd | -m mod | file | -] [arg] ...
Options and arguments (and corresponding environment variables):
         and comparing bytes/bytearray with str. (-bb: issue errors)
-B     : don't write .pyc files on import; also PYTHONDONTWRITEBYTECODE=x
-c cmd : program passed in as string (terminates option list)
-d     : debug output from parser; also PYTHONDEBUG=x
-E     : ignore PYTHON* environment variables (such as PYTHONPATH)
-h     : print this help message and exit (also --help)
-i     : inspect interactively after running script; forces a prompt even
         if stdin does not appear to be a terminal; also PYTHONINSPECT=x
-I     : isolate Python from the user's environment (implies -E and -s)
-m mod : run library module as a script (terminates option list)
-O     : remove assert and __debug__-dependent statements; add .opt-1 before
         .pyc extension; also PYTHONOPTIMIZE=x
-OO    : do -O changes and also discard docstrings; add .opt-2 before
        

In [None]:
!python $EVALUATION_PATH --config colab.yaml

2023-02-24 05:30:26.291206: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-24 05:30:27.146357: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-24 05:30:27.146467: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
INFO:huggingsound.speech_recognition.model:Loading model...
Is preds a file? False
Tra