<a href="https://colab.research.google.com/github/hoa92ng/Homework/blob/main/Making_the_Most_of_your_Colab_Subscription.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Making the Most of your Colab Subscription



## Faster GPUs

Users who have purchased one of Colab's paid plans have access to faster GPUs and more memory. You can upgrade your notebook's GPU settings in `Runtime > Change runtime type` in the menu to select from several accelerator options, subject to availability.

The free of charge version of Colab grants access to Nvidia's T4 GPUs subject to quota restrictions and availability.

You can see what GPU you've been assigned at any time by executing the following cell. If the execution result of running the code cell below is "Not connected to a GPU", you can change the runtime by going to `Runtime > Change runtime type` in the menu to enable a GPU accelerator, and then re-execute the code cell.


In [None]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

In order to use a GPU with your notebook, select the `Runtime > Change runtime type` menu, and then set the hardware accelerator to the desired option.

## More memory

Users who have purchased one of Colab's paid plans have access to high-memory VMs when they are available. More powerful GPUs are always offered with high-memory VMs.



You can see how much memory you have available at any time by running the following code cell. If the execution result of running the code cell below is "Not using a high-RAM runtime", then you can enable a high-RAM runtime via `Runtime > Change runtime type` in the menu. Then select High-RAM in the Runtime shape toggle button. After, re-execute the code cell.


In [None]:
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

if ram_gb < 20:
  print('Not using a high-RAM runtime')
else:
  print('You are using a high-RAM runtime!')

## Longer runtimes

All Colab runtimes are reset after some period of time (which is faster if the runtime isn't executing code). Colab Pro and Pro+ users have access to longer runtimes than those who use Colab free of charge.

## Background execution

Colab Pro+ users have access to background execution, where notebooks will continue executing even after you've closed a browser tab. This is always enabled in Pro+ runtimes as long as you have compute units available.



## Relaxing resource limits in Colab Pro

Your resources are not unlimited in Colab. To make the most of Colab, avoid using resources when you don't need them. For example, only use a GPU when required and close Colab tabs when finished.



If you encounter limitations, you can relax those limitations by purchasing more compute units via Pay As You Go. Anyone can purchase compute units via [Pay As You Go](https://colab.research.google.com/signup); no subscription is required.

## Send us feedback!

If you have any feedback for us, please let us know. The best way to send feedback is by using the Help > 'Send feedback...' menu. If you encounter usage limits in Colab Pro consider subscribing to Pro+.

If you encounter errors or other issues with billing (payments) for Colab Pro, Pro+, or Pay As You Go, please email [colab-billing@google.com](mailto:colab-billing@google.com).

## More Resources

### Working with Notebooks in Colab
- [Overview of Colab](/notebooks/basic_features_overview.ipynb)
- [Guide to Markdown](/notebooks/markdown_guide.ipynb)
- [Importing libraries and installing dependencies](/notebooks/snippets/importing_libraries.ipynb)
- [Saving and loading notebooks in GitHub](https://colab.research.google.com/github/googlecolab/colabtools/blob/main/notebooks/colab-github-demo.ipynb)
- [Interactive forms](/notebooks/forms.ipynb)
- [Interactive widgets](/notebooks/widgets.ipynb)

<a name="working-with-data"></a>
### Working with Data
- [Loading data: Drive, Sheets, and Google Cloud Storage](/notebooks/io.ipynb)
- [Charts: visualizing data](/notebooks/charts.ipynb)
- [Getting started with BigQuery](/notebooks/bigquery.ipynb)

### Machine Learning Crash Course
These are a few of the notebooks from Google's online Machine Learning course. See the [full course website](https://developers.google.com/machine-learning/crash-course/) for more.
- [Intro to Pandas DataFrame](https://colab.research.google.com/github/google/eng-edu/blob/main/ml/cc/exercises/pandas_dataframe_ultraquick_tutorial.ipynb)
- [Linear regression with tf.keras using synthetic data](https://colab.research.google.com/github/google/eng-edu/blob/main/ml/cc/exercises/linear_regression_with_synthetic_data.ipynb)


<a name="using-accelerated-hardware"></a>
### Using Accelerated Hardware
- [TensorFlow with GPUs](/notebooks/gpu.ipynb)
- [TensorFlow with TPUs](/notebooks/tpu.ipynb)

<a name="machine-learning-examples"></a>

## Machine Learning Examples

To see end-to-end examples of the interactive machine learning analyses that Colab makes possible, check out these tutorials using models from [TensorFlow Hub](https://tfhub.dev).

A few featured examples:

- [Retraining an Image Classifier](https://tensorflow.org/hub/tutorials/tf2_image_retraining): Build a Keras model on top of a pre-trained image classifier to distinguish flowers.
- [Text Classification](https://tensorflow.org/hub/tutorials/tf2_text_classification): Classify IMDB movie reviews as either *positive* or *negative*.
- [Style Transfer](https://tensorflow.org/hub/tutorials/tf2_arbitrary_image_stylization): Use deep learning to transfer style between images.
- [Multilingual Universal Sentence Encoder Q&A](https://tensorflow.org/hub/tutorials/retrieval_with_tf_hub_universal_encoder_qa): Use a machine learning model to answer questions from the SQuAD dataset.
- [Video Interpolation](https://tensorflow.org/hub/tutorials/tweening_conv3d): Predict what happened in a video between the first and the last frame.


In [None]:
from torch import nn
from transformers import Wav2Vec2Model, AutoModelForAudioClassification
from transformers import AutoFeatureExtractor
from datasets import Dataset, DatasetDict
import torch
import torch.nn.functional as F


def init_weight(m):
    if isinstance(m, torch.nn.Linear):
        torch.nn.init.xavier_normal_(m.weight)
    if isinstance(m, torch.nn.BatchNorm2d):
        m.weight.data.normal_(1.0, 0.02)
        m.bias.data.fill_(0)
    elif isinstance(m, torch.nn.Conv2d):
        m.weight.data.normal_(0.0, 0.02)


class Projection(torch.nn.Module):

    def __init__(self, in_planes, out_planes=None, n_layers=1, layer_type=0):
        super(Projection, self).__init__()
        self.out_planes = out_planes
        if out_planes is None:
            out_planes = in_planes
        self.layers = torch.nn.Sequential()
        _in = None
        _out = None
        for i in range(n_layers):
            # self.layers.add_module(f"{i}fc",
            #                        torch.nn.Linear(_in, _out))
            self.layers.add_module(f"{i}cv1d",
                                   torch.nn.Conv1d(in_planes, in_planes, 3, padding='same'))
            self.layers.add_module(f"{i}bn",
                                    torch.nn.BatchNorm1d(in_planes))

            if layer_type == n_layers - 1:
                self.layers.add_module(f"{i}relu",
                                            torch.nn.LeakyReLU(.2))
            # if i < n_layers - 1:
            #     # if layer_type > 0:
            #     #     self.layers.add_module(f"{i}bn",
            #     #                            torch.nn.BatchNorm1d(_out))
            #     if layer_type > 1:
            #         self.layers.add_module(f"{i}relu",
            #                                torch.nn.LeakyReLU(.2))
        self.apply(init_weight)

    def forward(self, x):
        # x = .1 * self.layers(x) + x
        x = self.layers(x)
        # x = x.reshape(x.shape[0],-1)
        # x = F.adaptive_avg_pool1d(x, self.out_planes)
        x = x.mean(dim=1)
        return x

class Discriminator(torch.nn.Module):
    def __init__(self, in_planes, n_layers=3, hidden=None):
        super(Discriminator, self).__init__()

        _hidden = in_planes if hidden is None else hidden
        self.body = torch.nn.Sequential()
        for i in range(n_layers-1):
            _in = in_planes if i == 0 else _hidden
            _hidden = int(_hidden // 1.5) if hidden is None else hidden
            self.body.add_module('block%d'%(i+1),
                                 torch.nn.Sequential(
                                    #  nn.Dropout(0.2),
                                     torch.nn.Linear(_in, _hidden),
                                     torch.nn.BatchNorm1d(_hidden),
                                     torch.nn.LeakyReLU(0.2)
                                 ))
        self.tail = torch.nn.Linear(_hidden, 1)
        self.apply(init_weight)

    def forward(self,x):
        x = self.body(x)
        x = self.tail(x)
        return x

class Discriminator_Conv(torch.nn.Module):
    def __init__(self, in_planes, n_layers=3, hidden=None):
        super(Discriminator, self).__init__()

        _hidden = in_planes if hidden is None else hidden
        self.body = torch.nn.Sequential()
        for i in range(n_layers-1):
            _in = in_planes if i == 0 else _hidden
            _hidden = int(_hidden // 1.5) if hidden is None else hidden
            self.body.add_module('block%d'%(i+1),
                                 torch.nn.Sequential(
                                    #  nn.Dropout(0.2),
                                     torch.nn.Linear(_in, _hidden),
                                     torch.nn.BatchNorm1d(_hidden),
                                     torch.nn.LeakyReLU(0.2)
                                 ))
        self.tail = torch.nn.Linear(_hidden, 1, bias=False)
        self.apply(init_weight)

    def forward(self,x):
        x = self.body(x)
        x = self.tail(x)
        return x

class Wave_Network(nn.Module):
    def __init__(self, num_classes=12, model_path='', device='cuda'):
        super().__init__()
        self.device = device
        self.backbone = Wav2Vec2Model.from_pretrained(model_path).to(device=device)

        for param in self.backbone.parameters():
            param.requires_grad = False
        self.backbone.feature_extractor._freeze_parameters()

        self.discriminator = Discriminator(768).to(device=device)
        self.projection = Projection(768, 768, 2).to(device=device)
        self.projector = nn.Sequential(
            nn.Dropout(0.3),
            nn.LazyLinear(256).to(device=device)
        )
        self.classifier = nn.Sequential(
            nn.Dropout(0.3),
            # nn.Flatten(),
            # nn.LazyLinear(256).to(device=device),
            nn.LazyLinear(num_classes).to(device=device)
        )

    def get_train_params(self, is_classification=True):
        if is_classification:
            return [
                {'params': self.projector.parameters()},
                {'params': self.classifier.parameters()},
            ]
        else:
            return [
                {'params': self.projection.parameters()},
                {'params': self.discriminator.parameters()},
            ]

    def forward(self, x, nomaly_label=None, std=0.05, is_train=True):
        x = self.backbone(x).last_hidden_state
        x_reshape = x.reshape(x.shape[0],-1)
        x_hidden= F.adaptive_avg_pool1d(x_reshape, 768)
        x_hidden_state = self.projector(x)
        x_hidden_state = x_hidden_state.mean(dim=1)

        if is_train:
            normal_hidden_state = x_hidden_state[nomaly_label==1]
            x_classification = self.classifier(normal_hidden_state)

            x_projector = self.projection(x_hidden)
            # add noise
            # if is_train:
            normal_x_projector = x_projector[nomaly_label==1]
            noise = torch.normal(mean=0, std=std, size=normal_x_projector.shape).to(device=self.device)
            x_noise = normal_x_projector + noise
            x_noise = torch.concat((x_projector, x_noise))
            # else:
            #     x_noise = x_projector
            x_anomaly = self.discriminator(x_noise)
        else:
            x_classification = self.classifier(x_hidden_state)
            x_projector = self.projection(x_hidden)
            x_anomaly = self.discriminator(x_projector)
        return x_anomaly, x_classification

    def save(self, file_path):
        # Custom logic before saving
        print(f"Saving model to {file_path}")

        # For example, saving both the model's state dict and some metadata
        torch.save({
            'model_state_dict_backbone': self.backbone.state_dict(),
            'model_state_dict_discriminator': self.discriminator.state_dict(),
            'model_state_dict_projection': self.projection.state_dict(),
            'model_state_dict_classifier': self.classifier.state_dict(),
            'custom_metadata': {
                'info': 'This is a custom saved model',
                'epoch': 10,
                'loss': 0.1234
            }
        }, file_path)

        # Custom logic after saving
        print("Model saved successfully!")

    def load(self, file_path):
        # Custom logic before saving
        print(f"Loading model from {file_path}")

        # For example, saving both the model's state dict and some metadata
        temp_model = torch.load(file_path)
        self.backbone.load_state_dict(temp_model['model_state_dict_backbone'])
        self.discriminator.load_state_dict(temp_model['model_state_dict_discriminator'])
        self.projection.load_state_dict(temp_model['model_state_dict_projection'])
        self.classifier.load_state_dict(temp_model['model_state_dict_classifier'])
        # Custom logic after saving
        print("Model loaded successfully!")

class Wave_Network_Classification(nn.Module):
    def __init__(self, num_classes=12):
        super().__init__()

        self.projector = nn.Sequential(
            # nn.Dropout(0.3),
            # nn.Conv1d(49, 1, 3, padding='same'),
            nn.LazyLinear(256)
        )

        self.squeeze_exhibition_in = nn.LazyLinear(16)
        self.relu = nn.ReLU()
        self.squeeze_exhibition_out = nn.LazyLinear(49)
        self.sigmoid = nn.Sigmoid()

        self.classifier = nn.Sequential(
            # nn.Dropout(0.3),
            nn.LazyLinear(num_classes)
        )

    def forward(self, x, nomaly_label=None, is_train=True):
        if nomaly_label is not None:
            input_x = x[nomaly_label==1]
        else:
            input_x = x
        x_hidden_state = self.projector(input_x) #[n, 49, 256]
        # x_hidden_state = torch.squeeze(x_hidden_state)
        # x_hidden_state_mean = x_hidden_state.mean(dim=1)
        x_hidden_state_mean = x_hidden_state.mean(dim=-1) #[n, 49, 1]
        x_hidden_state_mean = torch.unsqueeze(x_hidden_state_mean, 1)
        # x_hidden_state_transpose = torch.transpose(x_hidden_state_mean, -1, -2) # [n, 1, 49]
        x_squeeze_exhibition = self.squeeze_exhibition_in(x_hidden_state_mean)
        x_squeeze_exhibition = self.relu(x_squeeze_exhibition)
        x_squeeze_exhibition = self.squeeze_exhibition_out(x_squeeze_exhibition)
        x_squeeze_exhibition = self.sigmoid(x_squeeze_exhibition)

        x_hidden_state = torch.transpose(x_hidden_state, -1, -2)
        x_hidden_state = torch.mul(x_hidden_state, x_squeeze_exhibition)
        x_hidden_state = torch.transpose(x_hidden_state, -1, -2)

        x_hidden_state = x_hidden_state.mean(dim=1)

        if is_train:
            normal_hidden_state = x_hidden_state
            x_classification = self.classifier(normal_hidden_state)
        else:
            x_classification = self.classifier(x_hidden_state)
        return x_classification

class Wave_Network_Anomaly_Detection(nn.Module):
    def __init__(self, std=0.05):
        super().__init__()
        self.discriminator = Discriminator(768)
        self.projection = Projection(49, 768, 5)
        self.std = std

    def forward(self, x, nomaly_label=None, is_train=True):
        # x_reshape = x.reshape(x.shape[0],-1)
        # x_hidden= F.adaptive_avg_pool1d(x_reshape, 768)
        # x_hidden = x.mean(dim=1)
        x_hidden = x
        if is_train:
            x_projector = self.projection(x_hidden)
            # add noise
            # if is_train:
            normal_x_projector = x_projector[nomaly_label==1]
            noise = torch.normal(mean=0, std=self.std, size=normal_x_projector.shape).cuda()
            x_noise = normal_x_projector + noise
            x_noise = torch.concat((x_projector, x_noise))
            # else:
            #     x_noise = x_projector
            x_anomaly = self.discriminator(x_noise)
        else:
            x_projector = self.projection(x_hidden)
            x_anomaly = self.discriminator(x_projector)
        return x_anomaly
# _network =  Wav2Vec2Model.from_pretrained(pretrained_model_name_or_path='./model_1')
# print(_network)
# for name, param in _network.named_parameters():
#     print(f"layer: {name} | Shape: {param.shape}")

# model = AutoModelForAudioClassification.from_pretrained(
#         pretrained_model_name_or_path='./model_1', num_labels=10, ignore_mismatched_sizes=True)
# for name, param in model.named_parameters():
#     print(f"layer: {name} | Shape: {param.shape}")

In [None]:
from torch import nn
from transformers import Wav2Vec2Model, AutoFeatureExtractor, get_scheduler
from datasets import Dataset, DatasetDict, Audio, concatenate_datasets
from model.pure_model import Wave_Network_Classification, Wave_Network_Anomaly_Detection
from torch.utils.data import DataLoader
import evaluate
accuracy = evaluate.load("accuracy")
import numpy as np
import torch
from torch.nn.utils.rnn import pad_sequence
from datasets import load_from_disk, Audio, VerificationMode, load_dataset
import matplotlib.pyplot as plt
from torchmetrics import F1Score, Precision, Recall, Accuracy
from tqdm.auto import tqdm
from data.audio_augmentation import random_augementation

dict_label = {'yes':0,
              'no':1,
              'up':2,
              'down':3,
              'left':4,
              'right':5,
              'on':6,
              'off':7,
              'stop':8,
              'go':9,
              'unknown':10,
              'silence':11}

def compute_metrics(eval_pred):
    predictions = np.argmax(eval_pred.predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=eval_pred.label_ids)

def preprocess_function(examples):
    audio_arrays = [x['array'] for x in examples["audio"]]
    inputs = feature_extractor(
        audio_arrays, sampling_rate=feature_extractor.sampling_rate, padding="max_length", max_length=16_000, truncation=True,
    )
    # inputs['input_values'] = model_backbone(torch.from_numpy(np.array(inputs['input_values'])).cuda()).last_hidden_state.cpu()
    return inputs

def agumentation_function(examples, i):
    examples["audio"]['array'] = random_augementation(examples["audio"]['array'], i)

    return examples

def edit_label_2(seq):
    if seq['label'] == 11:
        seq['nomaly_label'] = 0
    else:
        seq['nomaly_label'] = 1
    return seq

def collate_fn(batch):
    return_batch = {}
    # Find the max length of sequences in the batch
    max_len = max([len(x['input_values']) for x in batch])

    # Pad sequences to the max length
    for x in batch:
        x['input_values'] = torch.cat([x['input_values'], torch.zeros(max_len - len(x['input_values']))])

    return_batch['input_values'] = torch.stack([x['input_values'] for x in batch])
    return_batch['label'] = torch.stack([x['label'] for x in batch])
    return_batch['anomaly_label'] = torch.stack([x['anomaly_label'] for x in batch])

    # Stack the padded sequences into a single tensor
    return return_batch

if __name__ == "__main__":
    model_path = './model_1'
    weight_cls_state_dict = r'D:\1.Project\3.Machine_Learning\Voice\W2Vec\models\w2vec_model_cls_final - best.pth'
    weight_anomaly_state_dict = r'D:\1.Project\3.Machine_Learning\Voice\W2Vec\models\w2vec_model_anomaly_final - best.pth'
    epoch_num = 50
    batch_size = 32
    device = 'cuda'

    label2id = {'yes': '0', 'no': '1', 'up': '2', 'down': '3', 'left': '4', 'right': '5', 'on': '6', 'off': '7', 'stop': '8', 'go': '9', 'unknown': '10', 'silence': '11'}
    id2label = {'0':'yes', '1':'no', '2':'up', '3':'down', '4':'left', '5':'right', '6':'on', '7':'off', '8':'stop', '9':'go', '10':'unknown', '11':'silence'}

    feature_extractor = AutoFeatureExtractor.from_pretrained(model_path)
    model_backbone = Wav2Vec2Model.from_pretrained(model_path).to(device=device)
    model_backbone.post_init()

    for param in model_backbone.parameters():
            param.requires_grad = False
    model_backbone.feature_extractor._freeze_parameters()
    # train_data = load_from_disk('./anomaly/data/dataset/train')
    # valid_data = load_from_disk('./anomaly/data/dataset/validation')
    # test_dataset = load_from_disk('./anomaly/data/dataset/test')
    # dataset = load_dataset("superb", "ks", trust_remote_code=True, verification_mode=VerificationMode.NO_CHECKS)
    dataset = load_from_disk(r'D:\1.Project\3.Machine_Learning\Voice\anomaly\data\dataset\superbs')
    train_data = load_from_disk(r'D:\1.Project\3.Machine_Learning\Voice\anomaly\data\dataset\superbs\train_updated_augmentation_fake')
    valid_data = dataset['validation']
    test_dataset = dataset['test']

    train_data = train_data.map(preprocess_function, remove_columns='audio', batched=True)
    valid_data = valid_data.map(preprocess_function, remove_columns='audio', batched=True)
    train_data = train_data.map(edit_label_2)
    valid_data = valid_data.map(edit_label_2)

    train_dataloader = DataLoader(train_data.with_format('torch'), batch_size, shuffle=True)
    test_dataloader = DataLoader(valid_data.with_format('torch'), batch_size)

    num_class = len(label2id) - 1
    model_cls = Wave_Network_Classification(num_classes=num_class).to(device=device)
    model_anomaly = Wave_Network_Anomaly_Detection().to(device=device)
    model_cls.load_state_dict(torch.load(weight_cls_state_dict))
    model_anomaly.load_state_dict(torch.load(weight_anomaly_state_dict))

    criterion_1 = nn.CrossEntropyLoss()
    criterion_2 = nn.BCEWithLogitsLoss()
    optim_cls = torch.optim.AdamW(model_cls.parameters(), lr=0.001)
    optim_anomaly = torch.optim.AdamW(model_anomaly.parameters(), lr=0.001)
    acc_score_classification = Accuracy('multiclass', num_classes=num_class).to(device='cuda')
    acc_score_anomaly = Accuracy('binary', num_classes=2).to(device='cuda')

    num_training_steps = epoch_num * len(train_dataloader)
    lr_scheduler_cls = get_scheduler(
        name="linear", optimizer=optim_cls, num_warmup_steps=0, num_training_steps=num_training_steps
    )
    lr_scheduler_anomaly = get_scheduler(
        name="linear", optimizer=optim_anomaly, num_warmup_steps=0, num_training_steps=num_training_steps
    )
    progress_bar = tqdm(range(num_training_steps), position=0, leave=True)

    for epoch in range(epoch_num):
        loss_item = 0.
        valid_loss = 0.
        running_loss = 0
        save_loss = 999999
        loss_item_1 = 0.
        loss_item_2 = 0.
        model_cls.train()
        model_anomaly.train()
        acc_score_classification.reset()
        acc_score_anomaly.reset()
        for i, data in enumerate(train_dataloader):
            optim_cls.zero_grad()
            optim_anomaly.zero_grad()
            with torch.no_grad():
                # data['input_values'] = model_backbone(data['input_values'].cuda()).extract_features
                data['input_values'] = model_backbone(data['input_values'].cuda()).last_hidden_state
            inputs, labels = data['input_values'].cuda(), data['nomaly_label'].cuda()
            normal_inputs = data['input_values'][data['nomaly_label']==1].cuda()
            anormal_inputs = data['input_values'][data['nomaly_label']==0].cuda()
            normal_labels = data['label'][data['nomaly_label']==1].cuda()

            if len(normal_inputs) == 0: continue

            original_project_label = torch.unsqueeze(data['nomaly_label'], dim=-1)
            false_project_label = torch.zeros(size=(normal_inputs.shape[0], 1))
            projection_labels = torch.concat((original_project_label, false_project_label)).cuda()
            o_classification = model_cls(inputs, labels)
            o_nomaly = model_anomaly(inputs, labels)
            loss_1 = criterion_1(o_classification, normal_labels)
            loss_2 = criterion_2(o_nomaly, projection_labels)
            loss = loss_1 + loss_2
            f1_score_cls_metric = acc_score_classification(o_classification, normal_labels)
            acc_score_anomaly_metric = acc_score_anomaly(o_nomaly, projection_labels)

            loss_1.backward()
            optim_cls.step()
            lr_scheduler_cls.step()

            loss_2.backward()
            optim_anomaly.step()
            lr_scheduler_anomaly.step()

            loss_item_1 += loss_1.item()
            loss_item_2 += loss_2.item()
            progress_bar.update(1)
            if i % 500 == 499:  # Print every 10 batches
                tqdm.write(f'[Epoch {epoch + 1}, Batch {i + 1}] loss_cls: {loss_item_1/500:.3f} | loss_anomaly: {loss_item_2/500:.3f} | acc_cls_score: {acc_score_classification.compute():.3f} | acc_anomaly_score: {acc_score_anomaly.compute():.3f}')
                loss_item_1 = 0.0
                loss_item_2 = 0.0

        model_cls.eval()
        model_anomaly.eval()
        acc_score_classification.reset()
        acc_score_anomaly.reset()
        with torch.no_grad():
            for i, data in enumerate(test_dataloader):
                data['input_values'] = model_backbone(data['input_values'].cuda()).last_hidden_state
                # data['input_values'] = model_backbone(data['input_values'].cuda()).extract_features
                inputs, labels = data['input_values'].cuda(), data['nomaly_label'].cuda()
                normal_inputs = data['input_values'][data['nomaly_label']==1].cuda()
                anormal_inputs = data['input_values'][data['nomaly_label']==0].cuda()
                normal_labels = data['label'][data['nomaly_label']==1].cuda()
                if len(normal_inputs) == 0: continue
                # true_project_label = torch.ones(size=(normal_inputs.shape[0], 1))
                original_project_label = torch.unsqueeze(data['nomaly_label'], dim=-1).cuda().float()
                false_project_label = torch.zeros(size=(normal_inputs.shape[0], 1))
                # projection_labels = torch.concat((original_project_label, false_project_label)).cuda()
                o_classification = model_cls(normal_inputs, is_train=False)
                o_nomaly = model_anomaly(inputs, labels, is_train=False)
                loss_1 = criterion_1(o_classification, normal_labels)
                loss_2 = criterion_2(o_nomaly, original_project_label)
                loss = loss_1 + loss_2
                valid_loss += loss.item()
                f1_score_cls_metric = acc_score_classification(o_classification, normal_labels)
                acc_score_anomaly_metric = acc_score_anomaly(o_nomaly, original_project_label)
            valid_loss = valid_loss/len(test_dataloader)
            if epoch == 0: valid_loss = save_loss
            else:
                if valid_loss < save_loss:
                    torch.save(model_cls.state_dict(), f'./w2vec/models/w2vec_model_cls_best.pth')
                    torch.save(model_anomaly.state_dict(), f'./w2vec/models/w2vec_model_anomaly_best.pth')
                    save_loss = valid_loss
            tqdm.write(f"Epoch [{epoch+1}/{epoch_num}], Valid Loss: {valid_loss:.3f} | acc_cls_score: {acc_score_classification.compute():.3f} | acc_anomaly_score: {acc_score_anomaly.compute():.3f}")

    torch.save(model_cls.state_dict(), f'./w2vec/models/w2vec_model_cls_final.pth')
    torch.save(model_anomaly.state_dict(), f'./w2vec/models/w2vec_model_anomaly_final.pth')


In [None]:
from torch import nn
from transformers import AutoModelForAudioClassification, TrainingArguments, Trainer
from transformers import AutoFeatureExtractor, Wav2Vec2Model
from datasets import Dataset, DatasetDict, Audio
from model.pure_model import Wave_Network
from torch.utils.data import DataLoader
import evaluate
accuracy = evaluate.load("accuracy")
import numpy as np
import torch
from torch.nn.utils.rnn import pad_sequence
from datasets import load_from_disk, Audio
import matplotlib.pyplot as plt
from model.pure_model import Wave_Network_Classification, Wave_Network_Anomaly_Detection

dict_label = {'yes':0,
              'no':1,
              'up':2,
              'down':3,
              'left':4,
              'right':5,
              'on':6,
              'off':7,
              'stop':8,
              'go':9,
              'silence':10,
              'unknown':11}

def compute_metrics(eval_pred):
    predictions = np.argmax(eval_pred.predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=eval_pred.label_ids)

def preprocess_function(examples):
    audio_arrays = [x['array'] for x in examples["audio"]]
    inputs = feature_extractor(
        audio_arrays, sampling_rate=feature_extractor.sampling_rate, padding="max_length", max_length=16_000, truncation=True,
    )
    # inputs['input_values'] = model_backbone(torch.from_numpy(np.array(inputs['input_values'])).cuda()).last_hidden_state.cpu()
    return inputs

def collate_fn(batch):
    return_batch = {}
    # Find the max length of sequences in the batch
    max_len = max([len(x['input_values']) for x in batch])

    # Pad sequences to the max length
    for x in batch:
        x['input_values'] = torch.cat([x['input_values'], torch.zeros(max_len - len(x['input_values']))])

    return_batch['input_values'] = torch.stack([x['input_values'] for x in batch])
    return_batch['label'] = torch.stack([x['label'] for x in batch])
    return_batch['nomaly_label'] = torch.stack([x['nomaly_label'] for x in batch])
    # return_batch['re_label'] = torch.stack([x['re_label'] for x in batch])
    return return_batch

def edit_label_2(seq):
    if seq['label'] == 11:
        seq['nomaly_label'] = 0
    else:
        seq['nomaly_label'] = 1
    return seq

if __name__ == "__main__":
    model_path = './model_1'
    device = 'cuda'
    weight_state_dict_cls = './w2vec/models/w2vec_model_cls_best.pth'
    weight_state_dict_anomaly = './w2vec/models/w2vec_model_anomaly_best.pth'
    feature_extractor = AutoFeatureExtractor.from_pretrained(model_path)
    model_backbone = Wav2Vec2Model.from_pretrained(model_path).to(device=device)

    model_cls = Wave_Network_Classification(num_classes=11).to(device=device)
    model_anomaly = Wave_Network_Anomaly_Detection().to(device=device)
    epoch_num = 30
    batch_size = 1

    label2id = {'yes': '0', 'no': '1', 'up': '2', 'down': '3', 'left': '4', 'right': '5', 'on': '6', 'off': '7', 'stop': '8', 'go': '9', 'silence': '10', 'unknown': '11'}
    id2label = {'0':'yes', '1':'no', '2':'up', '3':'down', '4':'left', '5':'right', '6':'on', '7':'off', '8':'stop', '9':'go', '10':'silence', '11':'unknown'}

    dataset = load_from_disk(r'D:\1.Project\3.Machine_Learning\Voice\anomaly\data\dataset\superbs')
    test_dataset = dataset['test']

    test_dataset = test_dataset.map(preprocess_function, remove_columns='audio', batched=True)
    test_dataset = test_dataset.map(edit_label_2)
    print(test_dataset)
    test_dataset_loader = DataLoader(test_dataset.with_format('torch'), batch_size, shuffle=True)
    num_class = len(label2id) - 1
    model_cls.load_state_dict(torch.load(weight_state_dict_cls))
    model_anomaly.load_state_dict(torch.load(weight_state_dict_anomaly))

    loss_item = 0.
    model_cls.eval()
    model_anomaly.eval()
    acc = 0
    with torch.no_grad():
        for i, data in enumerate(test_dataset_loader):
            data['input_values'] = model_backbone(data['input_values'].cuda()).last_hidden_state
            inputs = data['input_values'].cuda()
            o_classification = model_cls(inputs, is_train=False)
            o_nomaly = model_anomaly(inputs, is_train=False)
            anomaly_arg = torch.sigmoid(o_nomaly)
            cls_arg = torch.argmax(o_classification, dim=-1)
            if (anomaly_arg.cpu().item() > 0.5 and data['nomaly_label'].cpu().item() == 1):
                if data['label'].cpu().item() != 11 and cls_arg.cpu().item() == data['label'].cpu().item():
                    acc += 1
                else:
                    print(anomaly_arg, data['nomaly_label'], cls_arg, data['label'])
            elif anomaly_arg.cpu().item() <= 0.5 and data['nomaly_label'].cpu().item() == 0:
                acc += 1
            else:
                print(anomaly_arg, data['nomaly_label'], cls_arg, data['label'])
        print(f'acc: {acc/len(test_dataset)}')


In [None]:
from datasets import load_from_disk, concatenate_datasets, Dataset, Audio, ClassLabel
import numpy as np
import scipy.io.wavfile as wav
import random
import soundfile as sf
from typing import Dict, Optional, Union
import scipy, acoustics, pandas
import matplotlib.pyplot as plt
from audio_augmentation import random_augementation
import librosa

LABEL_IDS = {'yes': '0', 'no': '1', 'up': '2', 'down': '3', 'left': '4', 'right': '5', 'on': '6', 'off': '7', 'stop': '8', 'go': '9', 'unknown': '10', 'silence': '11'}

NOISE_AUDIO_PATH_ARRAYS = [
    r'D:\1.Project\3.Machine_Learning\Voice\anomaly\data\SpeechCommands\speech_commands_v0.01\_background_noise_\doing_the_dishes.wav',
    r'D:\1.Project\3.Machine_Learning\Voice\anomaly\data\SpeechCommands\speech_commands_v0.01\_background_noise_\dude_miaowing.wav',
    r'D:\1.Project\3.Machine_Learning\Voice\anomaly\data\SpeechCommands\speech_commands_v0.01\_background_noise_\exercise_bike.wav',
    r'D:\1.Project\3.Machine_Learning\Voice\anomaly\data\SpeechCommands\speech_commands_v0.01\_background_noise_\running_tap.wav',
]
class Audio_Implement(Audio):
    def decode_example(self, value: dict, token_per_repo_id: Optional[Dict[str, Union[str, bool, None]]] = None) -> dict:
        return {"path": value["path"], "array": value['array'], "sampling_rate": value['sampling_rate']}

def get_noise_data_sources():
    data_bank = []
    for path in NOISE_AUDIO_PATH_ARRAYS:
        audio_data, sample_rate = sf.read(path)
        data_bank.append((audio_data, sample_rate))
    return data_bank

def random_crop_wav(data_bank, idx, crop_duration=1):
    # Load the wav file
    audio_data, sample_rate = data_bank[idx]

    # Calculate total length of audio in seconds
    total_duration = len(audio_data) / sample_rate

    # Ensure that the crop duration is valid
    if crop_duration > total_duration:
        raise ValueError("Crop duration exceeds the audio length.")

    # Convert crop duration to samples
    crop_samples = int(crop_duration * sample_rate)

    # Determine a random start point
    max_start = len(audio_data) - crop_samples
    start_sample = random.randint(0, max_start)

    # Crop the audio
    cropped_audio = audio_data[start_sample:start_sample + crop_samples]

    # Convert to numpy array (if not already)
    cropped_array = np.array(cropped_audio)

    return cropped_array, sample_rate

def create_random_data(data_count = 1800):
    out_data = []
    data_bank = get_noise_data_sources()
    for i in range(data_count):
        # random method:
        rand_method = random.randint(0, 1)
        # crop
        if rand_method == 0:
            arr, sample_rate = random_crop_wav(data_bank=data_bank, idx=random.randint(0, 3))
        else:
            rand_noise_method = random.randint(0, 1)
            # white noise
            if rand_noise_method == 1:
                scipy.io.wavfile.write('white_noise.wav', 16000, np.array(((acoustics.generator.noise(16000*1, color='white'))/3) * 32767).astype(np.int16))
                arr, sample_rate = sf.read('white_noise.wav')
            # pink noise
            else:
                scipy.io.wavfile.write('pink_noise.wav', 16000, np.array(((acoustics.generator.noise(16000*1, color='pink'))/3) * 32767).astype(np.int16))
                arr, sample_rate = sf.read('pink_noise.wav')
        out_data.append((arr, sample_rate))
        print(f'{i} create file..', arr)
    return out_data

def add_data_to_dataset(input_dataset, input_data, label=10):
    input_dict = input_dataset.to_dict()
    pd = input_dataset.to_pandas()
    da = Dataset.from_pandas(pd)
    for i, data in enumerate(input_data):
        input_dict['file'].append(f'added_audio_file_{i}')
        input_dict['audio'].append({
                    'path': f'added_audio_file_{i}',
                    'array': data[0],
                    'sampling_rate': data[1]
                })
        input_dict['label'].append(label)

    output_dataset = Dataset.from_dict(input_dict)
    return output_dataset

def agumentation_function(examples, i):
    if examples['label'] != 11:
        examples["audio"]['array'] = random_augementation(examples["audio"]['array'], i)
    return examples

import os
def _prepare_file_list(root_dir):
    file_paths = []
    labels = []

    for idx, (root, _, files) in enumerate(os.walk(root_dir)):
        random.shuffle(files)
        for file in files:
            if file.endswith(".wav"):
                file_paths.append(os.path.join(root, file))
                file_idx = int(file.split('_')[1][:3])
                labels.append(file_idx)
    return file_paths, labels

def create_new_dataset(input_data, label=10, is_create_dataset_from_folder=False, root_path=None):
    input_dict = {
        'file': [],
        'audio': [],
        'label': []
    }
    for i, data in enumerate(input_data):
        input_dict['file'].append(f'added_audio_file_{i}')
        input_dict['audio'].append({
                    'path': f'added_audio_file_{i}',
                    'array': data[0],
                    'sampling_rate': data[1]
                })
        input_dict['label'].append(label)
        print(i, data[0], data[1])

    if is_create_dataset_from_folder and root_path:
        file_paths, labels = _prepare_file_list(root_path)
        for i, data in enumerate(list(zip(file_paths, labels))):
            audio_arr, sr = sf.read(data[0])
            audio_arr = librosa.resample(audio_arr, orig_sr=sr, target_sr=16_000)
            input_dict['file'].append(f'added_fake_audio_file_{i}')
            input_dict['audio'].append({
                    'path': f'added_fake_audio_file_{i}',
                    'array': audio_arr,
                    'sampling_rate': 16_000
                })
            input_dict['label'].append(data[1])
            print(i, data[0], data[1])
    output_dataset = Dataset.from_dict(input_dict)
    return output_dataset


path = r'D:\1.Project\3.Machine_Learning\Voice\anomaly\data\dataset\superbs\fake_data'
# _prepare_file_list(path)

dataset = load_from_disk(r'D:\1.Project\3.Machine_Learning\Voice\anomaly\data\dataset\superbs\train_updated_augmentation')
# train_dataset = dataset['train']

data = create_random_data(data_count=700)
new_dataset = create_new_dataset(data, is_create_dataset_from_folder=True, root_path=path)
new_dataset = new_dataset.cast_column('audio', Audio(sampling_rate=16_000))
new_dataset = new_dataset.cast_column('label', ClassLabel(names=['yes', 'no', 'up', 'down', 'left', 'right', 'on', 'off', 'stop', 'go', '_silence_', '_unknown_']))
print(new_dataset)
# train_dataset = concatenate_datasets([train_dataset, new_dataset], info=train_dataset.info)

dataset = concatenate_datasets([dataset, new_dataset])
for i in range(6):
    print(f'Round: {i}')
    augementation_train_data = new_dataset.filter(lambda x: x['label'] != 11).map(agumentation_function, fn_kwargs={'i': i})
    dataset = concatenate_datasets([dataset, augementation_train_data])
print(dataset)


df = dataset.to_pandas()
# Giả sử cột label trong dataset có tên là 'label'
label_counts = df['label'].value_counts()
# Hiển thị kết quả
print(label_counts)
dataset.save_to_disk(r'D:\1.Project\3.Machine_Learning\Voice\anomaly\data\dataset\superbs\train_updated_augmentation_fake')
# print('done')

In [None]:
import librosa
import numpy as np
import torchaudio.transforms as T
from scipy.signal import butter, lfilter
import random

def time_stretch(audio, rate=1.2):
    return librosa.effects.time_stretch(audio, rate=rate)

def pitch_shift(audio, sr, n_steps=2):
    return librosa.effects.pitch_shift(audio, sr=sr, n_steps=n_steps)

def add_noise(audio, noise_factor=0.005):
    noise = np.random.randn(len(audio))
    augmented_audio = audio + noise_factor * noise
    return augmented_audio

def shift_time(audio, shift_max=0.2, sr=16000):
    shift = np.random.randint(sr * shift_max) # shift in seconds
    return np.roll(audio, shift)

def change_volume(audio, factor=1.5):
    return audio * factor

def spec_augment(mel_spectrogram):
    time_mask = T.TimeMasking(time_mask_param=30)
    freq_mask = T.FrequencyMasking(freq_mask_param=30)
    augmented_spec = time_mask(mel_spectrogram)
    augmented_spec = freq_mask(augmented_spec)
    return augmented_spec

def reverb(audio, sr=16000):
    # Example using librosa's reverb effect
    room_size = 0.5
    return librosa.effects.preemphasis(audio, coef=room_size)

def butter_lowpass(cutoff, fs, order=5):
    nyq = 0.5 * fs
    normal_cutoff = cutoff / nyq
    b, a = butter(order, normal_cutoff, btype='low', analog=False)
    return b, a

def lowpass_filter(audio, cutoff, sr):
    b, a = butter_lowpass(cutoff, sr, order=6)
    y = lfilter(b, a, audio)
    return y

def random_augementation(audio, rand_method):
    if rand_method is None:
        rand_method = random.randint(0, 5)
    if rand_method == 0:
        return time_stretch(audio=audio)
    elif rand_method == 1:
        return pitch_shift(audio=audio, sr=16_000)
    elif rand_method ==2:
        noise_fc = random.uniform(0.001, 0.005)
        return add_noise(audio=audio, noise_factor=noise_fc)
    elif rand_method ==3:
        return change_volume(audio=audio)
    elif rand_method ==4:
        return reverb(audio=audio, sr=16_000)
    elif rand_method ==5:
        return shift_time(audio=audio, sr=16_000)

# audioo = np.zeros((16000,))
# x = time_stretch(audio=audioo, rate=1.2)
# print(x)