<a href="https://colab.research.google.com/github/hoa92ng/Homework/blob/main/Making_the_Most_of_your_Colab_Subscription.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Making the Most of your Colab Subscription



## Faster GPUs

Users who have purchased one of Colab's paid plans have access to faster GPUs and more memory. You can upgrade your notebook's GPU settings in `Runtime > Change runtime type` in the menu to select from several accelerator options, subject to availability.

The free of charge version of Colab grants access to Nvidia's T4 GPUs subject to quota restrictions and availability.

You can see what GPU you've been assigned at any time by executing the following cell. If the execution result of running the code cell below is "Not connected to a GPU", you can change the runtime by going to `Runtime > Change runtime type` in the menu to enable a GPU accelerator, and then re-execute the code cell.


In [None]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

In order to use a GPU with your notebook, select the `Runtime > Change runtime type` menu, and then set the hardware accelerator to the desired option.

## More memory

Users who have purchased one of Colab's paid plans have access to high-memory VMs when they are available. More powerful GPUs are always offered with high-memory VMs.



You can see how much memory you have available at any time by running the following code cell. If the execution result of running the code cell below is "Not using a high-RAM runtime", then you can enable a high-RAM runtime via `Runtime > Change runtime type` in the menu. Then select High-RAM in the Runtime shape toggle button. After, re-execute the code cell.


In [None]:
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

if ram_gb < 20:
  print('Not using a high-RAM runtime')
else:
  print('You are using a high-RAM runtime!')

## Longer runtimes

All Colab runtimes are reset after some period of time (which is faster if the runtime isn't executing code). Colab Pro and Pro+ users have access to longer runtimes than those who use Colab free of charge.

## Background execution

Colab Pro+ users have access to background execution, where notebooks will continue executing even after you've closed a browser tab. This is always enabled in Pro+ runtimes as long as you have compute units available.



## Relaxing resource limits in Colab Pro

Your resources are not unlimited in Colab. To make the most of Colab, avoid using resources when you don't need them. For example, only use a GPU when required and close Colab tabs when finished.



If you encounter limitations, you can relax those limitations by purchasing more compute units via Pay As You Go. Anyone can purchase compute units via [Pay As You Go](https://colab.research.google.com/signup); no subscription is required.

## Send us feedback!

If you have any feedback for us, please let us know. The best way to send feedback is by using the Help > 'Send feedback...' menu. If you encounter usage limits in Colab Pro consider subscribing to Pro+.

If you encounter errors or other issues with billing (payments) for Colab Pro, Pro+, or Pay As You Go, please email [colab-billing@google.com](mailto:colab-billing@google.com).

## More Resources

### Working with Notebooks in Colab
- [Overview of Colab](/notebooks/basic_features_overview.ipynb)
- [Guide to Markdown](/notebooks/markdown_guide.ipynb)
- [Importing libraries and installing dependencies](/notebooks/snippets/importing_libraries.ipynb)
- [Saving and loading notebooks in GitHub](https://colab.research.google.com/github/googlecolab/colabtools/blob/main/notebooks/colab-github-demo.ipynb)
- [Interactive forms](/notebooks/forms.ipynb)
- [Interactive widgets](/notebooks/widgets.ipynb)

<a name="working-with-data"></a>
### Working with Data
- [Loading data: Drive, Sheets, and Google Cloud Storage](/notebooks/io.ipynb)
- [Charts: visualizing data](/notebooks/charts.ipynb)
- [Getting started with BigQuery](/notebooks/bigquery.ipynb)

### Machine Learning Crash Course
These are a few of the notebooks from Google's online Machine Learning course. See the [full course website](https://developers.google.com/machine-learning/crash-course/) for more.
- [Intro to Pandas DataFrame](https://colab.research.google.com/github/google/eng-edu/blob/main/ml/cc/exercises/pandas_dataframe_ultraquick_tutorial.ipynb)
- [Linear regression with tf.keras using synthetic data](https://colab.research.google.com/github/google/eng-edu/blob/main/ml/cc/exercises/linear_regression_with_synthetic_data.ipynb)


<a name="using-accelerated-hardware"></a>
### Using Accelerated Hardware
- [TensorFlow with GPUs](/notebooks/gpu.ipynb)
- [TensorFlow with TPUs](/notebooks/tpu.ipynb)

<a name="machine-learning-examples"></a>

## Machine Learning Examples

To see end-to-end examples of the interactive machine learning analyses that Colab makes possible, check out these tutorials using models from [TensorFlow Hub](https://tfhub.dev).

A few featured examples:

- [Retraining an Image Classifier](https://tensorflow.org/hub/tutorials/tf2_image_retraining): Build a Keras model on top of a pre-trained image classifier to distinguish flowers.
- [Text Classification](https://tensorflow.org/hub/tutorials/tf2_text_classification): Classify IMDB movie reviews as either *positive* or *negative*.
- [Style Transfer](https://tensorflow.org/hub/tutorials/tf2_arbitrary_image_stylization): Use deep learning to transfer style between images.
- [Multilingual Universal Sentence Encoder Q&A](https://tensorflow.org/hub/tutorials/retrieval_with_tf_hub_universal_encoder_qa): Use a machine learning model to answer questions from the SQuAD dataset.
- [Video Interpolation](https://tensorflow.org/hub/tutorials/tweening_conv3d): Predict what happened in a video between the first and the last frame.


In [None]:
from transformers import AutoFeatureExtractor
from datasets import load_dataset, Audio, load_from_disk
from transformers import AutoModelForAudioClassification, TrainingArguments, Trainer
import numpy as np
import evaluate
accuracy = evaluate.load("accuracy")


import pandas as pd
import matplotlib.pyplot as plt

dict_label = {'yes':0,
              'no':1,
              'up':2,
              'down':3,
              'left':4,
              'right':5,
              'on':6,
              'off':7,
              'stop':8,
              'go':9,
              'unknown':10,
              'silence':11}

def compute_metrics(eval_pred):
    predictions = np.argmax(eval_pred.predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=eval_pred.label_ids)

def preprocess_function(examples):
    audio_arrays = [x["array"] for x in examples["audio"]]
    inputs = feature_extractor(
        audio_arrays, sampling_rate=feature_extractor.sampling_rate, max_length=16_000, truncation=True
    )
    return inputs


def edit_label(examples):
    for i, x in enumerate(examples['file']):
        a = examples['label'][i]
        if examples['is_unknown'][i]: examples['label'][i] = dict_label['unknown']
        elif id2label[str(examples['label'][i])] == '_silence_':
            examples['label'][i] = dict_label['silence']
    return examples

def edit_label_2(seq):
    if seq['label'] == 11:
        seq['anomaly_label'] = 0
    else:
        seq['anomaly_label'] = 1
    return seq

feature_extractor = AutoFeatureExtractor.from_pretrained('./model_1')
# train_dataset_ = load_dataset("google/speech_commands", 'v0.01', split='train', trust_remote_code=True)
# valid_dataset_ = load_dataset("google/speech_commands", 'v0.01', split='validation', trust_remote_code=True)
# test_dataset_ = load_dataset("google/speech_commands", 'v0.01', split='test', trust_remote_code=True)

# labels = valid_dataset_.features["label"].names
# label2id, id2label = dict(), dict()
# for i, label in enumerate(labels):
#     label2id[label] = str(i)
#     id2label[str(i)] = label


# train_dataset_ = train_dataset_.map(edit_label, batched=True)
# valid_dataset_ = valid_dataset_.map(edit_label, batched=True)
# test_dataset_ = test_dataset_.map(edit_label, batched=True)

# df = train_dataset_.to_pandas()
# # Giả sử cột label trong dataset có tên là 'label'
# label_counts = df['label'].value_counts()

# # Hiển thị kết quả
# print(label_counts)

# Hiển thị đồ thị
# label_counts.plot(kind='bar')
# plt.xlabel('Class')
# plt.ylabel('Số lượng')
# plt.title('Số lượng mỗi class trong cột label')
# plt.show()

dataset = load_from_disk(r'D:\1.Project\3.Machine_Learning\Voice\anomaly\data\dataset\superbs')
train_data = dataset['train']
valid_data = dataset['validation']
test_dataset = dataset['test']

encoded_data_train = train_data.map(preprocess_function, remove_columns='audio', batched=True)
encoded_data_validation = valid_data.map(preprocess_function, remove_columns='audio', batched=True)
encoded_test_validation = test_dataset.map(preprocess_function, remove_columns='audio', batched=True)
# train_data = train_data.map(edit_label_2)
# valid_data = valid_data.map(edit_label_2)

labels = dict_label.keys()
label2id, id2label = dict(), dict()
for i, label in enumerate(labels):
    label2id[label] = str(i)
    id2label[str(i)] = label
print(label2id)

# encoded_data_train = train_data.map(preprocess_function, remove_columns="audio", batched=True)
# encoded_data_validation = valid_data.map(preprocess_function, remove_columns="audio", batched=True)
# encoded_test_validation = test_dataset.map(preprocess_function, remove_columns="audio", batched=True)
print(encoded_data_train)
print(encoded_data_validation)
print(encoded_test_validation)

num_labels = len(id2label)
model = AutoModelForAudioClassification.from_pretrained(
    # "./model_1", num_labels=num_labels, label2id=label2id, id2label=id2label
    "./model_1", num_labels=num_labels, label2id=label2id, id2label=id2label
)
training_args = TrainingArguments(
    output_dir="my_awesome_mind_model",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=3e-5,
    per_device_train_batch_size=32,
    gradient_accumulation_steps=4,
    per_device_eval_batch_size=32,
    num_train_epochs=5,
    warmup_ratio=0.1,
    logging_steps=10,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    push_to_hub=False,
    report_to='none',
    save_total_limit=3,
)
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=encoded_data_train.with_format("torch"),
    eval_dataset=encoded_data_validation.with_format("torch"),
    tokenizer=feature_extractor,
    compute_metrics=compute_metrics,
)
trainer.train()
# print(model)
eval_results = trainer.evaluate(eval_dataset=encoded_test_validation.with_format("torch"))

print(eval_results)
# print(trainer.log_metrics("eval", eval_results))
