<a href="https://colab.research.google.com/github/lakhanrajpatlolla/aiml-learning/blob/master/Lakhan_Hackathon2_Voice_E_commerce_Ordering_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint

In [None]:
#@title Explanation Video
from IPython.display import HTML

HTML("""<video width="854" and height="480" controls>
  <source src="https://cdn.iiith.talentsprint.com/aiml/Experiment_related_data/Walkthrough/Hackathon_Voice_based.mp4" type="video/mp4">
</video>
""")

# Hackathon: Voice commands based E-commerce ordering system
The goal of the hackathon is to train your model on different types of voice data (such as studio data and your own team data) and able to place order based on user preferences.

## Grading = 40 Marks

### **Objectives:**

Stage 0 - Obtain Features from Audio samples

Stage 1 (22 Marks) - Define and train a CNN model on Studio data and deploy the model in the server

Stage 2 (18 Marks) - Collect your voice samples (team data) and refine the classifier trained on Studio_data. Deploy the model in the server.

## Dataset Description

The data contains voice samples of classes - Zero, One, Two, Three, Four, Five. Each class is denoted by a numerical label from 0 to 5.

The audio files collected in a Studio dataset contain very few noise samples and all the files are in wav format.

The audio files recorded for the studio are saved with the following naming convention:

● Class Representation + user_id + sample_ID (or noise + sample_ID)

> For example: The voice sample by the user b2 recorded “Zero”, it is saved as 0_b2_35.wav. Here 35 is sample ID, 2 is the user id and ‘0’ is the label of that sample.




In [1]:
#@title Please run the setup to download the dataset

from IPython import get_ipython
ipython = get_ipython()

notebook= "Hackathon2 - Voice E-commerce Ordering System" #name of the notebook

def setup():
    ipython.magic("sx wget https://cdn.iiith.talentsprint.com/aiml/Hackathon_data/B17_studio_rev_data.zip")
    ipython.magic("sx unzip B17_studio_rev_data.zip ")
    print ("Setup completed successfully")

setup()

Setup completed successfully


In [2]:
import os
import sys
import glob
import torch
import librosa
import warnings
import numpy as np
import torch.nn as nn
from time import sleep
from torch import optim
import torch.nn.functional as F
from torch.autograd import Variable
warnings.filterwarnings('ignore')

## **Stage 0:** Obtain Features from Audio samples
---

### Generate features from an audio sample of '.wav' format
- Code is available to extract the features

In [3]:
# Caution: Do not change the default parameters
def get_features(filepath, sr=8000, n_mfcc=30, n_mels=128, frames = 15):
    # The following function contains code to produce features of the audio sample.
    y, sr = librosa.load(filepath, sr=sr)
    D = np.abs(librosa.stft(y))**2
    S = librosa.feature.melspectrogram(S=D)
    S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=n_mels)
    log_S = librosa.power_to_db(S,ref=np.max)
    features = librosa.feature.mfcc(S=log_S, n_mfcc=n_mfcc)
    if features.shape[1] < frames :
        features = np.hstack((features, np.zeros((n_mfcc, frames - features.shape[1]))))
    elif features.shape[1] > frames:
        features = features[:, :frames]

    # Find 1st order delta_mfcc
    delta1_mfcc = librosa.feature.delta(features, order=1)

    # Find 2nd order delta_mfcc
    delta2_mfcc = librosa.feature.delta(features, order=2)

    # Stacking delta_mfcc features in sequence horizontally (column wise)
    features = np.hstack((delta1_mfcc.flatten(), delta2_mfcc.flatten()))

    # Increase the dimension by inserting an axis along second dimension
    features = features.flatten()[:,np.newaxis]

    # Convert the numpy.ndarray to a Tensor object
    features = Variable(torch.from_numpy(features)).float()
    return features

All the voice samples needed for training are present in the folder `"studio_data"`

In [4]:
%ls

B17_studio_rev_data.zip  [0m[01;34msample_data[0m/  [01;34mstudio_data[0m/


##**Stage 1**:  Define and train a CNN model on Studio data and deploy the model in the server

---


### a) Extract features of Studio data (4 Marks)

 Load 'Studio data' and extract mfcc features

 **Evaluation Criteria:**

 * Complete the code in the load_data function
 * The function should take path of the folder containing audio samples as input
 * It should return features of all the audio samples present in the specified folder into single array (list of lists or 2-d numpy array) and their respective labels should be returned too

In [None]:
def load_data(folder_path):
    #YOUR CODE HERE
    features = []
    labels = []

    for filename in os.listdir(folder_path):
      if filename.endswith(".wav"):
        filepath = os.path.join(folder_path, filename)

        #Extract features using get_features function
        feature = get_features(filepath)
        #conver features to a list before appending to features
        features.append(feature.flatten().tolist())

        #Extract label from filename prefix
        #label = int(filename[0])
        label = int(filename.split("_")[0])
        labels.append(label)

    features = np.array(features)
    labels = np.array(labels)

    # Reshape features dynamically based on feature_dim
    feature_dim = features.shape[1]  # Get the feature dimension
    features = features.reshape(features.shape[0], feature_dim, 1)
    print("Shape of the features: ", features.shape)
    print("Shape of the labels: ", labels.shape)
    return features, labels

Load data from studio_data folder for extracting all features and labels

In [73]:
studio_recorded_features, studio_recorded_labels = load_data('/content/studio_data')

Shape of the features:  (3979, 900, 1)
Shape of the labels:  (3979,)


In [74]:
print("Shape of the features: ", studio_recorded_features.shape)
print("Shape of the labels: ", studio_recorded_labels.shape)

Shape of the features:  (3979, 900, 1)
Shape of the labels:  (3979,)


Use train_test_split for splitting the train and test data

In [75]:
from sklearn.model_selection import train_test_split
# YOUR CODE HERE
X_train, x_test, y_train, y_test = train_test_split(studio_recorded_features, studio_recorded_labels, test_size=0.2, random_state=42)

Load the dataset with DataLoader
- Refer to [torch.utils.data.TensorDataset](https://pytorch.org/docs/stable/data.html#torch.utils.data.TensorDataset)
- Refer to [torch.utils.data.DataLoader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)

In [76]:
# YOUR CODE HERE for the DataLoader
from torch.utils.data import TensorDataset, DataLoader

train_data = TensorDataset(torch.from_numpy(X_train).float(), torch.from_numpy(y_train))
test_data = TensorDataset(torch.from_numpy(x_test).float(), torch.from_numpy(y_test))

batch_size = 64
train_loader = DataLoader(train_data, shuffle=True, batch_size=batch_size)
test_loader = DataLoader(test_data, shuffle=True, batch_size=batch_size)


### b) Define your CNN architecture (4 Marks)

[Hint](https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html)

In [77]:
# @title Given Model Arch
## Define your CNN Architecture
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()

        # Sample Convolution Layer 1
        self.conv1 = nn.Conv1d(in_channels=900, out_channels=400, kernel_size=1)
        self.bn1 = nn.BatchNorm1d(400)
        self.relu1 = nn.ReLU()

        # Sample Maxpool for the Convolutional Layer 1
        self.maxpool1 = nn.MaxPool1d(1)

        # Sample Dropout Layer
        self.dropout = nn.Dropout(p=0.25)

        # YOUR CODE HERE for defining more number of Convolutional layers with Maxpool as required (Hint: Use at least 2 more convolutional layers for better performance)


        # YOUR CODE HERE for defining the Fully Connected Layer and also define LogSoftmax

    def forward(self, x):
        # Convolution Layer 1, Maxpool and Dropout
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu1(out)
        out = self.maxpool1(out)
        out = self.dropout(out)
        # YOUR CODE HERE for the Convolutional Layers and Maxpool based on the defined Convolutional layers

        # YOUR CODE HERE for flattening the output of the final pooling layer to a vector. Flattening is simply arranging the 3D volume of numbers into a 1D vector

        # YOUR CODE HERE for returning the output of LogSoftmax after applying Fully Connected Layer

In [78]:
import torch.nn as nn
import torch.nn.functional as F

class WavClassifier(nn.Module):
    def __init__(self, num_classes=6):
        super(WavClassifier, self).__init__()

        # Layer 1: Convolution, Batch Normalization, ReLU, Max Pooling, Dropout
        self.conv1 = nn.Conv1d(in_channels=900, out_channels=400, kernel_size=1, stride=1, padding = 0)  # Output channels: 400, Kernel size: 1
        self.bn1 = nn.BatchNorm1d(400)
        self.pool1 = nn.MaxPool1d(1)
        self.dropout1 = nn.Dropout(p=0.25)

        # Layer 2: Convolution, Batch Normalization, ReLU, Max Pooling, Dropout
        self.conv2 = nn.Conv1d(in_channels=400, out_channels=256, kernel_size=3, stride=1, padding = 1)
        self.bn2 = nn.BatchNorm1d(256)
        self.pool2 = nn.MaxPool1d(1)
        self.dropout2 = nn.Dropout(p=0.25)

        # Layer 3: Convolution, Batch Normalization, ReLU, Max Pooling, Dropout
        self.conv3 = nn.Conv1d(in_channels=256, out_channels=128, kernel_size=3, stride=1, padding = 1)
        self.bn3 = nn.BatchNorm1d(128)
        self.pool3 = nn.MaxPool1d(1)
        self.dropout3 = nn.Dropout(p=0.25)

        # Fully connected layer:
        self.fc = nn.Linear(128, num_classes)  # Adjust input size based on feature dimensions
        self.log_softmax = nn.LogSoftmax(dim=1)

    def forward(self, x):
        # Pass through convolutional layers
        x = self.pool1(F.relu(self.bn1(self.conv1(x))))
        x = self.dropout1(x)
        x = self.pool2(F.relu(self.bn2(self.conv2(x))))
        x = self.dropout2(x)
        x = self.pool3(F.relu(self.bn3(self.conv3(x))))
        x = self.dropout3(x)

        # Flatten and pass through fully connected layer
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        # Apply LogSoftmax
        x = self.log_softmax(x)
        return x

In [79]:
# To run the training on GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cpu


In [80]:
model = WavClassifier()
model = model.to(device)
print(model)

# #criterion = # YOUR CODE HERE : Explore and declare loss function
# #optimizer = # YOUR CODE HERE : Explore on the optimizer and define with the learning rate
# # Define loss function and optimizer
# criterion = nn.CrossEntropyLoss()
# optimizer = optim.Adam(model.parameters(), lr=0.001)  # Adjust learning rate as needed

WavClassifier(
  (conv1): Conv1d(900, 400, kernel_size=(1,), stride=(1,))
  (bn1): BatchNorm1d(400, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (pool1): MaxPool1d(kernel_size=1, stride=1, padding=0, dilation=1, ceil_mode=False)
  (dropout1): Dropout(p=0.25, inplace=False)
  (conv2): Conv1d(400, 256, kernel_size=(3,), stride=(1,), padding=(1,))
  (bn2): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (pool2): MaxPool1d(kernel_size=1, stride=1, padding=0, dilation=1, ceil_mode=False)
  (dropout2): Dropout(p=0.25, inplace=False)
  (conv3): Conv1d(256, 128, kernel_size=(3,), stride=(1,), padding=(1,))
  (bn3): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (pool3): MaxPool1d(kernel_size=1, stride=1, padding=0, dilation=1, ceil_mode=False)
  (dropout3): Dropout(p=0.25, inplace=False)
  (fc): Linear(in_features=128, out_features=6, bias=True)
  (log_softmax): LogSoftmax(dim=1)
)


### c) Train and classify on the studio_data (3 Marks)

The goal here is to train the Model on voice samples collected in studio data and validate it continuously to calculate the loss and accuracy for the train dataset across each epoch.

Iterate over images in the train_loader and perform the following steps.

1. First, zero out the gradients using zero_grad()

2. Pass the data to the model. Convert the data to GPU before passing data  to the model

3. Calculate the loss using a Loss function

4. Perform Backward pass using backward() to update the weights

5. Optimize and predict by using the torch.max()

6. Calculate the accuracy of the train dataset


In [81]:
# @title Train Code Draft
# YOUR CODE HERE. This will take time

# Record loss and accuracy of the train dataset
# Training loop
num_epochs = 10  # Adjust the number of epochs as needed
for epoch in range(num_epochs):
    model.train()  # Set the model to training mode
    running_loss = 0.0
    correct_predictions = 0
    total_samples = 0

    for inputs, labels in train_loader:
        inputs = inputs.to(device)  # Add channel dimension and move to device
        labels = labels.to(device)

        optimizer.zero_grad()  # Zero the gradients

        outputs = model(inputs)  # Forward pass
        loss = criterion(outputs, labels)  # Calculate loss
        loss.backward()  # Backward pass
        optimizer.step()  # Update weights

        # Calculate accuracy
        _, predicted = torch.max(outputs, 1)
        total_samples += labels.size(0)
        correct_predictions += (predicted == labels).sum().item()

        running_loss += loss.item()

    epoch_loss = running_loss / len(train_loader)
    epoch_accuracy = 100 * correct_predictions / total_samples

    print(f"Epoch {epoch+1}/{num_epochs}, Loss: {epoch_loss:.4f}, Accuracy: {epoch_accuracy:.2f}%")

Epoch 1/10, Loss: 1.3485, Accuracy: 48.32%
Epoch 2/10, Loss: 0.8167, Accuracy: 71.91%
Epoch 3/10, Loss: 0.6504, Accuracy: 77.79%
Epoch 4/10, Loss: 0.5416, Accuracy: 81.34%
Epoch 5/10, Loss: 0.4635, Accuracy: 84.20%
Epoch 6/10, Loss: 0.4345, Accuracy: 84.35%
Epoch 7/10, Loss: 0.3984, Accuracy: 86.11%
Epoch 8/10, Loss: 0.3578, Accuracy: 87.59%
Epoch 9/10, Loss: 0.3083, Accuracy: 89.26%
Epoch 10/10, Loss: 0.2857, Accuracy: 90.67%


In [101]:
import torch
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

def train_model(model, train_loader, num_epochs=10, learning_rate=0.001, device= device):
    """
    Trains a PyTorch model.

    Args:
        model: The PyTorch model to train.
        train_loader: Train loader
        num_epochs: Number of training epochs.
        batch_size: Batch size for training.
        learning_rate: Learning rate for the optimizer.
        device: Device to use for training ("cuda" or "cpu").

    Returns:
        The trained model.
    """

    # Move model to the specified device
    model.to(device)

    # Define loss function and optimizer
    criterion = torch.nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    # Training loop
    for epoch in range(num_epochs):
        model.train()  # Set the model to training mode
        running_loss = 0.0
        correct_predictions = 0
        total_samples = 0

        for inputs, labels in train_loader:
            inputs, labels = inputs.to(device), labels.to(device)

            optimizer.zero_grad()  # Zero the gradients

            outputs = model(inputs)  # Forward pass
            loss = criterion(outputs, labels)  # Calculate loss
            loss.backward()  # Backward pass
            optimizer.step()  # Update weights

            # Calculate accuracy
            _, predicted = torch.max(outputs, 1)
            total_samples += labels.size(0)
            correct_predictions += (predicted == labels).sum().item()

            running_loss += loss.item()

        epoch_loss = running_loss / len(train_loader)
        epoch_accuracy = 100 * correct_predictions / total_samples

        print(f"Epoch {epoch + 1}/{num_epochs}, Loss: {epoch_loss:.4f}, Accuracy: {epoch_accuracy:.2f}%")

    return model  # Return the trained model

In [102]:
studio_trained_model = train_model(model, train_loader, num_epochs=10, learning_rate=0.001, device= device)

Epoch 1/10, Loss: 0.2055, Accuracy: 92.99%
Epoch 2/10, Loss: 0.1693, Accuracy: 94.53%
Epoch 3/10, Loss: 0.1566, Accuracy: 94.63%
Epoch 4/10, Loss: 0.1505, Accuracy: 95.16%
Epoch 5/10, Loss: 0.1476, Accuracy: 94.72%
Epoch 6/10, Loss: 0.1391, Accuracy: 95.22%
Epoch 7/10, Loss: 0.1297, Accuracy: 95.66%
Epoch 8/10, Loss: 0.1066, Accuracy: 96.73%
Epoch 9/10, Loss: 0.1139, Accuracy: 96.01%
Epoch 10/10, Loss: 0.1086, Accuracy: 96.01%


### d) Testing Evaluation for CNN model (3 Marks)

Evaluate model with the given test data

1. Transform and load the test images.

2. Pass the test data through the model (network) to get the outputs

3. Get the predictions from a maximum value using torch.max

4. Compare with the actual labels and get the count of the correct labels

5. Calculate the accuracy based on the count of correct labels

### **Expected testing accuracy is above 80%**

In [85]:
# YOUR CODE HERE to test the model
def evaluate_model(model, test_loader, device):
    model.eval()  # Set the model to evaluation mode
    correct_predictions = 0
    total_samples = 0

    with torch.no_grad():  # Disable gradient calculations during evaluation
        for inputs, labels in test_loader:
            inputs, labels = inputs.to(device), labels.to(device)

            outputs = model(inputs)
            _, predicted = torch.max(outputs, 1)

            total_samples += labels.size(0)
            correct_predictions += (predicted == labels).sum().item()

    accuracy = 100 * correct_predictions / total_samples
    print(f"Test Accuracy: {accuracy:.2f}%")
    return accuracy



Test Accuracy: 82.91%


In [103]:
# Call the evaluation function
studio_test_accuracy = evaluate_model(studio_trained_model, test_loader, device)

Test Accuracy: 83.67%


### e) Save and download your model (2 Marks)

**Save your model trained on studio data**

* Save the state dictionary of the classifier (use pytorch only), It will be useful in
integrating model to the web application

 [Hint](https://pytorch.org/tutorials/beginner/saving_loading_models.html)

In [104]:
### YOUR CODE HERE for saving the CNN model
torch.save(studio_trained_model.state_dict(), 'wav_classifier_model_studio_v1.pth')

# Load the saved state dictionary
#model.load_state_dict(torch.load('wav_classifier_model.pth'))

Download your trained model using the code below
* Give the path of model file to download through the browser

In [105]:
from google.colab import files
files.download('/content/wav_classifier_model_studio_v1.pth')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### f) Deploy and evaluate your model trained on Studio Data in the server (6 Marks).

(This can be done on the day of the Hackathon once the login username and password provided by the mentors in the lab)

Deploy your model on the server, check the hackathon document (2-Server Access and File transfer For Voice based e-commerce ordering.pdf) for details.

To order product in user interface, go through the document (3-Hackathon_II Application Interface Documentation.pdf) for details.


**Evaluation Criteria: Four consecutive utterances should be predicted correctly by the model**

- There are two stages in the e-commerce ordering application    
    - Ordering Product
    - Selecting the e-commerce platform
- If both the stages are cleared as per the evaluation criteria you will get
complete marks Otherwise, you will see a reduction in the marks

## **Stage 2:** Collect your voice samples and refine the classifier trained on studio_data and Team_data
---

### a) Collect your Team Voice Samples and extract features (6 Marks)

(This can be done on the day of the Hackathon once the login username and password is given by mentors in the lab)

* In order to collect the team data, ensure the server is active (2-Server Access and File transfer For Voice based e-commerce ordering.pdf)

* Refer document "3-Hackathon_II Application Interface Documentation.pdf" for collecting your team voice samples. These will get stored in your server

**Evaluation Criteria:**
* Load 'Team_data' and extract features
* Combine features of team data with the extracted features of studio data
* Split the combined features into train and test data
* Load the dataset with DataLoader

In [88]:
!mkdir team_data

In [94]:
# Replace <YOUR_GROUP_ID> with your Username given in the lab
!wget -r -A .wav https://aiml-sandbox1.talentsprint.com/audio_recorder/b24h3g12/team_data/ -nH --cut-dirs=100  -P ./team_data

--2025-03-14 11:04:46--  https://aiml-sandbox1.talentsprint.com/audio_recorder/b24h3g12/team_data/
Resolving aiml-sandbox1.talentsprint.com (aiml-sandbox1.talentsprint.com)... 139.162.203.12
Connecting to aiml-sandbox1.talentsprint.com (aiml-sandbox1.talentsprint.com)|139.162.203.12|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘./team_data/index.html.tmp’

index.html.tmp          [ <=>                ]   5.20K  --.-KB/s    in 0s      

2025-03-14 11:04:46 (711 MB/s) - ‘./team_data/index.html.tmp’ saved [5320]

Loading robots.txt; please ignore errors.
--2025-03-14 11:04:46--  https://aiml-sandbox1.talentsprint.com/robots.txt
Reusing existing connection to aiml-sandbox1.talentsprint.com:443.
HTTP request sent, awaiting response... 404 Not Found
2025-03-14 11:04:46 ERROR 404: Not Found.

Removing ./team_data/index.html.tmp since it should be rejected.

--2025-03-14 11:04:46--  https://aiml-sandbox1.talentsprint.com/audio_rec

In [95]:
# YOUR CODE HERE to Load data from teamdata folder for extracting all features and labels
team_data_recorded_features, team_data_recorded_labels = load_data('/content/team_data')

Shape of the features:  (39, 900, 1)
Shape of the labels:  (39,)


In [96]:
# Combine the features of all voice samples (studio_data and teamdata)
# YOUR CODE HERE
# 1. Combine features using numpy.concatenate
combined_features = np.concatenate((studio_recorded_features, team_data_recorded_features), axis=0)

# 2. Combine labels using numpy.concatenate
combined_labels = np.concatenate((studio_recorded_labels, team_data_recorded_labels), axis=0)

# Print shapes to verify
print("Combined features shape:", combined_features.shape)
print("Combined labels shape:", combined_labels.shape)

Combined features shape: (4018, 900, 1)
Combined labels shape: (4018,)


In [97]:
# YOUR CODE HERE to split the combined features into train and test data (Hint: Use train_test_split)
from sklearn.model_selection import train_test_split
# YOUR CODE HERE
X_train_combined, x_test_combined, y_train_combined, y_test_combined = train_test_split(combined_features, combined_labels, test_size=0.2, random_state=42)

In [98]:
# YOUR CODE HERE to load the dataset with DataLoader

train_data_combined = TensorDataset(torch.from_numpy(X_train_combined).float(), torch.from_numpy(y_train_combined))
test_data_combined = TensorDataset(torch.from_numpy(x_test_combined).float(), torch.from_numpy(y_test_combined))

batch_size = 64
train_loader_combined = DataLoader(train_data_combined, shuffle=True, batch_size=batch_size)
test_loader_combined = DataLoader(test_data_combined, shuffle=True, batch_size=batch_size)

### b) Classify and download the model (6 Marks)

The goal here is to train and test your model on all voice samples collected in studio and team data

**Evaluation Criteria:**
* Refine your classifier (if needed)
* Train your model on the extracted train data
* Test your model on the extracted test data
* Save and download the trained model

### **Expected testing accuracy is above 80%**

In [None]:
# YOUR CODE HERE for refining your classifier (if needed)

In [106]:
# YOUR CODE HERE to train your model

# Record loss and accuracy of the train dataset
combined_trained_model = train_model(model, train_loader_combined, num_epochs=10, learning_rate=0.001, device= device)

Epoch 1/10, Loss: 0.2585, Accuracy: 92.81%
Epoch 2/10, Loss: 0.1904, Accuracy: 93.40%
Epoch 3/10, Loss: 0.1777, Accuracy: 94.90%
Epoch 4/10, Loss: 0.1498, Accuracy: 95.30%
Epoch 5/10, Loss: 0.1730, Accuracy: 94.34%
Epoch 6/10, Loss: 0.1430, Accuracy: 95.49%
Epoch 7/10, Loss: 0.1650, Accuracy: 94.28%
Epoch 8/10, Loss: 0.1399, Accuracy: 95.58%
Epoch 9/10, Loss: 0.1188, Accuracy: 95.96%
Epoch 10/10, Loss: 0.1088, Accuracy: 96.58%


In [107]:
from math import comb
# YOUR CODE HERE to test your model
combined_test_accuracy = evaluate_model(combined_trained_model, test_loader_combined, device)

Test Accuracy: 92.29%


**Save your trained model**

* Save the state dictionary of the classifier (use pytorch only), It will be useful in
integrating model to the web application

 [Hint](https://pytorch.org/tutorials/beginner/saving_loading_models.html)

In [108]:
### YOUR CODE HERE for saving the CNN model
torch.save(combined_trained_model.state_dict(), 'wav_classifier_model_combined_v1.pth')

Download your trained model using the code below
* Give the path of model file to download through the browser

In [109]:
from google.colab import files
files.download('/content/wav_classifier_model_combined_v1.pth')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### c) Deploy and evaluate your model trained on Studio Data + Team Data in the server (6 Marks).

(This can be done on the day of the Hackathon once the login username and password provided by the mentors in the lab)

Deploy your model on the server, check the hackathon document (2-Server Access and File transfer For Voice based e-commerce ordering.pdf) for details.

To order product in user interface, go through the document (3-Hackathon_II Application Interface Documentation.pdf) for details.


**Evaluation Criteria: Four consecutive utterances should be predicted correctly by the model**

- There are two stages in the e-commerce ordering application    
    - Ordering Product
    - Selecting the e-commerce platform
- If both the stages are cleared as per the evaluation criteria you will get
complete marks Otherwise, you will see a reduction in the marks