# Project 3: Text Classification in PyTorch

## Instructions

* All the tasks that you need to complete in this project are either coding tasks (mentioned inside the code cells of the notebook with `#TODO` notations) or theoretical questions that you need to answer by editing the markdown question cells.
* **Please make sure you read the [Notes](#Important-Notes) section carefully before you start the project.**

## Introduction
This project deals with neural text classification using PyTorch. Text classification is the process of assigning tags or categories to text according to its content. It's one of the fundamental tasks in Natural Language Processing (NLP) with broad applications such as sentiment analysis, topic labeling, spam detection, and intent detection.

Text classification algorithms are at the heart of a variety of software systems that process text data at scale. Email software uses text classification to determine whether incoming mail is sent to the inbox or filtered into the spam folder. Discussion forums use text classification to determine whether comments should be flagged as inappropriate.

**_Example:_** A simple example of text classification would be Spam Classification. Consider the bunch of emails that you would receive in the your personal inbox if the email service provider did not have a spam filter algorithm. Because of the spam filter, spam emails get redirected to the Spam folder, while you receive only non-spam ("_ham_") emails in your inbox.

![](http://blog.yhat.com/static/img/spam-filter.png)

## Task
Here, we want you to focus on a specific type of text classification task, "Document Classification into Topics". It can be addressed as classifying text data or even large documents into separate discrete topics/genres of interest.


![](https://miro.medium.com/max/700/1*YWEqFeKKKzDiNWy5UfrTsg.png)

In this project, you will be working on classifying given text data into discrete topics or genres. You are given a bunch of text data, each of which has a label attached. We ask you to learn why you think the contents of the documents have been given these labels based on their words. You need to create a neural classifier that is trained on this given information. Once you have a trained classifier, it should be able to predict the label for any new document or text data sample that is fed to it. The labels need not have any meaning to us, nor to you necessarily.

## Data
There are various datasets that we can use for this purpose. This tutorial shows how to use the text classification datasets in the PyTorch library ``torchtext``. There are different datasets in this library like `AG_NEWS`, `SogouNews`, `DBpedia`, and others. This project will deal with training a supervised learning algorithm for classification using one of these datasets. In task 1 of this project, we will work with the `AG_NEWS` dataset.

## Load Data

A bag of **ngrams** feature is applied to capture some partial information about the local word order. In practice, bi-grams or tri-grams are applied to provide more benefits as word groups than only one word.

**Example:**

*"I love Neural Networks"*
* **Bi-grams:** "I love", "love Neural", "Neural Networks"
* **Tri-grams:** "I love Neural", "love Neural Networks"

In the code below, we have loaded the `AG_NEWS` dataset from the ``torchtext.datasets.TextClassification`` package with bi-grams feature. The dataset supports the ngrams method. By setting ngrams to 2, the example text in the dataset will be a list of single words plus bi-grams string.

In [1]:
"""
Load the AG_NEWS dataset in bi-gram features format.
"""

import torch
import torchtext
from torchtext.datasets import text_classification
import os

NGRAMS = 2

if not os.path.isdir('./.data'):
    os.mkdir('./.data')

train_dataset, test_dataset = text_classification.DATASETS['AG_NEWS'](
    root='./.data', ngrams=NGRAMS, vocab=None)

BATCH_SIZE = 16

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

120000lines [00:17, 6936.55lines/s]
120000lines [00:30, 3963.45lines/s]
7600lines [00:01, 4166.89lines/s]


## Model

Our first simple model is composed of an [`EmbeddingBag`](https://pytorch.org/docs/stable/nn.html?highlight=embeddingbag#torch.nn.EmbeddingBag) layer and a linear layer.

``EmbeddingBag`` computes the mean value of a “bag” of embeddings. The text entries here have different lengths. ``EmbeddingBag`` requires no padding here since the text lengths are saved in offsets. Additionally, since ``EmbeddingBag`` accumulates the average across the embeddings on the fly, ``EmbeddingBag`` can enhance the performance and memory efficiency to process a sequence of tensors.

In [2]:
# TODO: Import the necessary libraries
import torch.nn as nn
import torch.nn.functional as F

# TODO: Create a class TextClassifier. Remember that this class will be your model.
class TextClassifier(nn.Module):
    # TODO: Define the __init__() method with proper parameters
    # (vocabulary size, dimensions of the embeddings, number of classes)
    def __init__(self, vocab_size, embed_dim, num_class):
        # TODO: define the embedding layer
        super().__init__()
        self.bag = nn.EmbeddingBag(vocab_size, embed_dim, sparse=True)

        # TODO: define the linear forward layer
        self.fc = nn.Linear(embed_dim, num_class)
        
        # TODO: Initialize weights
        self.init_weights()

    # TODO: Define a method to initialize weights.
    def init_weights(self):
        # The weights should be random in the range of -0.5 to 0.5.
        # You can initialize bias values as zero.
        self.bag.weight.data.uniform_(-0.5, 0.5)
        self.fc.weight.data.uniform_(-0.5, 0.5)
        self.fc.bias.data.zero_()
    
    # TODO: Define the forward function.
    def forward(self, text, offsets):
        # This should calculate the embeddings and return the linear layer
        # with calculated embedding values.
        return self.fc(self.bag(text, offsets))

## Check your data before you proceed!

Okay, so we know that we are using the `AG_NEWS` dataset in this project, but do you know what does the data contain? What is the format of the data? How many classes of data are there in this dataset? We do not know, yet. Let's find out!


## Question 1:
Create a new cell in this notebook and try to analyze the dataset that we loaded for you before. Report the following:
* Vocabulary size (VOCAB_SIZE)
* Number of classes (NUM_CLASS)
* Names of the classes


## Answer 1:

In [3]:
vocab_size = len(train_dataset.get_vocab())
print("Vocabulary Size    :", vocab_size)
classes = train_dataset.get_labels()

print("Classes            :", len(classes))
label = {1 : "World",
         2 : "Sports",
         3 : "Business",
         4 : "Sci/Tec"}

for i in classes:
    print("-", i, ":", label[i+1])

Vocabulary Size    : 1308844
Classes            : 4
- 0 : World
- 1 : Sports
- 2 : Business
- 3 : Sci/Tec


## Create an instance for your model

Great! You have successfully completed a basic analysis of the data that you are going to work with. The vocab size is equal to the length of vocab (including single word and ngrams). The number of classes is equal to the number of labels. Copy paste the code statements you used in your analysis to complete the code below. Also, using these parameters, create an instance `model` of your text classifier `TextClassifier`.

In [4]:
'''
Paramters and model instance creation.
'''

# TODO: Instantiate the Vocabulary size and the number of classes
# from the training dataset that we loaded for you.

# Hint: Remember that these are PyTorch datasets. So, there should be 
# readily available functions that you can use to save time. ;)

VOCAB_SIZE = len(train_dataset.get_vocab())
EMBED_DIM = 32
NUM_CLASS = len(classes)

# TODO: Instantiate the model with the parameters you defined above. 
# Remember to allocate it to your 'device' variable.

model = TextClassifier(VOCAB_SIZE, EMBED_DIM, NUM_CLASS).to(device)

## Generate batch

Since the text entries have different lengths, you need to create a custom function to generate data batches and offsets. This function should be passed to the ``collate_fn`` parameter in the ``DataLoader`` call of pyTorch which you will use to create the data later on. The input to ``collate_fn`` is a list of tensors with the size of batch_size, and the ``collate_fn`` function packs them into a mini-batch. Pay attention here and make sure that ``collate_fn`` is declared as a top level definition. This ensures that the function is available in each worker. This is the reason why you need to define this custom function first before you call DataLoader().

The text entries in the original data batch input are packed into a list and concatenated as a single tensor as the input of ``EmbeddingBag``. The offsets is a tensor of delimiters to represent the beginning index of the individual sequence in the text tensor. Label is a tensor saving the labels of individual text entries.

Finish the function definition below. The function should take batch as an input parameter. Each entry in the batch contains a pair of values of the text and the corresponding label.

In [5]:
# TODO: Finish the function definition.

def generate_batch(batch):
    
    label = torch.tensor([i[0] for i in batch])
    text = [i[1] for i in batch]
    offsets = [0] + [len(entry) for entry in text]

    offsets = torch.tensor(offsets[:-1]).cumsum(dim=0)
    text = torch.cat(text)
    
    return text, offsets, label

## Define the train function

Here, you need to define a function which you will use later on in the project to train your model. This is very similar to the training steps that you have encountered before in previous coding assignment(s). The outline of the function is something like this -

* load the data as batches
* iterate over the batches
* find the model output for a forward pass
* calculate the loss
* perform backpropagation on the loss (optimize it)
* find the training accuracy

In addition to this, you also need to find the total loss and total training accuracy values. Also, you need to return the average values of the total loss and total accuracy.

In [6]:
from torch.utils.data import DataLoader

def train(train_data):

    # Initial values of training loss and training accuracy
    
    train_loss = 0
    train_acc = 0

    # TODO: Use the PyTorch DataLoader class to load the data 
    # into shuffled batches of appropriate sizes into the variable 'data'.
    # Remember, this is the place where you need to generate batches.
    data = DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True, collate_fn=generate_batch)
    
    
    for i, (text, offsets, cls) in enumerate(data):
        
        # TODO: What do you need to do in order to perform backprop on the optimizer?
        optimizer.zero_grad()
        
        
        text, offsets, cls = text.to(device), offsets.to(device), cls.to(device)
        
        # TODO: Store the output of the model in variable 'output'
        output = model(text, offsets)
        
        
        # TODO: Define the 'loss' variable (with respect to 'output' and 'cls').
        # Also calculate the total loss in variable 'train_loss'
        loss = criterion(output, cls)
        train_loss += loss.item()
        
        # TODO: Perform the backward propagation on 'loss' and 
        # optimize it through the 'optimizer' step
        loss.backward()
        optimizer.step()
        
        
        # TODO: Calculate and store the total training accuracy
        # in the variable 'total_acc'.
        # Remember, you need to find the 
        train_acc += (output.argmax(1) == cls).sum().item()        

    # TODO: Adjust the learning rate here using the scheduler step
    scheduler.step()
    

    return train_loss/len(train_data), train_acc/len(train_data)

## Define the test function

Using the framework of the `train()` function in the previous cell, try to figure out the structure of the test function below.

In [7]:
def test(test_data):
    
    # Initial values of test loss and test accuracy
    
    test_loss = 0
    acc = 0
    
    # TODO: Use DataLoader class to load the data
    # into non-shuffled batches of appropriate sizes.
    # Remember, you need to generate batches here too.
    data = DataLoader(test_data, batch_size=BATCH_SIZE, collate_fn=generate_batch)
    
    
    for text, offsets, cls in data:
        
        text, offsets, cls = text.to(device), offsets.to(device), cls.to(device)
        
        # Hint: There is a 'hidden hint' here. Let's see if you can find it :)
        with torch.no_grad():
        
            
            # TODO: Get the model output
            output = model(text, offsets)
            
            
            # TODO: Calculate and add the loss to find total 'loss'
            loss = criterion(output, cls)
            test_loss += loss.item()
            
            
            # TODO: Calculate the accuracy and store it in the 'acc' variable
            acc += (output.argmax(1) == cls).sum().item()
            

    return test_loss / len(test_data), acc / len(test_data)

## Split the dataset and run the model

The original `AG_NEWS` has no validation dataset. For this reason, you need to split the training dataset into training and validation sets with a proper split ratio. The `random_split()` function in the torch.utils core PyTorch library should be able to help you with this. We have already imported it for you. :)

* Consider the initial learning rate as 4.0, number of epochs as 5, training data ratio as 0.9.
* You need to define and use a proper loss function
* Define an Optimization algorithm (Suggestion: SGD)
* Define a scheduler function to adjust the learning rate through epochs (gamma parameter = 0.9).
(Hint: Look at the `StepLR` function)
* Monitor the loss and accuracy values for both training and validation data sets.

In [8]:
import time
import matplotlib
from torch.utils.data.dataset import random_split

# TODO: Set the number of epochs and the learning rate to 
# their initial values here

N_EPOCHS = 5
LEARNING_RATE = 4
TRAIN_RATIO = 0.9

# TODO: Set the intial validation loss to positive infinity
validation_loss = float('inf')


# TODO: Use the appropriate loss function
criterion = torch.nn.CrossEntropyLoss().to(device)


# TODO: Use the appropriate optimization algorithm with parameters (Suggested: SGD)
optimizer = torch.optim.SGD(model.parameters(), lr=LEARNING_RATE)


# TODO: Use a scheduler function
# gamma parameter = 0.9
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 1, gamma=0.9)


# TODO: Split the data into train and validation sets using random_split()
# Assumed to be 80:20
train_n = int(len(train_dataset) * TRAIN_RATIO)
training_data, val_data = random_split(train_dataset, [train_n, len(train_dataset) - train_n])


# TODO: Finish the rest of the code below
for epoch in range(N_EPOCHS):

    start_time = time.time()
    train_loss, train_acc = train(training_data)
    valid_loss, valid_acc = test(val_data)

    secs = int(time.time() - start_time)
    mins = secs / 60
    secs = secs % 60

    print('Epoch: %d' %(epoch + 1), " | time in %d minutes, %d seconds" %(mins, secs))
    print(f'\tLoss: {train_loss:.4f}(train)\t|\tAcc: {train_acc * 100:.1f}%(train)')
    print(f'\tLoss: {valid_loss:.4f}(valid)\t|\tAcc: {valid_acc * 100:.1f}%(valid)')


Epoch: 1  | time in 1 minutes, 18 seconds
	Loss: 0.0265(train)	|	Acc: 84.4%(train)
	Loss: 0.0185(valid)	|	Acc: 90.1%(valid)
Epoch: 2  | time in 1 minutes, 16 seconds
	Loss: 0.0119(train)	|	Acc: 93.6%(train)
	Loss: 0.0185(valid)	|	Acc: 90.2%(valid)
Epoch: 3  | time in 1 minutes, 15 seconds
	Loss: 0.0069(train)	|	Acc: 96.4%(train)
	Loss: 0.0193(valid)	|	Acc: 90.6%(valid)
Epoch: 4  | time in 1 minutes, 15 seconds
	Loss: 0.0038(train)	|	Acc: 98.2%(train)
	Loss: 0.0193(valid)	|	Acc: 91.2%(valid)
Epoch: 5  | time in 1 minutes, 12 seconds
	Loss: 0.0022(train)	|	Acc: 99.1%(train)
	Loss: 0.0215(valid)	|	Acc: 91.1%(valid)


## Let's  check the test loss and test accuracy

So you have trained your model and seen how well it performs on the training and validation datasets. Now, you need to check your model's performance against the test dataset. Using the test dataset as input, report the test loss and test accuracy scores of your model.

In [9]:
# TODO: Compete the code below to find 
# the results (loss and accuracy) on the test data

print('Checking the results of test dataset...')
test_loss, test_acc = test(test_dataset)
print(f'\tLoss: {test_loss:.4f}(test)\t|\tAcc: {test_acc * 100:.1f}%(test)')

Checking the results of test dataset...
	Loss: 0.0324(test)	|	Acc: 87.2%(test)


In [10]:
# importing necessary libraries

import re
from torchtext.data.utils import ngrams_iterator
from torchtext.data.utils import get_tokenizer

# labels for the AG_NEWS dataset

ag_news_label = {1 : "World",
                 2 : "Sports",
                 3 : "Business",
                 4 : "Sci/Tec"}

def predict(text, model, vocab, ngrams):
    tokenizer = get_tokenizer("basic_english")
    with torch.no_grad():
        text = torch.tensor([vocab[token]
                            for token in ngrams_iterator(tokenizer(text), ngrams)])
        output = model(text, torch.tensor([0]))
        return output.argmax(1).item() + 1

ex_text_str = "MEMPHIS, Tenn. – Four days ago, Jon Rahm was \
    enduring the season’s worst weather conditions on Sunday at The \
    Open on his way to a closing 75 at Royal Portrush, which \
    considering the wind and the rain was a respectable showing. \
    Thursday’s first round at the WGC-FedEx St. Jude Invitational \
    was another story. With temperatures in the mid-80s and hardly any \
    wind, the Spaniard was 13 strokes better in a flawless round. \
    Thanks to his best putting performance on the PGA Tour, Rahm \
    finished with an 8-under 62 for a three-stroke lead, which \
    was even more impressive considering he’d never played the \
    front nine at TPC Southwind."

vocab = train_dataset.get_vocab()
model = model.to("cpu")

# TODO: Predict the topic of the above given random text (use bigrams)
# Use the proper paramters in the predict() function

print("This is a '%s' news" % ag_news_label[predict(ex_text_str, model, vocab, 2)])

# If you have done everything correctly in this task,
# then the output of this cell should be - "This is a 'Sports' news".

This is a 'Sports' news


# Congratulations! You just designed your first neural classifier!

And probably you have achieved a good accuracy score too. Great job!

## Question 2:
You just tested your model with a new sample text. Try to feed some more random examples of similar text (which you think are related to at least one of the four topics _"World", "Sports", "Business", "Sci/Tec"_ of our problem) to the model and see how your model reacts. Give at least 3 such examples (You are free to include more examples if you wish to).

## Answer 2:

In [11]:
# Ref: https://www.theguardian.com/uk-news/2020/jan/13/queen-gives-reluctant-blessing-to-harry-and-meghans-plans
pred_world = "Queen gives reluctant blessing to Harry and Meghan's plans. \
She agreed to a ‘period of transition’ and stressed the couple remain ‘a valued part of my family’. \
The Queen has given her reluctant blessing to the Duke and Duchess of Sussex to split their time between the \
UK and Canada, making it clear that though she had wanted the couple to remain as full-time working royals, she supported their decision.\
After a historic summit of senior royals at Sandringham, details over exactly how Harry and Meghan will \
carve out the new “progressive” roles they seek remained unclear. The Queen has, however, agreed to a \
“period of transition” and stressed the couple remain “a valued part of my family”.\
But there were “complex matters” still to resolve, and “more work to be done” as she\
said she wants final decisions to be reached in the coming days.\
The Queen’s statement came after 90 minutes of talks, which began\
against the backdrop of Prince William and Prince Harry attempting to stem \
rancorous speculation about their relationship in a joint statement."

print("This is a '%s' news" % ag_news_label[predict(pred_world, model, vocab, 2)])


This is a 'World' news


In [12]:
# Ref: https://www.theguardian.com/football/2020/jan/13/barcelona-quique-setien-to-replace-ernesto-valverde
pred_sport = "Barcelona appoint Quique Setién as head coach to replace Ernesto Valverde\
Barcelona have sacked manager Ernesto Valverde and replaced him with the former Real Betis coach \
Quique Setién. Valverde, who had been in charge since the summer of 2017, becomes the first manager \
at the club to be sacked mid-season since Louis van Gaal 17 years ago. He leaves with Barcelona through \
to the knockout phase of the Champions League as group winners and top of La Liga, which they have won for\
each of the last two years. He had six months left on his contract, plus the option for another year after \
that. Setién has been given a two-and-a-half year contract until June 2022.\
How Barcelona made a right mess of sacking Ernesto Valverde. \
After days of openly pursuing replacements, Valverde was finally informed of the club’s intentions \
on Monday evening. During that time Xavi Hernández and Ronald Koeman, both of whom had expressed their \
desire to coach the club in the future, turned down the opportunity to take over with immediate effect. \
Barcelona had contemplated a series of other names, including Thierry Henry and the former Tottenham manager \
Mauricio Pochettino. Almost a dozen managers had been connected to the club as rumours circulated. The impression was of a club not sure which way to turn."

print("This is a '%s' news" % ag_news_label[predict(pred_sport, model, vocab, 2)])

This is a 'Sports' news


In [13]:
# Ref: https://www.theguardian.com/technology/2020/jan/13/google-parent-company-alphabet-expected-reach-1-trillion-value
pred_tech = "Google parent company Alphabet expected to reach $1tn value soon.\
Alphabet may join Apple, Microsoft and Amazon when it reports latest earnings, another sign of the unstoppable rise of tech.\
Another tech behemoth is poised to join the club of Silicon Valley giants valued at more than $1tn. Alphabet, Google’s parent \
company, reached a value of $993bn on Monday, with analysts expecting it to cross the $1tn mark soon.\
Alphabet would join a select club of tech companies to pass $1tn in value. Apple became the first tech \
company to pass the benchmark in August 2018 and has since risen to be valued at $1.37tn.\
The iPhone company was followed by Microsoft, which passed $1tn in April 2019 and Amazon, \
which joined the club in September. The value of Microsoft has continued to rise but Amazon has slipped back and is now worth $940bn.\
The five most valuable companies in the US are now all tech companies, with Facebook rounding out the pack with a current market capitalization of $631bn."

print("This is a '%s' news" % ag_news_label[predict(pred_tech, model, vocab, 2)])

This is a 'Sci/Tec' news


## Question 3:
Okay, probably the model still works great with the examples you fed to it in the previous question. How about a twist in the plot? Let's feed it some more random text data from completely different genres/topics (not belonging to the 4 topics which we talk about the in the first question). How does your model react now? Give at least 3 such examples (You are free to include more examples if you wish to).

Of course the predictions will be limited to the four class labels that your model is trained on. Can you somehow justify the labels that your model predicted now for the given text inputs?

## Answer 3a:

In [14]:
# Topic: Music, Ref: https://www.theguardian.com/music/2020/jan/11/brit-award-nominations-2020-dave-lewis-capaldi
pred_music = "Brit award nominations 2020: Dave and Lewis Capaldi top pile, with women shut out.\
Only one British woman, Mabel, is nominated across the best album, song and new artist \
categories, which skew heavily in favour of solo men.\
The British music industry’s focus on the male solo artist at the expense of female \
musicians has been thrown into sharp relief by the nominations for the 2020 Brit awards, \
in which only one British woman – and no groups featuring women – was nominated across 25 slots in mixed-gender categories.\
Pop singer Mabel was nominated for best new artist and best song for Don’t Call Me Up – but the nine other song nominees and \
four other new artist nominees are solo British males. (US star Miley Cyrus guests on Mark Ronson’s nominated song Nothing Breaks Like a Heart.)"

print("This is a '%s' news" % ag_news_label[predict(pred_music, model, vocab, 2)])


This is a 'World' news


In [15]:
# Topic: Games, Ref: https://www.theguardian.com/games/2020/jan/09/transport-fever-2-review
pred_game = "Transport Fever 2 review – simple pleasures offer copious fuel for fun. \
There’s much joy to be had building freight networks and watching cities grow … but what about the real-world pitfalls?\
As Britain returns to a daily commute beset with fare hikes and failing rail companies, there is significant appeal to a \
game in which you make the trains run on time. In the same way The Sims allows thirtysomething millennials to experience \
the fantasy of home ownership, so Transport Fever 2 lets you enjoy the thrill of plonking a bullet-train between Brighton and London Victoria.\
The concept of the transport sim is nothing new. Video games have been offering virtual train sets since Sid Meier’s Railroad Tycoon, \
letting players enjoy locomotive logistics without requiring a shed to store all those model networks. But Transport Fever 2 goes \
way beyond laying railroads. Everything from planes to pontoons can be deployed to carry commuters and cargo to your chosen destinations."

print("This is a '%s' news" % ag_news_label[predict(pred_game, model, vocab, 2)])

This is a 'Sci/Tec' news


In [16]:
# Topic: Film, Ref: https://www.theguardian.com/film/2020/jan/13/oscars-2020-nominations-joker-irishman
pred_film = "Joker leads Oscars 2020 pack – but Academy just trumps Baftas for diversity.\
Less than a week since Bafta’s strikingly white and male awards shortlist met \
with widespread criticism – including from the organisation’s own chief executive \
– the Academy of Motion Picture Arts and Sciences has released a set of nominations \
whose small concessions to diversity seem striking by contrast.\
Cynthia Erivo is nominated for best actress for her role in a biopic of abolitionist Harriet Tubman, \
and Parasite – Bong Joon-ho’s acclaimed South Korean black comedy – is up for six awards, including best director and best picture.\
Little Women, Greta Gerwig’s so-far-overlooked take on the Louisa May Alcott classic, also scored six nominations, \
including best picture and best adapted screenplay – but Gerwig was locked out of the all-male best director shortlist."


print("This is a '%s' news" % ag_news_label[predict(pred_film, model, vocab, 2)])

This is a 'World' news


## Answer 3.b:
Of course the predictions will be limited to the four class labels that your model is trained on. Can you somehow justify the labels that your model predicted now for the given text inputs?


Answers are in the report

## Question 4:
Your model probably has achieved a good accuracy score. However, there may be lots of things that you could still try to do to improve your classifier model. Can you try to list down some improvements that you think would be able to improve the above model's performance?

_(Hint: Maybe think about alternate architectures, #layers, hyper-paramters, etc..., but try not to come up with too complex stuff! :) )_

## Answer 4:
Answers are in the report




# Important Notes

## NOTE 1:
If you want, you can try out the models on other datasets too for comparisons. Although this is not mandatory, it would be really interesting to see how your model performs for data from different domains maybe. Note that you may need to tweak the code a little bit when you are considering other datasets and formats. 

## NOTE 2:
Any form of plagiarism is strictly prohibited. If it is found that you have copied sample code from the internet, the entire team will be penalized.

## NOTE 3:
Often Jupyter Notebooks tend to stop working or crash due to overload of memory (lot of variables, big neural models, memory-intensive training of models, etc...). Moreover, with more number of tasks, the number of variables that you will be using will surely incerase. Therefore, it is recommended that you use separate notebooks for each _Task_ in this project.

## NOTE 4:
You are expected to write well-documented code, that is, with proper comments wherever you think is needed. Make sure you write a comprehensive report for the entire project consisting of data analysis, your model architecture, methods used, discussing and comparing the models against the accuracy and loss metrics, and a final conslusion. If you want to prepare separate reports for each _Task_, you could do this in the Jupyter Notebook itself using $Mardown$ and $\LaTeX$ code if needed. If you want to submit a single report for the entire project, you could submit a PDF file in that case (Word or $\LaTeX$).

All the very best for project 2. Wishing you happy holidays and a very happy new year in advance! :)