<a href="https://colab.research.google.com/github/sahug/ds-bert/blob/main/BERT%20NLP%20-%20IMDB%20-%20Text%20Classification%20using%20BERT%20and%20Tensorflow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**BERT NLP - IMDB - Text Classification using BERT**

Here we will see how to fine-tune **DistilBERT** on the IMDb dataset to determine whether a movie review is positive or negative.

**Load Dataset**

In [1]:
import numpy as np

In [2]:
%pip install -qq datasets

In [3]:
from datasets import load_dataset
imdb = load_dataset("imdb")

Reusing dataset imdb (/root/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1)


  0%|          | 0/3 [00:00<?, ?it/s]

In [4]:
imdb

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['text', 'label'],
        num_rows: 50000
    })
})

In [5]:
imdb["train"][0]

{'label': 0,
 'text': 'I rented I AM CURIOUS-YELLOW from my video store because of all the controversy that surrounded it when it was first released in 1967. I also heard that at first it was seized by U.S. customs if it ever tried to enter this country, therefore being a fan of films considered "controversial" I really had to see this for myself.<br /><br />The plot is centered around a young Swedish drama student named Lena who wants to learn everything she can about life. In particular she wants to focus her attentions to making some sort of documentary on what the average Swede thought about certain political issues such as the Vietnam War and race issues in the United States. In between asking politicians and ordinary denizens of Stockholm about their opinions on politics, she has sex with her drama teacher, classmates, and married men.<br /><br />What kills me about I AM CURIOUS-YELLOW is that 40 years ago, this was considered pornographic. Really, the sex and nudity scenes are f

In [6]:
imdb["train"]["text"][0]

'I rented I AM CURIOUS-YELLOW from my video store because of all the controversy that surrounded it when it was first released in 1967. I also heard that at first it was seized by U.S. customs if it ever tried to enter this country, therefore being a fan of films considered "controversial" I really had to see this for myself.<br /><br />The plot is centered around a young Swedish drama student named Lena who wants to learn everything she can about life. In particular she wants to focus her attentions to making some sort of documentary on what the average Swede thought about certain political issues such as the Vietnam War and race issues in the United States. In between asking politicians and ordinary denizens of Stockholm about their opinions on politics, she has sex with her drama teacher, classmates, and married men.<br /><br />What kills me about I AM CURIOUS-YELLOW is that 40 years ago, this was considered pornographic. Really, the sex and nudity scenes are few and far between, ev

In [7]:
imdb["train"]["label"][0]

0

In [8]:
np.unique(imdb["train"]["label"])

array([0, 1])

**Look at Dataset**

In [9]:
from datasets import ClassLabel, Sequence
import random
import pandas as pd
from IPython.display import display, HTML

def show_random_elements(dataset, num_examples=10):
    assert num_examples <= len(
        dataset
    ), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset) - 1)
        while pick in picks:
            pick = random.randint(0, len(dataset) - 1)
        picks.append(pick)

    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
        elif isinstance(typ, Sequence) and isinstance(typ.feature, ClassLabel):
            df[column] = df[column].transform(
                lambda x: [typ.feature.names[i] for i in x]
            )
    display(HTML(df.to_html()))

In [10]:
show_random_elements(imdb["train"])

Unnamed: 0,text,label
0,"The film attempts to be a mockumentary--shot in the documentary form but with many obviously scripted parts--but fails in not providing the audience with any characters with which to create the illusion of the mockumentary. Also, the film purports to be about finding real love in Los Angeles, but is nothing more than an uninteresting look at an amateur filmmaker trying to make his first ""big movie.""",neg
1,"Emily Watson's Natalia is absolutely the most loving and romantic lead character I have ever seen on a screen. She is the queen of this film beyond all doubt. Or, is she transmuted to the king? The internecine weaving of the chess games and the families' struggles for control, power, and victory is stunning. Just as the chess masters in the film do, the director is playing many simultaneous games with our mind at once, but all weave into either major or minor patterns. The period, the costumes, and imagery of early 20th century Italy's lake district is captured magnificently. Not a single square of space is wasted.<br /><br />So many brilliant scenes abound, I cannot recount them all. I recommend budgeting enough time to watch this movie twice, possibly a week apart, because you can't possibly capture all the poetry within a 64-square yet multi-dimensional framework in one setting. <br /><br />I did not read Nabakov's book, but to try an analogy of my own, what I am reading reminds of me of another romantically triumphant poetry-as-game movie, Barry Levinson's The Natural. It totally jettisoned the downbeat ending of Bernard Malamud's fatalistic book in favor of a romantic impressionism that was uniquely American. Well, the director did that one better by seamlessly meshing Russian and Italian morals and mores as a backdrop to enlightenment. The true story here is that games are zero-sum; there is a winner and a loser, unless both contestants draw. But, in life, and especially in the context of our immortal souls, we are only limited by those constraints and life's conventions to the extent we let others break our spirit. <br /><br />Pure love, as personified by Emily Watson's Natalia, can transcend and allow all of us to be enhanced by its gifts simultaneously. Only the barriers erected by our fears can cut us off from it.<br /><br />This is a magnificent movie (10/10).",pos
2,"On the day of the California Presidential Primary, between midnight and 1:00 AM, the spy Victor Rovner sends a message from Kuala Lumpur to USA. Meanwhile, in Los Angeles, the Federal Agent Jack Bauer has returned to his family and is having trouble at home with his teenage daughter Kimberly, who blames her mother Teri for putting Jack out of the house. Teri and Jack decide to have a serious conversation with Kim, and they discover that the girl has run away home. While trying to solve his domestic problem, Jack is called to his Counter Terrorist Unit by his colleague Nina Myers for a meeting with their chief Richard Walsh, who discloses a menace against the life of Senator David Palmer, who is running for president, and they need to find the shooter. Later, Walsh has a private conversation with Jack and tells that there is a conspiracy in the agency against David Palmer, and assigns Jack to find the conspirators. When an airplane explodes over the Mojave Desert, Jack has one additional issue to worry about.<br /><br />The first episode of ""24"" is a promising beginning of a successful series, introducing Jack Bauer. This is the first time that I have watched this show and I confess that I liked what I have seen: a complex and dramatic story, with multiple and realistic characters. Kiefer Sutherland is perfect in the role of a family man and a reliable agent in charge of three difficult missions at the same time: find a killer to protect an important politician; find a traitor in his agency; and find his teenage daughter, who is getting in trouble, while trying to save his marriage. My vote is eight.<br /><br />Title (Brazil): ""12:00""",pos
3,"I read several mixed reviews and several of them downright trashed the movie. I originally became interested in this project because it was being directed by Tony Scott and I have become very interested in his work after Man On Fire had such a profound impact on me. Before I start my review, let me first say this...it's wonderful to see that this movie could have been told in a boring and ordinary manner, yet the writers and Scott chose a different approach.<br /><br />Plot:<br /><br />Simply stated, it's not boring. Most Hollywood movies give 'tried and true' plots that they know will connect with people, often ensuring the audiences acceptance of the film and creating a higher probability of profit. This plot was one of the more interesting ones I had seen in a while. Just for reference, I recently watched 'The Weather Man' and 'Lord of War' and while I will say that these movies are excellent, and I enjoyed them both tremendously, both the plots in these movies are boring and they are told exactly how you would expect them to be told. They don't take any chances whatsoever, and they are extremely predictable after you've watched a fair amount of American films. Domino's plot is both interesting and told in a manner that keeps you thinking, ""oh man, they're screwed now"". And I feel that has been lacking in a lot of recent films. It has a lot of depth to it, in my opinion, and gives you plenty of things to question while watching it. Overall, this is what kept me so interested in the movie.<br /><br />Characters:<br /><br />I felt that the characters were accurate. Knightley did a wonderful job of portraying a beautiful woman, who was anything but on the inside and wanted to be viewed as what she was. It was obvious that she wanted to prove herself and she took whatever means she had to accomplish that.<br /><br />Choco was also very believable, his use of Spanish in inappropriate situations, his reactions to Domino's lack of affection, as well as his jealousy issues within the team...they all rang true to me, which made me feel that his character was that much more realistic.<br /><br />Rourke's character was the least interesting to me, but it still rang true to me. He seemed like an ordinary guy, trying to make ends meet. I hope that's what the filmmakers were trying to accomplish with him because that's what I got out of it. He did a very good job of showing Ed in an Average Joe kind of way that has made his mistakes, yet is still trying to live.<br /><br />Claremont/Ladies: I believe that they provided much needed 'heart' to the story. They weren't just people who are out getting money to buy a Bentley, these were real people who had a real problem and they sought others mean to accomplish that goal. You could empathize with them because, to them, this child's illness was a problem with no other solution. These characters were supposed to show real people who are less fortunate who got into this mess because they needed help.<br /><br />The mobsters: They made the story seem sinister in a way that only the mob can. And I really liked that part. They also padded the story with small intricacies that made the plot that much more interesting.<br /><br />Christopher Walken/90210 guys:<br /><br />They provided the comic relief in an otherwise very serious movie. From Walken's awkward statements to the ceaseless references to the 90210 guys being has-beens. Their involvement in the movie only made it that much more enjoyable.<br /><br />Cinematography....yes....the cinematography. This is where this movie seems to have lost a lot of potential fans. But in my opinion I thought it was genius, the use of the camera to translate the mood of the current situation was extremely effective in my opinion. I considered it a method that was properly realized but could always use improvement, just like anything else. I both applaud and congratulate Scott, the editor, the cinematographer and the director of photography on taking some real chances with this movie. Not only did they go far and above with its presentation, they went that much further. The use of colors, both extremely light and extremely dark provided the 'look' of the film with a sinister and grungy look that accurately depicts the life of the mob, bounty hunters and the less fortunate in a manner that show that their life isn't as peachy or 'clean' as everyone else. If you notice, in times of less stress or conflict, there were very few camera tricks if any at all. This shows that Scott and his crew were trying to achieve something with this look and weren't just doing it for the heck of it. I realize that most people who watched this movie weren't expecting it and it cause many of them to be turned off to this film but I think it was great that Scott took this approach. Hollywood films have grown predictable and bland. Most of them are shot in the same manner with the same twists and turns. And I'm glad that Scott tried to make something different.<br /><br />Granted, this movie isn't for everyone, but to say it's trash and has nothing to offer is completely missing the point. I thoroughly enjoyed this film and I'm glad that I spent the money for it. I would recommend this to all, but I'm sure it will only hit a chord with few. I must agree with an earlier poster when he said that many of those who refuse to see outside the 'sphere of MTV' won't appreciate this movie, but I think many people will. We should all try to enjoy it for the fact that Scott and co. took some chances and tried to deliver something that was different and unique. And with that in mind, I think he succeeded tremendously.",pos
4,"Without question, the worst ELVIS film ever made. The movie portrays all Indians as drunk, stupid, and lazy. Watch ELVIS's skin change color throughout the film.",neg
5,"I caught Evening in the cinema with a lady friend. Evening is a chick flick with no apologies for being such, but I can say with some relief that it's not so infused with estrogen that it's painful for a red-blooded male to watch. Except for a single instance at the very end of the movie, I watched with interest and did not have to turn away or roll my eyes at any self-indulgent melodrama. Ladies, for their part, will absolutely love this movie.<br /><br />Ann Lord is elderly, bed-ridden and spending her last few days on Earth as comfortably as possible in her own home with her two grown daughters at her side. Discomfited by the memories of her past, Ann suddenly calls out a man's name her daughters have never heard before: Harris. While both of her daughters silently contemplate the significance of their mother's strong urge to recall and redress her ill-fated affair with this mysterious man at this of all times, Ann lapses back in her head to the fateful day she met Harris - and in doing so, lost the youthful optimism for the future that we all inevitably part ways with.<br /><br />Both Ann and her two daughters - one married with children, one a serial ""commitophobe"" - struggle with the central question of whether true love really exists, and perhaps more importantly, if true love can endure the test of time. Are we all one day fated to realize that love never lasts forever? Will we all realize that settling for the imperfect is the only realistic outcome? The subtle fact that the aged Ann is still wrestling with an answer to these questions on her deathbed is not lost on her two daughters.<br /><br />The cinematography for Evening is interesting - most of the film is spent in Ann's mind as she recalls the past, and for that reason I think the film was shot as if it was all deliberately overexposed, to give everyone an ethereal glow (and thus make it very obvious that all of this is not real, but occurred in the past). Claire Danes is beautiful (appearing to be really, really tall, though just 5' 5"" in reality), and is absolutely captivating in one climactic scene where her singing talents are finally put to the test.<br /><br />You can't really talk trash about the cast, which leads off with Claire Danes and doesn't let up from there: Vanessa Redgrave, Patrick Wilson, Meryl Streep and Glenn Close fill out the other major and minor roles in the film.<br /><br />I can't really say anything negative about this film at all, though Hugh Dancy's struggle to have his character emerge from utter one-dimensionality is in the end a total loss. Playing the spoiled, lovable drunk offspring of the obscenely rich who puts up a front of great bravado but is secretly scared stiff of never amounting to anything probably doesn't offer much in the way of character exploration - he had his orders and stuck to them.<br /><br />In the end, gentlemen, your lady friend will most certainly weep, and while you'll likely not feel nearly as affected, the evening will definitely not be a waste for the time spent watching Evening. Catch it in theatres or grab it as a rental to trade off for points for when you want to be accompanied to a viewing of Die Hard 4 or the upcoming Rambo flick. It'll be your little secret that this viewing didn't really cost you much at all.",pos
6,"Typical 90's comedy, situational comedy similar to our modern day ""My Family"". Thatcher being the height of most political jokes, Bill (Belinda Lang) blames Thatcher for anything she can. ""Bloody Thatcher"" possibly shared with most of us. David the typical teenager, cutting up brains with bread knives, Jenny, the moody older teenage child, only interested in boys and more boys. Bill and Ben working as much as they can to keep their family afloat struggling within the economical climate of the early 90's. Granted the first two series were not as successful as the latter however, series 3 onwards is where it all kicks off with more laughs that i care to count. overall this show didn't get the best viewing times and they ought to have held on a bit longer. clearly they couldn't have carried on after Gary Olsen died but i think they should get rid of ""catherine Tate"" ""the office"" ""little Britain"" and bring back the classics!",pos
7,"Movie industry is tricky business - because decisions have to be made and everyone involved has a private life, too. That's the very original thesis of this feeble attempt at making an 'insightful' film about film. And indeed, no better proof of the industry's trickiness than seeing Anouk Aimée and Maximilian Schell trapped in this inanity. The insight consists of talking heads rattle off bullshit like ""should I make a studio movie that pays a lot or should I make an indie item and stay true to my artistic self?"" ""Do the latter, please."" Or: ""our relationship is not only professional, it's private as well. It's a rather complex situation to handle, isn't it?"" ""Yes, it is, my dear."" Between the insipid dialogs one gets glimpses of palm trees, hotel lobbies and American movie posters (no sign of non-American film presence on the Croisette). Recurrent slumber sessions are inevitable, making the 100 minutes of the film feel like ages. Jenny Gabrielle is spectacularly unconvincing in justifying her own presence in the frame.",neg
8,"Part Two picks up... not where the last film left off. As part of the quasi-conventionality of Steven Soderbergh's epic 4+ hour event, Che's two stories are told as classic ""Rise"" and ""Fall"" scenarios. In Part Two, Che Guevara, leaving his post as a bureaucrat in Cuba and after a failed attempt in the Congo (only in passing mentioned in the film), goes down to Bolivia to try and start up another through-the-jungle style revolution. Things don't go quite as well planned, at all, probably because of Che's then notorious stature as a Communist and revolutionary, and in part because of America's involvement on the side of the Bolivian Government, and, of course, that Castro wasn't really around as a back-up for Che.<br /><br />As it goes, the second part of Che is sadder, but in some ways wiser than the first part. Which makes sense, as Guevara has to endure low morale from his men, betrayals from those around him, constant mistakes by grunts and nearby peasants, and by ultimately the enclosing, larger military force. But what's sadder still is that Guevara, no matter what, won't give in. One may see this as an incredible strength or a fatal flaw- maybe both- but it's also clear how one starts to see Che, if not totally more fully rounded, then as something of a more sympathetic character. True, he did kill, and executed, and felt justified all the way. And yet it starts to work on the viewer in the sense of a primal level of pity; the sequence where Guevara's health worsens without medicine, leading up to the shocking stabbing of a horse, marks as one of the most memorable and satisfying of any film this year.<br /><br />Again, Soderbergh's command of narrative is strong, if, on occasion, slightly sluggish (understandable due to the big running time), and one or two scenes just feel totally odd (Matt Damon?), but these are minor liabilities. Going this time for the straight color camera approach, this is almost like a pure militia-style war picture, told with a great deal of care for the men in the group, as well as Guevara as the Lord-over this group, and how things dwindle down the final scene. And as always, Del-Toro is at the top of his game, in every scene, every beat knowing this guy so well- for better and for worse- that he comes about as close to embodiment as possible. Overall, the two parts of Che make up an impressive package: history as drama in compelling style, good for an audience even if they don't know Che or, better, if they don't think highly of him. It's that special. 8.5/10",pos
9,For me this is a story that starts with some funny jokes regarding Franks fanatasies when he is travelling with a staircase and when he is sitting in business meetings... The problem is that when you have been watching this movie for an hour you will see the same fantasies/funny situations again and again and again. It is to predictable. It is more done as a TV story where you can go away and come back without missing anything.<br /><br />I like Felix Herngren as Frank but that is not enough even when it is a comedy it has to have more variations and some kind of message to it's audience....<br /><br />,neg


**Train and Test Set**

In [11]:
x_train, y_train = imdb["train"]["text"], imdb["train"]["label"]
x_test, y_test = imdb["test"]["text"], imdb["test"]["label"]

**Preprocess**

In [12]:
%pip install -qq tensorflow_hub
%pip install -qq tensorflow_text

In [13]:
import tensorflow_hub as hub
import tensorflow_text as text

In [14]:
preprocessor = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3")
def pre_processor(example):
  return preprocessor(example)

pre_processor([imdb["train"]["text"][0]])  

{'input_mask': <tf.Tensor: shape=(1, 128), dtype=int32, numpy=
 array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]],
       dtype=int32)>,
 'input_type_ids': <tf.Tensor: shape=(1, 128), dtype=int32, numpy=
 array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 

In [15]:
encoder = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4")
def encode_input(preprocessed_text):
  return encoder(preprocessed_text)

encoder(pre_processor([imdb["train"]["text"][0]]))  

{'default': <tf.Tensor: shape=(1, 768), dtype=float32, numpy=
 array([[-0.49452856, -0.49791604, -0.9572977 ,  0.34986803,  0.8465855 ,
         -0.13578144, -0.5889829 ,  0.41979802, -0.83022875, -0.99984914,
         -0.64828956,  0.9143599 ,  0.96066296,  0.46246734,  0.5033797 ,
         -0.28232867,  0.43113717, -0.49907467,  0.3578105 ,  0.91972834,
          0.39635044,  0.9999954 , -0.3181754 ,  0.44896606,  0.26785532,
          0.9436459 , -0.55513763,  0.7681533 ,  0.7290908 ,  0.6581166 ,
          0.2403225 ,  0.27039534, -0.9718417 , -0.16731699, -0.98095185,
         -0.978594  ,  0.3959105 , -0.22986463,  0.03160511,  0.04872655,
         -0.59069663,  0.31907785,  0.9999714 , -0.6494494 ,  0.5693388 ,
         -0.21495499, -0.9999487 ,  0.27358246, -0.41509196,  0.88994277,
          0.866684  ,  0.97687024,  0.24196966,  0.38024858,  0.39471295,
         -0.6547417 , -0.15274712,  0.05646044, -0.27651197, -0.4769802 ,
         -0.53807133,  0.3107337 , -0.7833923 , -0

**Model**

In [16]:
import tensorflow as tf
from tensorflow import keras
from keras import activations

inputs = keras.layers.Input(shape=(), dtype=tf.string, name="inputs")
preprocess = pre_processor(inputs)
encode = encode_input(preprocess)

nn1 = keras.layers.Dropout(0.1, name="dropout")(encode["pooled_output"])
nn1 = keras.layers.Dense(1, activation=keras.activations.sigmoid, name="output")(nn1)

model = keras.Model(inputs=[inputs], outputs=[nn1])

model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 inputs (InputLayer)            [(None,)]            0           []                               
                                                                                                  
 keras_layer (KerasLayer)       {'input_word_ids':   0           ['inputs[0][0]']                 
                                (None, 128),                                                      
                                 'input_type_ids':                                                
                                (None, 128),                                                      
                                 'input_mask': (Non                                               
                                e, 128)}                                                      

In [17]:
#Compile Model
METRICS = [
           tf.keras.metrics.BinaryAccuracy(name="accuracy"),
           tf.keras.metrics.Precision(name="precision"),
           tf.keras.metrics.Recall(name="recall"),
]

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), 
              loss=tf.keras.losses.BinaryCrossentropy(),
              metrics=METRICS)

In [None]:
history = model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=2)

Epoch 1/2
  2/782 [..............................] - ETA: 3:03:29 - loss: 0.7197 - accuracy: 0.4844 - precision: 0.5278 - recall: 0.5429