<a href="https://colab.research.google.com/github/gmdeorozco/NLP_Practice_Rep/blob/main/bert_sentiment_analysis%20/BERT_Sentiment_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install tensorflow_hub
!pip install tensorflow_text
!pip install tf-models-official

Collecting tensorflow_text
  Downloading tensorflow_text-2.16.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.2/5.2 MB[0m [31m15.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tensorflow<2.17,>=2.16.1 (from tensorflow_text)
  Downloading tensorflow-2.16.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (589.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m589.8/589.8 MB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
Collecting h5py>=3.10.0 (from tensorflow<2.17,>=2.16.1->tensorflow_text)
  Downloading h5py-3.11.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.3/5.3 MB[0m [31m74.8 MB/s[0m eta [36m0:00:00[0m
Collecting ml-dtypes~=0.3.1 (from tensorflow<2.17,>=2.16.1->tensorflow_text)
  Downloading ml_dtypes-0.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB)

#Classify text with BERT
###Learning Objectives


- Learn how to load a pre-trained BERT model from TensorFlow Hub
- Learn how to build your own model by combining with a classifier
- Learn how to train a BERT model by fine-tuning
- Learn how to save your trained model and use it
- Learn how to evaluate a text classification model

###Before you start
Please ensure you have a GPU (1 x NVIDIA Tesla T4 should be enough) attached to your Notebook instance to ensure that the training doesn't take too long.

##About BERT
BERT and other Transformer encoder architectures have been wildly successful on a variety of tasks in NLP (natural language processing). They compute vector-space representations of natural language that are suitable for use in deep learning models. The BERT family of models uses the Transformer encoder architecture to process each token of input text in the full context of all tokens before and after, hence the name: Bidirectional Encoder Representations from Transformers.

BERT models are usually pre-trained on a large corpus of text, then fine-tuned for specific tasks.





In [None]:
import tensorflow as tf
print("TensorFlow version:", tf.__version__)

TensorFlow version: 2.16.1


In [None]:
import os
import warnings

warnings.filterwarnings("ignore")
os.environ["TF_CPP_MIN_LOG"] = "2"

### Explanation of Code

The provided code is a Python script intended to configure the environment settings for TensorFlow in a Jupyter Notebook. Let's break down each line:

1. `import os`: This line imports the Python `os` module, which provides a way to interact with the operating system. It is commonly used for tasks such as file manipulation and environment variable access.

2. `import warnings`: This line imports the Python `warnings` module, which provides functions to issue warnings to the user. Warnings can be useful for alerting the user about potential issues or deprecated features in the code.

3. `warnings.filterwarnings("ignore")`: This line sets up a filter to ignore all warnings generated by Python during the execution of the code. By using `"ignore"` as the argument, it suppresses all warnings from being displayed to the user. This can be helpful to prevent the console or notebook output from being cluttered with warnings, especially when they are not critical.

4. `os.environ["TF_CPP_MIN_LOG"] = "2"`: This line sets an environment variable called `TF_CPP_MIN_LOG` to the value `"2"`. This variable is specific to TensorFlow and controls the TensorFlow logging level for C++-related messages. Setting it to `"2"` instructs TensorFlow to only log errors, suppressing any informational or warning messages.

Overall, this code snippet configures the environment settings to suppress warnings and control the logging behavior of TensorFlow, creating a cleaner output in the Jupyter Notebook environment.


In [None]:
import datetime
import shutil

import matplotlib.pyplot as plt
import tensorflow_hub as hub
import tensorflow_text as text
from google.cloud import aiplatform
from official.nlp import optimization

tf.get_logger().setLevel("ERROR")

### Explanation of Code

The provided code imports various libraries and modules commonly used in machine learning and deep learning tasks. Let's break down each line:

1. `import datetime`: This line imports the Python `datetime` module, which provides classes for manipulating dates and times.

2. `import shutil`: This line imports the Python `shutil` module, which provides functions for file operations, such as copying, moving, and deleting files and directories.

3. `import matplotlib.pyplot as plt`: This line imports the `pyplot` submodule from the `matplotlib` library, which is used for creating plots and visualizations in Python.

4. `import tensorflow_hub as hub`: This line imports the `tensorflow_hub` module, which is a library for reusable machine learning modules. It allows you to load and use pre-trained models from TensorFlow Hub.

5. `import tensorflow_text as text`: This line imports the `tensorflow_text` module, which provides utilities for working with text data in TensorFlow. It includes functions for tokenization, text preprocessing, and other NLP-related tasks.

6. `from google.cloud import aiplatform`: This line imports the `aiplatform` module from the `google.cloud` package, which provides access to Google Cloud's AI Platform services. This allows you to interact with AI Platform for training and deploying machine learning models.

7. `from official.nlp import optimization`: This line imports the `optimization` module from TensorFlow's official NLP (Natural Language Processing) repository. This module contains utilities for optimizing and fine-tuning machine learning models, particularly for NLP tasks.

8. `tf.get_logger().setLevel("ERROR")`: This line sets the logging level of TensorFlow to "ERROR", which means that only error messages will be displayed in the output. This can help suppress unnecessary logging messages and keep the output clean, especially in Jupyter Notebook environments.

Overall, this code snippet imports necessary libraries and sets up the environment for machine learning and deep learning tasks, including text processing and optimization, while configuring TensorFlow's logging level for a cleaner output.


In [None]:
print("Num GPUs Available: ", len(tf.config.list_physical_devices("GPU")))

Num GPUs Available:  1


###Sentiment Analysis

Sentiment Analysis
This notebook trains a sentiment analysis model to classify movie reviews as positive or negative, based on the text of the review.

You'll use the Large Movie Review Dataset that contains the text of 50,000 movie reviews from the Internet Movie Database.

##Download the IMDB dataset

In [None]:
url = 'https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz'

# Set a path to a folder outside the git repo. This is important so data won't get indexed by git on Jupyter lab
path = '/home/jupyter'

dataset = tf.keras.utils.get_file(
    "aclImdb_vi.tar.gz", url, untar = True, cache_dir = path, cache_subdir=''
)

dataset_dir = os.path.join(os.path.dirname(dataset),'aclImdb')

train_dir = os.path.join(dataset_dir,'train')

# remove unused folders to make it easier to load the data
remove_dir = os.path.join(train_dir, 'unsup')
shutil.rmtree(remove_dir)

Downloading data from https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
[1m84125825/84125825[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 0us/step
