<a href="https://colab.research.google.com/github/qmeng222/transformers-for-NLP/blob/main/Pipeline_Sentiment_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [8]:
# `!` operator tells the notebook cell that this line is not a Python code, its a command line script
!pip install transformers # install the Hugging Face Transformers library



In [9]:
# Twitter US Airline Sentiment: https://www.kaggle.com/crowdflower/twitter-airline-sentiment
# `wget` command is a utility for downloading files from the internet.
# `-nc` is an option for the `wget` command. -nc stands for "no-clobber." It instructs wget not to overwrite files that already exist in the current directory. If the file you're trying to download already exists, the -nc flag prevents it from being downloaded again.
!wget -nc https://lazyprogrammer.me/course_files/AirlineTweets.csv

File ‘AirlineTweets.csv’ already there; not retrieving.



In [10]:
# import the `pipeline` class from the HF Transformers library
# the `pipeline` class is a convenient way to perform NLP tasks using pre-trained models
from transformers import pipeline

import numpy as np # import the `numpy` library for numerical computations in Python
import pandas as pd # import the `pandas` library for data manipulation and analysis (in DataFrames)
import seaborn as sn #  import the `seaborn` library for data visualization

# import functions from the `sklearn.metrics` module within the `scikit-learn` library
# roc_auc_score (Receiver Operating Characteristic Area Under the Curve Scroe) for evaluating binary classification models
# f1_score computes the F1 score for evaluating classification models
# confusion_matrix computes a table that summarizes the performance of a classification model
from sklearn.metrics import roc_auc_score, f1_score, confusion_matrix

# Basic usage:

In [11]:
# initialize a sentiment analysis model & assign it to a variable:
classifier = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [12]:
type(classifier) # text classification pipeline

transformers.pipelines.text_classification.TextClassificationPipeline

In [13]:
# output is a dictionary:
classifier("This is such a great movie!")

[{'label': 'POSITIVE', 'score': 0.9998759031295776}]

In [14]:
classifier("This show was not interesting")

[{'label': 'NEGATIVE', 'score': 0.9997871518135071}]

In [15]:
classifier("I can't say that this was a good movie")

[{'label': 'NEGATIVE', 'score': 0.9278441071510315}]

In [16]:
# multiple inputs passed in as a list:
classifier([
  "This course is just what I needed.",
  "I can't understand any of this. Instructor kept telling me to meet the \
   prerequisites. What are prerequisites? Why does he keep saying that?"
])

[{'label': 'POSITIVE', 'score': 0.9991594552993774},
 {'label': 'NEGATIVE', 'score': 0.9966675639152527}]

👆 The output is a list too.

In [None]:
import torch # import the PyTorch library for tensor computations