<A HREF="https://medium.com/mlearning-ai/using-huggingface-transformers-with-pytorch-for-nlp-tasks-afc430190e22">Following this Medium article</A><P>
<A HREF="https://huggingface.co/docs/transformers/installation">HF installation and environment setup</A>

In [1]:
import csv
import pandas as pd

In [2]:
# install HuggingFace transformers if necessary
!pip -q install transformers

In [3]:
# Test installation (note: downloads model ~260MB)
from transformers import pipeline
print(pipeline('sentiment-analysis')('we love you'))

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


[{'label': 'POSITIVE', 'score': 0.9998704195022583}]


In [4]:
# Sentiment Analysis
cls = pipeline("sentiment-analysis")
cls("Team Blauwe Kater rules!")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


[{'label': 'POSITIVE', 'score': 0.9975104331970215}]

In [5]:
# Warning: installs bart-large-mnli model which is 1.52GB
cls_b = pipeline("zero-shot-classification")
cls_b(["This is related to computers" , "I love apples"] , ["apples", "computers"])

No model was supplied, defaulted to facebook/bart-large-mnli (https://huggingface.co/facebook/bart-large-mnli)


[{'sequence': 'This is related to computers',
  'labels': ['computers', 'apples'],
  'scores': [0.9986750483512878, 0.001324947108514607]},
 {'sequence': 'I love apples',
  'labels': ['apples', 'computers'],
  'scores': [0.9986883401870728, 0.0013116567861288786]}]

In [6]:
# Read in a csv formatted speech using csv package
filepath = './Data/DataUCSB/address-accepting-the-presidential-nomination-the-democratic-national-convention-denver.csv'
with open(filepath, 'r') as read_obj: # read csv file as a list of lists
  csv_reader = csv.reader(read_obj) # pass the file object to reader() to get the reader object
  speechList = sum(list(csv_reader), []) # Pass reader object to list() to get a list of lists (matrix)
                                        # sum(list, []) flattens 2D matrix into a vector

In [7]:
# joins list into a str - type(speech) = str
speech = ' '.join(speechList)
short = int(len(speechList)/10)
shortspeech = ' '.join(speechList[0:short])

In [8]:
# Warning: Downloads 1.14GB sshleifer/distilbart-cnn-12-6 model
sum = pipeline("summarization")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)


In [9]:
sum(shortspeech)

[{'summary_text': ' Barack Obama: With profound gratitude and great humility, I accept your nomination for the presidency of the United States. To Chairman Dean and my great friend Dick Durbin; and to all my fellow citizens of this great nation, I thank you. Let me express my thanks to the historic slate of candidates who accompanied me on this journey, and especially the one who traveled the farthest – Hillary Rodham Clinton .'}]

In [10]:
sum(shortspeech)[0]["summary_text"]

' Barack Obama: With profound gratitude and great humility, I accept your nomination for the presidency of the United States. To Chairman Dean and my great friend Dick Durbin; and to all my fellow citizens of this great nation, I thank you. Let me express my thanks to the historic slate of candidates who accompanied me on this journey, and especially the one who traveled the farthest – Hillary Rodham Clinton .'

In [11]:
shortspeech

"To Chairman Dean and my great friend Dick Durbin; and to all my fellow citizens of this great nation; With profound gratitude and great humility, I accept your nomination for the presidency of the United States. Let me express my thanks to the historic slate of candidates who accompanied me on this journey, and especially the one who traveled the farthest – a champion for working Americans and an inspiration to my daughters and to yours -- Hillary Rodham Clinton. To President Clinton, who last night made the case for change as only he can make it; to Ted Kennedy, who embodies the spirit of service; and to the next Vice President of the United States, Joe Biden, I thank you. I am grateful to finish this journey with one of the finest statesmen of our time, a man at ease with everyone from world leaders to the conductors on the Amtrak train he still takes home every night. To the love of my life, our next First Lady, Michelle Obama, and to Sasha and Malia – I love you so much, and I'm s

<A HREF="https://spacy.io/">Spacy</A><BR>

In [13]:
# NER using SpaCy
import spacy

In [23]:
!pip3 install -U spacy
# the following model is 587 MB
!python3 -m spacy download en_core_web_trf

Defaulting to user installation because normal site-packages is not writeable
2022-10-10 15:21:13.819587: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-10-10 15:21:13.819609: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-10-10 15:21:13.819642: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (muddy-HP-ProDesk-600-G3-SFF): /proc/driver/nvidia/version does not exist
Defaulting to user installation because normal site-packages is not writeable
Collecting en-core-web-trf==3.4.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_trf-3.4.0/en_core_web_trf-3.4.0-py3-none-any.whl (460.3 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m460.3/460.3 MB[0m [31m6

[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_trf')


In [25]:
nlp = spacy.load("en_core_web_lg")
doc = nlp(speech)

In [29]:
[[ent.text, ent.label_] for ent in doc.ents if ent.text]

[['Dean', 'PERSON'],
 ['Dick Durbin', 'PERSON'],
 ['the United States', 'GPE'],
 ['Americans', 'NORP'],
 ['Hillary Rodham Clinton', 'PERSON'],
 ['Clinton', 'PERSON'],
 ['last night', 'TIME'],
 ['Ted Kennedy', 'PERSON'],
 ['the United States', 'GPE'],
 ['Joe Biden', 'PERSON'],
 ['Amtrak', 'ORG'],
 ['every night', 'TIME'],
 ['First', 'ORDINAL'],
 ['Michelle Obama', 'PERSON'],
 ['Malia', 'PERSON'],
 ['Four years ago', 'DATE'],
 ['Kenya', 'GPE'],
 ['Kansas', 'GPE'],
 ['America', 'GPE'],
 ['one', 'CARDINAL'],
 ['American', 'NORP'],
 ['tonight', 'TIME'],
 ['two hundred and thirty two years', 'CARDINAL'],
 ['American', 'NORP'],
 ['Tonight', 'TIME'],
 ['Americans', 'NORP'],
 ['Washington', 'GPE'],
 ['George W. Bush', 'PERSON'],
 ['America', 'GPE'],
 ['these last eight years', 'DATE'],
 ['Ohio', 'GPE'],
 ['one', 'CARDINAL'],
 ['Indiana', 'GPE'],
 ['twenty years', 'DATE'],
 ['China', 'GPE'],
 ['American', 'NORP'],
 ['Tonight', 'TIME'],
 ['American', 'NORP'],
 ['Democrats', 'NORP'],
 ['Republican