# Text Classification

In [1]:
from transformers import pipeline

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
classifier = pipeline('sentiment-analysis')

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [3]:
classifier("The food was good overall but Pizza was horrible")

[{'label': 'NEGATIVE', 'score': 0.9829559326171875}]

In [4]:
classifier("I am loving it.")

[{'label': 'POSITIVE', 'score': 0.9998760223388672}]

In [5]:
classifier("I hate waking up early on the weekends.")

[{'label': 'NEGATIVE', 'score': 0.9970771074295044}]

In [7]:
import pandas as pd
data = pd.read_csv('tweets_trump_trudeau.csv')
data.head()

Unnamed: 0,id,author,status
0,1,Donald J. Trump,I will be making a major statement from the @W...
1,2,Donald J. Trump,Just arrived at #ASEAN50 in the Philippines fo...
2,3,Donald J. Trump,"After my tour of Asia, all Countries dealing w..."
3,4,Donald J. Trump,Great to see @RandPaul looking well and back o...
4,5,Donald J. Trump,Excited to be heading home to see the House pa...


In [8]:
for i in data['status'][:10]:
    print(classifier(i))

[{'label': 'POSITIVE', 'score': 0.9971588850021362}]
[{'label': 'NEGATIVE', 'score': 0.8045341968536377}]
[{'label': 'NEGATIVE', 'score': 0.9773215651512146}]
[{'label': 'POSITIVE', 'score': 0.9997676014900208}]
[{'label': 'POSITIVE', 'score': 0.9989203214645386}]
[{'label': 'POSITIVE', 'score': 0.9989012479782104}]
[{'label': 'POSITIVE', 'score': 0.9992148876190186}]
[{'label': 'POSITIVE', 'score': 0.9909795522689819}]
[{'label': 'POSITIVE', 'score': 0.9979164004325867}]
[{'label': 'POSITIVE', 'score': 0.9990936517715454}]


In [9]:
classifier2 = pipeline('sentiment-analysis', model='cardiffnlp/twitter-roberta-base-emotion')



In [10]:
for i in data['status'][:10]:
    print(classifier2(i))

[{'label': 'anger', 'score': 0.5119701623916626}]
[{'label': 'optimism', 'score': 0.48820018768310547}]
[{'label': 'joy', 'score': 0.4762808680534363}]
[{'label': 'anger', 'score': 0.5054358839988708}]
[{'label': 'optimism', 'score': 0.8555994033813477}]
[{'label': 'anger', 'score': 0.6616511940956116}]
[{'label': 'optimism', 'score': 0.5628583431243896}]
[{'label': 'anger', 'score': 0.744921863079071}]
[{'label': 'optimism', 'score': 0.6141897439956665}]
[{'label': 'optimism', 'score': 0.6337209343910217}]


# Text Summarization

In [11]:
summarizer = pipeline('summarization')

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [12]:
summarizer("""
¬¬MMA 865 2025B: Big Data Analytics
Course Syllabus
Updated August 7, 2024
COURSE DESCRIPTION
This course has two core components:
1.	Natural Language Processing
Natural Language Processing (NLP) is one of the six AI disciplines. We will discuss the major practice areas of NLP and several of its use-cases across many different industries. These key areas include Information Extraction, Document Classification, Sentiment Analysis, Language Generation, Chatbots, and Machine Translation. 
We will thoroughly cover text preprocessing and vectorization, as they are foundations for training NLP models. You will gain comprehensive knowledge of the tools and techniques necessary for effectively managing text data. This includes text preprocessing, text data visualization, and text vectorization / embeddings.
Large Language Models (LLM) like GPT have ushered in a paradigm shift in NLP, making them an indispensable asset for solving a wide range of language-related tasks. We will also briefly cover Transfer Learning and Transformer architecture in NLP, highlighting their significant impact on language models like GPT, Claude, LLaMa, Mistral, BARD, etc. Finally, we will explore how OpenAI's ChatGPT (and other LLM’s) are revolutionizing the operational landscape by studying practical use-cases of LLM’s.
2.	Big Data Engineering
We will explore big data engineering, covering key technologies such as Apache Hadoop, Spark, Relational and Non-Relational Database, and Cloud platforms. You will learn the history of big data, fundamentals, and the current trends.
We will also delve into advanced topics for machine learning system design. This includes an overview of several key technologies used in Machine Learning Operations (MLOps) such as Docker, Kubernetes, Microservices architecture, Experiment Logging, etc. Through a combination of theory and demos, you will gain practical insights into the foundations of big data engineering.
INSTRUCTOR
Moez Ali
moez.ali@queensu.ca
	TEACHING ASSISTANT 
Raghav Gupta
raghav.gupta@queensu.ca
OFFICE HOURS
Office hours schedule TBA in the first class.
RECOMMENDED TEXTBOOKS
Optional Textbooks that you can read at your own time.
•	Transformers for Natural Language Processing: Build, train, and fine-tune deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, and GPT-3, 2nd Edition
•	Learning Spark: Lightning-Fast Data Analytics 2nd Edition
•	Modern Generative AI with ChatGPT and OpenAI Models 1st Edition - Leverage the capabilities of OpenAI's LLM for productivity and innovation with GPT3 and GPT4
COURSE EVALUATION
Item	Value	Due*
Class Participation	20%	On-going
Individual Assignment	30%	September 7, 2024
Team Assignment	30%	September 28, 2024
Team Project	20%	October 5, 2024
* By 11:59pm Eastern Time.
1. Class Participation
Class participation is integral to the learning process and constitutes 20% of the course grade. Effective participation goes beyond mere attendance; it requires active engagement in classroom discussions, demonstrating a thorough understanding of the course materials. Students are encouraged to actively listen, thoughtfully contribute to discussions, and engage with different viewpoints.
A key aspect of class participation is the ability to ask meaningful questions that reflect deep engagement with the course content. Good questions are clear, concise, and stimulate further discussion. They should showcase critical thinking and a genuine curiosity about the subject matter.
Sharing use-cases, real-life experience, a new information on the subject based on your expertise and as it relates to the content is another way to demonstrate strong class participation.
2. Individual Assignment
Please see Individual Assignment under “Assignment” section on the course portal. 
3. Team Assignment
Please see Team Assignment – The Turing Tussle under “Assignment” section on the course portal.
4. Team Project
Please see Team Project - House of Common under “Assignment” section on the course portal.
""")

[{'summary_text': ' Course Syllabus: Natural Language Processing (NLP) is one of the six AI disciplines . We will discuss the major practice areas of NLP and several of its use-cases across many different industries . Big Data Engineering: Apache Hadoop, Spark, Relational and Non-Relational Database, and Cloud platforms .'}]

In [13]:
summarizer("""
Elon Musk is a South African-born American entrepreneur and businessman who founded X.com in 1999 (which later became PayPal), SpaceX in 2002 and Tesla Motors in 2003. Musk became a multimillionaire in his late 20s when he sold his start-up company, Zip2, to a division of Compaq Computers. 

Musk made headlines in May 2012, when SpaceX launched a rocket that would send the first commercial vehicle to the International Space Station. He bolstered his portfolio with the purchase of SolarCity in 2016 and cemented his standing as a leader of industry by taking on an advisory role in the early days of President Donald Trump's administration.

In January 2021, Musk reportedly surpassed Jeff Bezos as the wealthiest man in the world.

Early Life
Musk was born on June 28, 1971, in Pretoria, South Africa. As a child, Musk was so lost in his daydreams about inventions that his parents and doctors ordered a test to check his hearing.

At about the time of his parents’ divorce, when he was 10, Musk developed an interest in computers. He taught himself how to program, and when he was 12 he sold his first software: a game he created called Blastar.

In grade school, Musk was short, introverted and bookish. He was bullied until he was 15 and went through a growth spurt and learned how to defend himself with karate and wrestling.

Family
Musk’s mother, Maye Musk, is a Canadian model and the oldest woman to star in a Covergirl campaign. When Musk was growing up, she worked five jobs at one point to support her family.

Musk’s father, Errol Musk, is a wealthy South African engineer.

Musk spent his early childhood with his brother Kimbal and sister Tosca in South Africa. His parents divorced when he was 10.

Education
At age 17, in 1989, Musk moved to Canada to attend Queen’s University and avoid mandatory service in the South African military. Musk obtained his Canadian citizenship that year, in part because he felt it would be easier to obtain American citizenship via that path.

In 1992, Musk left Canada to study business and physics at the University of Pennsylvania. He graduated with an undergraduate degree in economics and stayed for a second bachelor’s degree in physics.

After leaving Penn, Musk headed to Stanford University in California to pursue a PhD in energy physics. However, his move was timed perfectly with the Internet boom, and he dropped out of Stanford after just two days to become a part of it, launching his first company, Zip2 Corporation in 1995. Musk became a U.S. citizen in 2002.
""")

[{'summary_text': ' Elon Musk was born on June 28, 1971, in Pretoria, South Africa . He was bullied until he was 15 and learned how to defend himself with karate and wrestling . He moved to Canada in 1989 to attend Queen’s University and avoid mandatory service in the South African military . He dropped out of Stanford University after two days to launch his first company, Zip2 .'}]

In [14]:
summarizer2 = pipeline('summarization', model='facebook/bart-large-xsum')

In [15]:
summarizer2("""
Elon Musk is a South African-born American entrepreneur and businessman who founded X.com in 1999 (which later became PayPal), SpaceX in 2002 and Tesla Motors in 2003. Musk became a multimillionaire in his late 20s when he sold his start-up company, Zip2, to a division of Compaq Computers. 

Musk made headlines in May 2012, when SpaceX launched a rocket that would send the first commercial vehicle to the International Space Station. He bolstered his portfolio with the purchase of SolarCity in 2016 and cemented his standing as a leader of industry by taking on an advisory role in the early days of President Donald Trump's administration.

In January 2021, Musk reportedly surpassed Jeff Bezos as the wealthiest man in the world.

Early Life
Musk was born on June 28, 1971, in Pretoria, South Africa. As a child, Musk was so lost in his daydreams about inventions that his parents and doctors ordered a test to check his hearing.

At about the time of his parents’ divorce, when he was 10, Musk developed an interest in computers. He taught himself how to program, and when he was 12 he sold his first software: a game he created called Blastar.

In grade school, Musk was short, introverted and bookish. He was bullied until he was 15 and went through a growth spurt and learned how to defend himself with karate and wrestling.

Family
Musk’s mother, Maye Musk, is a Canadian model and the oldest woman to star in a Covergirl campaign. When Musk was growing up, she worked five jobs at one point to support her family.

Musk’s father, Errol Musk, is a wealthy South African engineer.

Musk spent his early childhood with his brother Kimbal and sister Tosca in South Africa. His parents divorced when he was 10.

Education
At age 17, in 1989, Musk moved to Canada to attend Queen’s University and avoid mandatory service in the South African military. Musk obtained his Canadian citizenship that year, in part because he felt it would be easier to obtain American citizenship via that path.

In 1992, Musk left Canada to study business and physics at the University of Pennsylvania. He graduated with an undergraduate degree in economics and stayed for a second bachelor’s degree in physics.

After leaving Penn, Musk headed to Stanford University in California to pursue a PhD in energy physics. However, his move was timed perfectly with the Internet boom, and he dropped out of Stanford after just two days to become a part of it, launching his first company, Zip2 Corporation in 1995. Musk became a U.S. citizen in 2002.
""")

[{'summary_text': 'Elon Musk is the wealthiest man in the world, according to Forbes magazine.'}]

In [16]:
summarizer("""
WHEREAS, the solemn covenant made with the people of Lower Canada, and recorded in the Statute Book of the United Kingdom of Great Britain and Ireland, as the thirty-first chapter of the Act passed in the thirty-first year of the Reign of King George III hath been continually violated by the British Government, and our rights usurped. And, whereas our humble petitions, addresses, protests, and remonstrances against this injurious and unconstitutional interference have been made in vain. That the British Government hath disposed of our revenue without the constitutional consent of the local Legislature — pillaged our treasury — arrested great numbers of our citizens, and committed them to prison — distributed through the country a mercenary army, whose presence is accompanied by consternation and alarm — whose track is red with the blood of our people — who have laid our villages in ashes — profaned our temples — and spread terror and waste through the land. And, whereas we can no longer suffer the repeated violations of our dearest rights, and patiently support the multiplied outrages and cruelties of the Government of Lower Canada, we, in the name of the people of Lower Canada, acknowledging the decrees of a Divine Providence, which permits us to put down a Government, which hath abused the object and intention for which it was created, and to make choice of that form of Government which shall re-establish the empire of justice — assure domestic tranquillity — provide for common defence — promote general good, and secure to us and our posterity the advantages of civil and religious liberty,
""")

[{'summary_text': ' The covenant was made with the people of Lower Canada, and recorded in the Statute Book of the United Kingdom of Great Britain and Ireland, as the thirty-first chapter of the Act passed in the 30-first year of the Reign of King George III . And, whereas our humble petitions, addresses, protests, and remonstrances against this injurious and unconstitutional interference have been made in vain. That the British Government has disposed of our revenue without the constitutional consent of the local Legislature .'}]

In [17]:
summarizer2("""
WHEREAS, the solemn covenant made with the people of Lower Canada, and recorded in the Statute Book of the United Kingdom of Great Britain and Ireland, as the thirty-first chapter of the Act passed in the thirty-first year of the Reign of King George III hath been continually violated by the British Government, and our rights usurped. And, whereas our humble petitions, addresses, protests, and remonstrances against this injurious and unconstitutional interference have been made in vain. That the British Government hath disposed of our revenue without the constitutional consent of the local Legislature — pillaged our treasury — arrested great numbers of our citizens, and committed them to prison — distributed through the country a mercenary army, whose presence is accompanied by consternation and alarm — whose track is red with the blood of our people — who have laid our villages in ashes — profaned our temples — and spread terror and waste through the land. And, whereas we can no longer suffer the repeated violations of our dearest rights, and patiently support the multiplied outrages and cruelties of the Government of Lower Canada, we, in the name of the people of Lower Canada, acknowledging the decrees of a Divine Providence, which permits us to put down a Government, which hath abused the object and intention for which it was created, and to make choice of that form of Government which shall re-establish the empire of justice — assure domestic tranquillity — provide for common defence — promote general good, and secure to us and our posterity the advantages of civil and religious liberty,
""")

[{'summary_text': 'A petition signed by the people of Lower Canada, calling on the British Government to repeal the Indian Act.'}]

# Text Generation


In [18]:
generator = pipeline('text-generation')

No model was supplied, defaulted to openai-community/gpt2 and revision 6c0e608 (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [19]:
generator("Adam and Natalie are good friends.")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Adam and Natalie are good friends. They met up at the home of John and the girls are doing a house party now," he said. "For some crazy reason we couldn\'t agree on something, so they decided to take it to school."\n'}]

In [20]:
generator("Since I have joined Queens University.") 

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Since I have joined Queens University. I don't know what is going to happen to him, but I know I can do the job better. And if we don't succeed there, then it's that I feel I can just leave. When you"}]

In [21]:
generator("Canada is a country.....")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Canada is a country.....\n\nThe US is a nation that doesn't work with you... Do your due diligence, but I'm sure if I had to guess, I would say that Americans believe it is the other way around, and"}]