# Let's try HuggingFace Transformers NLP Pipelines!


In [None]:
!pip install transformers



# Zero-shot-classification

In [None]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "Today we're going to learn about semiconductors",
    candidate_labels=["electronics", "computer", "university", "laboratory", "physics", "math"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]



{'sequence': "Today we're going to learn about semiconductors",
 'labels': ['electronics',
  'university',
  'laboratory',
  'computer',
  'physics',
  'math'],
 'scores': [0.9132665991783142,
  0.026060478761792183,
  0.02085280604660511,
  0.020368890836834908,
  0.012660030275583267,
  0.0067911893129348755]}

# Text Generation

In [2]:
from transformers import pipeline

generator = pipeline("text-generation")
generator("We are cooking pizza so we can")

No model was supplied, defaulted to openai-community/gpt2 and revision 6c0e608 (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'We are cooking pizza so we can make it a little bit different every season. That\'s what I came up with this summer, so I can just focus on pizzas."'}]

In [4]:
from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("We're going to need pepperoni and <mask> to make pizza", top_k=2)

No model was supplied, defaulted to distilbert/distilroberta-base and revision ec58a5b (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.19257475435733795,
  'token': 7134,
  'token_str': ' cheese',
  'sequence': "We're going to need pepperoni and cheese to make pizza"},
 {'score': 0.09987569600343704,
  'token': 32394,
  'token_str': ' basil',
  'sequence': "We're going to need pepperoni and basil to make pizza"}]

In [6]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("My name is Roland, I am a student in Indonesia")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'entity_group': 'PER',
  'score': 0.9991634,
  'word': 'Roland',
  'start': 11,
  'end': 17},
 {'entity_group': 'LOC',
  'score': 0.9997918,
  'word': 'Indonesia',
  'start': 37,
  'end': 46}]

In [14]:
from transformers import pipeline

qa_pipeline = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")
context = "Paella is a classic Spanish dish of rice cooked with vegetables, seafood, and meat, and flavored with saffron. It originated in the rice-growing regions of Spain's Mediterranean coast, and is especially associated with the Valencia region."
question = "What is Paella?"

result = qa_pipeline(question=question, context=context)
print(f"Jawaban: {result['answer']}")

Jawaban: a classic Spanish dish of rice


In [20]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("My phone is 5 years old, i need to replace it")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'NEGATIVE', 'score': 0.9986024498939514}]

In [22]:
from transformers import pipeline

summarizer = pipeline("summarization")
summarizer(
    """
    Rain is water droplets that have condensed from atmospheric water vapor and then fall under gravity. Rain is a major component of the water cycle and is responsible for depositing most of the fresh water on the Earth. It provides water for hydroelectric power plants, crop irrigation, and suitable conditions for many types of ecosystems.
    The major cause of rain production is moisture moving along three-dimensional zones of temperature and moisture contrasts known as weather fronts. If enough moisture and upward motion is present, precipitation falls from convective clouds (those with strong upward vertical motion) such as cumulonimbus (thunder clouds) which can organize into narrow rainbands. In mountainous areas, heavy precipitation is possible where upslope flow is maximized within windward sides of the terrain at elevation which forces moist air to condense and fall out as rainfall along the sides of mountains. On the leeward side of mountains, desert climates can exist due to the dry air caused by downslope flow which causes heating and drying of the air mass. The movement of the monsoon trough, or Intertropical Convergence Zone, brings rainy seasons to savannah climes.
    The urban heat island effect leads to increased rainfall, both in amounts and intensity, downwind of cities. Global warming is also causing changes in the precipitation pattern, including wetter conditions across eastern North America and drier conditions in the tropics. Antarctica is the driest continent. The globally averaged annual precipitation over land is 715 mm (28.1 in), but over the whole Earth, it is much higher at 990 mm (39 in).[1] Climate classification systems such as the Köppen classification system use average annual rainfall to help differentiate between differing climate regimes. Rainfall is measured using rain gauges. Rainfall amounts can be estimated by weather radar.
    """
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'summary_text': ' Rain is a major component of the water cycle and is responsible for depositing most of the fresh water on the Earth . It provides water for hydroelectric power plants, crop irrigation, and suitable conditions for many types of ecosystems . Global warming is also causing changes in the precipitation pattern, including wetter conditions across eastern North America and drier conditions in the tropics .'}]

In [25]:
from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-id-en")

text_to_translate = "Aku bermain game pc hingga subuh"
result = translator(text_to_translate)

print(result[0]['translation_text'])

I play pc games until dawn


# Analisis Zero Short Classification

zero short classification termasuk ke bagian "Klasifikasikan ke seluruh kalimat" dimana dia mencari label yang paling cocok dengan prompt atau input yang dikasih dan memberikan score tertinggi kepada label tersebut.

zero shot classification merupakan salah satu pipeline yang menggunakan model yang telah di "pretrained" atau dilatih untuk mengklasifikasikan suatu kalimat ke label yang belum pernah dilihat oleh model.

Pada prompt yang saya berikan mungkin kata "semiconductor" ditemukan sangat berhubungan dengan label "electronic" sehingga pada label "electronic" model memberikan score tertinggi kepada label "electronic"

# Analisis Text Generation

Text generation seperti namanya memiliki peran untuk menghasilkan/melanjutkan teks dari input prompt yang telah diberikan, secara default menggunakan model gpt2. Text generation ini termasuk kategori membuat konten teks dalam NLP.

# Analisis Fill Mask

Fill mask berfungsi untuk mengisi kata yang kosong di tengah kalimat, menggunakan model distilroberta-base memberikan score terhadap label kata yang dianggap cocok seperti pada contoh output yang diberikan adalah "cheese" dan "basil". Termasuk kategori membuat konten teks dalam NLP.

# Analisis NER(Named Entity Recognition)

NER mengklasifikasikan suatu kata dalam teks yang diberikan ke salah satu label ini PER: Nama orang, LOC: Lokasi atau tempat, ORG: Organisasi, MISC: Entitas lain seperti acara, merek, atau produk. Pada contoh dapat mendeteksi konteks orang dalam kalimat dan lokasi yaitu "Roland" dan "Indonesia". NER ini termasuk dalam kategori klasifikasi setiap kata.

# Analisis Question Answering

Question Answering menerima prompt input yang berupa konteks yaitu data yang diperlukan untuk menjawab pertanyaan yang juga merupakan salah satu prompt input. Seperti pada contoh kita memberikan konteks tentang makanan dari spanyol dan kita tanya apa itu paella, dan dijawab oleh model dengan "a classic Spanish dish of rice". Termasuk ke dalam kategori ekstrak jawaban dari teks.

# Analisis Sentiment Analysis

Sentiment Analysis seperti namanya menganalisis sentimen atau perasaan pada teks input prompt. Dari input prompt yang diberi diberikan sentimen negatif mungkin karena kata "need" dan "replace". Termasuk ke dalam kategori klasifikasikan seluruh kalimat.

# Analisis Summarization

Summarization seperti namanya merupakan bagian NLP yang digunakan untuk meringkas teks input sambil mempertahankan informasi penting dan makna keseluruhan dari teks awal yang lebih panjang. Termasuk dalam kategori membuat kalimat baru dari teks masukan.

# Analisis Translation

Translation atau translasi dalam Bahasa Indonesia merupakan bagian NLP yang digunakan untuk menerjemahkan teks dari satu bahasa ke bahasa yang berbeda. Pada contoh akan menerjemahkan dari Bahasa Indonesia ke Bahasa Inggris. Termasuk dalam kategori membuat kalimat baru dari teks masukan.