In [7]:
!python --version

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Python 3.8.17


In [2]:
import transformers

print(f'transformers.__version__: {transformers.__version__}')

from transformers import pipeline

transformers.__version__: 4.24.0


In [9]:


classifier = pipeline("sentiment-analysis") # distilbert-base-uncased-finetuned-sst-2-english
classifier("I've been waiting for a HuggingFace course my whole life.")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9598049521446228}]

In [22]:
classifier("What is the present of the USA?")

[{'label': 'NEGATIVE', 'score': 0.9832007884979248}]

In [19]:
classifier(
    [
        "I've been waiting for a HuggingFace course my whole life.", 
        "I hate this so much!"
    ]
)

[{'label': 'POSITIVE', 'score': 0.9598049521446228},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

In [24]:
clf = pipeline("zero-shot-classification") # facebook/bart-large-mnli
clf("He is trying to get a job after graduation", candidate_labels = ["education", "job", "business"])
clf("He is trying to get a job after graduation", candidate_labels = ["education", "job", "business"])

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'He is trying to get a job after graduation',
 'labels': ['job', 'education', 'business'],
 'scores': [0.8361348509788513, 0.12200606614351273, 0.04185909777879715]}

In [28]:
gen = pipeline("text-generation") # gpt2

gen("Hayat is a great person.", num_return_sequences = 2)#, max_length = 25)

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Hayat is a great person. I'm sad for her.\n\nAnd we're not going back too much but we want to talk about her.\n\nNow why not start with her. You know, I'm not sure if she cares"},
 {'generated_text': "Hayat is a great person. However, I am also a young woman who will be a victim to all this if it ever happens to you again. If things get out of hand, I don't want you to go on a vacation to Europe"}]

In [30]:
gen = pipeline("text-generation", model = "distilgpt2") # gpt2

gen("Abul Hayat is a great person.", num_return_sequences = 2)#, max_length = 25)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Abul Hayat is a great person. He is the author of several books including: First Novel of Islam and the War on Terror, and Al-Okhbar: a History of American Politics and the War on Terror, in addition to numerous'},
 {'generated_text': 'Abul Hayat is a great person. He is a master in the field.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n'}]

In [33]:
unmasker = pipeline("fill-mask") # distilroberta-base

unmasker("Abul Hayat is a <mask> person")

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'score': 0.04307474568486214,
  'token': 3346,
  'token_str': ' Muslim',
  'sequence': 'Abul Hayat is a Muslim person'},
 {'score': 0.04200262948870659,
  'token': 7297,
  'token_str': ' decent',
  'sequence': 'Abul Hayat is a decent person'},
 {'score': 0.03854675590991974,
  'token': 588,
  'token_str': ' real',
  'sequence': 'Abul Hayat is a real person'},
 {'score': 0.028026383370161057,
  'token': 7940,
  'token_str': ' transgender',
  'sequence': 'Abul Hayat is a transgender person'},
 {'score': 0.023888390511274338,
  'token': 3458,
  'token_str': ' religious',
  'sequence': 'Abul Hayat is a religious person'}]

In [36]:
unmasker = pipeline('fill-mask', model='bert-base-cased') # bert-base-cased

unmasker("Abul Hayat is a [MASK] person")

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.11941937357187271,
  'token': 2689,
  'token_str': 'religious',
  'sequence': 'Abul Hayat is a religious person'},
 {'score': 0.044306445866823196,
  'token': 2505,
  'token_str': 'famous',
  'sequence': 'Abul Hayat is a famous person'},
 {'score': 0.033201780170202255,
  'token': 4360,
  'token_str': 'Muslim',
  'sequence': 'Abul Hayat is a Muslim person'},
 {'score': 0.03076528199017048,
  'token': 6241,
  'token_str': 'controversial',
  'sequence': 'Abul Hayat is a controversial person'},
 {'score': 0.0204167477786541,
  'token': 3385,
  'token_str': 'notable',
  'sequence': 'Abul Hayat is a notable person'}]

In [37]:
ner = pipeline("ner", grouped_entities=True) # dbmdz/bert-large-cased-finetuned-conll03-english
ner("My name is Md Abul Hayat and I work at JPMorgan Chase and Co. in Brooklyn, NY")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading (…)lve/main/config.json: 100%|███████████████████████████████████████████████████████| 998/998 [00:00<00:00, 93.8kB/s]
Downloading pytorch_model.bin: 100%|█████████████████████████████████████████████████████████| 1.33G/1.33G [01:36<00:00, 13.8MB/s]
Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████████████████| 60.0/60.0 [00:00<00:00, 15.3kB/s]
Downloading (…)solve/main/vocab.txt: 100%|█████████████████████████████████████████████████████| 213k/213k [00:00<00:00, 18.4MB/s]


[{'entity_group': 'PER',
  'score': 0.99162775,
  'word': 'Md Abul Hayat',
  'start': 11,
  'end': 24},
 {'entity_group': 'ORG',
  'score': 0.98313427,
  'word': 'JPMorgan Chase and Co.',
  'start': 39,
  'end': 61},
 {'entity_group': 'LOC',
  'score': 0.9961479,
  'word': 'Brooklyn',
  'start': 65,
  'end': 73},
 {'entity_group': 'LOC',
  'score': 0.96620333,
  'word': 'NY',
  'start': 75,
  'end': 77}]

In [42]:
qa = pipeline("question-answering") # distilbert-base-cased-distilled-squad 

qa(question = "Where is your office?", context = "My name is Md Abul Hayat and I work at JPMorgan Chase and Co. in Brooklyn, NY")

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'score': 0.429169237613678, 'start': 65, 'end': 77, 'answer': 'Brooklyn, NY'}

In [5]:
translator = pipeline("translation", model = 'csebuetnlp/banglat5_nmt_en_bn')
translator("My name is Md Abul Hayat")

Downloading (…)lve/main/config.json: 100%|███████████████████████████████████████████████████████| 766/766 [00:00<00:00, 95.6kB/s]
Downloading pytorch_model.bin: 100%|███████████████████████████████████████████████████████████| 990M/990M [02:40<00:00, 6.19MB/s]
Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████████| 1.97k/1.97k [00:00<00:00, 305kB/s]
Downloading spiece.model: 100%|██████████████████████████████████████████████████████████████| 1.11M/1.11M [00:00<00:00, 3.14MB/s]
Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████████| 1.79k/1.79k [00:00<00:00, 197kB/s]


ValueError: Couldn't instantiate the backend tokenizer from one of: 
(1) a `tokenizers` library serialization file, 
(2) a slow tokenizer instance to convert or 
(3) an equivalent slow tokenizer class to instantiate and convert. 
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.