# pipeline example
* pipeline https://huggingface.co/docs/transformers/main_classes/pipelines
* models https://huggingface.co/models

In [45]:
%pip install --upgrade pip
%pip install transformers

Note: you may need to restart the kernel to use updated packages.


# text-classification task
* tasks and models https://huggingface.co/models?pipeline_tag=text-classification&sort=trending

In [12]:
from transformers import pipeline

pipe = pipeline(task = "text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")
text = "Disappointed."
results = pipe(text)
results 

[{'label': 'NEGATIVE', 'score': 0.9997991919517517}]

In [13]:
texts = ["I love it!", "it is cold today."]
results = pipe(texts)
results

[{'label': 'POSITIVE', 'score': 0.9998781681060791},
 {'label': 'NEGATIVE', 'score': 0.9992520213127136}]

# summarization task
* SummarizationPipeline
https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.SummarizationPipeline


In [14]:
from transformers import pipeline

summarizer = pipeline(task="summarization")

document = """Deep learning is the subset of machine learning methods based on artificial neural networks 
with representation learning. The adjective "deep" refers to the use of multiple layers in the network. 
Methods used can be either supervised, semi-supervised or unsupervised.

Deep-learning architectures such as deep neural networks, deep belief networks, recurrent neural networks, 
convolutional neural networks and transformers have been applied to fields including computer vision, 
speech recognition, natural language processing, machine translation, bioinformatics, drug design, 
medical image analysis, climate science, material inspection and board game programs, 
where they have produced results comparable to and in some cases surpassing human expert performance.

Artificial neural networks (ANNs) were inspired by information processing and distributed communication nodes in biological systems. 
ANNs have various differences from biological brains. Specifically, artificial neural networks tend to be static and symbolic, 
while the biological brain of most living organisms is dynamic (plastic) and analog. 
ANNs are generally seen as low quality models for brain function."""

results = summarizer(document) # list
results


No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'summary_text': ' Deep learning is the subset of machine learning methods based on artificial neural networks . The adjective "deep" refers to the use of multiple layers in the network . The methods used can be either supervised, semi-supervised or unsupervised . They have produced results comparable to and in some cases surpassing human expert performance .'}]

# question-answering task
* question-answering https://huggingface.co/docs/transformers/main/en/task_summary#question-answering
* QuestionAnsweringPipeline
https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.QuestionAnsweringPipeline
* question-answering models
https://huggingface.co/models?pipeline_tag=question-answering&sort=trending



In [15]:
from transformers import pipeline

# read the context and answer the question
question_answering = pipeline(task="question-answering", model="distilbert-base-cased-distilled-squad")
question = "How many colors are there in a rainbow?"
context = "A rainbow is a meteorological phenomenon that is caused by reflection, refraction, and dispersion of light in water droplets resulting in a spectrum of light appearing in the sky. It takes the form of a multicolored circular arc."
result = question_answering(question=question, context=context)
result

{'score': 0.8451012372970581,
 'start': 203,
 'end': 215,
 'answer': 'multicolored'}

# text-to-audio task
* TextToAudioPipeline https://huggingface.co/docs/transformers/v4.36.0/en/main_classes/pipelines#transformers.TextToAudioPipeline
* https://huggingface.co/learn/audio-course/chapter2/tts_pipeline
* models https://huggingface.co/models?pipeline_tag=text-to-audio&sort=trending


In [46]:
%pip install --upgrade pip
%pip install --upgrade transformers scipy

Note: you may need to restart the kernel to use updated packages.

Collecting transformers
  Downloading transformers-4.36.0-py3-none-any.whl.metadata (126 kB)
     ---------------------------------------- 0.0/126.8 kB ? eta -:--:--
     ---------------------------------------- 0.0/126.8 kB ? eta -:--:--
     --- ------------------------------------ 10.2/126.8 kB ? eta -:--:--
     -------- ---------------------------- 30.7/126.8 kB 262.6 kB/s eta 0:00:01
     ----------- ------------------------- 41.0/126.8 kB 245.8 kB/s eta 0:00:01
     -------------------------- ---------- 92.2/126.8 kB 476.3 kB/s eta 0:00:01
     ------------------------------------ 126.8/126.8 kB 623.1 kB/s eta 0:00:00
Downloading transformers-4.36.0-py3-none-any.whl (8.2 MB)
   ---------------------------------------- 0.0/8.2 MB ? eta -:--:--
   - -------------------------------------- 0.2/8.2 MB 5.0 MB/s eta 0:00:02
   --- ------------------------------------ 0.6/8.2 MB 7.9 MB/s eta 0:00:01
   ------ -----------

In [1]:
from transformers import pipeline
from scipy.io.wavfile import write

text2speech = pipeline(task="text-to-speech") # 1.68GB
type(text2speech)

No model was supplied, defaulted to suno/bark-small and revision 645cfba (https://huggingface.co/suno/bark-small).
Using a pipeline without specifying a model name and revision in production is not recommended.


transformers.pipelines.text_to_audio.TextToAudioPipeline

In [18]:
text = "This text will be converted into audio."
speech = text2speech(text)
speech

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:10000 for open-end generation.


{'audio': array([[ 0.01151444, -0.00313956, -0.00215029, ...,  0.05346299,
          0.1074708 ,  0.09522161]], dtype=float32),
 'sampling_rate': 24000}

In [19]:
speech['audio'].shape

(1, 89280)

play audio

* scipy https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.wavfile.write.html

In [20]:
import os 
from IPython.display import Audio

audio_data = speech['audio'].flatten() # 2d array -> 1d array
sampling_rate = speech['sampling_rate']

# play audio
Audio(audio_data, rate=sampling_rate)

save audio as wav

In [22]:
filename = "ouput.wav"
outputfile_path = os.path.join(os.getcwd(), filename)
write(outputfile_path, rate=sampling_rate, data=audio_data)
print(outputfile_path) 

c:\Users\t\OneDrive\Documents\python\transformers\ouput.wav


# automatic-speech-recognition task
You must have ffmpeg installed on your PC.

* ffmpeg https://ffmpeg.org/download.html


I have ffmpeg installed in C:/ffmpeg

In [60]:
import shutil

# Find "ffmpeg" in the search path
ffmpeg_path = shutil.which("ffmpeg")

if ffmpeg_path:
    print(f"ffmpeg found at: {ffmpeg_path}")
else:
    print("ffmpeg not found in the search path.")


ffmpeg found at: C:\ffmpeg\ffmpeg.EXE


Use AutomaticSpeechRecognitionPipeline
* AutomaticSpeechRecognitionPipeline https://huggingface.co/docs/transformers/v4.35.2/en/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline

In [None]:
from IPython.display import Audio

audio_url = "https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/i-know-kung-fu.mp3"
audio = Audio(audio_url)
audio 


In [None]:
from transformers import pipeline

transcriber = pipeline(task="automatic-speech-recognition", model="openai/whisper-base")
type(transcriber)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


transformers.pipelines.automatic_speech_recognition.AutomaticSpeechRecognitionPipeline

In [None]:
# i know kung fu ->  I don't come food.
transcriber("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/i-know-kung-fu.mp3")

{'text': " I don't come food."}

# image-to-text task
* image to text pipeline https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.ImageToTextPipeline

In [47]:
%pip install transformers requests Pillow

Note: you may need to restart the kernel to use updated packages.


In [31]:
from transformers import pipeline

captioner = pipeline(task="image-to-text", model="Salesforce/blip-image-captioning-large")
type(captioner)

transformers.pipelines.image_to_text.ImageToTextPipeline

In [32]:
image_url = "https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png"
result = captioner(image_url)
print(result)



[{'generated_text': 'a close up of a receipt with a picture of a person in the background'}]


In [35]:
import requests
from urllib.parse import unquote
from PIL import Image
from io import BytesIO

url = "https://upload.wikimedia.org/wikipedia/commons/thumb/2/28/Flag_of_Puerto_Rico.svg/1024px-Flag_of_Puerto_Rico.svg.png"
response = requests.get(url)

# Check if the request was successful (status code 200)
if response.status_code == 200:
    image = Image.open(BytesIO(response.content))
    result = captioner(image, max_new_tokens=100)
    print(result)
else:
    print(f"Failed to download image. Status code: {response.status_code}")
    print("Error message:", response.text)  # Print the error message


[{'generated_text': 'a close up of a flag with a star on it'}]


# document-question-answering task
* DocumentQuestionAnsweringPipeline https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.DocumentQuestionAnsweringPipeline
* models https://huggingface.co/models?pipeline_tag=document-question-answering&sort=trending
* tesseract https://github.com/UB-Mannheim/tesseract/wiki

In [48]:
%pip install pytesseract

Note: you may need to restart the kernel to use updated packages.


In [59]:
# Find a command in the search path
import shutil

command = "tesseract"
command_path = shutil.which(command)

if command_path:
    print(f'{command} found at: "{command_path}"')
else:
    print("{command} not found in the search path.")

tesseract found at: "C:\Program Files\Tesseract-OCR\tesseract.EXE"


In [38]:
from transformers import pipeline
document_qa = pipeline(task="document-question-answering", model="impira/layoutlm-document-qa")
type(document_qa)

transformers.pipelines.document_question_answering.DocumentQuestionAnsweringPipeline

In [43]:
from IPython.display import Image, display

image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/b/b6/Paws_notebook_showing_how_to_load_wikidata_item_dictionary.png/836px-Paws_notebook_showing_how_to_load_wikidata_item_dictionary.png"
display(Image(url=image_url))

In [44]:
result = document_qa(
    image=image_url,
    question="What module was imported?",
)

result

[{'score': 0.9995822906494141, 'answer': 'pywikibot', 'start': 40, 'end': 40}]