<a href="https://colab.research.google.com/github/resulcaliskan/articles/blob/master/transformers_pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Discover the transformers

The 🤗 Transformers library provides the functionality to create and use shared models. The Model Hub contains thousands of pretrained models that anyone can download and use.

In [None]:
!pip install transformers

# Working with pipelines

In [2]:
from transformers import pipeline

Some of the currently available pipelines are:  

feature-extraction (get the vector representation of a text)  
fill-mask  
ner (named entity recognition)  
question-answering  
sentiment-analysis  
summarization  
text-generation  
translation  
zero-shot-classification  
Let’s have a look at a few of these!  

## sentiment-analysis

In [3]:
classifier = pipeline("sentiment-analysis")

In [4]:
classifier([
            "I like hiking in the greenness and sightseeing along the bosphorus. My best leasure time hobby.", 
            "Russell is the founding father of the AI Community."
            ])

[{'label': 'POSITIVE', 'score': 0.9977815747261047},
 {'label': 'POSITIVE', 'score': 0.9988616108894348}]

In [5]:
classifier("We have been fighting against very fierce forest fire. We have lost most of the forrest in south of Turkey.")

[{'label': 'NEGATIVE', 'score': 0.9900261759757996}]

## Zero-shot classification

In [6]:
classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

{'labels': ['education', 'business', 'politics'],
 'scores': [0.844597339630127, 0.11197540909051895, 0.043427303433418274],
 'sequence': 'This is a course about the Transformers library'}

## Text generation

In [7]:
generator = pipeline("text-generation")

In [8]:
generator("If you want to learn transformers, you need to visit hugging face and do more")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'If you want to learn transformers, you need to visit hugging face and do more about it.'}]

## Mask filling

In [9]:
unmasker = pipeline("fill-mask")

In [10]:
unmasker("This notebook examples teach you all nlp <mask> models.", top_k=2)

[{'score': 0.022073592990636826,
  'sequence': 'This notebook examples teach you all nlp python models.',
  'token': 39825,
  'token_str': ' python'},
 {'score': 0.012379801832139492,
  'sequence': 'This notebook examples teach you all nlp regression models.',
  'token': 39974,
  'token_str': ' regression'}]

## Summarization

In [11]:
summarizer = pipeline("summarization")

In [12]:
text = """
One of the biggest concerns in science is bias—that scientists themselves, consciously or unconsciously, may put their thumbs on the scales and influence the outcomes of experiments. Boffins have come up with all sorts of tactics to try to eliminate it, from having their colleagues repeat their work to the “double blinding” common in clinical trials, when even the experimenters do not know which patients are receiving an experimental drug and which are getting a sugar-pill placebo.But gathering the data and running an experiment is not the only part of the process that can go awry. The methods chosen to analyse the data can also influence results. The point was dramatically demonstrated by two recent papers published in a journal called Surgery. Despite being based on the same dataset, they drew opposite conclusions about whether using a particular piece of kit during appendix-removal surgery reduced or increased the chances of infection. 
A new paper, from a large team of researchers headed by Martin Schweinsberg, a psychologist at the European School of Management and Technology, in Berlin, helps shed some light on why. Dr Schweinsberg gathered 49 different researchers by advertising his project on social media. Each was handed a copy of a dataset consisting of 3.9m words of text from nearly 8,000 comments made on Edge.org, an online forum for chatty intellectuals.Dr Schweinsberg asked his guinea pigs to explore two seemingly straightforward hypotheses. The first was that a woman’s tendency to participate would rise as the number of other women in a conversation increased. The second was that high-status participants would talk more than their low-status counterparts. Crucially, the researchers were asked to describe their analysis in detail by posting their methods and workflows to a website called DataExplained. That allowed Dr Schweinsberg to see exactly what they were up to.
In the end, 37 analyses were deemed sufficiently detailed to include. As it turned out, no two analysts employed exactly the same methods, and none got the same results. Some 29% of analysts reported that high-status participants were more likely to contribute. But 21% reported the opposite. (The remainder found no significant difference.) Things were less finely balanced with the first hypothesis, with 64% reporting that women do indeed participate more, if plenty of other women are present. But 21% concluded that the opposite was true.
The problem was not that any of the analyses were “wrong” in any objective sense. The differences arose because researchers chose different definitions of what they were studying, and applied different techniques. When it came to defining how much women spoke, for instance, some analysts plumped for the number of words in each woman’s comment. Others chose the number of characters. Still others defined it by the number of conversations that a woman participated in, irrespective of how much she actually said.
Academic status, meanwhile, was defined variously by job title, the number of citations a researcher had accrued, or their “h-index”, a number beloved by university managers which attempts to combine citation counts with the importance of the journals those citations appear in. The statistical techniques chosen also had an impact, though less than the choice of definitions. Some researchers chose linear-regression analysis; others went for logistic regression or a Kendall correlation.
Truth, in other words, can be a slippery customer, even for simple-sounding questions. What to do? One conclusion is that experimental design is critically important. Dr Schweinsberg hopes that platforms such as DataExplained can help solve the problem as well as revealing it, by allowing scientists to specify exactly how they chose to perform their analysis, allowing those decisions to be reviewed by others. It is probably not practical, he concedes, to check and re-check every result. But if many different analytical approaches point in the same direction, then scientists can be confident that their conclusion is the right one.
"""
# text link: https://www.economist.com/science-and-technology/2021/07/28/data-dont-lie-but-they-can-lead-scientists-to-opposite-conclusions

In [17]:
summarizer(text, min_length=10, max_length=40 )

[{'summary_text': ' Berlin psychologist Martin Schweinsberg gathered 49 different researchers by advertising his project on social media . Each was handed a copy of a dataset consisting of 3.9m words of text from nearly'}]

## Translation

In [14]:
# translator düzgün çalışması için aşağıdaki kütüphaneye ihtiyaç duyuyor.
!pip install sentencepiece



In [15]:
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-tr-en")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=839750.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=796647.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1563964.0, style=ProgressStyle(descript…




In [16]:
translator("Transformers ile NLP öğrenmek, hiç bu kadar kolay olmamıştı.")

[{'translation_text': "It's never been easier to learn the NLP with Transformers."}]