# Aula 1: Conhecendo os modelos

**[Hugging Face](https://huggingface.co/)**


A Hugging Face é uma empresa focada em tecnologia de Inteligência Artificial (IA) e Processamento de Linguagem Natural (NLP), responsável por criar e manter uma plataforma aberta que facilita a criação, treinamento e utilização de modelos avançados de aprendizado de máquina, especialmente para tarefas relacionadas ao processamento de linguagem.

## Começando a resumir textos

**[Biblioteca Transformers](https://huggingface.co/docs/transformers/en/index)**


A biblioteca Transformers da Hugging Face é uma das ferramentas mais populares e utilizadas na comunidade de NLP. Ela oferece uma interface simplificada para trabalhar com modelos de linguagem baseados em transformadores, como BERT, GPT, T5 e muitos outros. Com suporte para diversas tarefas, como análise de sentimentos, tradução automática, e geração de texto, a biblioteca permite que desenvolvedores e pesquisadores implementem soluções de NLP de forma eficiente e eficaz.


**[Função pipeline da Biblioteca Transformers](https://huggingface.co/docs/transformers/en/main_classes/pipelines)**

A função pipeline da biblioteca Transformers é uma ferramenta poderosa e versátil que simplifica o uso de modelos pré-treinados para várias tarefas de processamento de linguagem natural (NLP). Ela fornece uma interface de alto nível que permite aos usuários realizar tarefas complexas de NLP com apenas algumas linhas de código.

A função pipeline serve como um ponto de entrada para diversas tarefas de NLP, como análise de sentimentos, tradução, sumarização, e geração de texto. Ela abstrai a complexidade dos modelos e da tokenização, oferecendo uma interface fácil de usar que simplifica a implementação dessas tarefas.

In [1]:
!pip install transformers



In [2]:
from transformers import pipeline

## Começando a traduzir textos

In [3]:
resumidor_texto = pipeline("summarization")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

Device set to use cpu


In [4]:
texto_exemplo = """
A inteligência artificial (IA) é uma área da ciência da computação que enfatiza a criação de máquinas inteligentes que trabalham e reagem como seres humanos.
Algumas das atividades que os computadores com inteligência artificial são
projetados para fazer incluem: reconhecimento de fala, aprendizado, planejamento e resolução de problemas. A pesquisa associada à inteligência artificial é altamente técnica e especializada.Os principais problemas da inteligência artificial incluem programação de computadores para certos traços como conhecimento,
raciocínio, solução de problemas, percepção, aprendizado, planejamento, habilidade
de manipular e mover objetos.
"""

In [5]:
resumo = resumidor_texto(texto_exemplo, max_length=120, min_length=70)

In [6]:
resumo

[{'summary_text': ' A pesquisa associada to inteligência artificial is altamente técnica e especializada . The IAI enfatiza a criação de máquinas inteligentes that trabalham e reagem como seres humanos . Algumas das atividades que computadores com IAI sãoprojetados para fazer incluem: reconhecimento de fala, aprendizado, planejamento e resolução of problemas'}]

In [7]:
print(resumo[0]['summary_text'])

 A pesquisa associada to inteligência artificial is altamente técnica e especializada . The IAI enfatiza a criação de máquinas inteligentes that trabalham e reagem como seres humanos . Algumas das atividades que computadores com IAI sãoprojetados para fazer incluem: reconhecimento de fala, aprendizado, planejamento e resolução of problemas


In [8]:
texto_ingles = """
Artificial intelligence (AI) is a field of computer science that emphasizes the
creation of intelligent machines that work and react like humans. Some of the
activities that computers with artificial intelligence are designed to perform
include speech recognition, learning, planning, and problem-solving. Research
associated with artificial intelligence is highly technical and specialized.
The main problems of artificial intelligence include programming computers for
certain traits such as knowledge, reasoning, problem-solving, perception,
learning, planning, and the ability to manipulate and move objects.
"""

In [9]:
resumo_ingles = resumidor_texto(texto_ingles, max_length=120, min_length=70)

Your max_length is set to 120, but your input_length is only 115. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=57)


In [10]:
print(resumo_ingles[0]['summary_text'])

 Artificial intelligence (AI) is a field of computer science that emphasizes the creation of intelligent machines that work and react like humans . The main problems of artificial intelligence include programming computers for certain traits such as knowledge, reasoning, problem-solving, perception, learning, planning, and the ability to manipulate and move objects . Research with artificial intelligence is highly technical and specialized .


In [11]:
tratutor = pipeline("translation_en_to_fr")

No model was supplied, defaulted to google-t5/t5-base and revision a9723ea (https://huggingface.co/google-t5/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu


In [12]:
texto_ingles = """
Artificial intelligence (AI) is a field of computer science that emphasizes the
creation of intelligent machines that work and react like humans. Some of the
activities that computers with artificial intelligence are designed to perform
include speech recognition, learning, planning, and problem-solving.'
"""

In [13]:
traducao = tratutor(texto_ingles,  max_length=400, min_length=100)

Both `max_new_tokens` (=256) and `max_length`(=400) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


In [14]:
traducao

[{'translation_text': "L'intelligence artificielle (AI) est un domaine d'informatique qui met l'accent sur la création de machines intelligentes qui fonctionnent et réagissent comme les humains. Parmi les activités que les ordinateurs dotés d'une intelligence artificielle sont conçus pour effectuer figurent la reconnaissance de la parole, l'apprentissage, la planification et la résolution de problèmes."}]

In [15]:
print(traducao[0]['translation_text'])

L'intelligence artificielle (AI) est un domaine d'informatique qui met l'accent sur la création de machines intelligentes qui fonctionnent et réagissent comme les humains. Parmi les activités que les ordinateurs dotés d'une intelligence artificielle sont conçus pour effectuer figurent la reconnaissance de la parole, l'apprentissage, la planification et la résolution de problèmes.


Aula 2: Resumo de textos

Escolhendo modelo

In [16]:
from transformers import pipeline

In [17]:
modelo_resumo = "facebook/bart-large-cnn"

In [18]:
resumidor_eng = pipeline("summarization", model=modelo_resumo)

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu


In [19]:
texto_classificacao = """
Classification in machine learning involves assigning labels to input data based on its features and is fundamental in applications like spam detection, medical diagnosis, and image categorization. The process includes data collection and preparation, feature selection, choosing an algorithm (such as logistic regression, decision trees, SVM, k-NN, or neural networks), training, evaluation, hyperparameter tuning, and deployment. Despite its power, classification faces challenges like imbalanced datasets, ensuring model generalization, and handling noisy data. Successful classification models require careful handling of these steps and challenges to effectively solve real-world problems.
"""

In [20]:
resumo = resumidor_eng(texto_classificacao, max_length=70, min_length=30)

In [21]:
resumo

[{'summary_text': 'Classification in machine learning involves assigning labels to input data based on its features. Despite its power, classification faces challenges like imbalanced datasets. Successful classification models require careful handling of these steps.'}]

In [22]:
print(resumo[0]['summary_text'])

Classification in machine learning involves assigning labels to input data based on its features. Despite its power, classification faces challenges like imbalanced datasets. Successful classification models require careful handling of these steps.


Aplicando o resumo

In [23]:
texto_powerbi_en = """
Power BI is a data analysis and reporting tool developed by Microsoft. Its main function is to transform raw data into interactive and understandable visual information. Power BI is widely used by companies of all sizes to improve decision-making and business strategies.
One of the main attractions of Power BI is its ability to integrate with a variety of data sources, such as SQL databases, Excel files, cloud services like Azure and Google Analytics, among others. This allows users to centralize all their data on a single platform, facilitating analysis and information sharing.
Power BI is composed of three main components: Power BI Desktop, Power BI Service, and Power BI Mobile. Power BI Desktop is a desktop application that allows users to create detailed reports and dashboards. Power BI Service is an online platform where reports and dashboards can be published and shared with other members of the organization. Finally, Power BI Mobile allows users to access their reports and dashboards from anywhere using mobile devices.
One of the most powerful features of Power BI is the ability to create interactive visualizations. These visualizations allow users to explore data in various ways, identifying patterns and trends that might go unnoticed in traditional data tables. Additionally, Power BI offers a vast library of customizable visualizations, including bar charts, line charts, geographic maps... //Trecho omitido """

In [27]:
len(texto_powerbi_en)

1440

In [24]:
def carregar_modelo(nome_modelo):
  resumidor = pipeline("summarization", model=nome_modelo)
  return resumidor

In [25]:
def resumir_texto(texto):
  resumidor_texto = carregar_modelo("facebook/bart-large-cnn")
  resumo = resumidor_texto(texto, max_length=200, min_length=100)
  resumo_texto = resumo[0]['summary_text']
  return resumo_texto

In [26]:
resumir_texto(texto_powerbi_en)

Device set to use cpu


'Power BI is a data analysis and reporting tool developed by Microsoft. Its main function is to transform raw data into interactive and understandable visual information. Power BI is widely used by companies of all sizes to improve decision-making and business strategies. One of the most powerful features of Power BI are the ability to create interactive visualizations. These visualizations allow users to explore data in various ways, identifying patterns and trends that might go unnoticed in traditional data tables. The tool is composed of three main components: Power BI Desktop, Power BI Service, and Power BI Mobile.'