<a href="https://colab.research.google.com/github/jordipozo/CEIABD/blob/main/PIPELINE_HUGGING_FACE.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# USO DE PIPELINES EN HUGGING FACE

In [1]:
from transformers import pipeline

## Análisis de sentimientos

In [6]:
nlp_sentiment_analysis = pipeline("sentiment-analysis")
text_sentiment = "We are very sad to include pipeline into the transformers repository"
nlp_sentiment_analysis(text_sentiment)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'NEGATIVE', 'score': 0.9994685053825378}]

## Pregunta - Respuesta

In [8]:
nlp_qa = pipeline("question-answering")
context = "Jim is a new consultant in Google labs since past January"
question = "How long Jim has worked at Google labs?"
nlp_qa({
    'question': question,
    'context': context
})

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'score': 0.6414128541946411,
 'start': 39,
 'end': 57,
 'answer': 'since past January'}

### Varias preguntas sobre un mismo texto

In [32]:
from transformers import pipeline

# Cargar el pipeline para la tarea de pregunta-respuesta
qa_pipeline = pipeline("question-answering", model="MMG/bert-base-spanish-wwm-cased-finetuned-spa-squad2-es-finetuned-sqac")



In [33]:
# Definir el contexto y las preguntas

text = r"""
El texto científico es aquel que presenta el desarrollo de una investigación o que aborda conocimientos propios de algún área de la ciencia e incorpora resultados, pruebas y argumentos para sustentarlos.
Por ejemplo: El origen de las especies, de Charles Darwin.
El texto científico tiene como objetivo principal transmitir conocimientos de manera rigurosa, por eso para elaborar las hipótesis y teorías que expone, utiliza el método científico. Asimismo, suele presentar un lenguaje técnico, formal y objetivo, ya que es un tipo de texto informativo, que además está destinado a un público con determinada formación en un campo particular de la ciencia.
"""

questions = [
    "Qué presenta o aborda el texto científico?",
    "Cuál es el objetivo principal del texto científico?",
    "Qué tipo de lenguaje suele presentar el texto científico?",
]


In [34]:
# Procesar cada pregunta
for question in questions:
    result = qa_pipeline(question=question, context=text)
    print(f"Question: {question}")
    print(f"Answer: {result['answer']}\n")

Question: Qué presenta o aborda el texto científico?
Answer: el desarrollo de una investigación

Question: Cuál es el objetivo principal del texto científico?
Answer: transmitir conocimientos de manera rigurosa

Question: Qué tipo de lenguaje suele presentar el texto científico?
Answer: técnico, formal y objetivo



## Named Entity Recognition

In [11]:
nlp = pipeline("ner")
text_ner = "European authorities fined Google a record $5.1 billion on Wednesday for abusing its power in the mobile phone market and ordered the company to alter its practices"
nlp(text_ner)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'entity': 'I-MISC',
  'score': 0.9980934,
  'index': 1,
  'word': 'European',
  'start': 0,
  'end': 8},
 {'entity': 'I-ORG',
  'score': 0.9990614,
  'index': 4,
  'word': 'Google',
  'start': 27,
  'end': 33}]

## Extracción de características

In [12]:
nlp_fe = pipeline("feature-extraction")
text_fe = "We are very happy to include pipeline into the transformers repository"
nlp_fe(text_fe)

No model was supplied, defaulted to distilbert/distilbert-base-cased and revision 935ac13 (https://huggingface.co/distilbert/distilbert-base-cased).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/465 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/263M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

[[[0.48452088236808777,
   0.18782423436641693,
   -0.23446080088615417,
   -0.26551875472068787,
   -0.37308642268180847,
   -0.19289185106754303,
   0.2006969302892685,
   0.04009896144270897,
   -0.023532511666417122,
   -0.9145359992980957,
   -0.2768729329109192,
   0.021916253492236137,
   -0.12369903922080994,
   -0.02056872844696045,
   -0.4850296378135681,
   0.009693970903754234,
   0.08380338549613953,
   0.18940851092338562,
   -0.055479928851127625,
   -0.15175722539424896,
   0.09595902264118195,
   -0.23599773645401,
   0.540239691734314,
   -0.23270918428897858,
   0.2062351256608963,
   -0.07206831872463226,
   0.34423646330833435,
   0.21119612455368042,
   -0.24598738551139832,
   0.23761677742004395,
   -0.06334802508354187,
   0.3319413959980011,
   0.006383721251040697,
   0.0114351911470294,
   -0.3018612861633301,
   0.27772241830825806,
   -0.07829474657773972,
   -0.19239696860313416,
   -0.14876027405261993,
   -0.18337702751159668,
   -0.38243240118026733,
 