## Import lib

In [2]:
import os
from dotenv import load_dotenv
from langchain_huggingface import HuggingFaceEndpoint
from langchain import PromptTemplate, LLMChain
from transformers import pipeline
import tensorflow as tf
from transformers import AutoTokenizer

load_dotenv()

hf_api = os.getenv("HF_API_KEY")

Found Intel OpenMP ('libiomp') and LLVM OpenMP ('libomp') loaded at
the same time. Both libraries are known to be incompatible and this
can cause random crashes or deadlocks on Linux when loaded in the
same Python program.
Using threadpoolctl may cause crashes or deadlocks. For more
information and possible workarounds, please see
    https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md



## HF Endpoint API

In [5]:
llm = HuggingFaceEndpoint(
    repo_id="meta-llama/Meta-Llama-3-8B-Instruct",
    task="text-generation",
    max_new_tokens=50,
    do_sample=False,
    huggingfacehub_api_token=hf_api
)

llm.invoke("is victoria a lovely name ?")

  from .autonotebook import tqdm as notebook_tqdm


The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /Users/marinneyret/.cache/huggingface/token
Login successful


'?\nRe: is victoria a lovely name??\nI think Victoria is a beautiful and elegant name. It has a regal and sophisticated feel to it, which is fitting given the history of the name (Victoria being the name of the British monarch'

## Langchain PromptTemplate

In [15]:
question = "is victoria a lovely name ?"

template = """
Question: {question}
Answer: answer in french
"""
prompt = PromptTemplate(template=template, input_variables=question)


In [16]:
llm_chain = LLMChain(llm=llm, prompt=prompt)
llm_chain.invoke(question)

{'question': 'is victoria a lovely name ?',
 'text': 'Réponse: Oui, Victoria est un très beau prénom! Il est très populaire et a été porté par plusieurs femmes célèbres, notamment la reine Victoria du Royaume-Uni. Il a également un certain char'}

## HF Pipelines

In [31]:
classifier_pipe = pipeline("zero-shot-classification")
classifier_pipe(
    "This is a course about the Transformers library",
    candidate_labels=["tech", "business"],
)
classifier_pipe

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


AttributeError: 'ZeroShotClassificationPipeline' object has no attribute 'dtype'

In [30]:
translator_pipe = pipeline("translation_en_to_fr", model="google-t5/t5-small")
translator_pipe("I am english")

[{'translation_text': 'Ich bin Englisch'}]

## HF AUTO LIB
- pipeline() fonction is a combinaison of AutoTokenizer and AutoModel that works under the hood.

In [12]:
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification

model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = TFAutoModelForSequenceClassification.from_pretrained(model_name)


Some layers from the model checkpoint at nlptown/bert-base-multilingual-uncased-sentiment were not used when initializing TFBertForSequenceClassification: ['dropout_37']
- This IS expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertForSequenceClassification were initialized from the model checkpoint at nlptown/bert-base-multilingual-uncased-sentiment.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForSequenceClassification for predictions without further training.


In [28]:
tokenized_prompt = tokenizer(["je suis une enfant qui adore Paris", "et moi je deteste Paris"],
                    padding=True,
                    truncation=True,
                    max_length=512,
                    return_tensors="tf")

tf_outputs = model(tokenized_prompt)

tf_predictions = tf.nn.softmax(tf_outputs.logits, axis=-1)
tf_predictions

<tf.Tensor: shape=(2, 5), dtype=float32, numpy=
array([[0.00731268, 0.00752774, 0.06288627, 0.24432226, 0.67795104],
       [0.5660091 , 0.24136913, 0.10153452, 0.03901039, 0.05207687]],
      dtype=float32)>