<a href="https://colab.research.google.com/github/jgleaves7/HuggingFace/blob/main/01_DemoModelos_LLM_PruebaTecnica_IngeniroIA_AsDeporte.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Prueba técnica AI Engineer
**AS Deporte**

El siguiente programa usa el modelo GPT2 en su función de generador de text. Es decir a partir de una entrada de texto el modelo completará la frase o historia previa.

La información especifica del modelo se encuentra:
[GPT-2 Especificaciones](https://huggingface.co/docs/transformers/model_doc/gpt2?usage=AutoModel#gpt-2)

In [None]:
# Instalaciones de kibrerías en el ambiente de trabajo
#pip install transformers torch bitsandbytes accelerate
#Línea comentada ya que las bibliotecas han sido instaladas previamente

In [1]:
import torch
# Carga del modelo
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")

#Carga de modelo optimizada

modelq = AutoModelForCausalLM.from_pretrained("openai-community/gpt2",torch_dtype=torch.float16, device_map="auto", attn_implementation="sdpa")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [2]:
# Comparativo de uso de memoria
gbsO = model.get_memory_footprint() / 1e9
gbs = modelq.get_memory_footprint() / 1e9

print(f"Numero de parametros modelo Original: {model.num_parameters()}")
print(f"Numero de parametros modelo Optimizado: {modelq.num_parameters()}")
print(f"Memory footprint FP32: {gbsO:.2f} GB")
print(f"Memory footprint FB16: {gbs:.2f} GB")

Numero de parametros modelo Original: 124439808
Numero de parametros modelo Optimizado: 124439808
Memory footprint FP32: 0.51 GB
Memory footprint FB16: 0.26 GB


In [3]:
# Crear función de salida

def Generate(text):
  # eneración de tokens del texto de entrada
  input_ids = tokenizer(text, return_tensors="pt").input_ids
  # Modelos de generación de texto
  output_idsq = modelq.generate(input_ids, do_sample=True,temperature=0.4, repetition_penalty=1.2, max_new_tokens=40)
  output_ids = model.generate(input_ids, do_sample=True,temperature=0.4, repetition_penalty=1.2, max_new_tokens=40)
  # Decodificación de los tokens generados

  decoded_textq = tokenizer.decode(output_idsq[0])
  decoded_text = tokenizer.decode(output_ids[0])
  return f"Generated text sin cuantificación: {decoded_text}/n Generated text con cuantificación: {decoded_textq}"

In [4]:
# Test de la función
Generate("I can't believe you did such a ")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


"Generated text sin cuantificación: I can't believe you did such a !!!\nMy husband and I have been going through this for years. He is an amazing guy, he has always had fun with his wife (and kids) so we've gotten along great! We/n Generated text con cuantificación: I can't believe you did such a \xa0long time ago. You're not even from the US, but I'm sure it's possible that your parents were in China or somewhere else if they weren\nthe ones who got to know me"

In [5]:
# Importación de Biblioteca para interface
import gradio as gr

In [7]:
iface = gr.Interface(
    fn=Generate,
    inputs=gr.Textbox(lines=2, placeholder="Escriba una oración aquí..."),
    outputs=gr.Textbox(placeholder="Ejemplo de texto generado"),
    title="Creador de historias",
    description="Introduce una oración para crear más contexto sobre el texto inicial. El modelo puede generar texto."
)
iface.launch()

It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://c3cd41c704cce609e0.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


