In [1]:
import transformers
from transformers import AutoTokenizer
import torch
import os
print(f"number of GPUs: torch.cuda.device_count()")
print(torch.__version__)

number of GPUs: torch.cuda.device_count()
2.6.0+rocm6.1


In [2]:
model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct" 

if  torch.cuda.is_available():
    device = "cuda"
else:
    raise ValueError("No se reconoció GPU.")

pipeline = transformers.pipeline(
	"text-generation", 
	model=model_id,
	model_kwargs={"torch_dtype": torch.bfloat16},
	device=device
)

# Tokenizer necesario para contar tokens
tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Device set to use cuda


In [None]:
# Directorios de entrada y salida
input_dir = "dev/"
output_dir = "out/"
os.makedirs(output_dir, exist_ok=True)

# Instrucciones para el modelo
prompt = [
    {"role": "system", 
     "content": 
        """ 
        Quiero que identifiques entidades nombradas que requieren ser anonimizadas en el informe clínico que copio entre comillas al final de esta instrucción. Quiero que me des el resultado en formato .xml in-line, donde las entidades sean identificadas por etiquetas en el mismo texto. Quiero que etiquetes con los criterios MEDDOCAN. A continuación, te muestro un ejemplo que contiene:
        - El texto original del informe en formato plano (.txt)
        - La representación estructurada del mismo en XML con etiquetas semánticas detalladas y posiciones de texto (atributos start, end, text, TYPE, etc.).
        Tu tarea será generar un XML con las mismas reglas de estructura y etiquetado a partir de cada texto clínico. Instrucciones:
        - Conserva el formato exacto del XML del ejemplo.
        - Cada etiqueta tiene que tener el tipo de entidad (`TYPE`) del inventario de MEDDOCAN. Los tipos de entidad que puedes usar son los siguientes: 
            NOMBRE_SUJETO_ASISTENCIA
            EDAD_SUJETO_ASISTENCIA
            SEXO_SUJETO_ASISTENCIA
            FAMILIARES_SUJETO_ASISTENCIA
            NOMBRE_PERSONAL_SANITARIO
            FECHAS
            PROFESION
            HOSPITAL
            CENTRO_SALUD
            INSTITUCION
            CALLE
            TERRITORIO
            PAIS
            NUMERO_TELEFONO
            NUMERO_FAX
            CORREO_ELECTRONICO
            ID_SUJETO_ASISTENCIA
            ID_CONTACTO_ASISTENCIAL
            ID_ASEGURAMIENTO
            ID_TITULACION_PERSONAL_SANITARIO
            ID_EMPLEO_PERSONAL_SANITARIO
            IDENTIF_VEHICULOS_NRSERIE_PLACAS
            IDENTIF_DISPOSITIVOS_NRSERIE
            DIREC_PROT_INTERNET
            URL_WEB
            IDENTIF_BIOMETRICOS
            OTRO_NUMERO_IDENTIF
            OTROS_SUJETO_ASISTENCIA
          - y un campo de comentario (`comment`) vacío
        Cuando te dé un nuevo texto, responde solo con el XML, sin explicaciones adicionales.
    
        Ejemplo - Informe en formato .txt: 
        Datos del paciente.
        Nombre: María Soledad Moreno Roca
        DNI: 23556552K
        Fecha de nacimiento: 09/01/1941
        Género: Mujer
        Domicilio: Calle de Almagro 80
        Ciudad: Denia, Valencia, Comunidad Valenciana
        Código postal: 46571
        Email: mariasoledad_roca@ucm.es
        Teléfono fijo: +34 960 66 89 48
        Teléfono móvil: +34 660 57 14 97
        NHC: 2409425
        NASS: 468043486571
        Condición de riesgo: Científico de Investigación
        
        Datos asistenciales.
        Médico: Dr. Juan Ramón Benito Vicente. NC 097900390. Investigador Clínico en Epidemiología. Instituto de Investigación Biomédica en Red de Enfermedades Infecciosas (CIBERINFEC). Avenida Monforte de Lemos 3-5. 28029. Madrid. España.
        Fecha de ingreso: 05/06/1996
        Centro de salud: Centro de Salud Carabanchel
        
        Informe clínico del paciente:
        Paciente sobreviviente de violencia de 55 años de edad, acompañado de su madre. 
        
        Ejemplo - Informe en formato .xml: lo que debes generar
        <?xml version='1.0' encoding='UTF-8'?>
        <MEDDOCAN>
          <TEXT>
        Ejemplo - Informe en formato .txt: 
        Datos del paciente.
        Nombre:  <TAG TYPE="NOMBRE_SUJETO_ASISTENCIA">María Soledad</TAG> <TAG TYPE="NOMBRE_SUJETO_ASISTENCIA">Moreno Roca</TAG>
        DNI: <TAG TYPE="ID_SUJETO_ASISTENCIA">23556552K</TAG>
        Fecha de nacimiento: <TAG TYPE="FECHAS">09/01/1941</TAG>
        Género: <TAG TYPE="SEXO_SUJETO_ASISTENCIA">Mujer</TAG>
        Domicilio: <TAG TYPE="CALLE">Calle de Almagro 80</TAG>.
        Ciudad: <TAG TYPE="TERRITORIO">Denia</TAG>, <TAG TYPE="TERRITORIO">Valencia</TAG>, <TAG TYPE="TERRITORIO">Comunidad Valenciana</TAG>
        Código postal: <TAG TYPE="TERRITORIO">46571</TAG>
        Email: <TAG TYPE="CORREO_ELECTRONICO">mariasoledad_roca@ucm.es</TAG>
        Teléfono fijo: <TAG TYPE="NUMERO_TELEFONO">+34 960 66 89 48</TAG>
        Teléfono móvil: <TAG TYPE="NUMERO_TELEFONO">+34 660 57 14 97</TAG>
        NHC: <TAG TYPE="ID_SUJETO_ASISTENCIA">2409425</TAG>
        NASS: <TAG TYPE="ID_ASEGURAMIENTO">468043486571</TAG>
        Condición de riesgo: <TAG TYPE="PROFESION">Científico de Investigación</TAG>
        
        Datos asistenciales.
        Médico: Dr. <TAG TYPE="NOMBRE_PERSONAL_SANITARIO">Juan Ramón Benito Vicente</TAG>. NC <TAG TYPE="ID_TITULACION_PERSONAL_SANITARIO">097900390</TAG>. <TAG TYPE="ID_EMPLEO_PERSONAL_SANITARIO">Investigador Clínico en Epidemiología</TAG>. <TAG TYPE="INSTITUCION">Instituto de Investigación Biomédica en Red de Enfermedades Infecciosas (CIBERINFEC)</TAG>. <TAG TYPE="CALLE">Avenida Monforte de Lemos 3-5</TAG>. <TAG TYPE="TERRITORIO">28029</TAG>. <TAG TYPE="TERRITORIO">Madrid</TAG>. <TAG TYPE="PAIS">España</TAG>.
        Fecha de ingreso: <TAG TYPE="FECHAS">05/06/1996</TAG>
        Centro de salud: <TAG TYPE="CENTRO_SALUD">Centro de Salud Carabanchel</TAG>
        
        Informe clínico del paciente:
        Paciente <TAG TYPE="OTROS_SUJETO_ASISTENCIA">sobreviviente de violencia</TAG> de <TAG TYPE="EDAD_SUJETO_ASISTENCIA">55 años</TAG> de edad, acompañado de su <TAG TYPE="FAMILIARES_SUJETO_ASISTENCIA">madre</TAG>.
            </TEXT>
        </MEDDOCAN>
    
        Recordá que en ningún caso debes incluir advertencias, explicaciones ni descripciones sobre la tarea, sobre la instrucción que te he dado o sobre cuestiones de funcionamiento del modelo de lenguaje.
        """},
     ]

# Configuración de tokens
MAX_CONTEXT_TOKENS = 8192
MAX_GENERATION_TOKENS = 4000
MAX_INPUT_TOKENS = MAX_CONTEXT_TOKENS - MAX_GENERATION_TOKENS



# Procesar cada archivo .txt
for filename in os.listdir(input_dir):
    if filename.endswith(".txt"):
        filepath = os.path.join(input_dir, filename)
        with open(filepath, "r", encoding="utf-8") as f:
            texto = f.read()

        # Crear mensaje estilo chat
        prompt_text = prompt[0]["content"]
        messages = [
            {"role": "system", "content": prompt_text},
            {"role": "user", "content": texto}
        ]

        # Calcular tokens de entrada
        full_prompt = prompt_text + texto
        total_tokens = len(tokenizer.encode(full_prompt))
        print(f"{filename}: Tokens de entrada: {total_tokens}")

        # Truncar el prompt si se pasa del límite permitido
        if total_tokens > MAX_INPUT_TOKENS:
            print(f"Truncando prompt: {filename}")
            # Calcular los tokens disponibles para el prompt
            max_tokens_prompt = MAX_INPUT_TOKENS - len(tokenizer.encode(texto))
            
            # Truncar el prompt para ajustarlo al límite de tokens
            prompt_tokens = tokenizer.encode(prompt[0]["content"])
            truncated_prompt_tokens = prompt_tokens[:max_tokens_prompt]
            
            # Decodificar los tokens truncados y actualizar el prompt
            truncated_prompt = tokenizer.decode(truncated_prompt_tokens, skip_special_tokens=True)
            messages[0]["content"] = truncated_prompt

        # Generar texto
        output = pipeline(messages, max_new_tokens=MAX_GENERATION_TOKENS)

        # Extraer solo el contenido generado por el modelo
        respuesta = output[0]["generated_text"][2]["content"]

        # Guardar en .xml
        output_filename = os.path.splitext(filename)[0] + ".xml"
        output_path = os.path.join(output_dir, output_filename)
        with open(output_path, "w", encoding="utf-8") as out_f:
            out_f.write(respuesta)

        print(f"Procesado: {filename} → {output_filename}")

print("Proceso completado.")

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


721821248.txt: Tokens de entrada: 1964


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 721821248.txt → 721821248.xml
640351088.txt: Tokens de entrada: 1993


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 640351088.txt → 640351088.xml
660438891.txt: Tokens de entrada: 1953


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 660438891.txt → 660438891.xml
485409220.txt: Tokens de entrada: 1994


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 485409220.txt → 485409220.xml
654170593.txt: Tokens de entrada: 1952


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 654170593.txt → 654170593.xml
632267394.txt: Tokens de entrada: 1966


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 632267394.txt → 632267394.xml
547849267.txt: Tokens de entrada: 1910


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 547849267.txt → 547849267.xml
527978714.txt: Tokens de entrada: 1943


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 527978714.txt → 527978714.xml
624457565.txt: Tokens de entrada: 1974


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 624457565.txt → 624457565.xml
505661421.txt: Tokens de entrada: 1960


You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 505661421.txt → 505661421.xml
597181863.txt: Tokens de entrada: 1914


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 597181863.txt → 597181863.xml
503605741.txt: Tokens de entrada: 1976


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 503605741.txt → 503605741.xml
560374028.txt: Tokens de entrada: 1941


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 560374028.txt → 560374028.xml
705042332.txt: Tokens de entrada: 2040


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 705042332.txt → 705042332.xml
597310456.txt: Tokens de entrada: 1976


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 597310456.txt → 597310456.xml
540108868.txt: Tokens de entrada: 1920


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 540108868.txt → 540108868.xml
709920014.txt: Tokens de entrada: 1967


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 709920014.txt → 709920014.xml
660403240.txt: Tokens de entrada: 1942


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 660403240.txt → 660403240.xml
645172339.txt: Tokens de entrada: 1966


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 645172339.txt → 645172339.xml
690073045.txt: Tokens de entrada: 1911


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 690073045.txt → 690073045.xml
696321479.txt: Tokens de entrada: 2013


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 696321479.txt → 696321479.xml
483248189.txt: Tokens de entrada: 1963


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 483248189.txt → 483248189.xml
604932017.txt: Tokens de entrada: 1972


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 604932017.txt → 604932017.xml
729626002.txt: Tokens de entrada: 1987


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 729626002.txt → 729626002.xml
678788106.txt: Tokens de entrada: 1925


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 678788106.txt → 678788106.xml
598148722.txt: Tokens de entrada: 1959


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 598148722.txt → 598148722.xml
616361252.txt: Tokens de entrada: 1983


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 616361252.txt → 616361252.xml
677503499.txt: Tokens de entrada: 1946


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 677503499.txt → 677503499.xml
497801942.txt: Tokens de entrada: 1910


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 497801942.txt → 497801942.xml
498186154.txt: Tokens de entrada: 1926


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 498186154.txt → 498186154.xml
702865317.txt: Tokens de entrada: 1991


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 702865317.txt → 702865317.xml
563018431.txt: Tokens de entrada: 1970


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 563018431.txt → 563018431.xml
730086383.txt: Tokens de entrada: 1950


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 730086383.txt → 730086383.xml
702219044.txt: Tokens de entrada: 1943


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 702219044.txt → 702219044.xml
655637875.txt: Tokens de entrada: 1969


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 655637875.txt → 655637875.xml
658320128.txt: Tokens de entrada: 1948


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 658320128.txt → 658320128.xml
612699570.txt: Tokens de entrada: 1958


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 612699570.txt → 612699570.xml
732723029.txt: Tokens de entrada: 2018


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 732723029.txt → 732723029.xml
652947676.txt: Tokens de entrada: 1989


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 652947676.txt → 652947676.xml
506050914.txt: Tokens de entrada: 1937


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 506050914.txt → 506050914.xml
547282685.txt: Tokens de entrada: 1960


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 547282685.txt → 547282685.xml
526303461.txt: Tokens de entrada: 1976


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 526303461.txt → 526303461.xml
606063227.txt: Tokens de entrada: 1930


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 606063227.txt → 606063227.xml
513633424.txt: Tokens de entrada: 1927


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 513633424.txt → 513633424.xml
503383766.txt: Tokens de entrada: 1970


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 503383766.txt → 503383766.xml
513665572.txt: Tokens de entrada: 1948


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 513665572.txt → 513665572.xml
688634239.txt: Tokens de entrada: 1946


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 688634239.txt → 688634239.xml
649565923.txt: Tokens de entrada: 2015


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 649565923.txt → 649565923.xml
501581893.txt: Tokens de entrada: 1943


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 501581893.txt → 501581893.xml
575005901.txt: Tokens de entrada: 1929


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 575005901.txt → 575005901.xml
717173872.txt: Tokens de entrada: 1960


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 717173872.txt → 717173872.xml
575867870.txt: Tokens de entrada: 1983


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 575867870.txt → 575867870.xml
506716929.txt: Tokens de entrada: 1959


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 506716929.txt → 506716929.xml
648933728.txt: Tokens de entrada: 1990


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 648933728.txt → 648933728.xml
511664645.txt: Tokens de entrada: 1913


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 511664645.txt → 511664645.xml
521505023.txt: Tokens de entrada: 1930


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 521505023.txt → 521505023.xml
689661570.txt: Tokens de entrada: 2003


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 689661570.txt → 689661570.xml
574995462.txt: Tokens de entrada: 2017


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 574995462.txt → 574995462.xml
645654307.txt: Tokens de entrada: 2014


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 645654307.txt → 645654307.xml
649782247.txt: Tokens de entrada: 1939


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 649782247.txt → 649782247.xml
734284528.txt: Tokens de entrada: 1976


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 734284528.txt → 734284528.xml
584016242.txt: Tokens de entrada: 1976


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 584016242.txt → 584016242.xml
605254894.txt: Tokens de entrada: 1974


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 605254894.txt → 605254894.xml
749550983.txt: Tokens de entrada: 1989


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 749550983.txt → 749550983.xml
515061887.txt: Tokens de entrada: 1960


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 515061887.txt → 515061887.xml
614776161.txt: Tokens de entrada: 1906


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 614776161.txt → 614776161.xml
601035657.txt: Tokens de entrada: 1974


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 601035657.txt → 601035657.xml
711034119.txt: Tokens de entrada: 1997


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 711034119.txt → 711034119.xml
718817047.txt: Tokens de entrada: 1996


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 718817047.txt → 718817047.xml
606432749.txt: Tokens de entrada: 1992


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 606432749.txt → 606432749.xml
589679414.txt: Tokens de entrada: 2006


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 589679414.txt → 589679414.xml
714896030.txt: Tokens de entrada: 1929


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 714896030.txt → 714896030.xml
691741259.txt: Tokens de entrada: 2012


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 691741259.txt → 691741259.xml
703225045.txt: Tokens de entrada: 2013


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 703225045.txt → 703225045.xml
600120072.txt: Tokens de entrada: 1979


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 600120072.txt → 600120072.xml
710987224.txt: Tokens de entrada: 1945


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 710987224.txt → 710987224.xml
694497546.txt: Tokens de entrada: 1969


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 694497546.txt → 694497546.xml
615131443.txt: Tokens de entrada: 1992


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 615131443.txt → 615131443.xml
706134751.txt: Tokens de entrada: 1941


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 706134751.txt → 706134751.xml
620358037.txt: Tokens de entrada: 1969


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 620358037.txt → 620358037.xml
715685977.txt: Tokens de entrada: 2001


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 715685977.txt → 715685977.xml
638583754.txt: Tokens de entrada: 1913


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 638583754.txt → 638583754.xml
538571939.txt: Tokens de entrada: 1942


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 538571939.txt → 538571939.xml
682979808.txt: Tokens de entrada: 2038


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 682979808.txt → 682979808.xml
667913623.txt: Tokens de entrada: 1984


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 667913623.txt → 667913623.xml
579940350.txt: Tokens de entrada: 1961
Procesado: 579940350.txt → 579940350.xml


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


507757764.txt: Tokens de entrada: 1909


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 507757764.txt → 507757764.xml
640702967.txt: Tokens de entrada: 1937


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 640702967.txt → 640702967.xml
645257288.txt: Tokens de entrada: 1984


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 645257288.txt → 645257288.xml
723982399.txt: Tokens de entrada: 1974


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 723982399.txt → 723982399.xml
725904059.txt: Tokens de entrada: 1976


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 725904059.txt → 725904059.xml
718004178.txt: Tokens de entrada: 1928


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 718004178.txt → 718004178.xml
752419591.txt: Tokens de entrada: 1974


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 752419591.txt → 752419591.xml
552845925.txt: Tokens de entrada: 1972


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 552845925.txt → 552845925.xml
642505224.txt: Tokens de entrada: 1961


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 642505224.txt → 642505224.xml
626444047.txt: Tokens de entrada: 1943


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 626444047.txt → 626444047.xml
605727580.txt: Tokens de entrada: 1973


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 605727580.txt → 605727580.xml
580859708.txt: Tokens de entrada: 1936


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 580859708.txt → 580859708.xml
700803973.txt: Tokens de entrada: 1924


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 700803973.txt → 700803973.xml
730875276.txt: Tokens de entrada: 2033


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 730875276.txt → 730875276.xml
573276782.txt: Tokens de entrada: 1937


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 573276782.txt → 573276782.xml
641355006.txt: Tokens de entrada: 1920


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 641355006.txt → 641355006.xml
565857132.txt: Tokens de entrada: 1965


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 565857132.txt → 565857132.xml
484827107.txt: Tokens de entrada: 1963


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 484827107.txt → 484827107.xml
577234398.txt: Tokens de entrada: 1930


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 577234398.txt → 577234398.xml
643938461.txt: Tokens de entrada: 1962


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 643938461.txt → 643938461.xml
654383847.txt: Tokens de entrada: 1986
Procesado: 654383847.txt → 654383847.xml


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


572730498.txt: Tokens de entrada: 1948


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 572730498.txt → 572730498.xml
718221461.txt: Tokens de entrada: 1924


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 718221461.txt → 718221461.xml
525799157.txt: Tokens de entrada: 1988
Procesado: 525799157.txt → 525799157.xml


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


539587419.txt: Tokens de entrada: 1951


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 539587419.txt → 539587419.xml
570856604.txt: Tokens de entrada: 1947


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 570856604.txt → 570856604.xml
728556392.txt: Tokens de entrada: 1957


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 728556392.txt → 728556392.xml
598104802.txt: Tokens de entrada: 1954


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 598104802.txt → 598104802.xml
531776211.txt: Tokens de entrada: 1929


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 531776211.txt → 531776211.xml
521957253.txt: Tokens de entrada: 1955


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 521957253.txt → 521957253.xml
625401701.txt: Tokens de entrada: 2000


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 625401701.txt → 625401701.xml
522115012.txt: Tokens de entrada: 1961


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 522115012.txt → 522115012.xml
505338393.txt: Tokens de entrada: 2028


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 505338393.txt → 505338393.xml
668930177.txt: Tokens de entrada: 1978


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 668930177.txt → 668930177.xml
580244218.txt: Tokens de entrada: 1966


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 580244218.txt → 580244218.xml
542501574.txt: Tokens de entrada: 1926


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 542501574.txt → 542501574.xml
615812605.txt: Tokens de entrada: 1966


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 615812605.txt → 615812605.xml
543538570.txt: Tokens de entrada: 1945


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 543538570.txt → 543538570.xml
668772548.txt: Tokens de entrada: 2018


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 668772548.txt → 668772548.xml
579557539.txt: Tokens de entrada: 1912


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 579557539.txt → 579557539.xml
584744406.txt: Tokens de entrada: 1950


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 584744406.txt → 584744406.xml
702527775.txt: Tokens de entrada: 1975


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 702527775.txt → 702527775.xml
666598978.txt: Tokens de entrada: 1945


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 666598978.txt → 666598978.xml
597185762.txt: Tokens de entrada: 1927


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 597185762.txt → 597185762.xml
560501168.txt: Tokens de entrada: 1956


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 560501168.txt → 560501168.xml
639408303.txt: Tokens de entrada: 1966


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 639408303.txt → 639408303.xml
666697529.txt: Tokens de entrada: 1964


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 666697529.txt → 666697529.xml
645553524.txt: Tokens de entrada: 1912


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 645553524.txt → 645553524.xml
727575879.txt: Tokens de entrada: 1992


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 727575879.txt → 727575879.xml
591177490.txt: Tokens de entrada: 1940


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 591177490.txt → 591177490.xml
594983058.txt: Tokens de entrada: 1992


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 594983058.txt → 594983058.xml
614741885.txt: Tokens de entrada: 1969


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 614741885.txt → 614741885.xml
611045770.txt: Tokens de entrada: 1926


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 611045770.txt → 611045770.xml
623187179.txt: Tokens de entrada: 2067


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 623187179.txt → 623187179.xml
674830774.txt: Tokens de entrada: 1958


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 674830774.txt → 674830774.xml
580199505.txt: Tokens de entrada: 1935


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 580199505.txt → 580199505.xml
502642392.txt: Tokens de entrada: 1923


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 502642392.txt → 502642392.xml
689178818.txt: Tokens de entrada: 1957


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 689178818.txt → 689178818.xml
495194539.txt: Tokens de entrada: 1957


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 495194539.txt → 495194539.xml
647965816.txt: Tokens de entrada: 1999


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 647965816.txt → 647965816.xml
549335221.txt: Tokens de entrada: 1983


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 549335221.txt → 549335221.xml
702273075.txt: Tokens de entrada: 1952


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 702273075.txt → 702273075.xml
648016230.txt: Tokens de entrada: 1993


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 648016230.txt → 648016230.xml
751362924.txt: Tokens de entrada: 1907


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 751362924.txt → 751362924.xml
720753386.txt: Tokens de entrada: 2005


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 720753386.txt → 720753386.xml
548395570.txt: Tokens de entrada: 1934


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 548395570.txt → 548395570.xml
697572931.txt: Tokens de entrada: 1949


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 697572931.txt → 697572931.xml
730122506.txt: Tokens de entrada: 1972


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 730122506.txt → 730122506.xml
676818656.txt: Tokens de entrada: 1973


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 676818656.txt → 676818656.xml
703994487.txt: Tokens de entrada: 1970
Procesado: 703994487.txt → 703994487.xml


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


637084534.txt: Tokens de entrada: 2035


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 637084534.txt → 637084534.xml
721353434.txt: Tokens de entrada: 1932


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 721353434.txt → 721353434.xml
555414714.txt: Tokens de entrada: 2047


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 555414714.txt → 555414714.xml
727997554.txt: Tokens de entrada: 1972


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 727997554.txt → 727997554.xml
734060094.txt: Tokens de entrada: 1974


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 734060094.txt → 734060094.xml
557096587.txt: Tokens de entrada: 1969


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 557096587.txt → 557096587.xml
698590732.txt: Tokens de entrada: 1972


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 698590732.txt → 698590732.xml
529088693.txt: Tokens de entrada: 1946
Procesado: 529088693.txt → 529088693.xml


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


553827609.txt: Tokens de entrada: 2008


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 553827609.txt → 553827609.xml
705162272.txt: Tokens de entrada: 1976


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 705162272.txt → 705162272.xml
574615527.txt: Tokens de entrada: 2028


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 574615527.txt → 574615527.xml
583303247.txt: Tokens de entrada: 1932


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 583303247.txt → 583303247.xml
742756726.txt: Tokens de entrada: 1941


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 742756726.txt → 742756726.xml
540437733.txt: Tokens de entrada: 1928


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 540437733.txt → 540437733.xml
610814709.txt: Tokens de entrada: 1973


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 610814709.txt → 610814709.xml
742323722.txt: Tokens de entrada: 1962


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 742323722.txt → 742323722.xml
586421126.txt: Tokens de entrada: 1985


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 586421126.txt → 586421126.xml
647065314.txt: Tokens de entrada: 1943


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 647065314.txt → 647065314.xml
643328585.txt: Tokens de entrada: 1964


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 643328585.txt → 643328585.xml
644944435.txt: Tokens de entrada: 1989


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 644944435.txt → 644944435.xml
612890741.txt: Tokens de entrada: 1944


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 612890741.txt → 612890741.xml
702978449.txt: Tokens de entrada: 1944


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 702978449.txt → 702978449.xml
723809095.txt: Tokens de entrada: 1949


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 723809095.txt → 723809095.xml
535159032.txt: Tokens de entrada: 1986


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 535159032.txt → 535159032.xml
662748955.txt: Tokens de entrada: 1994


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 662748955.txt → 662748955.xml
591592602.txt: Tokens de entrada: 1952


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 591592602.txt → 591592602.xml
614932894.txt: Tokens de entrada: 1911


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 614932894.txt → 614932894.xml
594060626.txt: Tokens de entrada: 1924


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 594060626.txt → 594060626.xml
618479066.txt: Tokens de entrada: 1972


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 618479066.txt → 618479066.xml
619104507.txt: Tokens de entrada: 1976


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 619104507.txt → 619104507.xml
667643216.txt: Tokens de entrada: 1944


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 667643216.txt → 667643216.xml
498454379.txt: Tokens de entrada: 2001


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 498454379.txt → 498454379.xml
685948375.txt: Tokens de entrada: 1994


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 685948375.txt → 685948375.xml
555112093.txt: Tokens de entrada: 1947


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 555112093.txt → 555112093.xml
521039011.txt: Tokens de entrada: 1958


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 521039011.txt → 521039011.xml
487700934.txt: Tokens de entrada: 1949


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 487700934.txt → 487700934.xml
644180013.txt: Tokens de entrada: 1990


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 644180013.txt → 644180013.xml
533173842.txt: Tokens de entrada: 1988


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 533173842.txt → 533173842.xml
504163138.txt: Tokens de entrada: 1957


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 504163138.txt → 504163138.xml
537238102.txt: Tokens de entrada: 1915


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 537238102.txt → 537238102.xml
496553695.txt: Tokens de entrada: 1917


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 496553695.txt → 496553695.xml
743560436.txt: Tokens de entrada: 1955


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 743560436.txt → 743560436.xml
719716225.txt: Tokens de entrada: 1914


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 719716225.txt → 719716225.xml
653894703.txt: Tokens de entrada: 1948


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 653894703.txt → 653894703.xml
607259377.txt: Tokens de entrada: 1915


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 607259377.txt → 607259377.xml
520095729.txt: Tokens de entrada: 1989


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 520095729.txt → 520095729.xml
702458137.txt: Tokens de entrada: 1973


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 702458137.txt → 702458137.xml
740144638.txt: Tokens de entrada: 1927


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 740144638.txt → 740144638.xml
720864737.txt: Tokens de entrada: 1930


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 720864737.txt → 720864737.xml
551347688.txt: Tokens de entrada: 2063


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 551347688.txt → 551347688.xml
643975816.txt: Tokens de entrada: 2083


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 643975816.txt → 643975816.xml
634032025.txt: Tokens de entrada: 2032


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 634032025.txt → 634032025.xml
511726152.txt: Tokens de entrada: 1955


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 511726152.txt → 511726152.xml
502064612.txt: Tokens de entrada: 1965


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 502064612.txt → 502064612.xml
484663980.txt: Tokens de entrada: 1925


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 484663980.txt → 484663980.xml
689208416.txt: Tokens de entrada: 1989


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 689208416.txt → 689208416.xml
499918490.txt: Tokens de entrada: 1983


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 499918490.txt → 499918490.xml
698991451.txt: Tokens de entrada: 1987


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 698991451.txt → 698991451.xml
648667451.txt: Tokens de entrada: 1973


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 648667451.txt → 648667451.xml
537509925.txt: Tokens de entrada: 2009


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 537509925.txt → 537509925.xml
531425456.txt: Tokens de entrada: 1899


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 531425456.txt → 531425456.xml
643530370.txt: Tokens de entrada: 1932


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 643530370.txt → 643530370.xml
695832037.txt: Tokens de entrada: 2000


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 695832037.txt → 695832037.xml
723319979.txt: Tokens de entrada: 1952


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 723319979.txt → 723319979.xml
551598243.txt: Tokens de entrada: 2001


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 551598243.txt → 551598243.xml
647918065.txt: Tokens de entrada: 1941


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 647918065.txt → 647918065.xml
618794775.txt: Tokens de entrada: 1980


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 618794775.txt → 618794775.xml
559948713.txt: Tokens de entrada: 2021


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 559948713.txt → 559948713.xml
640392927.txt: Tokens de entrada: 1979


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 640392927.txt → 640392927.xml
663935917.txt: Tokens de entrada: 1938


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 663935917.txt → 663935917.xml
483566004.txt: Tokens de entrada: 1958


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 483566004.txt → 483566004.xml
542182389.txt: Tokens de entrada: 1952


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 542182389.txt → 542182389.xml
493033844.txt: Tokens de entrada: 1929


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 493033844.txt → 493033844.xml
578297463.txt: Tokens de entrada: 1948


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 578297463.txt → 578297463.xml
515993781.txt: Tokens de entrada: 1918


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 515993781.txt → 515993781.xml
714352071.txt: Tokens de entrada: 1977


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 714352071.txt → 714352071.xml
637525813.txt: Tokens de entrada: 1915


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 637525813.txt → 637525813.xml
667930335.txt: Tokens de entrada: 2003


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 667930335.txt → 667930335.xml
557927115.txt: Tokens de entrada: 1976


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 557927115.txt → 557927115.xml
558344603.txt: Tokens de entrada: 1976


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 558344603.txt → 558344603.xml
750095317.txt: Tokens de entrada: 1923


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 750095317.txt → 750095317.xml
570109697.txt: Tokens de entrada: 1956


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 570109697.txt → 570109697.xml
491347017.txt: Tokens de entrada: 1956


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 491347017.txt → 491347017.xml
692237489.txt: Tokens de entrada: 1964


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 692237489.txt → 692237489.xml
736336430.txt: Tokens de entrada: 2004


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 736336430.txt → 736336430.xml
628631232.txt: Tokens de entrada: 1972


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 628631232.txt → 628631232.xml
691757573.txt: Tokens de entrada: 1954


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 691757573.txt → 691757573.xml
510314832.txt: Tokens de entrada: 2010
Procesado: 510314832.txt → 510314832.xml


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


484529658.txt: Tokens de entrada: 1936


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 484529658.txt → 484529658.xml
732380020.txt: Tokens de entrada: 1941


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 732380020.txt → 732380020.xml
719529678.txt: Tokens de entrada: 1916


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 719529678.txt → 719529678.xml
551573814.txt: Tokens de entrada: 1948


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 551573814.txt → 551573814.xml
723030912.txt: Tokens de entrada: 1960


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Procesado: 723030912.txt → 723030912.xml
605419284.txt: Tokens de entrada: 1985
Procesado: 605419284.txt → 605419284.xml
Proceso completado.
