## Introduction to Translation with Transformers and MLflow

In [1]:
# Disable tokenizers warnings when constructing pipelines
%env TOKENIZERS_PARALLELISM=false

import warnings

# Disable a few less-than-useful UserWarnings from setuptools and pydantic
warnings.filterwarnings("ignore", category = UserWarning)

env: TOKENIZERS_PARALLELISM=false


In [3]:
import transformers
import mlflow

model_architecture = "google/flan-t5-base"

translation_pipeline = transformers.pipeline(
    task = "translation_en_to_fr",
    model = transformers.T5ForConditionalGeneration.from_pretrained(
        model_architecture, max_length = 1000
    ),
    tokenizer = transformers.T5TokenizerFast.from_pretrained(model_architecture, return_tensors = 'pt')
)

## Testing the Translation Pipeline

In [4]:
# Evaluate the pipeline on a sample sentence prior to logging
translation_pipeline(
    "translate English to French: I enjoyed my slow saunter along the Champs-Élysées."
)

[{'translation_text': "J'ai apprécié mon sajour lente sur les Champs-Élysées."}]

## Setting Model Parameters and Inferring Signature

In [5]:
# Define the parameters that we are permitting to be used at inference time
import mlflow.models

model_params = {'max_length': 1000}

# Generate the model signature by providing an input the expected output
signature = mlflow.models.infer_signature(
    "This is a sample input sentence.",
    mlflow.transformers.generate_signature_output(translation_pipeline, "This is another sample."),
    params = model_params
)

## Reviewing the Model Signature

In [6]:
# Visualize the model signature
signature

inputs: 
  [string (required)]
outputs: 
  [string (required)]
params: 
  ['max_length': integer (default: 1000)]

## Create an experiment

In [7]:
mlflow.set_experiment("Translation")

<Experiment: artifact_location='file:///e:/MLFlow/mlruns/410751203367765635', creation_time=1731914900050, experiment_id='410751203367765635', last_update_time=1731914900050, lifecycle_stage='active', name='Translation', tags={}>

## Logging the Model with MLflow

In [8]:
with mlflow.start_run():
    model_info = mlflow.transformers.log_model(
        transformers_model = translation_pipeline,
        artifact_path = "french_translator",
        signature = signature,
        model_params = model_params, 
    )

Non-default generation parameters: {'max_length': 300, 'early_stopping': True, 'num_beams': 4}







## Ensuring Component Integrity and Functionality 
Inspecting these components ensures that:
* The model aligns with our task requirements.
* Hardware resources are optimally utilized.
* Text inputs are correctly preprocessed for model consumption.
* The model’s compatibility with the selected deep learning framework is confirmed.

In [9]:
# Load our saved model as a dictionary of components, comprising the model
translation_components = mlflow.transformers.load_model(
    model_info.model_uri, return_type = 'components'
)

for key, value in translation_components.items():
    print(f"{key} -> {type(value).__name__}")

2024/11/18 14:33:04 INFO mlflow.transformers: 'runs:/305267d7333e4be28692f096be002c6d/french_translator' resolved as 'file:///e:/MLFlow/mlruns/410751203367765635/305267d7333e4be28692f096be002c6d/artifacts/french_translator'


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

task -> str
framework -> str
torch_dtype -> dtype
model -> T5ForConditionalGeneration
tokenizer -> T5TokenizerFast


## Understanding Model Flavors in MLflow
The model_info.flavors attribute in MLflow provides insights into the model’s capabilities and deployment requirements across various platforms.

Flavors in MLflow represent different ways the model can be utilized and deployed. Key aspects include:

* Python Function Flavor: Indicates the model’s compatibility as a generic Python function, including model binary, loader module, Python version, and environment specifications.

* Transformers Flavor: Tailored for models from the Hugging Face Transformers library, covering transformers version, code dependencies, task, instance type, source model name, pipeline model type, framework, tokenizer type, components, and model binary.

In [10]:
model_info.flavors

{'transformers': {'transformers_version': '4.41.2',
  'code': None,
  'task': 'translation_en_to_fr',
  'instance_type': 'TranslationPipeline',
  'framework': 'pt',
  'torch_dtype': 'torch.float32',
  'pipeline_model_type': 'T5ForConditionalGeneration',
  'source_model_name': 'google/flan-t5-base',
  'model_binary': 'model',
  'tokenizer_type': 'T5TokenizerFast',
  'components': ['tokenizer']},
 'python_function': {'loader_module': 'mlflow.transformers',
  'python_version': '3.12.2',
  'env': {'conda': 'conda.yaml', 'virtualenv': 'python_env.yaml'}}}

## Evaluating the Translation Output

In [11]:
translation_pipeline = mlflow.transformers.load_model(model_info.model_uri)
response = translation_pipeline("I have heard that Nice is nice this time of year.")

print(response)

2024/11/18 14:33:05 INFO mlflow.transformers: 'runs:/305267d7333e4be28692f096be002c6d/french_translator' resolved as 'file:///e:/MLFlow/mlruns/410751203367765635/305267d7333e4be28692f096be002c6d/artifacts/french_translator'


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

[{'translation_text': "J'ai entendu que Nice est bien cette période de l'année."}]


## Assessing the Reconstructed Pipeline's Translation

In [12]:
reconstructed_pipeline = transformers.pipeline(**translation_components)

reconstructed_response = reconstructed_pipeline(
    "Transformers makes using Deep Learning models easy and fun!"
)

print(reconstructed_response)

[{'translation_text': "Transformers simplifie l'utilisation des modèles de l'apprentissage profonde!"}]


## Direct Utilization of Model Components

In [13]:
translation_components.keys()

dict_keys(['task', 'framework', 'torch_dtype', 'model', 'tokenizer'])

## Advanced Usage: Direct Interaction with Model Components

In [17]:
# Access the individual components from the components dictionary
tokenizer = translation_components["tokenizer"]
model = translation_components["model"]

query = "Translate to French: Liberty, equality, fraternity, or death."

# This notebook was run on a Mac laptop, so we'll send the output tensor to the "mps" device.
# If you're running this on a different system, ensure that you're sending the tensor output to the appropriate device to ensure that
# the model is able to read it from memory.
inputs = tokenizer.encode(query, return_tensors="pt").to("cpu")
outputs = model.generate(inputs).to("cpu")
result = tokenizer.decode(outputs[0])

# Since we're not using a pipeline here, we need to modify the output slightly to get only the translated text.
print(result.replace("<pad> ", "\n").replace("</s>", ""))


La liberté, l'égalité, la fraternité ou la mort.
