Add support for audio transcription pipelines in transformers #8464

BenWilson2 · 2023-05-18T23:06:11Z

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Add support for the AutomaticSpeechRecognitionPipeline type in mlflow.transformers

How is this patch tested?

Existing unit/integration tests
New unit/integration tests
Manual tests (describe details, including test results, below)

Manual testing in progress on 13.x runtime.

Does this PR change the documentation?

No. You can skip the rest of this section.
Yes. Make sure the changed pages / sections render correctly in the documentation preview.

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

Added support for AutomaticSpeechRecognitionPipelines (i.e., Whisper audio transcription) to the transformers flavor and added native support for the bytes type as input to pyfunc signature enforcement.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

harupy · 2023-05-19T02:05:29Z

tests/transformers/test_transformers_model_export.py

+@pytest.fixture()
+def sound_file_for_test():
+    url = "https://www.nasa.gov/62282main_countdown_launch.wav"
+    response = requests.get(url)


Suggested change

response = requests.get(url)

response = requests.get(url)

response.raise_for_status()

to avoid encoutering an unclear error when the request fails and the content attribute is not audio bytes.

removed the requests logic as we're just going to load from relative path parsing to the datasets folder to retrieve the static .wav file

harupy · 2023-05-19T02:05:36Z

tests/transformers/test_transformers_model_export.py

@@ -316,6 +320,28 @@ def image_for_test():
    return dataset["test"]["image"][0]


+@pytest.fixture()
+def sound_file_for_test():
+    url = "https://www.nasa.gov/62282main_countdown_launch.wav"


How large is this file? If it's small, can we include it in the repository?

It's a few MB. I'll add the raw bytes into tests/datasets and we can just use that instead of calling up NASA ;) Good call.

mlflow-automation · 2023-05-19T02:45:16Z

Documentation preview for 7d023f2 will be available here when this CircleCI job completes successfully.

More info

Ignore this comment if this PR does not change the documentation.
It takes a few minutes for the preview to be available.
The preview is updated when a new commit is pushed to this PR.
This comment was created by https://github.com/mlflow/mlflow/actions/runs/5029403715.

harupy · 2023-05-19T03:05:20Z

mlflow/transformers.py

+            encoded_sound_file = list(data[0].values())[0]
+            return decode_sound_file(encoded_sound_file)


Suggested change

encoded_sound_file = list(data[0].values())[0]

return decode_sound_file(encoded_sound_file)

encoded_audio = list(data[0].values())[0]

return decode_sound_file(encoded_audio)

Can we rename this variable because it's not a file?

great point. Changed!

harupy · 2023-05-19T03:15:29Z

examples/transformers/whisper.py

+
+
+# Acquire an audio file
+audio_file = requests.get("https://www.nasa.gov/62283main_landing.wav").content


Suggested change

audio_file = requests.get("https://www.nasa.gov/62283main_landing.wav").content

audio = requests.get("https://www.nasa.gov/62283main_landing.wav").content

nit

harupy · 2023-05-19T03:16:56Z

mlflow/transformers.py

+            except binascii.Error:
+                return False
+
+        def decode_sound_file(encoded):


Can we replace sound with audio for consistency?

refactored for consistency

harupy · 2023-05-19T03:50:28Z

tests/transformers/test_transformers_model_export.py

+    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp_audio:
+        tmp_audio.write(response.content)


Can we use pathlib.Path.write_bytes and the tmp_path fixture here so the temp file is deleted after running tests?

Suggested change

with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp_audio:

tmp_audio.write(response.content)

tmp_audio = tmp_path / "audio.wav"

tmp_audio.write_bytes(response.content)

Removed all of this logic and switched to pathlib.Path().read_bytes()

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

dbczumar · 2023-05-19T19:20:15Z

docs/source/models.rst

@@ -2443,6 +2444,8 @@ expected input to the model to ensure your inference request can be read properl
 \**** The mask syntax for the model that you've chosen is going to be specific to that model's implementation. Some are '[MASK]', while others are '<mask>'. Verify the expected syntax to
 avoid failed inference requests.

+\***** If using MLServer for realtime inference, a raw audio file in bytes format must be base64 encoded prior to submitting to the endpoint.


Does this only apply to MLServer (https://pypi.org/project/mlserver-mlflow/ - Seldon), or does it apply to the MLflow Model Server in general?

uhhh whoops. Serving in general. chalk this up to writing this docstring while in a meeting that someone brought up MLServer.

dbczumar · 2023-05-19T19:22:51Z

examples/transformers/whisper.py

+print(transcription)
+
+# Load the pipeline as a pyfunc with the audio file being encoded as base64
+pyfunc_transcriber = mlflow.pyfunc.load_model(model_uri=model_info.model_uri)
+
+pyfunc_transcription = pyfunc_transcriber.predict(base64.b64encode(audio).decode("ascii"))
+
+# Note: the pyfunc return type if `return_timestamps` is set is a JSON encoded string.
+print(pyfunc_transcription)


Nit: Can we add some text in the print() before these transcriptions like "Whisper transcription" and "Pyfunc transcription"?

dbczumar · 2023-05-19T19:23:42Z

mlflow/ml-package-versions.yml

+          "accelerate",
+          "librosa",
+          "ffmpeg",


If possible, can we add some inline comments explaining which libraries are required for which functionalities?

dbczumar · 2023-05-19T19:26:57Z

mlflow/transformers.py

+                        "`base64.b64encode(<audio data bytes>).decode('ascii')`"
+                    ) from e
+
+        if isinstance(data, list) and all(isinstance(element, dict) for element in data):


Can we insert an inline comment displaying the structure of the data that falls into this case?

mlflow/transformers.py

dbczumar

LGTM! Thanks @BenWilson2 !

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

harupy

LGTM!

BenWilson2 added 3 commits May 16, 2023 11:18

WIP

7244cf3

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

Add support for audio transcription in transformers

2c9c53c

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

fix the docs

ffa464c

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

BenWilson2 requested review from pogil, harupy, dbczumar and ankit-db May 18, 2023 23:29

harupy reviewed May 19, 2023

View reviewed changes

github-actions bot added area/models MLmodel format, model serialization/deserialization, flavors rn/feature Mention under Features in Changelogs. labels May 19, 2023

harupy reviewed May 19, 2023

View reviewed changes

BenWilson2 added 4 commits May 19, 2023 09:43

cleaning up

1d96034

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

add dependency

cd0824f

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

install ffmpeg base package

92a9dff

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

fix pyfunc exclusion logic

c5c57e2

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

dbczumar reviewed May 19, 2023

View reviewed changes

mlflow/transformers.py Show resolved Hide resolved

dbczumar approved these changes May 19, 2023

View reviewed changes

BenWilson2 added 2 commits May 19, 2023 19:22

adjustments for tests

eb4daa5

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

lint issue with yaml

7d023f2

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

harupy approved these changes May 20, 2023

View reviewed changes

BenWilson2 enabled auto-merge (squash) May 20, 2023 01:23

BenWilson2 merged commit 205babd into mlflow:master May 20, 2023
35 checks passed

BenWilson2 deleted the transformers-audio branch May 20, 2023 01:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for audio transcription pipelines in transformers #8464

Add support for audio transcription pipelines in transformers #8464

BenWilson2 commented May 18, 2023

harupy May 19, 2023 •

edited

BenWilson2 May 19, 2023

harupy May 19, 2023 •

edited

BenWilson2 May 19, 2023

mlflow-automation commented May 19, 2023 •

edited

harupy May 19, 2023

BenWilson2 May 19, 2023

harupy May 19, 2023

BenWilson2 May 19, 2023

harupy May 19, 2023

BenWilson2 May 19, 2023

harupy May 19, 2023 •

edited

BenWilson2 May 19, 2023

dbczumar May 19, 2023

BenWilson2 May 19, 2023

dbczumar May 19, 2023 •

edited

BenWilson2 May 19, 2023

dbczumar May 19, 2023

dbczumar May 19, 2023

dbczumar left a comment

harupy left a comment

	response = requests.get(url)
	response = requests.get(url)
	response.raise_for_status()

		encoded_sound_file = list(data[0].values())[0]
		return decode_sound_file(encoded_sound_file)



		# Acquire an audio file
		audio_file = requests.get("https://www.nasa.gov/62283main_landing.wav").content

	audio_file = requests.get("https://www.nasa.gov/62283main_landing.wav").content
	audio = requests.get("https://www.nasa.gov/62283main_landing.wav").content

		with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp_audio:
		tmp_audio.write(response.content)

Add support for audio transcription pipelines in transformers #8464

Add support for audio transcription pipelines in transformers #8464

Conversation

BenWilson2 commented May 18, 2023

Related Issues/PRs

What changes are proposed in this pull request?

How is this patch tested?

Does this PR change the documentation?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

harupy May 19, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harupy May 19, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlflow-automation commented May 19, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harupy May 19, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbczumar May 19, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbczumar left a comment

Choose a reason for hiding this comment

harupy left a comment

Choose a reason for hiding this comment

harupy May 19, 2023 •

edited

harupy May 19, 2023 •

edited

mlflow-automation commented May 19, 2023 •

edited

harupy May 19, 2023 •

edited

dbczumar May 19, 2023 •

edited