Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initpy and test_transformers_model_export #10538

Merged

Conversation

KonakanchiSwathi
Copy link
Contributor

@KonakanchiSwathi KonakanchiSwathi commented Nov 29, 2023

🛠 DevTools 🛠

Open in GitHub Codespaces

Install mlflow from this PR

pip install git+https://github.com/mlflow/mlflow.git@refs/pull/10538/merge

Checkout with GitHub CLI

gh pr checkout 10538

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

How is this PR tested?

  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests

Does this PR require documentation update?

  • No. You can skip the rest of this section.
  • Yes. I've updated:
    • Examples
    • API references
    • Instructions

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/gateway: AI Gateway service, Gateway client APIs, third-party Gateway integrations
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Copy link

github-actions bot commented Nov 29, 2023

Documentation preview for 3973ffb will be available here when this CircleCI job completes successfully.

More info

@github-actions github-actions bot added area/model-registry Model registry, model registry APIs, and the fluent client calls for model registry area/models MLmodel format, model serialization/deserialization, flavors rn/none List under Small Changes in Changelogs. rn/feature Mention under Features in Changelogs. integrations/azure Azure and Azure ML integrations area/windows Issue is unique to windows. and removed area/models MLmodel format, model serialization/deserialization, flavors rn/none List under Small Changes in Changelogs. area/model-registry Model registry, model registry APIs, and the fluent client calls for model registry labels Nov 29, 2023
Signed-off-by: swathi <konakanchi.swathi@gmail.com>
@KonakanchiSwathi
Copy link
Contributor Author

@BenWilson2 @serena-ruan Please review

assert len(predictions) != 0


@pytest.mark.skipif(RUNNING_IN_GITHUB_ACTIONS, reason=GITHUB_ACTIONS_SKIP_REASON)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line was added by mistake. Can you remove this please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenWilson2 Please review

@BenWilson2
Copy link
Member

Looking good! Once that typo is fixed, we can merge! :D

@@ -2669,7 +2733,7 @@ def decode_audio(encoded):
@staticmethod
def _validate_str_input_uri_or_file(input_str):
"""
Validation of blob references to audio files, if a string is input to the ``predict``
Validation of blob references to audio/image files, if a string is input to the ``predict``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Validation of blob references to audio/image files, if a string is input to the ``predict``
Validation of blob references to either audio or image files, if a string is input to the ``predict``

Let's not use shorthand in docstrings

Comment on lines 633 to 634
with model_path.joinpath("requirements.txt").open() as file:
requirements = file.read()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with model_path.joinpath("requirements.txt").open() as file:
requirements = file.read()
requirements = model_path.joinpath("requirements.txt").read_text()

Comment on lines 642 to 643
with model_path.joinpath("model_card.md").open(encoding="utf-8") as file:
card_text = file.read()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with model_path.joinpath("model_card.md").open(encoding="utf-8") as file:
card_text = file.read()
card_text = model_path.joinpath("model_card.md").read_text(encoding="utf-8")

@@ -1341,6 +1363,46 @@ def test_qa_pipeline_pyfunc_load_and_infer(small_qa_pipeline, model_path, infere
assert all(isinstance(element, str) for element in inference)


def read_image(filename):
image_path = os.path.join(pathlib.Path(__file__).parent.parent, "datasets", filename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we mixing pathlib.Path and os.path.join() ? Just use pathlib to traverse the paths and read the bytes of the file directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

"inference_payload",
[
image_url,
os.path.join(pathlib.Path(__file__).parent.parent, "datasets", "cat.png"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does an absolute path reference from the repo root not work here? Why is this evaluation path parsing logic in the parameter?

"inference_payload",
[
[os.path.join(pathlib.Path(__file__).parent.parent, "datasets", "cat.png")],
[image_url, image_url],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why two instances here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wanted to try the list with more than one, but changed it one instance

],
)
def test_vision_pipeline_pyfunc_predict(small_vision_model, inference_payload):
if inference_payload == "base64":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not embed complex logic like this that mutates the parameter based on string matching. Just create another test explicitly for this condition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, moved this logic inside because if we do this as an input parameter a lengthy base64 string is being printed in the test suite. now we have to duplicate three tests if we want to create a new test for base64.

transformers_loaded_model = mlflow.transformers.load_model(model_uri)
expected_predictions = transformers_loaded_model.predict(inference_payload)

assert list(predictions.to_dict("records")[0].values()) == expected_predictions[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we only validating the first entry here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks

@MadhuM02
Copy link
Contributor

Hi @serena-ruan , @BenWilson2 Thanks for reviewing the PR and giving us valuable feedback. IT's been quite some time since this effort has started. can we sort of connect and close this one out??

Comment on lines 3615 to 3624
mlflow.transformers.log_model(
transformers_model=small_vision_model,
artifact_path=artifact_path,
signature=infer_signature(
image_url,
mlflow.transformers.generate_signature_output(small_vision_model, image_url),
params=parameters,
),
)
model_uri = mlflow.get_artifact_uri(artifact_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar here, let's use model_info.model_uri

Copy link
Collaborator

@serena-ruan serena-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left few nits. LGTM otherwise. Let's wait for Ben to take another look.

KonakanchiSwathi and others added 5 commits December 21, 2023 10:57
Signed-off-by: swathi <konakanchi.swathi@gmail.com>

Co-authored-by: Serena Ruan <82044803+serena-ruan@users.noreply.github.com>
Signed-off-by: Konakanchi Swathi <98085410+KonakanchiSwathi@users.noreply.github.com>
Signed-off-by: Konakanchi Swathi <98085410+KonakanchiSwathi@users.noreply.github.com>
Copy link
Member

@BenWilson2 BenWilson2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the test fix (string concatenation typo with a trailing comma) is applied

Signed-off-by: swathi <konakanchi.swathi@gmail.com>

Co-authored-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com>
Signed-off-by: Konakanchi Swathi <98085410+KonakanchiSwathi@users.noreply.github.com>
@BenWilson2 BenWilson2 merged commit b929a3e into mlflow:master Dec 21, 2023
59 checks passed
@MadhuM02
Copy link
Contributor

Hi @BenWilson2/ @serena-ruan, is there a tentative date for next mlflow pypi release??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/windows Issue is unique to windows. integrations/azure Azure and Azure ML integrations rn/bug-fix Mention under Bug Fixes in Changelogs. rn/feature Mention under Features in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants