-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct pyarrow version check #11905
base: master
Are you sure you want to change the base?
Conversation
According to issue mlflow#8213 addressed with PR mlflow#9878 we should use `pyarrow.fs.HadoopFileSystem` with pyarrow GREATER THAN 2.0.0 but condition was inverted. Signed-off-by: Antonio Bibiano <antbbn@users.noreply.github.com>
Documentation preview for 1ad3729 will be available when this CircleCI job More info
|
Signed-off-by: Harutaka Kawamura <hkawamura0130@gmail.com>
@antbbn Could you check and fix test failures? |
Sure, might be a few days but I’ll handle those.
…On Wed, 8 May 2024 at 11:20, WeichenXu ***@***.***> wrote:
@antbbn <https://github.com/antbbn> Could you check and fix test failures?
—
Reply to this email directly, view it on GitHub
<#11905 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABXSZ7X2D3R33EKOBVHITYDZBHG4HAVCNFSM6AAAAABHG4OVP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJZHEYTCOBRGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Yeah, that’s what should be used with newer pyarrow versions. The condition
was not correct in the original PRs I mentioned in the description.
…On Sun, 19 May 2024 at 06:57, Corey Zumar ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In mlflow/store/artifact/hdfs_artifact_repo.py
<#11905 (comment)>:
> @@ -192,7 +192,8 @@ def hdfs_system(scheme, host, port):
host = scheme + "://" + host if host else "default"
- if packaging.version.parse(pyarrow.__version__) < packaging.version.parse("2.0.0"):
+ pyarrow_version = packaging.version.parse(pyarrow.__version__)
+ if pyarrow_version.major >= 2:
⬇️ Suggested change
- if pyarrow_version.major >= 2:
+ if pyarrow_version.major < 2:
@harupy <https://github.com/harupy> Was the previous behavior reversed on
accident? The previous logic was to only use pyarrow.fs.HadoopFileSystem
if pyarrow's version is < 2. The proposed change seems to reverse that
logic. Is this intentioonal?
—
Reply to this email directly, view it on GitHub
<#11905 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABXSZ7SZLB63OGML6VSRGL3ZDAWJ3AVCNFSM6AAAAABHG4OVP6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDANRUHE4TEOBTHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - apologies for my failure to read the PR description. Thanks @antbbn!
@antbbn did you have a chance to manually test this? |
Yes, and unfortunately it doesn't work, the legacy So a bit more refactoring of this module is necessary, i'm working on it and will report back soon. |
The last two commits are my attempt at a refactor, while at the beginning I started to use the pyarrow API directly I realized that we were re-implementing functionality that is already present in the fsspec package. I think it's a worthy addition and it greatly simplifies the code here, I think eventually many of the repos in there could be refactored to be just thin wrappers around fsspec implementations. |
Signed-off-by: Antonio Bibiano <antbbn@gmail.com>
Signed-off-by: Antonio Bibiano <antbbn@gmail.com>
7b5fa0b
to
7c7301f
Compare
Signed-off-by: Antonio Bibiano <antbbn@gmail.com>
Signed-off-by: Antonio Bibiano <antbbn@gmail.com>
Thank you for the simplification suggestion and we appreciate your effort to improve the code quality! However, we are generally not willing to add a new dependency for improving code quality. MLflow is used in a wide range of environment including critical production services, so dependency libraries need to be carefully assessed to avoid any compatibility issues. Also, introducing new dependency can potentially increase maintenance cost. Would you mind reverting the refactoring change and use Thank you so much for your contribution! |
🛠 DevTools 🛠
Install mlflow from this PR
Checkout with GitHub CLI
According to issue #8213 addressed with PR #9878 we should use
pyarrow.fs.HadoopFileSystem
with pyarrow GREATER THAN 2.0.0 but condition was inverted.Related Issues/PRs
Issue #8213
PR #9878
What changes are proposed in this pull request?
According to issue #8213 addressed with PR #9878 we should use
pyarrow.fs.HadoopFileSystem
with pyarrow GREATER THAN 2.0.0 but condition was inverted.How is this PR tested?
Does this PR require documentation update?
Release Notes
Is this a user-facing change?
Users will stop seeing the deprecation warning.
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/deployments
: MLflow Deployments client APIs, server, and third-party Deployments integrationsarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templatesarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notesShould this PR be included in the next patch release?
Yes
should be selected for bug fixes, documentation updates, and other small changes.No
should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.What is a minor/patch release?
Bug fixes, doc updates and new features usually go into minor releases.
Bug fixes and doc updates usually go into patch releases.