-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FR] Increase parameter length #3931
Comments
hey there, what's the status on this? |
If the data you're logging is substantial, and even has some 'structure', then maybe a parameter isn't the ideal choice. You can always log information as a file, as an artifact. I tend to agree that a higher limit would be nice. 2000 isn't 'huge' for example. |
A comment to support the current issue. The discussed limitation in parameter length makes MLFlow less suited for use cases such as Natural Language Generation. For prompt-based NLG, prompts (strings, often long ones) are hyper-parameters that are central to model performances, alongside other numerical/boolean parameters. For now, my team and I are logging these prompts as artifacts. However, doing so lessens the relevance of the MLFlow Tracking UI since we cannot use those of the UI's components designed to spot the relevant hyperparameter values. More generally, storing some parameters as |
@xavierfontaine What's the longest prompts you've used? |
Adding comment to express support for the feature as well. I have a logged parameter value that's <1000, but exceeds the 500 length limit. |
+1 this would be really helpful. Any update on this? |
@jinzhang21 I'm sorry for taking so long to respond.
Regarding logged parameters, what interests us most are prompt templates (e.g., It might be interesting to note that OpenAI has a version of gpt-4 that supports prompts up to 32,000 token-long (~124,000 characters.) Furthermore, the next generation of open-source/proprietary models might have little-to-no limitation on prompt length ([1], [2].) Of course, prompt templates will typically remain much shorter than the observed prompts, but we should expect their average size to keep increasing over time nonetheless. I don't remember what the storage backend used by MLflow is, but I guess the simplest would be to store strings in a format that enforces no limitation on length? |
c.c. @dbczumar and @sunishsheth2009 who are leading the LLM efforts in MLflow |
+1 for this feature, can't log parameters such as selected features from a feature selection step due to the small maximum param size |
Can't even log my class_list, I use MMDetection, which automatically logs it as part of the config in their tracking hook. Only workaround is to replace the class_names with IDs, but then I need to find something else to log the class_names |
Willingness to contribute
Proposal Summary
Increase maximum parameter value length from 250 to 2000 or only limit by request size.
Motivation
We use of parameter values longer than 250 characters. Those occur in parameters of naturally variable length:
In #1870 it was proposed to alter VARCHAR limits. It would be nice to not care of those alterations in migrations.
Our length distribution has some outliers.
One solution for us would be to break apart outliers of >10k and settle on the somewhat historic URL limit of 2k.
Alternatively, the VARCHAR limit is dropped while the 1MB request limit stays for sanity. AFAICS a modern database wouldn't have any problem, nor does e.g. wandb.
What component(s), interfaces, languages, and integrations does this feature affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/projects
: MLproject format, project running backendsarea/scoring
: Local serving, model deployment tools, spark UDFsarea/server-infra
: MLflow server, JavaScript dev serverarea/tracking
: Tracking Service, tracking client APIs, autologgingInterfaces
area/uiux
: Front-end, user experience, JavaScript, plottingarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguages
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsThe text was updated successfully, but these errors were encountered: