[FR] Increase parameter length #3931

ahirner · 2021-01-01T11:44:36Z

Willingness to contribute

Yes. I can contribute this feature independently.
Yes. I would be willing to contribute this feature with guidance from the MLflow community. (review migrations)
No. I cannot contribute this feature at this time.

Proposal Summary

Increase maximum parameter value length from 250 to 2000 or only limit by request size.

Motivation

We use of parameter values longer than 250 characters. Those occur in parameters of naturally variable length:

URL references
Definitions of filtering steps: list of qualifiers
Definition of data augmentations: mapping of transformations to kwargs

In #1870 it was proposed to alter VARCHAR limits. It would be nice to not care of those alterations in migrations.

Our length distribution has some outliers.

> select distinct length(value) from params order by length(value) desc limit 20;
11991
11978
11976
11488
11481
1115
1113
1003
902
431
430
396
394
345
300
296
289
287
280
275

One solution for us would be to break apart outliers of >10k and settle on the somewhat historic URL limit of 2k.

Alternatively, the VARCHAR limit is dropped while the 1MB request limit stays for sanity. AFAICS a modern database wouldn't have any problem, nor does e.g. wandb.

What component(s), interfaces, languages, and integrations does this feature affect?

Components

Interfaces

area/uiux: Front-end, user experience, JavaScript, plotting
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Languages

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

The text was updated successfully, but these errors were encountered:

nickresnick · 2021-04-30T14:11:37Z

hey there, what's the status on this?

srowen · 2021-11-16T15:00:36Z

If the data you're logging is substantial, and even has some 'structure', then maybe a parameter isn't the ideal choice. You can always log information as a file, as an artifact. I tend to agree that a higher limit would be nice. 2000 isn't 'huge' for example.

xavierfontaine · 2022-05-09T09:58:15Z

A comment to support the current issue.

The discussed limitation in parameter length makes MLFlow less suited for use cases such as Natural Language Generation. For prompt-based NLG, prompts (strings, often long ones) are hyper-parameters that are central to model performances, alongside other numerical/boolean parameters.

For now, my team and I are logging these prompts as artifacts. However, doing so lessens the relevance of the MLFlow Tracking UI since we cannot use those of the UI's components designed to spot the relevant hyperparameter values. More generally, storing some parameters as parameter, and some as artifact, necessarily complexifies codebases.

jinzhang21 · 2022-08-18T21:46:30Z

@xavierfontaine What's the longest prompts you've used?

sabaimran · 2023-01-13T17:20:08Z

Adding comment to express support for the feature as well. I have a logged parameter value that's <1000, but exceeds the 500 length limit.

akshara08 · 2023-06-05T22:27:21Z

+1 this would be really helpful. Any update on this?

xavierfontaine · 2023-06-06T02:31:46Z

@jinzhang21 I'm sorry for taking so long to respond.

What's the longest prompts you've used?

Regarding logged parameters, what interests us most are prompt templates (e.g., Please summarize the following text: {text_to_summarize}.) The average template length has been increasing together with the limit put on the input length of Large Language Models. Although 1000-2000 character-long templates are common, seeing much longer templates is not unusual anymore. The longest I have seen In a professional context was ~18,000 character-long.

It might be interesting to note that OpenAI has a version of gpt-4 that supports prompts up to 32,000 token-long (~124,000 characters.) Furthermore, the next generation of open-source/proprietary models might have little-to-no limitation on prompt length ([1], [2].) Of course, prompt templates will typically remain much shorter than the observed prompts, but we should expect their average size to keep increasing over time nonetheless.

I don't remember what the storage backend used by MLflow is, but I guess the simplest would be to store strings in a format that enforces no limitation on length?

jinzhang21 · 2023-06-06T16:10:08Z

c.c. @dbczumar and @sunishsheth2009 who are leading the LLM efforts in MLflow

getchebarne · 2023-07-10T15:08:05Z

+1 for this feature, can't log parameters such as selected features from a feature selection step due to the small maximum param size

GeorgePearse · 2023-08-03T08:57:03Z

Can't even log my class_list, I use MMDetection, which automatically logs it as part of the config in their tracking hook.

Only workaround is to replace the class_names with IDs, but then I need to find something else to log the class_names

ahirner added the enhancement New feature or request label Jan 1, 2021

github-actions bot added area/sqlalchemy Use of SQL alchemy in tracking service or model registry area/tracking Tracking service, tracking client APIs, autologging labels Jan 1, 2021

This was referenced Feb 10, 2021

MlflowLogger fail when logging long parameters Lightning-AI/pytorch-lightning#5892

Closed

MlflowLogger limit parameter value length 250 char Lightning-AI/pytorch-lightning#5893

Merged

ahirner mentioned this issue Mar 1, 2021

[FR] Increase max logged parameter size (currently 500 bytes) #4078

Open

23 tasks

Eleven1Liu mentioned this issue Nov 17, 2021

[FR] Enable >250 characters for mlflow.log_params #5076

Closed

23 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FR] Increase parameter length #3931

[FR] Increase parameter length #3931

ahirner commented Jan 1, 2021 •

edited

nickresnick commented Apr 30, 2021

srowen commented Nov 16, 2021

xavierfontaine commented May 9, 2022 •

edited

jinzhang21 commented Aug 18, 2022

sabaimran commented Jan 13, 2023

akshara08 commented Jun 5, 2023

xavierfontaine commented Jun 6, 2023 •

edited

jinzhang21 commented Jun 6, 2023

getchebarne commented Jul 10, 2023

GeorgePearse commented Aug 3, 2023 •

edited

[FR] Increase parameter length #3931

[FR] Increase parameter length #3931

Comments

ahirner commented Jan 1, 2021 • edited

Willingness to contribute

Proposal Summary

Motivation

What component(s), interfaces, languages, and integrations does this feature affect?

nickresnick commented Apr 30, 2021

srowen commented Nov 16, 2021

xavierfontaine commented May 9, 2022 • edited

jinzhang21 commented Aug 18, 2022

sabaimran commented Jan 13, 2023

akshara08 commented Jun 5, 2023

xavierfontaine commented Jun 6, 2023 • edited

jinzhang21 commented Jun 6, 2023

getchebarne commented Jul 10, 2023

GeorgePearse commented Aug 3, 2023 • edited

ahirner commented Jan 1, 2021 •

edited

xavierfontaine commented May 9, 2022 •

edited

xavierfontaine commented Jun 6, 2023 •

edited

GeorgePearse commented Aug 3, 2023 •

edited