Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sweep across KV cache layouts #662

Merged
merged 1 commit into from
Jun 25, 2024

Conversation

yeandy
Copy link
Contributor

@yeandy yeandy commented May 21, 2024

Sweep across different sharding configurations for KV cache. Will be used in our automation infra here GoogleCloudPlatform/ml-auto-solutions#288

User needs to set the config's inference_metadata_file, which is a path to a
json file.

This json should contain the following keys:

  • two_axis_order_product_id_list: comma separated string of two_axis_order_product_id
  • prefill_cache_axis_order_list: comma delimited string of prefill_cache_axis
  • ar_cache_axis_order_list: comma delimited string of ar_cache_axis
  • accelerator: name of the accelerator
  • flatten_microbenchmark_results: Whether or not to flatten results. Should
    be true

MaxText/configs/base.yml Outdated Show resolved Hide resolved
MaxText/configs/base.yml Outdated Show resolved Hide resolved
Copy link
Collaborator

@morgandu morgandu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Andy for the great work! Overall looks good, and I am happy to see the first pass results.

Some comments/suggestions:

I think my overall goal is try to get rid of MaxText/inference_microbenchmark_sweep.py and let MaxText/inference_microbenchmark.py being self contained.

On the ml_auto_solutions side, any sweeping now or later can either use existing flags(base.yml) or we may need to introduce new flags as part of the experiment. Have a manual test run, then scale up for more experiments. It'd be great if there is no extra / minimum code requirement between the manual test, and ml_auto_solutions.

@morgandu morgandu force-pushed the mor--kv-cache-layout branch 3 times, most recently from 2c3ecf8 to 185563d Compare May 23, 2024 18:14
@morgandu morgandu force-pushed the mor--kv-cache-layout branch 3 times, most recently from 187cb3d to 000e935 Compare May 30, 2024 21:54
@morgandu
Copy link
Collaborator

LGTM on my side!

Adding @patemotter for visibility since he may need to use this soon.

@morgandu morgandu force-pushed the mor--kv-cache-layout branch 2 times, most recently from 58b5b31 to 4b4eaaa Compare May 31, 2024 19:16
Base automatically changed from mor--kv-cache-layout to main June 3, 2024 15:22
Comment on lines 77 to 88
# Manually update the config
# Don't set key_value_axis_order_product_id; otherwise it will recompute
# ar_key_axis_order and ar_value_axis_order
quant = 'bf16' if not config.quantization else config.quantization
run_name = (
f"{inference_metadata['accelerator']}-{config.model_name}-"
f"{quant}-{key_value_axis_order_product_id}-{prefill_key_axis_order}-"
f"{ar_key_axis_order}"
)
tensorboard_dir = os.path.join(config.base_output_directory, run_name, "tensorboard", "")
checkpoint_dir = os.path.join(config.base_output_directory, run_name, "checkpoint", "")
metrics_dir = os.path.join(config.base_output_directory, run_name, "metrics", "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are quant and quantize_kvcache and different combination of these two, as discussed, we will create different test_config in xlml, and the base_run_name should already have all the information to differentiate the runs

Comment on lines 93 to 95
pyconfig._config.keys['tensorboard_dir'] = tensorboard_dir # pylint: disable=protected-access
pyconfig._config.keys['checkpoint_dir'] = checkpoint_dir # pylint: disable=protected-access
pyconfig._config.keys['metrics_dir'] = metrics_dir # pylint: disable=protected-access
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think checkpoint_dir and metrics_dir are used at all?

Copy link
Contributor Author

@yeandy yeandy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look

MaxText/inference_microbenchmark.py Show resolved Hide resolved
MaxText/inference_microbenchmark.py Show resolved Hide resolved
@yeandy yeandy marked this pull request as ready for review June 17, 2024 16:52
@yeandy
Copy link
Contributor Author

yeandy commented Jun 17, 2024

@morgandu Can you take a final look? Anything else we need to add?

@morgandu
Copy link
Collaborator

Final LGTM! Though the PR description need to be updated! since we have prefill_cache_axis_order and ar_cache_axis_order now.

@yeandy
Copy link
Contributor Author

yeandy commented Jun 24, 2024

Updated description.

@yeandy yeandy force-pushed the mor--kv-cache-layout-reformat-output branch from 074cf22 to 6c03e98 Compare June 25, 2024 17:00
@yeandy yeandy force-pushed the mor--kv-cache-layout-reformat-output branch from 6c03e98 to 9606e62 Compare June 25, 2024 17:44
@copybara-service copybara-service bot merged commit 5a215db into main Jun 25, 2024
13 checks passed
@copybara-service copybara-service bot deleted the mor--kv-cache-layout-reformat-output branch June 25, 2024 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants