Skip to content

Conversation

namanlalitnyu
Copy link
Contributor

@namanlalitnyu namanlalitnyu commented Sep 16, 2025

This PR includes the following changes:

  • Added a new Github step to upload the vllm profiling results to AWS S3 bucket.
  • Refactoring the way results will be uploaded to S3. We are adding an extra folder name (model_name) to the base S3 prefix, so that it can be easy for us to identify the different profiling results of different models.
  • The S3 path prefix is as follows: <date>/<repository>/<commit_sha>/<github_workflow_id>/<github_job_id>

Testing
Screenshot 2025-09-16 at 11 23 55 PM

Github Action: link

Download link for Profiling traces:

@meta-cla meta-cla bot added the cla signed label Sep 16, 2025
@namanlalitnyu namanlalitnyu marked this pull request as ready for review September 16, 2025 20:40
@namanlalitnyu namanlalitnyu changed the title [Profiling] Upload vLLM Profiling results to AWS 3 [Profiling] Upload vLLM Profiling results to AWS S3 Sep 16, 2025
@linzebing
Copy link

Can you share the S3 folder link?

@namanlalitnyu
Copy link
Contributor Author

Can you share the S3 folder link?
Hi @linzebing , I actually don't have access to the AWS Console where we can see this results. But, I just verified them from the Github action part since we are using the public github library to upload results.
I just know about the S3 prefix where we can find it: vllm-project/vllm/2025-09-16/218454b9b26cd2185cdf84e3ec9f58538185d06b.

@huydhn Can you please share on how can we double check if its good for us, or in future as well, if we are allowed to visit the AWS console.

@linzebing
Copy link

linzebing commented Sep 16, 2025

Can you share the S3 folder link?
Hi @linzebing , I actually don't have access to the AWS Console where we can see this results. But, I just verified them from the Github action part since we are using the public github library to upload results.
I just know about the S3 prefix where we can find it: vllm-project/vllm/2025-09-16/218454b9b26cd2185cdf84e3ec9f58538185d06b.

@huydhn Can you please share on how can we double check if its good for us, or in future as well, if we are allowed to visit the AWS console.

I don't think public S3 folders need any console login. However, I'll defer to @huydhn on the access control level here.

echo "upload-date=${UPLOAD_DATE}" >> "${GITHUB_OUTPUT}"
echo "s3-prefix=${REPOSITORY}/${UPLOAD_DATE}/${HEAD_SHA}" >> "${GITHUB_OUTPUT}"
- name: Upload profiling results to S3
Copy link
Contributor

@huydhn huydhn Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, I can see some profiles on S3, i.e.

Maybe you want to rename these files like 3746728f887_953.1758048457999122720.pt.trace.json.gz to make it easier to discover them later one, i.e. sglang.pt.trace.json.gz. IMO, it would be easier to do this on the workflow before the upload

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed..easy naming would be helpful in future.
I think this is an older commit's url, as I had also added "model_name" in the path, to make it easy for filtering out the data.
This is how it would look with the latest changes: https://gha-artifacts.s3.us-east-1.amazonaws.com/2025-09-17/vllm-project/vllm/ca2d1925ef5ad309061c2d5dd9a1e409c5ca28ee/17788403923/profiling/facebook_opt_125m_tp1_random/vllm.async_llm.pt.trace.json.gz

echo "s3-prefix=${REPOSITORY}/${UPLOAD_DATE}/${HEAD_SHA}" >> "${GITHUB_OUTPUT}"
- name: Upload profiling results to S3
uses: seemethere/upload-artifact-s3@v5
Copy link
Contributor

@huydhn huydhn Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another note here is that the upload step, as it stands, only works on AWS runner, linux.aws.a100 or linux.aws.h100 and will not work on linux.dgx.b200. You can ignore this for now if you don't plan to run anything on b200, but if you do, you will need this snippet https://github.com/pytorch/pytorch/blob/main/.github/workflows/_rocm-test.yml#L105-L111 to configure the credentials before the upload, i.e.

- name: Configure aws credentials
  if: contains(env.DEVICE_TYPE, 'B200')
  uses: aws-actions/configure-aws-credentials@v4.1.0
  with:
    role-to-assume: arn:aws:iam::308535385114:role/gha_workflow_s3_and_ecr_read_only
    aws-region: us-east-1
    role-duration-seconds: 18000

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, noted. Right now, I think we are good with using only a100/h100 runners, but if in future, we will change, will add this step. Thanks for sharing.

@huydhn
Copy link
Contributor

huydhn commented Sep 17, 2025

I don't think public S3 folders need any console login. However, I'll defer to @huydhn on the access control level here.

All artifacts we have on PyTorch OSS CI is public by default. They can be found on https://gha-artifacts.s3.us-east-1.amazonaws.com/<S3_PREFIX>/<ANY_UPLOADED_FILE>. Let's just keep them public

@namanlalitnyu
Copy link
Contributor Author

I don't think public S3 folders need any console login. However, I'll defer to @huydhn on the access control level here.

All artifacts we have on PyTorch OSS CI is public by default. They can be found on https://gha-artifacts.s3.us-east-1.amazonaws.com/<S3_PREFIX>/<ANY_UPLOADED_FILE>. Let's just keep them public

This is awesome, thank you.

@namanlalitnyu namanlalitnyu merged commit d67c812 into main Sep 17, 2025
3 checks passed
@linzebing
Copy link

I have a question: how can I get a directory view of all the traces available for the past 6 months?

@namanlalitnyu
Copy link
Contributor Author

I have a question: how can I get a directory view of all the traces available for the past 6 months?

I think there are a couple of ways by which we can see the traces from a directory:

  1. Getting access to the AWS console and seeing from there directly (but not every user can do it).
  2. We can use a python library (eg: boto3), and use it to query S3 path to fetch all the traces present inside that, and render somewhere. (I have implemented this somewhere else, but not in meta).
  3. But, I think yeah, with these urls that generate from these github actions, we just get specific file downloadable links.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants