Skip to content

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Aug 13, 2023

TIL, uploading to Rockset has an upper limit of 5000 records per request. So uploading PT2 perf benchmark could fail if that limit was reached, for example https://github.com/pytorch/pytorch/actions/runs/5828810421/job/15849232756

HTTP response body: {"message":"The number of documents specified in this request exceeds the maximum allowed limit of 5,000 documents.","message_key":"RECEIVER_REQUEST_MAX_DOCUMENT_LIMIT","type":"INVALIDINPUT","line":null,"column":null,"trace_id":"73fc2eb5-cfd1-4baa-8141-47c7cde87812","error_id":null,"query_id":null,"internal_errors":null}

The fix is to upload the results in multiple smaller batches of at most 5000 records.

Testing

5743 records from https://github.com/pytorch/pytorch/actions/runs/5828810421/job/15849232756 were written in 2 batches (5000 + 743)

python3 -m tools.stats.upload_dynamo_perf_stats --workflow-run-id 5821183777 --workflow-run-attempt 1 --repo pytorch/pytorch --head-branch gh/ezyang/2294/head
...
Writing 5000 documents to Rockset
Done!
Writing 743 documents to Rockset
Done!

@huydhn huydhn requested a review from a team August 13, 2023 04:45
@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Aug 13, 2023
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 13, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107095

Note: Links to docs will display an error until the docs builds have been completed.

✅ 2 Unrelated Failures

As of commit fafaa2e:

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@huydhn huydhn requested a review from clee2000 August 13, 2023 04:50
@huydhn
Copy link
Contributor Author

huydhn commented Aug 14, 2023

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 14, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Copy link
Contributor

@ZainRizvi ZainRizvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this fix!

At a more general level though, I'm thinking original data like this should be uploaded to AWS first and then be sent to Rockset.

One of our goals is to not loose data in case Rockset ever becomes unavailable, which is why we usually import data into Rockset from AWS. Now, if we want to take rockset data, process it, and store it back into Rockset, that's still okay, since the original data remains in AWS and can be recomputed if necessary, but original data like our benchmark info should prob go somewhere else first.

This PR is still fine as is, since it's a strict improvement over the existing situation, but this is something we should keep in mind moving forward

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged test-config/default topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants