Skip to content

Conversation

osalpekar
Copy link
Member

@osalpekar osalpekar commented Feb 1, 2024

As described in this talk and this repo, we are experimenting with using CodeLlama-powered information retrieval for target determination.

The idea is that we create embeddings for PyTorch test functions, and store this index in S3. Then when a new PR comes in, we create embedding(s) for that PR, compare them to the index of test embeddings, and run only the most relevant tests.

This PR creates a workflow that does the indexing part (creating embeddings for functions and store in S3). All the logic for running the indexer is in osalpekar/llm-target-determinator. This workflow just checks out the relevant repos, installs the dependencies, runs the torchrun command to trigger indexing, and uploads the artifacts to S3.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Feb 1, 2024
Copy link

pytorch-bot bot commented Feb 1, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118824

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 781075a with merge base 55483fc (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@osalpekar osalpekar had a problem deploying to target-determinator-env February 1, 2024 00:18 — with GitHub Actions Failure
@osalpekar osalpekar had a problem deploying to target-determinator-env February 1, 2024 00:26 — with GitHub Actions Failure
@osalpekar osalpekar had a problem deploying to target-determinator-env February 5, 2024 21:34 — with GitHub Actions Failure
@osalpekar osalpekar had a problem deploying to target-determinator-env February 5, 2024 21:52 — with GitHub Actions Failure
@osalpekar osalpekar had a problem deploying to target-determinator-env February 5, 2024 23:55 — with GitHub Actions Failure
@osalpekar osalpekar had a problem deploying to target-determinator-env February 6, 2024 00:04 — with GitHub Actions Failure
@osalpekar osalpekar had a problem deploying to target-determinator-env February 6, 2024 01:01 — with GitHub Actions Failure
@osalpekar osalpekar had a problem deploying to target-determinator-env February 6, 2024 01:05 — with GitHub Actions Failure
@osalpekar osalpekar had a problem deploying to target-determinator-env February 6, 2024 01:10 — with GitHub Actions Failure
@osalpekar osalpekar had a problem deploying to target-determinator-env February 6, 2024 21:15 — with GitHub Actions Failure
@osalpekar osalpekar had a problem deploying to target-determinator-env February 6, 2024 21:56 — with GitHub Actions Failure
@clee2000 clee2000 had a problem deploying to target-determinator-env February 7, 2024 17:37 — with GitHub Actions Failure
@clee2000 clee2000 had a problem deploying to target-determinator-env February 7, 2024 19:21 — with GitHub Actions Failure
@clee2000 clee2000 had a problem deploying to target-determinator-env February 7, 2024 21:18 — with GitHub Actions Failure
@clee2000 clee2000 had a problem deploying to target-determinator-env February 7, 2024 22:13 — with GitHub Actions Error
@clee2000 clee2000 had a problem deploying to target-determinator-env February 7, 2024 22:16 — with GitHub Actions Error
@clee2000 clee2000 had a problem deploying to target-determinator-env February 7, 2024 22:18 — with GitHub Actions Failure
@clee2000 clee2000 had a problem deploying to target-determinator-env February 7, 2024 23:23 — with GitHub Actions Failure
@clee2000 clee2000 had a problem deploying to target-determinator-env February 7, 2024 23:52 — with GitHub Actions Failure
@clee2000 clee2000 had a problem deploying to target-determinator-env February 8, 2024 00:00 — with GitHub Actions Failure
@clee2000 clee2000 had a problem deploying to target-determinator-env February 8, 2024 00:07 — with GitHub Actions Failure
@clee2000 clee2000 temporarily deployed to target-determinator-env February 8, 2024 00:37 — with GitHub Actions Inactive
@clee2000 clee2000 had a problem deploying to target-determinator-env February 8, 2024 21:56 — with GitHub Actions Failure
@clee2000 clee2000 temporarily deployed to target-determinator-env February 8, 2024 23:13 — with GitHub Actions Inactive
@clee2000 clee2000 temporarily deployed to target-determinator-env February 9, 2024 00:32 — with GitHub Actions Inactive
cd "${GITHUB_WORKSPACE}"/llm-target-determinator/assets

zip -r indexer-files.zip indexer-files
aws s3 cp \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, it makes sense to add a timestamp here in indexer zip file, we could also setup a retention rule of N days on S3 to clean up old files. This could be done later though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!


jobs:
index:
runs-on: linux.g5.4xlarge.nvidia.gpu # 1 GPU A10G 24GB each
Copy link
Contributor

@huydhn huydhn Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you will need to call setup-linux https://github.com/pytorch/pytorch/blob/main/.github/workflows/_linux-test.yml#L71-L72 and install NVIDIA driver https://github.com/pytorch/pytorch/blob/main/.github/workflows/_linux-test.yml#L94-L97 here. If this job is run on a newly launch g5 runner by chance, it would not have CUDA driver IIRC, so the indexing step would fail

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm that's strange, looks like indexing passed in this job on GPU: https://github.com/pytorch/pytorch/actions/runs/7849903772/job/21424841173?pr=118824. Will investigate in a follow-up.

Copy link
Contributor

@huydhn huydhn Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, the runner is reusable, as long as the indexing job is not the first one to run, the driver will be there. However, my understanding might be outdated, may be we already have included CUDA driver in the image (although I doubt that)

Copy link
Contributor

@huydhn huydhn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM! There is a small bug w.r.t NVIDIA driver installation, but that's an easy fix.

@osalpekar
Copy link
Member Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 14, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants