Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

646 lab 604 custom job selector label #655

Merged
merged 9 commits into from
Sep 19, 2023

Conversation

thetechnocrat-dev
Copy link
Contributor

@thetechnocrat-dev thetechnocrat-dev commented Sep 19, 2023

Changes

  1. Add selector over ride flag to plex init and plex run
  2. Update Plex to use Bacalhau version 1.0.3

Details

Add the -s flag to plex run or plex init to directly pass selectors to Bacalhau. This will overwrite the default selector of owner=labdao.

Example

go run main.go init -s owner=josh -t QmZWYpZXsrbtzvBCHngh4YEgME5djnV5EedyTpc8DrK7k2 -i '{"protein": ["QmUWCBTqbRaKkPXQ3M14NkUuM4TEwfhVfrqLNoBB7syyyd/7n9g.pdb"], "small_molecule": ["QmViB4EnKX6PXd77WYSgMDMq9ZMX14peu3ZNoVV1LHUZwS/ZINC000019632618.sdf"]}' --scatteringMethod=dotProduct --autoRun=true -a test

Will produce the error

error submitting Bacalhau job: not enough nodes to run job. requested: 1, available: 0

Because there are no nodes with the selector owner = josh

@thetechnocrat-dev thetechnocrat-dev linked an issue Sep 19, 2023 that may be closed by this pull request
@linear
Copy link

linear bot commented Sep 19, 2023

LAB-604 custom job selector label

be able to put custom labels on plex jobs that pass through to bacalhau as a label, will be useful as instance type selector for benchmarking

@vercel
Copy link

vercel bot commented Sep 19, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
docs ⬜️ Ignored (Inspect) Visit Preview Sep 19, 2023 2:42pm

@alabdao
Copy link
Collaborator

alabdao commented Sep 19, 2023

If I read the code correctly, if --selector is specified with label=value, it is getting appended to hard-coded owner=labdao (or owner=labdaostaging for staging) making the final selector option to be label=value,owner=labdao. If that's correct, this would cause issues where if we want jobs submitted via plex to run on nodes without owner=labdao set. In short don't want any hard-coded selector specified.

@thetechnocrat-dev thetechnocrat-dev temporarily deployed to ci September 19, 2023 13:58 — with GitHub Actions Inactive
@acashmoney
Copy link
Contributor

acashmoney commented Sep 19, 2023

With selector flag

Expected error case when passing in node selector which doesn't exist works.

go run main.go init -s owner=aakaash -t QmZWYpZXsrbtzvBCHngh4YEgME5djnV5EedyTpc8DrK7k2 -i '{"protein": ["QmUWCBTqbRaKkPXQ3M14NkUuM4TEwfhVfrqLNoBB7syyyd/7n9g.pdb"], "small_molecule": ["QmViB4EnKX6PXd77WYSgMDMq9ZMX14peu3ZNoVV1LHUZwS/ZINC000019632618.sdf"]}' --scatteringMethod=dotProduct --autoRun=true -a test
Plex version (v0.10.4) up to date.
Pinned IO JSON CID: QmWhpHj9qWwHETVwCJtK1o5wXvxF8h7oVBxQQKpp4HkDXK
Created working directory: /Users/aakaash/Desktop/code/OPENLAB/plex/jobs/fa205bcb-4fff-4875-9a04-4b17cbdc51e4
Initialized IO file at: /Users/aakaash/Desktop/code/OPENLAB/plex/jobs/fa205bcb-4fff-4875-9a04-4b17cbdc51e4/io.json
Processing IO Entries
Starting to process IO entry 0 
Error processing IO entry 0 
error submitting Bacalhau job: not enough nodes to run job. requested: 1, available: 0
Finished processing, results written to /Users/aakaash/Desktop/code/OPENLAB/plex/jobs/fa205bcb-4fff-4875-9a04-4b17cbdc51e4/io.json
Completed IO JSON CID: QmQjaFxq4TpynLh7iLDjAAT96pzSL9T9S9TNd15sJqxSVY

Without selector flag

@thetechnocrat-dev however, I got a context deadline exceeded error 2/5 times running without the selector flag.

go run main.go init -t QmZWYpZXsrbtzvBCHngh4YEgME5djnV5EedyTpc8DrK7k2 -i '{"protein": ["QmUWCBTqbRaKkPXQ3M14NkUuM4TEwfhVfrqLNoBB7syyyd/7n9g.pdb"], "small_molecule": ["QmViB4EnKX6PXd77WYSgMDMq9ZMX14peu3ZNoVV1LHUZwS/ZINC000019632618.sdf"]}' --scatteringMethod=dotProduct --autoRun=true -a test
Plex version (v0.10.4) up to date.
Pinned IO JSON CID: QmWhpHj9qWwHETVwCJtK1o5wXvxF8h7oVBxQQKpp4HkDXK
Created working directory: /Users/aakaash/Desktop/code/OPENLAB/plex/jobs/abad6223-2cca-4aaf-827e-2f1154f5669b
Initialized IO file at: /Users/aakaash/Desktop/code/OPENLAB/plex/jobs/abad6223-2cca-4aaf-827e-2f1154f5669b/io.json
Processing IO Entries
Starting to process IO entry 0 
Job running...
Bacalhau job id: 422883ea-6648-4f77-98a0-691775da9d0f 
////_🌱___////
Computed default go-libp2p Resource Manager limits based on:
    - 'Swarm.ResourceMgr.MaxMemory': "8.6 GB"
    - 'Swarm.ResourceMgr.MaxFileDescriptors': 30720

Theses can be inspected with 'ipfs swarm resources'.

Error processing IO entry 0 
error downloading Bacalhau results: failed to get ipfs cid 'QmdDEZZzjQNEFXeb3dGeS8ta2tbV5HuLMBgN6qsKqdF9fU': context deadline exceeded
Finished processing, results written to /Users/aakaash/Desktop/code/OPENLAB/plex/jobs/abad6223-2cca-4aaf-827e-2f1154f5669b/io.json
Completed IO JSON CID: QmbpaTejgpAxXmu9qyhNMcbEnCHVDANt8oQuBBJ8ecAVqp

The CID QmdDEZZzjQNEFXeb3dGeS8ta2tbV5HuLMBgN6qsKqdF9fU does exist, but it appears DownloadBacalhauResults isn't able to fetch it? It's unclear to me why this works 3/5 times.

@thetechnocrat-dev thetechnocrat-dev temporarily deployed to ci September 19, 2023 14:42 — with GitHub Actions Inactive
@thetechnocrat-dev
Copy link
Contributor Author

thetechnocrat-dev commented Sep 19, 2023

@acashmoney thanks for the extra testing. I changed the timeout to 5 minutes, which is what we had before and what bacalhau has set as the default.

@alabdao, good catch I forgot I coded the selector to append last week. It now overrides.

Copy link
Contributor

@acashmoney acashmoney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's goo

With selector flag

go run main.go init -s owner=aakaash -t QmZWYpZXsrbtzvBCHngh4YEgME5djnV5EedyTpc8DrK7k2 -i '{"protein": ["QmUWCBTqbRaKkPXQ3M14NkUuM4TEwfhVfrqLNoBB7syyyd/7n9g.pdb"], "small_molecule": ["QmViB4EnKX6PXd77WYSgMDMq9ZMX14peu3ZNoVV1LHUZwS/ZINC000019632618.sdf"]}' --scatteringMethod=dotProduct --autoRun=true -a test
Plex version (v0.10.4) up to date.
Pinned IO JSON CID: QmWhpHj9qWwHETVwCJtK1o5wXvxF8h7oVBxQQKpp4HkDXK
Created working directory: /Users/aakaash/Desktop/code/OPENLAB/plex/jobs/fa3c55e8-9bf9-4d31-925f-9f8671a38315
Initialized IO file at: /Users/aakaash/Desktop/code/OPENLAB/plex/jobs/fa3c55e8-9bf9-4d31-925f-9f8671a38315/io.json
Processing IO Entries
Starting to process IO entry 0 
Error processing IO entry 0 
error submitting Bacalhau job: not enough nodes to run job. requested: 1, available: 0
Finished processing, results written to /Users/aakaash/Desktop/code/OPENLAB/plex/jobs/fa3c55e8-9bf9-4d31-925f-9f8671a38315/io.json
Completed IO JSON CID: QmQjaFxq4TpynLh7iLDjAAT96pzSL9T9S9TNd15sJqxSVY

Without selector flag

go run main.go init -t QmZWYpZXsrbtzvBCHngh4YEgME5djnV5EedyTpc8DrK7k2 -i '{"protein": ["QmUWCBTqbRaKkPXQ3M14NkUuM4TEwfhVfrqLNoBB7syyyd/7n9g.pdb"], "small_molecule": ["QmViB4EnKX6PXd77WYSgMDMq9ZMX14peu3ZNoVV1LHUZwS/ZINC000019632618.sdf"]}' --scatteringMethod=dotProduct --autoRun=true -a test
Plex version (v0.10.4) up to date.
Pinned IO JSON CID: QmWhpHj9qWwHETVwCJtK1o5wXvxF8h7oVBxQQKpp4HkDXK
Created working directory: /Users/aakaash/Desktop/code/OPENLAB/plex/jobs/02a36ee7-ecd7-4aa6-87e7-ec11a6cf98fc
Initialized IO file at: /Users/aakaash/Desktop/code/OPENLAB/plex/jobs/02a36ee7-ecd7-4aa6-87e7-ec11a6cf98fc/io.json
Processing IO Entries
Starting to process IO entry 0 
Job running...
Bacalhau job id: 3ac8d6a5-e14e-4b1d-9657-f1d677bdfbd2 
////_🌱___////
Computed default go-libp2p Resource Manager limits based on:
    - 'Swarm.ResourceMgr.MaxMemory': "8.6 GB"
    - 'Swarm.ResourceMgr.MaxFileDescriptors': 30720

Theses can be inspected with 'ipfs swarm resources'.

Success processing IO entry 0 
Finished processing, results written to /Users/aakaash/Desktop/code/OPENLAB/plex/jobs/02a36ee7-ecd7-4aa6-87e7-ec11a6cf98fc/io.json
Completed IO JSON CID: QmbW5AqM8jdF8dcu2xe16g23Ykirc8TSjWWqESK4XNfbxe

@thetechnocrat-dev thetechnocrat-dev merged commit 10656ed into main Sep 19, 2023
3 checks passed
@thetechnocrat-dev thetechnocrat-dev deleted the 646-lab-604-custom-job-selector-label branch September 19, 2023 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[LAB-604] custom job selector label
3 participants