Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamically distribute data collection into jobs #10

Open
3 of 4 tasks
oindrillac opened this issue Nov 9, 2022 · 0 comments
Open
3 of 4 tasks

Dynamically distribute data collection into jobs #10

oindrillac opened this issue Nov 9, 2022 · 0 comments
Assignees

Comments

@oindrillac
Copy link
Collaborator

oindrillac commented Nov 9, 2022

Based on the number of repos in an organization and the number of PRs in total, dynamically decide the number of jobs required to execute the data collection and on the basis of that split github workflow into that many number of jobs to execute data collection successfully.

Acceptance Criteria:

  • Create an initial job (job 1) that decides the number of jobs that need to be triggered, and the scope of each job (which job collects from which repo and which PRs)
  • Based on the number of jobs decided, treat job 1's output as an environment variable and use a matrix strategy to distribute data collection workflow across those number of jobs.
  • Within each job collect data belonging to the scope of the job
  • Once data collection completes across individual jobs which run sequentially, aggregate all the data into a single data dump and use for model training.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant