Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add creation of namespace; ignore error if it already exists. #44

Merged

Conversation

dakoner
Copy link
Contributor

@dakoner dakoner commented Jun 28, 2022

No description provided.

@mattrasmus mattrasmus merged commit 95e55a0 into insitro:k8s-rasmus Jul 8, 2022
@mattrasmus
Copy link
Collaborator

Thanks, this is helpful to add. I'll incorporate.

mattrasmus added a commit that referenced this pull request Jan 14, 2023
* initial checkin of fake k8s executor that does nothing

* move some ECR methods which aren't used to aws_utils.

* Continue refactoring to enable k8s executor.

* intermediate progress on k8s

* return api response in create_job and use it to check job status

* snapshot of continued work on k8s

* enable k8s task

* another snapshot of k8s work.

* add eval_hash to job name

* continue wiring up job apis

* monitor job improvements for example

* add some refinements to k8s example code

* sync point- working k8s job works

* Return either a batch or core client API

* clean up workflow a bit

* finish up basic working example

* Refactor aws utils tests.

* fix docker local job status

* add first passing test.

* minor test improvements

* keep working on tests.

* intermediate work

* working test!

* in progress

* A bundle of working tests

* more tests

* intermediate work

* cleanup

* make script simpler.

* clean up array code

* store correct job id in pending jobs.

* correct some more tests

* minor test cleanup

* implement labels

* cleanup and DRY.  Incomplete

* Clean up job naming.

* Remove unused aws_user

* remove unused constant

* cleanups

* cleanup

* typing fix and attempt to remove iter_k8s_job_status

* Fix pytype errors

* pytype cleanup and regular cleanup

* pylint cleanups

* cleanup and handle exception in logging better

* add scaling test

* add back job array support.  incomplete but tests pass.

* intermediate work

* progress toward job arrays

* intermediate work on making job arrays work.

* mostly-working job arrays

* more test jobs for k8s

* workign test_executor but not test_array

* minor changes to workflow

* Test cleanups.

* cleanups

* namespace support

* implement job reuniting for job arrays

* better pod errors, container limits

* container size tuning.

* container limits.

* delete job at end.

* fix pytyping (incomplete)

* adding some comments and commenting out some unused code.

* More comments, black formatting, remove some lint errors.

* Formatting.

* add timeout and retry support.

* add task with timeout

* test formatting.

* handle timeout betteR

* add comments/formatting.

* fix attribute access, and a string that got munged by a formatter

* add copy of 05_aws_batch

* clean up workflow a bit.

* remove print

* add copy of batch bioinfo workflow.

* Fix retry limit, exceptgion handling.

* Handle exception.

* handle timeouts better

* remove unnecessary fragment.

* cleanup

* Update deps so they work with k8s.

* working on k8s

* add redun.ini

* add doc string

* refactor get_aws_env_vars

* add README for k8s example

* merge fixes

* fix tests

* fix types and refactor

* working on k8s executor

* add pod-level error handling

* remove parse_task_error

* refactor k8s monitor loop

* update k8s example

* fix tests

* add more files for k8s example

* remove breakpoint

* remove dead log code

* clean up k8s job list APIs

* use more pagination

* mock k8s clients

* working on tests

* use get_script_task_command to generate shell_command

* Add creation of namespace; ignore error if it already exists. (#44)

* Default old k8s servers to using non-arrayed jobs. (#51)

* fix GSFileSystem.glob

* add gpus to k8s

* Support for annotations and service account name

* use scatch_prefix throughout k8s

* use fallback k8s config load

* be robust to aws env vars not being available

* lint

* fix tests

* make kubernetes an extra install

* update k8s mocks

* add missing mock

* avoid putting aws secrets in job description

* lint

* import aws secrets using python

* Fix bug where some jobs weren't being properly cleaned up. (#52)

* allow configuration of secret name

* update k8s example

* consolidate the k8s clients and config loading

* remove unneeded changes

* add more docstrings and error handling

* add more typing

* small fixes to run on EKS

* simplify k8s example

* small simplifications

* Delete pods in foreground when jobs are deleted. (#57)

Use background deletion

omit testing flag

* update docs for k8s

* update k8s example

* lint

* remove unneeded data from k8s example

* fix k8s example

* remove test code, lint

* lint

* add array job test

* updates from review

Co-authored-by: David Konerding <konerding.david@gene.com>
Co-authored-by: Konerding <davidek@sgdbpr0417-usw2.aws.science.roche.com>
Co-authored-by: dek <dakoner@gmail.com>
Co-authored-by: Rico Meinl <rmeinl97@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants