Skip to content
This repository has been archived by the owner on Nov 23, 2023. It is now read-only.

feat(infra): state machine workflow for datasets validation #56

Merged
merged 19 commits into from
Dec 7, 2020

Conversation

imincik
Copy link
Contributor

@imincik imincik commented Nov 14, 2020

Implements Lambda State Machine workflow for datasets validation during Dataset Version creation according Datasets Version Endpoint Detailed Design

This PR implements includes dummy implementation of all workflow steps. Also includes new AWS stacks:

  • networking - VPC required for AWS Batch (ECS) cluster
  • processing - State Machine resources

@imincik imincik added this to the Sprint 4 milestone Nov 14, 2020
@imincik imincik self-assigned this Nov 14, 2020
@imincik imincik force-pushed the state-machine-dataset-validation branch from 507b189 to bc9bf68 Compare November 15, 2020 21:45
@imincik
Copy link
Contributor Author

imincik commented Nov 15, 2020

@l0b0 , one problem with this PR I see right now is that we have duplicated Lambda functions bundling script (backend/dataset_import/bundle.bash). I am not sure if we want to solve it in this PR or in the new one. What do you think ?

@l0b0
Copy link
Contributor

l0b0 commented Nov 15, 2020

@l0b0 , one problem with this PR I see right now is that we have duplicated Lambda functions bundling script (backend/dataset_import/bundle.bash). I am not sure if we want to solve it in this PR or in the new one. What do you think ?

Shouldn't dataset_import be in the endpoints directory? The idea with the bundle.bash is that it can be called with the endpoint name, so we don't need to duplicate the script to use it for all the endpoints.

@imincik
Copy link
Contributor Author

imincik commented Nov 17, 2020

@l0b0 , I wanted to have separate directory for endpoint lambda functions. They are main API endpoints and it looks to me that they might deserve separate directory. As we go, I guess there would be some need to have even more dirs for different kinds of Lambda functions - just to avoid one big directory with mess of functions of different kind.

If you agree, then we need to move bundling shell script one directory higher and use it then for all Lambdas. The question is if we should do it in this PR or do in separate PR. Personally, I would be happier to do it in another PR. What do you think ?

@imincik imincik force-pushed the state-machine-dataset-validation branch 5 times, most recently from eff70d6 to e37d637 Compare December 2, 2020 13:01
@imincik imincik changed the title feat(infra): dataset validation state machine feat(infra): state machine workflow for datasets validation Dec 2, 2020
@imincik imincik requested a review from l0b0 December 2, 2020 13:46
backend/processing/bundle.bash Outdated Show resolved Hide resolved
backend/processing/check_files_checksums/Dockerfile Outdated Show resolved Hide resolved
backend/processing/check_files_checksums/Dockerfile Outdated Show resolved Hide resolved
backend/processing/check_files_checksums/Dockerfile Outdated Show resolved Hide resolved
infra/datalake/networking_stack.py Outdated Show resolved Hide resolved
infra/datalake/processing_stack.py Outdated Show resolved Hide resolved
infra/datalake/processing_stack.py Outdated Show resolved Hide resolved
infra/datalake/processing_stack.py Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
backend/processing/bundle.bash Outdated Show resolved Hide resolved
backend/processing/bundle.bash Outdated Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
@imincik imincik force-pushed the state-machine-dataset-validation branch from e37d637 to a14bc65 Compare December 3, 2020 09:51
@imincik
Copy link
Contributor Author

imincik commented Dec 3, 2020

@l0b0 , I addressed most of your comments. There are just few which I can't solve or have some different opinion. Thanks for good review.

infra/datalake/networking_stack.py Outdated Show resolved Hide resolved
infra/datalake/processing_stack.py Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
@imincik imincik force-pushed the state-machine-dataset-validation branch from 767304e to 49da9a8 Compare December 6, 2020 21:10
@l0b0 l0b0 self-requested a review December 6, 2020 21:12
l0b0
l0b0 previously approved these changes Dec 6, 2020
@l0b0 l0b0 force-pushed the state-machine-dataset-validation branch from 49da9a8 to bddae0d Compare December 6, 2020 21:53
@l0b0 l0b0 self-requested a review December 6, 2020 21:55
l0b0
l0b0 previously approved these changes Dec 6, 2020
@l0b0 l0b0 force-pushed the state-machine-dataset-validation branch from bddae0d to d6d14db Compare December 6, 2020 22:08
@l0b0 l0b0 self-requested a review December 7, 2020 00:19
@l0b0 l0b0 merged commit d246350 into master Dec 7, 2020
@l0b0 l0b0 deleted the state-machine-dataset-validation branch December 7, 2020 00:20
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants