New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: data load for migrations #188
Comments
My proposal for this Feature is based on a prototype done with CLI to investigate the required changes in the Worker and some information about the benchmark. Context:
CLI Implementation
The CLI will insert all the targets to the RSTUF SQL DB directly.
Worker:I identified some changes (I will file separate task issues)
I was doing it manually. ❯ docker exec -it repository-service-tuf-worker_repository-service-tuf-worker_1 /bin/bash
root@1509b2901ab3:/opt/repository-service-tuf-worker# python
Python 3.10.7 (main, Sep 13 2022, 01:53:53) [GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import app
14:42:09 DEBUG STORAGE_BACKEND is defined as <class 'repository_service_tuf_worker.services.storage.local.LocalStorage'>
14:42:09 DEBUG KEYVAULT_BACKEND is defined as <class 'repository_service_tuf_worker.services.keyvault.local.LocalKeyVault'>
14:42:09 DEBUG STORAGE_BACKEND is defined as <class 'repository_service_tuf_worker.services.storage.local.LocalStorage'>
14:42:09 DEBUG KEYVAULT_BACKEND is defined as <class 'repository_service_tuf_worker.services.keyvault.local.LocalKeyVault'>
>>> app.repository._send_publish_targets_task('import')
14:42:33 DEBUG Start from server, version: 0.9, properties: {'capabilities': {'publisher_confirms': True, 'exchange_exchange_bindings': True, 'basic.nack': True, 'consumer_cancel_notify': True, 'connection.blocked': True, 'consumer_priorities': True, 'authentication_failure_close': True, 'per_consumer_qos': True, 'direct_reply_to': True}, 'cluster_name': 'rabbit@f789dd71edf0', 'copyright': 'Copyright (c) 2007-2022 VMware, Inc. or its affiliates.', 'information': 'Licensed under the MPL 2.0. Website: https://rabbitmq.com', 'platform': 'Erlang/OTP 25.0.4', 'product': 'RabbitMQ', 'version': '3.10.7'}, mechanisms: [b'PLAIN', b'AMQPLAIN'], locales: ['en_US']
14:42:33 DEBUG using channel_id: 1
14:42:33 DEBUG Channel open
>>> On my basic worker prototype:
The possible bottleneck/processing is publishing all targets sequentially. In the second prototype, I also tried to improve this algorithm using Celery Canvas, creating an asynchronous group of new tasks. |
Sounds like a reasonable plan. |
- This commit adds the gherkin Feature file for the repository-service-tuf#188 - This follows the Feature process described on https://repository-service-tuf.readthedocs.io/en/latest/devel/development.html#rstuf-feature Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com>
* Feature: FT for import targets to RSTUF DB - This commit adds the gherkin Feature file for the #188 - This follows the Feature process described on https://repository-service-tuf.readthedocs.io/en/latest/devel/development.html#rstuf-feature Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> * Add scenario that skips the publish targets Add test scenario that skips the publish targets to the TUF metadata. Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> * Add parameter --skip-publish-targets Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> * Apply suggestions from code review Fix some wordings Co-authored-by: Martin Vrachev <martin.vrachev@gmail.com> --------- Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> Co-authored-by: Martin Vrachev <martin.vrachev@gmail.com>
Implements the feature import target `rstuf admin import-targets`. This feature gives the RSTUF administrator the functionality to import a large number of existent targets. It helps to roll out and deploy the RSTUF in existing repositories. This feature is detailed and explained at these links - repository-service-tuf/repository-service-tuf#188 - BDD Feature: repository-service-tuf/repository-service-tuf#218 Some changes in the ceremony was done to reuse code Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com>
Implements the feature import target `rstuf admin import-targets`. This feature gives the RSTUF administrator the functionality to import a large number of existent targets. It helps to roll out and deploy the RSTUF in existing repositories. This feature is detailed and explained at these links - repository-service-tuf/repository-service-tuf#188 - BDD Feature: repository-service-tuf/repository-service-tuf#218 Some changes in the ceremony was done to reuse code Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com>
Implements the feature import target `rstuf admin import-targets`. This feature gives the RSTUF administrator the functionality to import a large number of existent targets. It helps to roll out and deploy the RSTUF in existing repositories. This feature is detailed and explained at these links - repository-service-tuf/repository-service-tuf#188 - BDD Feature: repository-service-tuf/repository-service-tuf#218 Some changes in the ceremony was done to reuse in the code: - The `ceremony._check_server` was converted to `helpers.api_client.get_headers` - The `ceremony._bootstrap_state` was converted to `helpers.api_client.task_status` Added 100% coverage to `helpsers.api_client` Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com>
Implements the feature import target `rstuf admin import-targets`. This feature gives the RSTUF administrator the functionality to import a large number of existent targets. It helps to roll out and deploy the RSTUF in existing repositories. This feature is detailed and explained at these links - repository-service-tuf/repository-service-tuf#188 - BDD Feature: repository-service-tuf/repository-service-tuf#218 Some changes in the ceremony was done to reuse in the code: - The `ceremony._check_server` was converted to `helpers.api_client.get_headers` - The `ceremony._bootstrap_state` was converted to `helpers.api_client.task_status` Added 100% coverage to `helpsers.api_client` Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com>
Implements the feature import target `rstuf admin import-targets`. This feature gives the RSTUF administrator the functionality to import a large number of existent targets. It helps to roll out and deploy the RSTUF in existing repositories. This feature is detailed and explained at these links - repository-service-tuf/repository-service-tuf#188 - BDD Feature: repository-service-tuf/repository-service-tuf#218 Some changes in the ceremony was done to reuse in the code: - The `ceremony._check_server` was converted to `helpers.api_client.get_headers` - The `ceremony._bootstrap_state` was converted to `helpers.api_client.task_status` Added 100% coverage to `helpsers.api_client` Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com>
* feature: import targets Implements the feature import target `rstuf admin import-targets`. This feature gives the RSTUF administrator the functionality to import a large number of existent targets. It helps to roll out and deploy the RSTUF in existing repositories. This feature is detailed and explained at these links - repository-service-tuf/repository-service-tuf#188 - BDD Feature: repository-service-tuf/repository-service-tuf#218 Some changes in the ceremony was done to reuse in the code: - The `ceremony._check_server` was converted to `helpers.api_client.get_headers` - The `ceremony._bootstrap_state` was converted to `helpers.api_client.task_status` Added 100% coverage to `helpsers.api_client` Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> * Fix the sentences Co-authored-by: Martin Vrachev <martin.vrachev@gmail.com> * Make the optional dependencies clear and docs The optional dependencies `psycopg2` and `sqlalchemy` are optional. This is required only when the user aim to use the `import-targets` feature. Add to documentation the new feature Add details about the usage and CSV Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> * Add typing notation - add typing notation - make the `_check_csv_files` more explicity Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> * fix tests for changed commited to the ceremony Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> * fix the bootstrap check for ceremony Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> * add unit tests for import-targets - Added some small refactoring in the implementation - 100% coverage for the UT - Fix some comments from the review Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> * Removed unused Mock from test_import_targets Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> * fix linting Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> * Apply suggestions from code review Co-authored-by: Martin Vrachev <martin.vrachev@gmail.com> * Add missing asserts Co-authored-by: Martin Vrachev <martin.vrachev@gmail.com> * fix import-targets params, remove duplicate checks - move parameters to use double dash `--{names}` - remove duplicate check that is done by `is_logged` - fix tests Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> * fix bug in is_logged, typing and typo - fix `is_logged` bug in case of no data with 200 status code - fix typing for the `task_status` response - fix typo/wording for task status Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> * remove `**kw` from `api_client.Login` lambdas - remove login keyword parameters from lambda Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> * update documentation using `--` to the parameters Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> --------- Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> Co-authored-by: Martin Vrachev <martin.vrachev@gmail.com>
The implementation load data feature in CLI is implemented. I'm checking as done in the above list of required tasks |
All tasks for this feature are completed. I am closing this issue/feature. |
Mark "Data load for migrations" as complete as #188 is closed. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
Mark "Data load for migrations" as complete as #188 is closed. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
Mark "Data load for migrations" as complete as repository-service-tuf#188 is closed. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
* Fix typos in "All-in-one" docs section (#219) * Add OpenSSF Best Practices badge (#221) Add the OpenSSF Best Practices badge to RSTUF umbrella README.rst (it adds also to RSTUF documentation) Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> * chore: sync git submodules and docs (#214) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * build: Update Python dependencies (#220) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * build: Update Python dependencies (#225) * chore: sync git submodules and docs (#224) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * chore: sync git submodules and docs (#228) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * chore: sync git submodules and docs (#231) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * Update the ROADMAP (#223) * Update the ROADMAP - Added to the MWV the Roles Simplification features - Removed feature, that doesn't impact the minimum functionalities - Add TLS/SSL for Broker and Result Backend communication (Issue #6): It can be achivied by not exposing the Broker and Result Backend communication. - Moved features that are not compatibility breaker to MVP - Old Metadata retention - Support to AWS S3 (Storage) and AWS KMS (Key Vault) - Token revocation - Fixed components milestone/versions - MWV: v1.0.0bX - MVP: v1.0.X Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> * Add Deployment Design Document to MVP Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> --------- Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> * build: Update Python dependencies (#230) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * Roadmap update: Issue #28 (#233) - Update Issue #28 in the Roadmap Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> * chore: sync git submodules and docs (#235) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * chore: sync git submodules and docs (#236) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * chore: sync git submodules and docs (#237) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * chore: sync git submodules and docs (#238) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * chore: sync git submodules and docs (#239) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * chore: sync git submodules and docs (#240) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * chore: sync git submodules and docs (#241) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * chore: sync git submodules and docs (#242) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * build: Update Python dependencies (#234) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * build: Update Python dependencies (#247) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * chore: sync git submodules and docs (#249) Co-authored-by: kairoaraujo <kairoaraujo@users.noreply.github.com> * Add MAINTAINERS.rst file (#248) Signed-off-by: Martin Vrachev <mvrachev@vmware.com> * build: Update Python dependencies (#250) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * chore: sync git submodules and docs (#251) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * Update readthedocs url to a working one (#243) * Update readthedocs url to a working one * Update README.rst Co-authored-by: Kairo Araujo <kairo@kairo.eti.br> --------- Co-authored-by: Kairo Araujo <kairo@kairo.eti.br> * chore: sync git submodules and docs (#252) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * chore: sync git submodules and docs (#255) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * chore: sync git submodules and docs (#258) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * Add load data information to the Guide (#254) * Add load data information to the Guide Add a chapter with more details about using the load data feature from the CLI Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> * Apply suggestions/comments for documentation Co-authored-by: Martin Vrachev <martin.vrachev@gmail.com> * Simplify the feature usage text Simplify the feature usage text removing the details about the process using the REST API Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> * Update docs/source/guide/deployment/importing-targets.rst Co-authored-by: Martin Vrachev <martin.vrachev@gmail.com> --------- Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> Co-authored-by: Martin Vrachev <martin.vrachev@gmail.com> * build: Update Python dependencies (#257) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * chore: sync git submodules and docs (#259) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * Mark "Data load for migrations" as complete (#260) Mark "Data load for migrations" as complete as #188 is closed. Signed-off-by: Martin Vrachev <mvrachev@vmware.com> * build: Update Python dependencies (#262) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * chore: sync git submodules and docs (#263) Co-authored-by: kairoaraujo <kairoaraujo@users.noreply.github.com> * RSTUF umbrella repository list all maintainers (#264) Add all component maintainers in the umbrella repository. Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> * chore: sync git submodules and docs (#266) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * chore: sync git submodules and docs (#268) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * build: Update Python dependencies (#265) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * chore: sync git submodules and docs (#271) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * build: Update Python dependencies (#270) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * chore: sync git submodules and docs (#273) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * build: Update Python dependencies (#272) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * chore: sync git submodules and docs (#275) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * chore: sync git submodules and docs (#277) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * Add the MWV Board (#274) Add to the Minimum Working Version board to the ROADMAP. Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> * build(deps): bump peter-evans/create-pull-request from 4.2.3 to 4.2.4 (#278) Bumps [peter-evans/create-pull-request](https://github.com/peter-evans/create-pull-request) from 4.2.3 to 4.2.4. - [Release notes](https://github.com/peter-evans/create-pull-request/releases) - [Commits](peter-evans/create-pull-request@2b011fa...38e0b6e) --- updated-dependencies: - dependency-name: peter-evans/create-pull-request dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build: Update Python dependencies (#276) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * chore: sync git submodules and docs (#279) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * build: Update Python dependencies (#281) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * chore: sync git submodules and docs (#282) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * chore: sync git submodules and docs (#283) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * build(deps): bump actions/checkout from 3.3.0 to 3.4.0 (#280) * build(deps): bump actions/checkout from 3.3.0 to 3.4.0 Bumps [actions/checkout](https://github.com/actions/checkout) from 3.3.0 to 3.4.0. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@ac59398...24cb908) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * Fix linting error Signed-off-by: Martin Vrachev <mvrachev@vmware.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Martin Vrachev <mvrachev@vmware.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Martin Vrachev <mvrachev@vmware.com> * chore: sync git submodules and docs (#285) Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> * build(deps): bump actions/checkout from 3.4.0 to 3.5.0 (#286) Bumps [actions/checkout](https://github.com/actions/checkout) from 3.4.0 to 3.5.0. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@24cb908...8f4b7f8) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --------- Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com> Signed-off-by: Martin Vrachev <mvrachev@vmware.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Konstantinos Papadopoulos <konpap1996@yahoo.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: rdimitrov <rdimitrov@users.noreply.github.com> Co-authored-by: kairoaraujo <kairoaraujo@users.noreply.github.com> Co-authored-by: Martin Vrachev <mvrachev@vmware.com> Co-authored-by: Martin Vrachev <martin.vrachev@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Mark "Data load for migrations" as complete as #188 is closed. Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
What is the feature about?
There is a use case of RSTUF where a repository already has a lot of existing data.
Example: PyPI has 7+ million packages, NPM has 1.2+ million packages, etc.
If a repository decides to adopt RSTUF, loading all this pre-existent using the process of making requests to the API, waiting it to be processed by the worker can take a long time.
A suggestion is to implement to RSTUF CLI a new feature for data migration/load.
it could be faster if the data can be submitted in a structured way directly to RSTUF SQL DB.
path;size;hash;custom;
)publish_targets
and waits for it to be processed.This might be related to:
rstuf artifact add
repository-service-tuf-cli#39A prototype + feature is required.
Services it relates to
repository-service-for-tuf-cli
Related tasks
No response
References
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: