Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TUF Initialization using python-tuf 2.0.0 #10870

Closed

Conversation

kairoaraujo
Copy link
Contributor

@kairoaraujo kairoaraujo commented Mar 4, 2022

This work refactors the Draft PR by
@woodruffw, to build a new repository tool on top of the Python-TUF
Metadata API, and use it instead of the Python-TUF repository tool
that was deprecated in v1.0.0.

Part of #10672

Note to reviewer

The current implementation has some development-only components, and lacks a few services for full PEP458 compliance as well as extensive tests. However, it should qualify for a review of the overall architecture and flow (see details in 'Overview' below). Components and functionality that are planned for subsequent PRs are listed in 'Next steps' below.

Overview

  classDiagram
    direction LR
    class `tuf.interfaces` {
      zope.interface.Interface
      IKeyService(Interface)
      IStorageService(Interface)
      IRepositoryService(Interface)
    }
    class `tuf.services` {
      IKeyService
      IRepositoryService
      IStorageService
      LocalKeyService(IKeyService)
      LocalStorageService(IStorageService)
      RepositoryService(IRepositoryService)
    }
    class `tuf.tasks` {
      init_repository
      init_targets_delegation
      bump_snapshot
      bump_bin_n_roles
      add_hashed_targets
    }

    class `cli.tuf`{
        dev keypairs
        dev init-repo
        dev init-delegations
        dev add-all-packages
        dev add-all-indexes
        dev bump-snapshot
        dev bump-bin-n-roles
    }


    `tuf.services` <|-- `tuf.interfaces`
    `tuf.tasks` -- `tuf.services`
    `cli.tuf` -- `tuf.tasks`
    warehouse -- `cli.tuf`
    warehouse -- `tuf.tasks`

warehouse.tuf.repository

  • MetadataRepository implements a custom TUF metadata repository tool on top of
    the new Python-TUF Metadata API to create and maintain (update, sign, sync with storage) TUF metadata for Warehouse.

warehouse.tuf.services

  • LocalKeyService provides a local file storage backend for TUF role keys used by the repository tool (development only!!).
  • LocalStorageService provides a local file storage backend for TUF role metadata used by the repository tool.
  • RepositoryService provides methods for common Warehouse-TUF tasks, using the repository tool.

warehouse.tuf.tasks

Defines common Warehouse-TUF tasks that use the RepositoryService for

  • bootstrapping a metadata repository (init_repository, init_targets_delegation),
  • updating metadata upon package upload (add_hashed_targets)
  • scheduled metadata updates (bump_bin_n_roles, bump_snapshot)

warehouse.cli.tuf

Defines development commands for bootstrapping a TUF metadata repository (keypair, init_repo, init_delegations), backsigning existing packages and simple index pages (add_all_packages, add_all_indexes), and for manually triggering scheduled tasks (bump_bin_n_roles, bump_snapshot). CLI calls go through warehouse.cli.tasks, to take advantage of the Celery/Redis queue.

Next steps:

  • Polish the new Warehouse metadata repository tool based on review feedback
  • PRs to implement TUF in the Warehouse request flow
    • upload target file
    • delete target file
    • tasks for refreshing indexes/projects
  • Tests

Using the Warehouse development environment for TUF

Follow the official Warehouse until make initdb

$ make inittuf

The metadata is available at http://localhost:9001/metadata/

You can also upload a file using the Warehouse and add the targets using CLI

docker-compose run --rm web python -m warehouse tuf dev add-all-packages
docker-compose run --rm web python -m warehouse tuf dev add-all-indexes

Updated: Removed MetadataRepository implementation

@kairoaraujo
Copy link
Contributor Author

kairoaraujo commented Mar 4, 2022

I want to thank @lukpueh for sharing his TUF expertise and helping me review and add improvements to this draft PR last few days.
Thank @joshuagl and @jku, for always finding time to answer my PEP458/Python-TUF questions.
@di, thanks for giving some help about Warehouse.
@woodruffw, thanks for the initial PR. That made my life easy.

The idea of tagging you all here is to say thanks ❤️, and the second intention is to ask for help reviewing this draft PR 😎.

I will continue working on the tests for this PR.

@lukpueh
Copy link
Contributor

lukpueh commented Mar 4, 2022

Summarizing some architectural discussions I had with @kairoaraujo on slack:

The current design implements Warehouse-specific TUF app code in tuf.services and a generic TUF repo abstraction in tuf.repository.

The intention is to hide TUF metadata handling from Warehouse. And, in the future when better TUF repository tooling is available, replace the repo abstraction by such tooling (see theupdateframework/python-tuf#1136).

But it seems hard to draw a clear line between application and repository responsibilities. Some pain points:

  • Although the app code doesn't directly interact with the Metadata API, it provides most of its inputs and passes them to the intermediary repo code in a format that is already very similar to the TUF Metadata API classes (see RolesPayload, TargetsPayload), which makes the blackbox argument less strong.
  • Inter-dependent metadata updates prescribed by the TUF spec currently occur in app and repo code. E.g. "update timestamp when snaphshot is updated" (app), or "update snaphsot when targets is updated" (repo).
  • Metadata needs to be loaded from storage in both app and repo, but also passed back and forth with multiple side-effects (in storage and in memory).

The added complexity of the repository abstraction together with an unclear distribution of responsibilities makes it harder to review the correctness of the implementation in terms of PEP and TUF spec compliance.

Maybe it would be easier to remove the abstraction and implement all in app code? The two relevant classes RepositoryService (app) and MetadataRepository (repo) already have a 1-1 relationship and look a lot alike in terms of attributes.

All that said, we decided that we will stick to the current design, and wait for feedback from upstream.

(cc @jku, who has spent a lot of thoughts on TUF repository abstractions.)

Copy link

@ameily ameily left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level design seems to be in decent shape and headed in the right direction.

warehouse/tuf/repository.py Outdated Show resolved Hide resolved
keyids=[key["keyid"] for key in role_parameter.keys],
threshold=role_parameter.threshold,
terminating=None,
paths=role_parameter.paths,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to pass in copies of the paths and path_hash_prefixes lists because they are mutable in both the original RolesPayload and the new DelegatedRole objects?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not seem necessary. RolesPayload is only used as a transport container to move the data from the service to the repository. But I agree that it's a lot of references being passed around with potential for unwanted side-effects. That's why I've been advocating for merging service and repository (see #10870 (comment))

key_rolename = key_rolename
else:
key_rolename = rolename
role_metadata.signed.expires = role_expires
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a check that role_expires is actually in the future? If so, this function and others that set a role's metadata expiration date should be updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably does not hurt, other than adding a bit of cruft. The expiration date is determined by a per-role configured interval and a helper function, both controlled by us. Maybe instead of checking here we could add tests that the configured value is greater 0 and the helper indeed adds the value to now?

warehouse/tuf/services.py Outdated Show resolved Hide resolved
@kairoaraujo kairoaraujo force-pushed the refactoring_pr_tuf_initialization branch 2 times, most recently from 2bc6c23 to f300b2f Compare March 30, 2022 11:21
@kairoaraujo kairoaraujo marked this pull request as ready for review March 30, 2022 11:21
@kairoaraujo
Copy link
Contributor Author

As I added all the tests, I moved this PR from Draft.

@kairoaraujo kairoaraujo changed the title WIP: TUF Initialization using python-tuf 1.0.0 TUF Initialization using python-tuf 1.0.0 Mar 30, 2022
@kairoaraujo kairoaraujo force-pushed the refactoring_pr_tuf_initialization branch from 4d10b79 to 9eebff6 Compare March 31, 2022 07:34
@di di mentioned this pull request Apr 11, 2022
52 tasks
@kairoaraujo kairoaraujo force-pushed the refactoring_pr_tuf_initialization branch from 9eebff6 to a8607be Compare April 11, 2022 17:17
@kairoaraujo kairoaraujo force-pushed the refactoring_pr_tuf_initialization branch from fea707d to 41e90a5 Compare May 30, 2022 11:18
@kairoaraujo kairoaraujo force-pushed the refactoring_pr_tuf_initialization branch from 41e90a5 to cbd1e41 Compare June 25, 2022 07:15
@lukpueh
Copy link
Contributor

lukpueh commented Jun 25, 2022

An alternative implementation for this PR is available in kairoaraujo#1, following the design proposal outlined in #10870 (comment) above.

The goal is to reduce complexity to facilitate general code review and review of the implementation correctness with regard to PEP458.

I suggest we discuss the alternative implementation internally first (cc @kairoaraujo, @jku, @joshuagl, @mnm678) and update this PR here thereafter. That said, comments from the pypa community and general public are welcome at any time!

@kairoaraujo kairoaraujo requested a review from a team as a code owner July 21, 2022 12:51
@kairoaraujo kairoaraujo marked this pull request as draft July 21, 2022 12:54
@kairoaraujo
Copy link
Contributor Author

I moved back to draft to implement tests.

@kairoaraujo kairoaraujo force-pushed the refactoring_pr_tuf_initialization branch 2 times, most recently from eccef82 to 3649de9 Compare August 22, 2022 18:38
@kairoaraujo kairoaraujo marked this pull request as ready for review August 22, 2022 18:44
@kairoaraujo kairoaraujo changed the title TUF Initialization using python-tuf 1.0.0 TUF Initialization using python-tuf 2.0.0 Aug 22, 2022
@kairoaraujo kairoaraujo force-pushed the refactoring_pr_tuf_initialization branch from 3649de9 to 815ab11 Compare September 20, 2022 15:29
@ofek
Copy link
Contributor

ofek commented Sep 21, 2022

Could someone trigger the CI with that button below?

@trishankatdatadog
Copy link
Contributor

@kairoaraujo would you please help fix the CI / Dependencies test?

@pradyunsg
Copy link
Contributor

A gentle nudge/reminder that the dependency declarations in this PR are inconsistent, which is why the CI is failing.

@kairoaraujo
Copy link
Contributor Author

@pradyunsg, I see that dependencies errors are not related directly to my changes, but to the google-cloud-bigquery and protobuf.
Should I bump it into my PR or open a new PR/issue?

@di
Copy link
Member

di commented Sep 29, 2022

We're fixing this in #12226, nothing to do here.

@di di self-assigned this Sep 29, 2022
Kairo de Araujo and others added 13 commits September 30, 2022 08:00
Added unit tests for tuf.hash_bins

Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com>
Added unit tests for warehouse.tuf.repository

Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com>
Fix general linting for tests added and tuf services

Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com>
As ``warehouse.tuf.repository`` is using typing, the mypy found some
issues and it was fixed.
Some tests improvements, added some monkeypatch and new asserts for call
recorded using pretend.

Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com>
Fix some required paramenters for running the development environment.
Fix bug on LocalKeyStorage

Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com>
Remove unnecessary TargetsPayload data structure and use the
TargetFile from the Python TUF (python-tuf) Metadata API.
The TargetsPayload was used to add hashed targets. However, a similar
data structure is provided by python-tuf.

Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com>
Remove function _make_fileinfo RepositoryService that is not
used anywhere.

Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com>
The RolesPayload was used by the TUF initialization (for development
purposes) and during the Roles Delegations.

The RolesPayload is no longer necessary during the TUF development
initialization once all the configuration is available on request
settings. During the Roles Delegations, it was replaced by the
python-TUF data structure DelegatedRole, reusing it from
``tuf.repository``.

Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com>
This commit adds a refactoring on the key signature used.
Instead of using from Key Storage Service keys as a dictionary, uses
that as a ``securesystemslib.signer.Signer``. It gives more
flexibility and uses the same data structure across the services,
repository and TUF.

Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com>
Reduce complexity and lines of code by implementing all current TUF
management code inside `RepositoryService`, and thus removing one
level of abstraction, previously implemented by the
`MetadataRepository` class.

NOTE: This patch is marked WIP, as it has not removed all
references to `MetadataRepository`, nor adopted the tests.
Moreover, it still needs review in terms of correctness wrt PEP458.
But the reduced complexity should make this easier.

NOTE: (2) There is more potential for DRY code, see reoccurring
`_bump_version; _bump_expiration; _sign; _persist;` and
`_update_snapshot; _update_timestamp;` call chains. For this
iteration of the patch, I chose verbosity/explicitness over saving
a few more lines. But maybe both can be achieved.

Signed-off-by: Lukas Puehringer <lukas.puehringer@nyu.edu>
This commit fixes some bug implementation from the last commit.

This improves the internal functions that require the role names to
work correctly once the role object has no explicit type (name),
adds the SPEC_VERSION in the service, and the JSONSerializer for
persisting the files with a better appearance.

Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com>
The introduction of python-tuf 2.0.0 adds the feature of Succinct
Delegation Roles as part of TAP15
(https://github.com/theupdateframework/taps/blob/master/tap15.md)

This feature reduces the number of lines as the Hash Bins become
built-in on python-tuf.

All unit tests updated.

Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com>
Missing tuf.url setting in the conftest for the app_config.

Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com>
@kairoaraujo kairoaraujo force-pushed the refactoring_pr_tuf_initialization branch from 8058e54 to f420476 Compare September 30, 2022 08:20
warehouse/tuf/interfaces.py Outdated Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
Fixed typos in conftest and tuf/interfaces.

Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com>
Copy link
Member

@di di left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, just some nits, although I wasn't able to run the add-all-packages command.

For next steps: I want to call out that in addition to generating target metadata w/in the upload request flow, we also need to implement a non-local KeyService (and choose what service we're going to use as the backing service for this in production).

Comment on lines +101 to +102
$(WAREHOUSE_CLI) tuf dev keypair --name targets --path /opt/warehouse/src/dev/tufkeys/targets1
$(WAREHOUSE_CLI) tuf dev keypair --name targets --path /opt/warehouse/src/dev/tufkeys/targets2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there two different targets generated here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generated two different keys to use different configuration thresholds in the config, and the KeyService handle multiple keys in the development environment.

"tuf.targets.threshold": 2,

request = config.task(_init_targets_delegation).get_request()
try:
config.task(_init_targets_delegation).run(request)
except (FileExistsError, StorageError) as err:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit weird that we're catching a StorageError from the underlying sub-dependency all the way up here. I'd expect this to be catching an exception specific to tuf right where we're invoking that library instead.

@@ -358,6 +368,10 @@ def configure(settings=None):
],
)

# For development only: this artificially prolongs the expirations of any
# Warehouse-generated TUF metadata by approximately one year.
settings.setdefault("tuf.development_metadata_expiry", 31536000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be used anywhere? Also, if we want to override this for development, we should set a default value to be used in production, and then override it in the dev/environment file.

@@ -84,12 +84,12 @@ def render_simple_detail(project, request, store=False):
f"{project.normalized_name}/{content_hash}.{project.normalized_name}.html"
)

length = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we can still return the length even if we're not storing the file, correct? Any reason not to return it here? Seems like this would be undefined behavior otherwise.

@@ -598,6 +599,7 @@ def includeme(config):
)
config.add_redirect("/pypi/", "/", domain=warehouse)
config.add_redirect("/packages/{path:.*}", files_url, domain=warehouse)
config.add_redirect("/metadata/{path:.*}", metadata_url, domain=warehouse)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary? Why do we need a redirect?

@@ -0,0 +1,242 @@
# General TUF Warehouse implementation Notes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably all move into our docs.

Comment on lines +35 to +41
# NOTE: This is a deviation from PEP 458, as published: the PEP
# stipulates that bin-n metadata expires every 24 hours, which is
# both burdensome for mirrors and requires a large number of redundant
# signing operations even when the targets themselves do not change.
# An amended version of the PEP should be published, at which point
# this note can be removed.
"tuf.bin-n.expiry": 604800,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is someone working on this?


@dev.command()
@click.pass_obj
def add_all_packages(config):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running this locally, I got:

Traceback (most recent call last):
  File "/opt/warehouse/src/warehouse/tuf/services.py", line 139, in get
    file_object = open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: '/var/opt/warehouse/tuf_metadata/1.bins.json'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/warehouse/src/warehouse/__main__.py", line 18, in <module>
    sys.exit(warehouse())
  File "/opt/warehouse/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/warehouse/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/opt/warehouse/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/warehouse/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/warehouse/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/warehouse/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/warehouse/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/warehouse/lib/python3.10/site-packages/click/decorators.py", line 38, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "/opt/warehouse/src/warehouse/cli/tuf.py", line 148, in add_all_packages
    config.task(_add_hashed_targets).run(request, targets)
  File "/opt/warehouse/src/warehouse/tasks.py", line 71, in run
    result = original_run(*args, **kwargs)
  File "/opt/warehouse/src/warehouse/tuf/tasks.py", line 58, in add_hashed_targets
    repository_service.add_hashed_targets(targets)
  File "/opt/warehouse/src/warehouse/tuf/services.py", line 428, in add_hashed_targets
    bin_n = self._load(RoleType.BINS.value)
  File "/opt/warehouse/src/warehouse/tuf/services.py", line 219, in _load
    return Metadata.from_file(role_name, None, self._storage_backend)
  File "/opt/warehouse/lib/python3.10/site-packages/tuf/api/metadata.py", line 233, in from_file
    with storage_backend.get(filename) as file_obj:
  File "/usr/local/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/opt/warehouse/src/warehouse/tuf/services.py", line 142, in get
    raise StorageError(f"Can't open {filename}")
securesystemslib.exceptions.StorageError: Can't open /var/opt/warehouse/tuf_metadata/1.bins.json

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this during the make inittuf?

Or did you follow the sequence:

$(WAREHOUSE_CLI) tuf dev init-repo

@brainwane
Copy link
Contributor

@kairoaraujo I see there are some questions from Dustin that I think you have not addressed yet; could you check? Thanks!

@kairoaraujo
Copy link
Contributor Author

@kairoaraujo I see there are some questions from Dustin that I think you have not addressed yet; could you check? Thanks!

Hi @brainwane, I will continue with that.
I was following/waiting for the discussion on the PEP 458 Design Document to avoid some re-work.

@brainwane
Copy link
Contributor

@kairoaraujo I believe all the open questions in that design document have now been resolved; is there anything else you're blocked on before revising this pull request? Thanks!

@kairoaraujo
Copy link
Contributor Author

Hi folks
Yes, the design doc is ready, and I will continue to move PEP 458 implementation forward in the new year.

@avishayil
Copy link

Hi @kairoaraujo we're really excited about this. Anywhere we can see the current progress? milestones on the implementation?

@trishankatdatadog
Copy link
Contributor

Hi @kairoaraujo we're really excited about this. Anywhere we can see the current progress? milestones on the implementation?

Thanks for the interest! In fact, should the stakeholders (e.g., Lukas, Kairo, Dustin, Ofek, Donald, Marina, Sumana, etc) have a meeting to sync up?

@lukpueh
Copy link
Contributor

lukpueh commented Jan 23, 2023

Thanks for the friendly nudge, @avishayil! We just posted a status update to the "PEP 458 current status and next steps" -thread on Python discuss. Happy to provide more details! 🎉

@trishankatdatadog
Copy link
Contributor

Should we close this PR in favour of #13943? @miketheman @di

@miketheman
Copy link
Member

Superseded by #15241

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet