Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple distgit new packit api collection object #1838

Conversation

majamassarini
Copy link
Member

DO NOT MERGE.

This is just a SPIKE PR to propose multiple solutions on how to refactor the packit_api object to support monorepos.

related to #1618

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

✔️ pre-commit SUCCESS in 1m 52s
✔️ packit-tests-rpm SUCCESS in 12m 40s
✔️ packit-tests-pip-deps SUCCESS in 13m 56s
✔️ packit-tests-git-main SUCCESS in 13m 15s
✔️ packit-tests-pip-deps-sess-rec SUCCESS in 2m 50s
✔️ packit-tests-git-main-sess-rec SUCCESS in 3m 06s
✔️ reverse-dep-packit-service-tests SUCCESS in 3m 01s

@@ -82,52 +83,62 @@ def koji(
PATH_OR_URL argument is a local path or a URL to the upstream git repository,
it defaults to the current working directory
"""
api = get_packit_api(
collection_api = get_packit_api(
config=config, dist_git_path=dist_git_path, local_project=path_or_url
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original idea was to turn dist_git_path into a Union[str, list[str]], save this internally in a dist_gits: list[str] attribute (first creating the list if only a str has been provided), and refactor the current dg attribute to be a property returning the only path in the list, or raising an "not supported" exception if there are more dist-git paths.

The above change would be backwards compatible for all current usage. The work to enable using multiple dist-git paths would be done in follow up issues, updating Packit API endpoints and CLI commands one-by-one, and adjusting packit-service, if needed.

Could something like the above be implemented?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I started working on this I was also thinking of changing PackitAPI internally but then I saw this code

remote_hostname in package_config.dist_git_base_url

It seems to me that the relationship between dg object and lp_upstream object in PackitAPI is decided outside the PackitAPI class and for this reason I changed PackitAPI externally, creating a collection. If we can get rid of that line, I think I can do what you described above but otherwise I think it is not that easy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the case. The block of code you are referring to figures out whether Packit's working directory is an upstream Git repo or a downstream dist-git repo, and sets lp_upstream and lp_downstream accordingly (which later on are going to be used to construct api.dg and api.up.)

This should be probably updated so that when local_project is a dist-git repo, and there are multiple packages configured, PackitAPI is created so that it's limited to work only on the package corresponding to the dist-git repo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, tomorrow I will try to change that code and I will try to create a list of dg inside the PackitAPI object.

Copy link
Member

@lachmanfrantisek lachmanfrantisek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I like about this approach:

  • Amount of code changes.
    • (Read: very little needs to be changed -- just an extra for loop + new CLI argument/parameter for each subcommand and the config conversion that can be shared across the subcommands.)
  • Being able to implement support step by step without any huge change.
  • As Maja mentioned to me yesterday, this approach will be easier to use in the service.
    • We can define multiple jobs when parsing config and not need to change handlers.
    • I originally thought that it would be beneficial to have info about all the packages together inside a handler, but I've changed my mind after yesterday's discussion with Maja -- it's probably better to let the jobs be handled independently and live their own life. We don't need to mix PRs or Koji builds for different packages. The only question is Bodhi where we might want to have one Bodhi update being created for all, but that's probably still solvable.
  • We can easily specify what package we work with (or with all) on top of the core functionality.
  • No new code is added to API class. (It's already complicated and adding a new level/code would not help.)

Comment on lines +112 to +113
except AttributeError as e:
if "there is more than one package in the config" in str(e):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we can avoid parsing exceptions when working with our own objects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, if we choose to go this way I will improve it :)

@@ -82,52 +83,62 @@ def koji(
PATH_OR_URL argument is a local path or a URL to the upstream git repository,
it defaults to the current working directory
"""
api = get_packit_api(
collection_api = get_packit_api(
config=config, dist_git_path=dist_git_path, local_project=path_or_url
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the case. The block of code you are referring to figures out whether Packit's working directory is an upstream Git repo or a downstream dist-git repo, and sets lp_upstream and lp_downstream accordingly (which later on are going to be used to construct api.dg and api.up.)

This should be probably updated so that when local_project is a dist-git repo, and there are multiple packages configured, PackitAPI is created so that it's limited to work only on the package corresponding to the dist-git repo.


logger.debug(f"job_config_index: {job_config_index}")
if job_config_index is not None and isinstance(package_config, PackageConfig):
if job_config_index >= len(package_config.jobs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me, that this won't work when there are multiple packages defined: a package_config taken from package_config.packages is going to be a CommonPackageConfig and won't have a jobs attribute.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I missed it. But I think we can create a new object, let me call it PackageConfigView which has all the jobs and attributes used by a package and taken from a MultiplePackages config.

I made the same mistake also in the new PR: #1840, but I think it easy to fix.

@csomh
Copy link
Contributor

csomh commented Jan 25, 2023

The reason I don't like this approach is that it leaks functionality (breaking up the configuration into individual package configs) related to handling monorepos from PackitAPI to get_packit_api(), and does it in a way which is not reusable.

I would find an implementation, where handling monorepos (including interpreting the configuration) is enclosed in PackitAPI objects (that is: a single API object holds references to the upstream repo and all the downstream repositories) easier to reason about than having an API object for each package. Of course PackitAPI methods (even __init__(), if that's required) should receive an argument to be used to limit their scope to selected packages, if so desired.

I don't see why such an approach wouldn't have the same benefits that @lachmanfrantisek mentioned above (including creating one task per package).

That said, I still can see the approach implemented in this PR to work (although my gut feeling is that it'll require more extensive code changes in Packit Service), but then there should be a mechanism (utility function or method) to reduce a PackageConfig object to include a single package configuration (including the filtering and transformation of the relevant jobs).

@lachmanfrantisek
Copy link
Member

For completeness, I've suggested Maja try to provide multiple approaches and write down the cons/pros for each so we can objectively compare those. And also, when trying to work on that, Maja will see what is the easiest one to do. (I have some feelings but I might easily be wrong since I haven't tried to implement that for real.)

Regarding dist_git_base_url, I don't think it's the best option to start with since it sounds more like a config option of one package than a true monorepo -- e.g. similar to a dist-git branch. But I am not saying we can't implement that after all, just that I would not be too focused on this use case for now.

@majamassarini
Copy link
Member Author

Today I have worked on this possible version of refactoring #1840.

Tomorrow, as I said, I will try to change PackitAPI internally. And then I will let you know which is my favourite implementation and why :D

@majamassarini
Copy link
Member Author

I close this PR, it was just an example for supporting the SPIKE on how to handle packit refactoring for monorepo.

Related research could be found here: https://github.com/packit/research/tree/main/monorepo-support/refactoring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants