Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monorepo / Monobuild support? #936

Closed
2 tasks done
epage opened this issue Mar 5, 2019 · 61 comments
Closed
2 tasks done

Monorepo / Monobuild support? #936

epage opened this issue Mar 5, 2019 · 61 comments
Labels
status/duplicate Duplicate issues

Comments

@epage
Copy link
Contributor

epage commented Mar 5, 2019

  • I have searched the issues of this repo and believe that this is not a duplicate.
  • I have searched the documentation and believe that my question is not covered.

Feature Request

The goal is to allow a developer to make changes to multiple python packages in their repo without having to submit and update lock files along the way.

Cargo, which poetry seems to be modeled after, supports this with two features

  • Path dependencies. You can specify where to find a dependency on disk for local development while still supporting version constraints for public consumption
  • Workspaces let you define a group of packages you can perform operations on, like cargo test --all will run tests on all packages in a workspace.

It looks like some monobuild tools exist for python (buck, pants, bazel) but

  • Some don't support all major platforms
  • They assume you are not using pypi and do not publish to pypi and expect all dependencies to be submitted to SCM.
@floer32
Copy link

floer32 commented Mar 14, 2019

[I'm just somebody driving by]

I've had to deal with the monorepo (anti)pattern1️⃣ and I agree it'd be very helpful to have help in dealing with it and packaging it; but also have learned it's really hard to do good things for monorepo without making compromises to normal patterns.

To me it sounds like this could be hard to get into Poetry, at least right from the start. It's easier to imagine another library where someone chooses an option like buck/pants/bazel and then creates a helper to make that work well with Poetry and vice/versa

1️⃣: it's not always an antipattern, I know. too often, it is, and many best practices are abandoned. So it can make it hard to develop monorepo-related features without specific good examples that are targetted for support. TLDR could be good to link to an OSS example (or contrive one and link to that)

@epage
Copy link
Contributor Author

epage commented Mar 14, 2019

I understand. I hate the religious view taken towards monorepos. I have found what I call mini-monorepos to be useful, small repos that serve a single purpose.

For example, in the Rust world, I have a package for generic testing of conditions, called predicates. I've split it into 3 different packages, predicates-core to define the interfaces, predicates, and predicates-tree for rendering predicate failures in a way similar to pytest. I did these splits so that (1) people aren't forced into dependencies they don't need and (2) Rust is thankfully strict on semver and so the split also represents a splitting of compatibility guarantees. It is more important to provide a stable API for vocab terms (predicates-core) than for implementations that aren't generally passed around.

I specially suggested continuing to follow after Cargo's model, like poetry has done in other ways, for monobuild support rather than getting into the more complex requirements related to tools like buck/pants/bazel (fast as possible, vendor-all-the-deps, etc). If someone needs the requirements of those tools, they should probably just use those tools instead. From at least my brief look, it seemed like they don't do a good job of interoperating with other python build / dependency management systems.

@davidroeca
Copy link

Also would love support here! Specifically for developing a library that has different components each with their own set of potentially bulky dependencies (e.g. core, server type 1, server type 2, client, etc.). Also helpful when trying to expose the same interface that supports different backends (similarly, you don't want to install every backend, just the one you want).

The only OSS library I could find that emulates this approach is toga -- it would be great if poetry could handle dependency resolution for these sorts of libraries.

The toga quickstart explains how the dependencies are managed with a bunch of setup.py files.

@NGaffney
Copy link

NGaffney commented Apr 3, 2019

I'm interested in this also. How far does the current implementation of editable installs get you towards your use case?

@epage
Copy link
Contributor Author

epage commented Apr 3, 2019

I'm interested in this also. How far does the current implementation of editable installs get you towards your use case?

Use cases

  • I can make changes across packages in the same commit
  • My CI can publish each package as a wheel
    • Requires using local dependencies for the build while publishing the dependency with a version
  • I can validate a change in the bottom of my stack does not break things higher in my stack

Rust can solve this at two levels

  • You can mix path and version dependencies. The path dependency wins for local builds but that is stripped when published and only the version dependency is used.
  • A workspace allows running commands (build, test) across multiple packages. I think this might also implicitly set path dependencies, I'm unsure.

So regarding the first, editable installs might cover this if you can mix path dependencies with version dependencies which is the key feature needed to partially handle my specified use cases.

@NGaffney
Copy link

NGaffney commented Apr 4, 2019

I tried some of this out today and it looks like the first feature you describe is nearly, but not quite supported.

You can declare a dependency both as a dev and non-dev dependency, when you do this the dev version should take precedence (at least that's how I interpret this), allowing a package to be installed as editable during local dev. Then for final build the --no-dev flag would remove the dev version from consideration.

I've made it work with a toy example which contains a path dependency in dev and non-dev mode but not for a local vs pypi dependency.

@guillaumep
Copy link

guillaumep commented May 20, 2019

I am currently trying to find a way to build a Python monorepo without using heavy-duty tools like Bazel.

I have seen a repository which uses yarn and lerna from the JavaScript world to build Python packages:

https://github.com/pymedphys/pymedphys

Using yarn you can declare workspaces, and using lerna you can execute commands in all workspaces. You can then use the script section of package.json in order to run Python build commands.

I've yet to try this technique with Poetry (and the pymedphys repository does not use it), however I feel it might be worth exploring.

@stale
Copy link

stale bot commented Nov 13, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Nov 13, 2019
@davidroeca
Copy link

I still think this could be useful

@stale stale bot removed the stale label Nov 13, 2019
@daveisfera
Copy link

yarn workspaces makes it possible to have a set of shared libs and that are used across multiple apps in a monorepo, so I think it's a good model to follow or at least borrow ideas from

@remram44
Copy link
Contributor

remram44 commented Jan 17, 2020

Something I am running into right now is that using Poetry with path dependencies is very unfriendly to the Docker cache.

To correctly leverage the cache (read: don't install all dependencies from scratch on every build) I would want to install the dependencies first, and then copy my own code over and install it. However Poetry refuses to do anything (even poetry export -f requirements.txt) if my code has not already been copied in.

I am considering writing my own parser for the poetry.lock file, to pip install my dependencies from there before I copy my code over.

One of two things could help me:

  • Actually have some support in Poetry for this (if it knew my path = ... dependencies were part of the workspace, some command could install the other dependencies while ignoring those)
  • Just have poetry export only need pyproject.toml and poetry.lock, and work even if the code is not there (I'm not sure why it needs it right now)

@finswimmer finswimmer added the kind/feature Feature requests/implementations label Jan 18, 2020
@kbakk
Copy link
Contributor

kbakk commented Mar 7, 2020

It will be interesting to see how #1993 will be solved - it does imply the Poetry repo becoming a kind of monorepo, right? Edit: Err, sorry, I just assumed it would be done in the current repo, but it seems to not follow a monorepo structure (python-poetry/core)

@TheButlah
Copy link
Contributor

TheButlah commented Sep 16, 2020

Any progress on this feature request? I've been using poetry for about a year now inside a monorepo in a research lab. We have a single global pyproject.toml that dictates the global set of dependencies since there isn't a way to have multiple separate pyproject.toml files (like cargo workspaces).

Needless to say, this creates a lot of pain, because we continually experience problems when one developer wants a dependency that is incompatible with another developer's code. As a result, most of the researches go off the grid and don't use poetry at all, instead having their own venvs that only work on their machines.

@abn
Copy link
Member

abn commented Sep 16, 2020

Related: #2270

@TheButlah this is something I wish to pick up sometime this year. Would be great to make sure the use case is listed in the issue.

@KoltesDigital
Copy link

KoltesDigital commented Nov 26, 2020

I've quickly made a prototype for handling monorepos, with success. In fact it nearly works out of the box already! Are you interested if I make a PR?

I put here some unordered details and thoughts. I don't guarantee it'd resolve all cases, but at least it's fine for my needs. I'm basing this workflow on the one I use for JS/TS projects using yarn and lerna.

Repo structure:

./
 |- .git/
 |- packages/
 |   |- foo/
 |   |   |- sources (foo.py or foo/__init__.py, with tests...)
 |   |   |- pyproject.toml
 |   |- bar/
 |   |   |- sources (...)
 |   |   |- pyproject.toml
 |- pyproject.toml

The idea is to have a private/virtual pyproject.toml at the root. Each package has its own pyproject.toml specifying the package name, version, dependencies, as usual. I could have created one virtualenv per package, but I find too cumbersome (I'm using VS Code and it would be very annoying to switch virtualenvs every time I change the file I'm working on), so I created a global virtualenv and that's the purpose of the root pyproject.toml.

This file contains all the project packages as sole dependencies (using the format foo = { path = "./packages/foo", develop = true }). Doing so, poetry resolves the dependencies of the packages. If bar depends on foo, packages/bar/pyproject.toml shall declare a regular version constraint (foo = "^1.2.3"). The root pyproject.toml has the dev dependencies (black, pytest...) since they are used for all packages, however the packages may have some as well (read below for CICD).

There is often the need for executing a command in every package directory. For instance, and this is already working, running poetry build or poetry publish from packages/foo successfully builds or publishes the package. Actually for these two use cases, I'm proposing new commands poetry packages build/publish, but a generic poetry packages exec -- args... would allow for doing anything else.

On the CICD server, it makes sense to create one virtualenv per package, at least for one thing: checking that each package only imports other packages that are declared as its dependencies. In that sense, it may make sense for packages to have their own dev dependencies, e.g. if they have some specific tests which are not shared with others.

I'm used to delegate version bumping to the CICD pipeline: every merged PR will automatically bump versions and publish packages, based on what has been changed (using conventional commits spec). Dependent packages need to be bumped as well: if bar depends on foo, bar hasn't changed but foo has some changes of any kind, bar still needs to get a patch bump. Actually this is only true if bar depends on the current version of foo: if an earlier version is specified, the dependency constraint is not updated.

Lerna takes care of git adding the changed files (package.json and CHANGELOG.md for every changed package) and can even git push... but thereafter, one may want to customize the commit message, so there's an option for that... well I think it's too much, the CICD pipeline can do it as well. I'd be happy if poetry could just bump versions, append messages in changelogs, and after that I can git add --all and git commit myself.

For publishing, lerna can retrieve the package versions on the registries, and publishes only the new ones. Here in Python this is tedious, since PyPi (warehouse) and pypiserver do not have a common route to get version info. My hack for now is just to publish everything, and I ignore 409: Conflict errors.

Caveats: packages shall not declare conflicting dependencies if there's only one global virtualenv.

Proposition

Again, what I'm proposing here addresses my needs, but this feature should fit it other use cases as well, so please give feedback about what you would need for your own workflow.

I prefer not to use "monorepo" in the names, which is too coercive.

New commands:

  • poetry packages bump - with the logic described above.
  • poetry packages clean - not sure about this one. In my prototype it removes dist and __pycache__ in each package, but one may have other files to clean as well, so I'm hesitating: should the list of files to be cleaned be customizable, and how? Or should the developers rely on poetry packages exec -- rm ...?
  • poetry packages exec -- ...
  • poetry packages new <name>
  • poetry packages publish
  • poetry packages show
  • poetry packages show dependencies - not sure about the actual command syntax, but the need is to show the packages with their dependencies, with selectable format. FTR I made this package and the dot format could be nice to have built-in too.
  • poetry package <name> add / remove ... - manages dependencies for each package.

New pyproject.toml section:

[tool.poetry.packages]
paths = ["packages/*"] # used to specify where to find packages
version = "independent" # optional flag to let each package have their own version. Without this flag, all packages get the same version.

@maneetgoyal
Copy link

maneetgoyal commented Dec 6, 2020

I am currently trying to find a way to build a Python monorepo without using heavy-duty tools like Bazel.

I have seen a repository which uses yarn and lerna from the JavaScript world to build Python packages:

https://github.com/pymedphys/pymedphys

Using yarn you can declare workspaces, and using lerna you can execute commands in all workspaces. You can then use the script section of package.json in order to run Python build commands.

I've yet to try this technique with Poetry (and the pymedphys repository does not use it), however I feel it might be worth exploring.

Yes indeed pymedphys was using the yarn script section at some point (till their v0.11.x) but then restructured their repo and migrated to using poetry only. Trying to dig into their justification for this move.


Edit: Some hints in pymedphys/pymedphys#192 (comment). Have requested the author to shed some more light though.


Edit 2: Thanks to SimonBiggs. pymedphys/pymedphys#192 (comment)

@hpgmiskin
Copy link

@KoltesDigital I would really like to try out the modifications you have made to allow poetry to manage the dependencies for a monorepo.

Might you be able to share a branch to see these changes? You might have already shared but I just could not find it. Thanks!

@KoltesDigital
Copy link

@hpgmiskin thanks for your interest! My prototype is more a workflow PoC, I actually haven't changed poetry yet, instead I just added some scripts on top of it. These scripts are in JS/TS, and of course the final solution should only use Python. Moreover, I haven't implemented my whole proposition, and to do so I'll have to modify poetry.

So before doing this work, I want to first have some feedback from contributors, in order to know whether they're ok with the direction I'm heading to.

@remram44
Copy link
Contributor

What about having Poetry correctly update its lock file when a sub-pyproject.toml file changes, without having to run poetry lock from scratch? Or having all dependencies be collected in a single top-level poetry.lock (and possibly a single virtualenv), à la Cargo? Or installing dependencies without the code being there, for Docker cache and build system friendliness?

Those commands are nice, but I'm unlikely to use them, and I'd rather see the monorepo use-case be properly supported as a first step, and the opinionated utility commands (and CI integration) added as a second step.

@davidroeca
Copy link

@remram44 that alone would be a big step forward

@KoltesDigital
Copy link

@remram44 it's actually what happens with the root-level pyproject.toml, it creates a single virtual environnement and leads to a single root-level poetry.lock. It's indeed the same with Cargo workspaces, and Yarn workspaces. And this is already working without any of my additions, you can try the repo structure I described.

I also believe that poetry should not reinvent the wheel, and leave the things that CICD does best. That's why I mentioned that I prefer invoking git myself (i.e. the CICD takes care of that), because every CICD pipeline is different. But IMHO bumping versions of the subpackages is something everybody will need, that's why I propose to make this part into poetry. Or as an alternative, as a plug-in to poetry. After that, users can version, publish to you, etc. the way they want.

@remram44
Copy link
Contributor

If I run poetry add in a subdirectory, it will generate a poetry.lock there instead of updating the root-level one. Same with running poetry run or poetry shell in a subdirectory. This is different from what workspace-aware package managers do.

@KoltesDigital
Copy link

@remram44 exactly, that's why I've proposed new root-level commands poetry package <name> add/remove, which mimics Yarn or Cargo CLI features.

My experience with monorepos is that the users should not run commands from subdirectories. If one were to add from a subdirectory with either Yarn or Cargo, this would create undesired lock files too. And this would be reasonable. I don't expect a project manager tool to find out if the current project is actually part of a larger monorepo. Monorepo settings define which subdirectories are to be considered as subpackages (workspaces in package.json, Cargo.toml, and in my proposition), not the other way around.

@remram44
Copy link
Contributor

Cargo gets this right, I don't know why you say this is unreasonable. The shortcomings of some tools are no argument to ignore the behavior of working tools.

@KoltesDigital
Copy link

@remram44 interesting, I wasn't aware of this feature of Cargo. But well, we're having different views about what should the monorepo support look like. Our two views are different, but not exclusive, so let's have both.

@gerbenoostra
Copy link

@alecandido Multiple PyPI (distribution) packages in the same project would indeed be a way and nice addition. I however see that as an extension to having packages refer to each other. Projects need to be able to refer to each other, and on top of that, one could add a container project, that distributes all those subprojects in one go.

The downsides of distributing a single package, mainly dependencies & metadata, prevent me from moving to a monolith package. Now I resolve to independently built packages. But it comes with all the typical downsides of multi repo approach (MRs on multiple repo's, needing to update versions on each one every time)

Just having independent projects referring to each other (thus in my example, having both a src/pkg1/pyproject.toml and a src/pkg2/pyproject.toml) would already help.

@gerbenoostra
Copy link

I see two different aspects within the poetry monorepo:

  • the possibility to have 1 parent poetry project that contains/refers to two child poetry projects. This would allow to share configuration across multiple projects, and allow building them all with one poetry command. Thus a root pyproject.toml and multiple package-i/pyproject.toml.
  • the possibility to have multiple releasable poetry projects in one git repo, where they depend on each other through a path dependency, while still allowing them to be built as wheel & sdist with version dependency. Thus a package-b/pyproject.toml that depends on a package-a/pyproject.toml by path="../package-a. However, the poetry build should replace that dependency by a version, allowing the individual packages to be pip installed.

To workaround the missing poetry functionality regarding the second point, I've created an example repo at https://gitlab.com/gerbenoostra/poetry-monorepo .

It implements two different approaches:

  • Before running poetry build, apply sed to the pyproject.toml to replace the path=.. dependency by the compatible version range.
  • Run poetry build first, then apply sed to the metadata files within the sdist & wheel files to replace the path dependencies by the compatible version range.

Any feedback and/or suggestions would be useful.

@johnbendi
Copy link

I think when considering a monorepo strategy there are quite several issues to consider including dependency management and the delineation between components/libraries and projects. This blog post gets really close to my understanding of how a monorepo should be structured and maybe the discussion can build upon the work done there:
Opendoor Labs Monorepo strategy

@michaeloliverx
Copy link

I think pnpm from the JavaScript ecosystem really got this right and many popular open source projects adopted it. It might be worth a look for some inspiration.

@gerbenoostra
Copy link

gerbenoostra commented Sep 1, 2022 via email

@DavidVujic
Copy link

DavidVujic commented Sep 1, 2022

The code in this PR would enable Workspace support in Poetry.
We would then be able to have Python code in a monorepo, containing projects - each one defined by a pyproject.toml - and reuse shared Python packages. The shared packages are allowed to be in a folder structure relative to the project (and not needed to be a subfolder to the pyproject.toml file referring to the package).

The architecture I would recommend for monorepos (if this feature would be enabled) is called Polylith. I have linked to a post describing it further up in this thread.

@neersighted
Copy link
Member

Going to fold this into #2270 as we have two parallel issues -- feel free to continue the discussion here, but I'd like to formally track this category of feature in one place.

@neersighted neersighted closed this as not planned Won't fix, can't repro, duplicate, stale Oct 4, 2022
@neersighted neersighted added status/duplicate Duplicate issues and removed kind/feature Feature requests/implementations labels Oct 4, 2022
@gerbenoostra
Copy link

Above I mentioned that I tried to work around poetry's limitations, and still use poetry in a mono repo. I've finally took some time to improve the repo, and also have written a blogpost to explain the approach.
Perhaps these utility scripts are beneficial.
Blogpost: https://gerben-oostra.medium.com/python-poetry-mono-repo-without-limitations-dd63b47dc6b8
Example repo: https://gitlab.com/gerbenoostra/poetry-monorepo/

@kapilt
Copy link

kapilt commented Mar 6, 2023

@gerbenoostra fwiw I went ahead with a plugin to do the alternate version mentioned in your blog post (modify the wheels from lock files as a post build / pre deploy step). its got some additional primary goals (freezing versions in wheels) https://github.com/cloud-custodian/poetry-plugin-freeze

@DavidVujic
Copy link

As another alternative, there's a plugin that handles the wheel & sdist packaging by doing work just before the poetry build command kicks in. It's a plugin called Multiproject, that is the foundation for making architectures like Polylith possible in Python. I'm an advocate for that type of architecture 😄

If you want to know more, here's the docs (Multiproject is of course referenced in there): https://davidvujic.github.io/python-polylith-docs/

@gerbenoostra
Copy link

@gerbenoostra fwiw I went ahead with a plugin to do the alternate version mentioned in your blog post (modify the wheels from lock files as a post build / pre deploy step). its got some additional primary goals (freezing versions in wheels) https://github.com/cloud-custodian/poetry-plugin-freeze

A really nice to the point solution without other assumptions.
As you describe, it solves the application building side (pinning all versions), and not the library approach (allowing version ranges).
But as mono repos are typically (though perhaps not always) about creating applications, instead of creating externally shared libraries, this probably works most of the time.

@gerbenoostra
Copy link

As another alternative, there's a plugin that handles the wheel & sdist packaging by doing work just before the poetry build command kicks in. It's a plugin called Multiproject, that is the foundation for making architectures like Polylith possible in Python. I'm an advocate for that type of architecture 😄

If you want to know more, here's the docs (Multiproject is of course referenced in there): https://davidvujic.github.io/python-polylith-docs/

Nice to see multiple initiatives here. I like the single command here, just build it and one get's a full library.
Note that

the poetry build-project command will:
copy the actual project into a temporary folder.
collect relative includes - such as include = "foo/bar", from = "../../shared" - and copy them into the temprary folder.
generate a new pyproject.toml.
run the poetry build command in the temporary folder.
copy the built dist folder (containing the wheel and sdist) into the actual project folder.
remove the temporary folder.
One side note is that one should be aware that one actually builds a copy of direct path dependencies into the wheel. Instead of depending on a package name with version.
With transitive dependencies, or with reusing libraries, one needs to work around it, as you describe.

Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status/duplicate Duplicate issues
Projects
None yet
Development

No branches or pull requests