Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

Repos, repos, repos #2446

Closed
achingbrain opened this issue Sep 11, 2019 · 32 comments
Closed

Repos, repos, repos #2446

achingbrain opened this issue Sep 11, 2019 · 32 comments

Comments

@achingbrain
Copy link
Member

Making changes in js-ipfs typically involves changing this repo, the ipfs-http-client, interface-js-ipfs-core and also ipfds-ctl (because of it's dependency on ipfs-http-client.

We have these as separate modules but they are really modular in name only due to the cross-dependencies indicated above.

This has had a number of consequences:

  1. Developers have to coordinate PRs across all of these repos which is painful for experienced contributors and completely bewildering for new contributors
  2. It's impossible to have one PR that encapsulates a changeset - this makes it hard to test, hard to review, hard to integrate, hard to release and hard to roll back
  3. There is no one build that says "Yes, this change is ok" which results in a lot of manual work and makes automation harder because you have to juggle multiple versions of multiple modules
  4. There is a large amount of duplication across these repos, from tests to utility methods (js-ipfs-utils is helping with the utility methods but is also compounding the problem).
  5. Coordinating shared dependency updates across these modules becomes a Byzantine task (e.g. see how out of date libp2p, ipld, are at the moment etc)

Since this project does not include the libp2p and IPLD codebases it should be a relatively thin layer on top of them, yet development is very complex.

We've started talking about re-integrating some core modules in #2222 but we should go further than that.

⚒️ Break the ipfsd-ctl/ipfs-http-client dependency

Possible solutions:

a. Make ipfs-http-client a peer dependency of ipfsd-ctl
b. Require an implementation to be passed in the same way as with in-proc js-ipfs daemons

💍 One PR to rule them all

Combine the repos. One way to do this might be adding the client/ipfsd to core:

const IPFS = require('ipfs')
const IPFSClient = require('ipfs/client')

...

✅ Pro: Simple to use
✅ Pro: Works with existing unspecified tooling that assumes one-module-per-repo
❌ Con: If you only consume the client you'll end up installing lots of dependencies you don't need
❌ Con: Separate PRs to interface-js-ipfs-core still required unless you add that in too
❌ Con: Poor separation of concerns
❌ Con: All-or-nothing releases

Another way might be a lerna-style mono-repo a la babel:

$ ls -l
js-ipfs/
├── package.json
├── lerna.json
└── packages/
    ├── ipfs
    ├── ipfsd-ctl
    ├── ipfs-http-client
    └── interface-js-ipfs-core

✅ Pro: One change, one PR
✅ Pro: Existing dependant code does not break because module names do not change
✅ Pro: Still able to release individual modules independently
✅ Pro: Easier to make, review and release changes
✅ Pro: Easier to contribute to
✅ Pro: The tooling figures out cross-package dependencies, no more npm link required
✅ Pro: Clever enough to figure out which package depends on which and only test/release what's changed
❌ Con: In the worst case tests take as long as the slowest module
❌ Con: Cannot group GH issues by package
❌ Con: Tools that assume one-module-per-repo will need fixing
❌ Con: When adding new dependencies, it's easy to forget to add them to the right package due to dependency hoisting

cc @ipfs/wg-js-core and @ipfs/repos-javascript

@hugomrdias
Copy link
Member

hugomrdias commented Sep 11, 2019

Con: When adding new dependencies, it's easy to forget to add them to the right package due to dependency hoisting

Aegir to the rescue aegir deps running in CI

@vasco-santos
Copy link
Member

❌ Con: Cannot group GH issues by package

I also see this as an opportunity to start using labels instead. IMHO, one single repo with issues, which we can filter through tags is more efficient to maintain

@achingbrain
Copy link
Member Author

one single repo with issues, which we can filter through tags is more efficient to maintain

I can see that both ways, and it's been a sticking point for adoption in the past, but I do notice that js-ipfs is the gateway for most issues/bug reports for IPFS/IPLD/libp2p so part of maintaining it would be to triage incoming issues and move them to the appropriate repos (e.g. libp2p/js-libp2p, ipld/js-ipld, etc) so it's not like it'd be just one repo with hundreds of issues.

@daviddias
Copy link
Member

daviddias commented Sep 12, 2019

On:

image

You can pass a http-api-client today as an argument:

IpfsClient - A custom IPFS API constructor to use instead of the packaged one

https://github.com/ipfs/js-ipfsd-ctl/#ipfsfactory---const-f--ipfsfactorycreateoptions

So it should just be a question of upgrading the tests to do it

@achingbrain
Copy link
Member Author

N.b 'you can pass' is not the same as 'require an implementation to be passed'. Sometimes you've got to be cruel to be kind.

@daviddias
Copy link
Member

How does it differ for the goal you described?

@achingbrain
Copy link
Member Author

We'd remove js-ipfs-http-client as a dependency and throw an error if an implementation wasn't passed. The developer would be faced with a failing test that's at least easy to fix and we wouldn't see as much churn in js-ipfsd-ctl releases.

@daviddias
Copy link
Member

@achingbrain what about the 260 repos using it -- https://github.com/ipfs/js-ipfsd-ctl/network/dependents?package_id=UGFja2FnZS0xNDQ1MDU1Mw%3D%3D -- plus the ones in the future? Is it worth adding one more step for them in trade for us having a angry test script shouting so that we don't forget a good practice? I'm sure we can do this in a better way.

@achingbrain
Copy link
Member Author

Most of those repos seem to use ipfsd-ctl as a convenience to start an IPFS node in one step, which I guess is more of a reflection on our API than anything else.

This is (as I see it) the least contentious part of this conversation, shall we take it to a dedicated issue?

@achingbrain
Copy link
Member Author

An example of a contributor who is confused by our module setup: ipfs-inactive/js-ipfs-http-client#1034

If this were one repo with one PR-per-change, he would find it easier to contribute.

@achingbrain
Copy link
Member Author

achingbrain commented Sep 23, 2019

You can pass a http-api-client today as an argument:

Just noticed this isn't even the case for all IPFSd types (at the time of writing) - Client and Daemon types support IpfsClient, for InProc the opt is called IpfsApi. 🤦‍♂

@achingbrain
Copy link
Member Author

An example of the problem: right now there are changes merged into interface-js-ipfs-core master to support a PR to js-ipfs - they require the PR to be merged for the tests to run successfully.

Other PRs to js-ipfs that need changes made to interface-js-ipfs-core are now blocked by the first PR.

One solution is to have the deps in your js-ipfs PR point to branches but this just delays the problem, as someone still has to sit down and merge & release all the dep branches - if problems occur during this process the other PRs become blocked again.

@daviddias
Copy link
Member

daviddias commented Sep 28, 2019

On #2446 (comment)

Just noticed this isn't even the case for all IPFSd types (at the time of writing) - Client and Daemon types support IpfsClient, for InProc the opt is called IpfsApi. 🤦‍♂

@achingbrain that seems to be a lapse. A JS IPFS in process node inside the browser has no HTTP API and therefore needs no IPFSApi. As you can see, it doesn't even get used https://github.com/ipfs/js-ipfsd-ctl/blob/master/src/factory-in-proc.js#L87

PR: ipfs/js-ipfsd-ctl#384

@achingbrain
Copy link
Member Author

I suppose we could spin this round and ask if anyone thinks it actually helps developer productivity and new contributor onboarding to have to submit/review/update (potentially) three or four PRs to fix an issue or add a new feature?

@daviddias
Copy link
Member

@achingbrain on your study, make sure to not fall into the trap of running a uni-variate analysis. Avoid it by including the multiple requirements, goals and why we have the current set up that we have today, how that affects the multiple types of users and contributors and how important each is.

Note that I don't suggest this with the intent of adding a block on your path, quite the contrary, more of a stair. This is a discussion that has been going on for years and it was never carefully reviewed beyond some venting here and there. Personally, the option to get the http-client and js-ipfs in the same repo is growing on me, specifically because then the interface tests are all in one place. That said, I don't think lerna is the way to go.

@hugomrdias
Copy link
Member

Which alternatives to lerna do you think are the way to go?

Copy link
Contributor

As a relative newcomer and advocate for the community, just confirming that this feels like a really big hurdle to contributions. I wasn't involved in the history that got us to this state, so I'll leave it others to define and evaluate the the trade-offs, but a huge +1 from me from the perspective of supporting new contributors.

@AuHau
Copy link
Member

AuHau commented Nov 29, 2019

Another sub-con of "Tools that assume one-module-per-repo will need fixing" that I have stumbled upon recently is that you are not able to consume forked/in-process packages using git+https as npm expects one-module-per-repo.

It might not be such a problem as the proposed packages to be merged are "on top of the chain", yet still it is a potential inconvenience.

@achingbrain
Copy link
Member Author

achingbrain commented Dec 5, 2019

As an example of the complexity we have wrought, here's a graph of (maybe) all the PRs required to land UnixFSv1.5 metadata (the ones with #? after haven't had PRs opened yet but will need them).

That's adding two fields to the unixfs data type, some flags to ipfs add and some mfs commands to manipulate them.

The arrow shows a 'depends on' relationship for each PR.

image

@daviddias
Copy link
Member

daviddias commented Dec 5, 2019

image

Don't feel shy of coalescing all the modules indicated by the yellow arrows. It was what was majorly proposed by #2222 Everyone is in support of the changes highlighted by the yellow arrows :)

The pink boxes was just a way for me to highlight that there might be some circular dependencies there that are not required.

@achingbrain
Copy link
Member Author

achingbrain commented Dec 5, 2019

Modules like @textile/ipfs-lite and ipfs-only-hash use the importer and/or exporter. I like to think there will be more experiments one day that are based on our stack - these would be harder to produce if everything is part of the main ipfs module (though not if we go down the proposed Lerna mono-repo route since everything will still be published as a separate module).

Merging the importer, exporter and unixfs modules makes a lot of sense though - these are low-churn modules that are related in functionality and tend to change at the same time.

Folding mfs into core is also a good idea, but we should go all the way and include the utils, the rpc client and interface tests too as these are high-churn modules which also typically change at the same time and even after collapsing our tree as suggested, we still can't deliver one feature with one PR.

At our session on this at lab week everyone agreed that this is a good idea. I'd love to hear from @alanshaw though.

@daviddias
Copy link
Member

daviddias commented Dec 10, 2019

Merging the importer, exporter and unixfs modules makes a lot of sense though - these are low-churn modules that are related in functionality and tend to change at the same time.

Great, dooooo eeet :D

Folding mfs into core is also a good idea

Awesome, 🚢

but we should go all the way and include the utils, the rpc client and interface tests

To be reviewed/revisited. This should not block the other coalescing to happen though. Can we agree to make the coalescing in what we have agreement and evaluate the remaining after?

PS: When merging/coalescing, use the same technique that @jacobheun used for js-libp2p so that commit history is preserved

@hugomrdias
Copy link
Member

I would prefer to permanently solve this problem that haunts us every single day or else we are just delaying a thing that needs to happen, exactly like async/await we finally did it and everyone is super happy and ipfs better for it.

We just need to agree on which packages we want to publish standalone, merge the others and put all in one repo so we can finally make just one PR for a changeset.

@alanshaw
Copy link
Member

alanshaw commented Jan 4, 2020

I feel that if we do do this then it needs to be a lerna mono repo (or similar) i.e. multiple npm packages in one repo. I think module isolation is the best way to ensure developers build and test their work independently from other aspects of the system. Lumping everything in one repo breaks the isolation and gives developers rope to build highly connected tightly coupled, and interdependent code that is a nightmare to maintain and adapt over time.

We have been bitten again and again by changesets spanning multiple repos that become blocked by releases of partial changesets. At worst this completely blocks all work and at best it creates additional work for both the contributor and maintainer - i.e. unblocking requires the contributor to backport their changes to an earlier release and have the maintainer the review and release those changes.

Not only that, but the process to land a changeset requires a lot of admin work: We have to open and orchestrate releases of multiple PRs in repos maintained by different people and temporarily depend on git dependencies to prove CI passes upstream.

All this negatively effects our pace of development and for me this is one of the biggest problems I think we could solve by doing this.

You could argue that better communication between contributor and maintainer on when to release a PR would improve this situation but for multiple maintainers across timezones we can’t always guarantee merges and releases will happen in a short enough time to not cause blockages elsewhere. We also can’t really assume new contributors will do this correctly and it doesn’t solve the admin problem of having to open many PRs and thread them all together.

Caveat: My assertions on the benefits are not backed by experience. I’ve never used or contributed to a monorepo before...and so this is part of the reason why I’ve taken so long to respond on this.

Another part is that my gut feeling is that we shouldn’t do this. I have concerns around discoverability (only one entry point for many modules), PR size (I’d rather review multiple small PRs), interspersed commit history, the fact that the OSS community in the majority expects one repo per module, and a few other minor things...but ATM I don’t think that these are enough reasons to not do this and given that the rest of the team backs it already I feel like we should just try it and be sure to retrospect on it.

That said, I think we need to work out some details before we can initiate this though. I’ve made a small list:

  • How does lead maintainership work with a mono repo? Do we need to establish some rules for when it’s ok to merge PRs?
  • What are the changes required to AEgir to make this a reality?
  • Which repos would we bring into js-ipfs? How can we retain git history?
  • What would be the plan for issues? A label per module? Can we bulk transfer issues to js-ipfs?
  • I’m keen for this to not effect CI run times. Is this possible (at least temporarily until js-ipfs tests are fixed up)?
  • What will be the process for deprecating/redirecting/removing the old repos?
  • Are there any other downsides to contributors by doing this?

@mohe2015
Copy link

mohe2015 commented Jan 4, 2020

I think another option would be to request a new feature from Github that allows you to link PRs together (using branch name?). I don't know how useful this could be in practice but I think it should be something to consider.

@autonome
Copy link
Contributor

autonome commented Jan 6, 2020

@alanshaw's questions about repo collapsing are really good. I spent a long time straddling monorepo world (mozilla-central) and many external/adjacent projects in module or sub-project repos, so have a few thoughts about pros/cons and have seen how they were played out, and have felt the pain of both ways 😅

In my experience having single repo for large project really helps reduce the cognitive, workflow and coordination overhead we're trying to address here. But not for free, also brings new challenges ;) But things like group norms are easier to be consistent about, whole group activity is highly visible as a default (a new and different signal-to-noise challenge though!), and discoverability is much easier. And none of that prevents shipping modules separately a la Lerna.

I have concerns around discoverability (only one entry point for many modules)

What type of discoverability? As a consumer of individual modules, looking on Github, discoverability will be harder. As a consumer of modules on NPM, things like Lerna might close the gap.

As a core contributor I think discoverability is far better in collapsed repos. I went from monorepo to module-per-repo, and visibility and discoverability both feel soooo difficult now. Used to be that all the modules were in one place that I could navigate with local tooling. Now I'm always having to traverse boundaries that feel artificial, and always result in overhead - network, tooling, workflow to work across loads of repos.

PR size (I’d rather review multiple small PRs)

This was a real problem for a long time in gecko and Firefox development. As a "module peer" (kinda like "lead maintainer" - a code reviewer across the equivalent of loads and loads of repos), it was a nightmare.

However, the advent of six week cadenced release process made mega-patch style of development untenable and forced a change. Reviewers started demanding patches be broken up into smaller pieces and stacked, landed iteratively even if it meant things like interspersed commit histories and other trade-offs (landed but commented out, disabled tests, etc etc). Now that's the default for large changes, and the developer productivity tooling supports it via patch queues per issue, and other features. Definitely takes intentional practice from reviewers, and communication of those practices to people submitting PRs.

interspersed commit history

No silver bullet. Can be a big downside to collapsing repos, depending on what the history is used for.

I'd begin by identifying exactly why we find the history desirable for this project in particular, and then exploring mitigations and alternate solutions for those needs. Previously I've seen things like globally searchable history + commit message requirements help with history tracing, and also strong use of feature flags and/or backout-ability required (iow, transferring the contiguous part of a change from the commit history to the filesystem), which also helps enforce better in-repo modularization and avoiding the tight coupling that module-per-repo is so great at defending against.

the fact that the OSS community in the majority expects one repo per module

Should challenge this assumption. Repo per module is a fairly new phenomenon and doesn't seem at all settled as a best practice for all projects. It's super useful for some types of JS projects, but even that OSS sub-community is bumping up against the limitations of it as time goes by, as we're seeing here in our case. I wonder if Github has stats around this...

@achingbrain
Copy link
Member Author

How does lead maintainership work with a mono repo?

Given that js-ipfs, js-ipfs-http-client and interface-js-ipfs-core share a lead maintainer I don't see much changing.

Do we need to establish some rules for when it’s ok to merge PRs?

I don't think we need more than we have right now. Green CI & peer review conquers most all.

What are the changes required to AEgir to make this a reality?

Lerna's release tools are pretty good - it's clever enough to work out which modules depend on which so which need new releases for a given changeset so we'll hopefully sunset some of AEgir's functionality. We might need to extend or wrap it to do the automated contributors bit, off the top of my head.

For testing/linting etc, we just use npm scripts, that invoke aegir, the same as today.

Which repos would we bring into js-ipfs?

After collapsing things like js-ipfs-mfs, js-ipfs-utils and js-ipfs-multipart into core, I think it'd primarily be js-ipfs, js-ipfs-http-client and interface-js-ipfs-core. It'd then also be really easy to split the CLI and HTTP RPC interfaces into separate modules but also keep them in sync with core.

How can we retain git history?

In short, in the source repo you git mv the file/directories around until everything has the folder structure you want in the final repo, then git commit. Then add the source to the target repo as a new remote and do a git merge source/master --allow-unrelated-histories.

What would be the plan for issues? A label per module?

Per module or system component? Most people just open an issue on js-ipfs at the moment regardless of whether the problem is with that module or mfs or libp2p or whatever so again not a lot will change.

Can we bulk transfer issues to js-ipfs?

I believe so - it should be possible given that everything is under the ipfs GH org.

I’m keen for this to not effect CI run times. Is this possible (at least temporarily until js-ipfs tests are fixed up)?

Given a change that touches core, the http client and the interface test suite the total CI time will not change, because you have to run the tests of all those modules to get a changeset through anyway. It might feel like it takes longer but that's just because you are running them all together instead of individually over a period of days or weeks 😁

What will be the process for deprecating/redirecting/removing the old repos?

We can empty the repos, put a README file in describing the new location of the files and archive the repo. Over time no-one will look at them any more.

Are there any other downsides to contributors by doing this?

Not that I can think of. At least nothing that outweighs the gains we'll receive from reduced complexity and PR busywork.

I feel like we should just try it and be sure to retrospect on it.

I think this is key. We should continuously question our tools and practices and ask if they are making our lives harder or easier. Nothing is forever, we can always split the modules out again if it creates more problems than it solves.

@mikeal
Copy link
Contributor

mikeal commented Jan 8, 2020

Overall, +1

The only thing I’d like to see is a simple and clear explanation of why something would be in the monorepo and when it would not. There’s always going to be external dependencies that won’t be in the mono repo, some of which will probably still be in the IPFS org as their own repo, so providing clarity to everyone one why something is in or out would be super helpful.

the fact that the OSS community in the majority expects one repo per module

Should challenge this assumption. Repo per module is a fairly new phenomenon and doesn't seem at all settled as a best practice for all projects. It's super useful for some types of JS projects, but even that OSS sub-community is bumping up against the limitations of it as time goes by, as we're seeing here in our case. I wonder if Github has stats around this...

The characterization that “the majority” expect this is totally fair. Of the more than million modules in the npm ecosystem more than 99% of them have a 1-to-1 relationship between the package and a repo.

However, Vue.js and a couple other large and notable projects use a mono repo and as a result all of the most popular tooling that would assume this has been updated to handle it. This wasn’t the case a few years ago but it’s better now.

Monorepo is not typical or expected by any means, but anyone who has reached the point where a monorepo looks like a solution to their problems is at a size and complexity level where it’s hard to argue that a monorepo layout is going to confuse things more than they already are.

@autonome
Copy link
Contributor

autonome commented Jan 8, 2020

in the NPM ecosystem

Important qualifying words. NPM isn't yet the entire open source software universe, far from it.

As Hamlet said:

"There are more things in heaven and earth, Horatio, than are dreamt of in your repository."

Yeah, +1 to this either way:

anyone who has reached the point where a monorepo looks like a solution to
their problems is at a size and complexity level where it’s hard to argue
that a monorepo layout is going to confuse things more than they already are.

@mikeal
Copy link
Contributor

mikeal commented Jan 8, 2020

This is a JS project, JS developers (backend and frontend) use npm and the modules there 😁

Monorepo’s are quite common in Go but I don’t know of any other ecosystems where they are the norm.

@achingbrain
Copy link
Member Author

If you're looking for examples, Maven has had multi-module projects pretty much forever.

Anyway, once 0.41.0 is out of the door would be a good time to do this. My approach will be to merge ipfs-mfs and ipfs-multipart into core, then create the monorepo with js-ipfs, js-ipfs-http-client and interface-js-ipfs-core as packages. Future changes would then be to split the cli and http components out of core.

We might keep the markdown api docs in interface-js-ipfs-core for the time being but eventually we should generate these from the code. I think @hugomrdias and @alanshaw have both done a bunch of typescript related experiments around this.

@achingbrain
Copy link
Member Author

We have merged these repos, so I'm going to close this issue. We should validate the decision/see what we can improve next time we do a retro.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants