Setup monorepo with lerna, account library integration #561

holgerd77 · 2019-07-25T08:58:30Z

Being in discussion for some time there is now an agreement to integrate the following libraries here by setting up a monorepo with lerna similar to the setup in the ethereumjs-config:

We'll start with integrating the account library due to its small size, low direct usage level and proximity to the VM logic, integration of the other libraries should be handled as a separate issue.

This has to go along with an update on the ethereumjs-config library (libraries) to be able to distribute the respective configurations to a monorepo-style consuming repository. This effort shouldn't be underestimated and will likely be as much work as the direct integration here.

The text was updated successfully, but these errors were encountered:

s1na · 2019-11-20T15:34:28Z

With the stable Istanbul being released, I think it's time to re-start this conversation. I had two specific discussion-points in mind, namely:

Do we keep the library names as is, or rename and use the scoped naming, e.g. @ethereumjs/account
Do we use different or the same version for all modules in the monorepo?

Re using the same version: I think this has some advantages. To achieve it we have 2 options as I see it. If we rename libraries we can safely reset the version. If we don't rename, we can upgrade all libraries to the biggest version among them (e.g. if VM has v5.x and all other v2.x, we upgrade all to v6)

holgerd77 · 2019-11-22T12:35:41Z

Yes, I also thought that it would be very much the time for the mono repo implementation. We can also very much do this step-by-step and on a first round e.g. keep everything "as is" so the VM integrating all the modules it currently does and just setup the outer mono-repo structure and maybe do a first bug fix release for testing the new structure (just a first thought, not sure if exactly this way makes sense, but generally: we can take on this one step after the other).

@s1na on your questions:

I would have very much the tendency to take on the occasion and switch to a scoped naming here, this would generally make very much sense for the whole EthereumJS org libraries (over time)
I lack very much the "feeling from experience" here. I would assume the biggest disadvantage here would be that one is enforcing e.g. major versions on all mono repo-associated libraries, even if there is no or just a subtle change in the library itself. This would very much affect single users of the respective library, since they eventually won't get a bug fix which would be otherwise be distributed by a separate bug fix release. Now people would be enforced to manually do the major version upgrade. How would you respond to this criticism?

s1na · 2019-11-25T10:24:37Z

I would assume the biggest disadvantage here would be that one is enforcing e.g. major versions on all mono repo-associated libraries

This would definitely be the downside. The only approach by which we can avoid it is to both keep the names (i.e. no scoped renaming) and the versions. I guess it comes down to a pro/con discussion. To bring a pro for using the same version I'd refer to the discussion in ethereumjs/ethereumjs-block#74.

@alcuadrado curious to hear your thoughts as well.

evertonfraga · 2019-11-26T21:52:15Z

For information and reference, I made a file structure (+ simplified dependency tree) of the monorepo proposal, as I get it:

Edit: Added ethereumjs-common to the diagram

evertonfraga · 2019-11-27T06:02:08Z

I'd like to add one more topic for discussion: what should be the ideal test structure for the monorepo?

Here's a brief analysis of the current situation:

	platform	# of jobs	latest job duration
ethereumjs-vm	circleci	11	13m 13s
ethereumjs-blockchain	travis	5	3m 6s
ethereumjs-block	travis	9	4m 12s
ethereumjs-tx	travis	9	3m 46s
ethereumjs-account	travis	5	3m 31s
ethereumjs-common	travis	4	1m 45s

Both CI platforms that we use still don't have a tailored support for monorepos. I see some approaches:

1. Build all 43 jobs, for every commit

This is the most straightforward way to go, but it is also resource-heavy. It strictly depends on a robust parallelization plan on the CI side, otherwise, we'd have an unpleasant job queue to cope with.

find out what's our current plan capacity.

2. Only trigger jobs for packages that had changes (👎)

In this scenario it would detect which package got changes and run only their jobs. That can be accomplished with a single test runner that uses a combo of lerna diff and CircleCI API to trigger a job.

The big no for me is that the tests would still be as siloed as they are today, where you need to release one package to see it failing on a downstream test.

3. Run tests by affected package

We'd use their explicit dependency relation to determine which jobs need to run. I still don't know how elegant this solution would be on the CI side, but I'd be happy to explore this path more.

How to read the image:

Changes to vm should run their own 11 jobs
Changes to block should run block (9), blockchain (5) and vm (11) = 25 total
Changes to both blockchain and accounts should run their own tests + vm

In any case, it seems primordial to migrate them over to circleci.

Edit: Added ethereumjs-common to the table and diagram

s1na · 2019-11-27T10:18:56Z

Great analysis @evertonfraga, and a good point!

I'd also go either with 1 or 3. But yeah as you mentioned as it stands 1 doesn't seem very feasible. I do think however it might be possible to reduce the runtime of some of the tests, making option 1 more feasible. E.g. #536 was a low-hanging fruit I found a while ago which improved runtime quite a bit.

I think it'd also make things a lot simpler if the projects were using a similar toolchain (and CI). This should be a good preparatory first step. ethereumjs-account can already started to be integrated here though I think.

ryanio · 2019-12-03T06:02:46Z

I like the idea of doing lerna in independent mode and discovered semantic-release.

holgerd77 · 2019-12-03T12:18:31Z

Ryan brought this up and so I just started thinking about further library integration here (also inspired by the visualization done above by @evertonfraga): in past discussions we had some broader consensus to not include the merkle-patricia-library here, I would also be skeptical about the ethereumjs-util library. However I think the Common and Testing libraries might be relatively indisputable candidates (Update: Testing shouldn't be included due to repo size, see comment here)?

Another topic: without further reflection on pros and cons on this I would also have a tendency to independent mode as suggested by @ryanio.

ryanio · 2019-12-03T16:43:51Z

In a monorepo world it might make sense to include util so everything remains under the same umbrella and we receive the full benefits of monorepo-ing.

Would it be too much to nest MPT under util?

alcuadrado · 2019-12-03T18:26:19Z

I'm really excited that this migration is happening 🤩

Do we keep the library names as is, or rename and use the scoped naming, e.g. @ethereumjs/account

Renaming the packages will prevent users to get updates. If I'm using, for example, ethereumjs-tx and bug fixes start to be published as @ethereumjs/tx, I won't get them.

This would only happen once, so it may not be that important.

Do we use different or the same version for all modules in the monorepo?

I think I'd use different versions with independent mode, as @ryanio suggested.

@holgerd77's argument about unnecessary major versions and bug fixes is similar to what I just wrote about renaming the packages, but it would be much more frequent so way more important. That alone is enough to use independent mode IMO.

I understand your motivation and share your concern, @s1na . Keeping separate versions will probably lead to situations like ethereumjs/ethereumjs-block#74 again.

I believe the root of the problem is that npm dependencies are handled incorrectly in most of the libraries. Things like -account, -block, -blockchain, -tx, bn.js, and possibly others should be peer dependencies most of the time. I really like this article that explains when peer dependencies are needed. I think we all had a short conversation about this in Osaka.

Changing the libs to peer dependencies would be a big breaking change, so I'd keep it outside of the scope of the migration.

Great job @evertonfraga ! I think option 3 is the most interesting, but it may be hard to implement. @ryanio's PR using Github Actions on the -blockchain repo shows some very promising performance improvements, so I wonder if using actions to run all jobs in the monorepo on every PR won't be performant enough.

alcuadrado · 2019-12-03T18:29:38Z

We can also very much do this step-by-step and on a first round e.g. keep everything "as is" so the VM integrating all the modules it currently does and just setup the outer mono-repo structure and maybe do a first bug fix release for testing the new structure (just a first thought, not sure if exactly this way makes sense, but generally: we can take on this one step after the other).

Also, I think this is a very good way to approach this. There are lots of things to migrate and config, so doing it gradually makes sense to me.

holgerd77 · 2019-12-04T10:56:38Z

Ok just a summary on how the consensus seem to form on how we proceed here (correct me or continue the discussion if you have strong opinions against any of the points):

We use independent mode as Lerna setup and keep on with current version numbers
We do the renaming to scoped packages like @ethereumjs/vm
We use GitHub actions as a unified CI tool (is this already consensus?)
We do a gradual step-by-step integration, first PR on this would be as described above (only VM, outer structure)
For testing in doubt we can start with 1. (build all jobs) to keep things simple and expand on this later on further package inclusion
Rename all git tags to avoid naming conflicts

Anything I forgot (feel free to directly edit the post)? 😀

evertonfraga · 2019-12-04T14:55:22Z

My assessment to Github Actions: 👍

Pros

Performance really seems to be superior.
Actions (or jobs) in separate files is beneficial to a monorepo structure.
It has a marketplace with a growing community, so reusing bits is super easy.
It goes beyond the traditional workflow of (git push, run job). It can interact with PRs, and do a lot of things.
last, but not least, it is integrated to Github and the UI is more responsive than Travis and CircleCI.

Concerns and "non-concerns"

The ethereum org is still on the legacy billing structure, so they currently face a challenge there to use Actions. As our repos mostly live under EthereumJS, can we take advantage of the free builds for GH Actions?
I'm not too concerned with vendor lock-in, as there's a low cost to move over to any other CI tool.

alcuadrado · 2019-12-04T16:00:54Z

I wonder if the cache limits of github actions won't be a problem. They cache up to 2gb per repo, and node_modules can be huge. Mor info here

ryanio · 2019-12-04T16:29:31Z

Thanks @alcuadrado, that is indeed an important aspect to keep in mind for the evaluation.

I found this thread: actions/cache#6

It has some interesting discussion, including one user seeing success with running a 700mb node_modules and the cache compressing the archive. I’ll have to give the thread another closer look in relation to our specific needs to extract more conclusions.

I can try to investigate these limits further and if we may find them constraining.

2GB per repo sounds extremely low for a monorepo use case like ours, especially with [vm, account, block, blockchain, tx] all running their own checks in matrix across 4 node versions on every PR push.

From the top of your head, which ci steps / tests / builds / outputs / logs do you find being on the larger side?

I also found Persisting workflow data using artifacts.

ryanio · 2019-12-04T17:01:34Z

As a separate point to add to the list of considerations, how do we feel about semantic-release? I think it would be nice to have a fully automated release process.

https://github.com/atlassian/lerna-semantic-release

holgerd77 · 2019-12-04T19:59:53Z

@ryanio Can we have this as a separate topic, eventually just open a new issue on that? This can be discussed relatively separated from this issue (and also might need a bit longer and thoughtful discussion) and there is a bit the danger to overload the discussion here.

ryanio · 2019-12-04T20:43:34Z

@holgerd77 Yes great point, have started it here: #625

evertonfraga · 2019-12-04T21:38:24Z

As a hands-on way to understand the requirements to the migration, I started to make this shell script monorepify.sh.

This can be useful to rehearse the migration and get valuable information ahead of time (eg: would GitHub actions work with X jobs? What about their cache limit?). Also, this automation tool could be in fact used to perform the changes we need 😛.

The migration can be run for vm repo only (just comment line 11, might have some quirks), with any combination of the other candidates to the monorepo (edit line 11), or all of them (just run).

Roadmap

Edit: see this project for related tasks: https://github.com/orgs/ethereumjs/projects/1

I'll use my own fork as a playground, and you can inspect it here: https://github.com/evertonfraga/ethereumjs-vm/

evertonfraga · 2019-12-05T04:43:40Z

Good news: GitHub actions make it trivial to implement my proposed solution (3). We’ll have to declare some exclusion paths to each workflow file, so the irrelevant builds are not ran for each push.

https://help.github.com/en/actions/automating-your-workflow-with-github-actions/workflow-syntax-for-github-actions#example-ignoring-paths

alcuadrado · 2019-12-05T15:03:12Z

Nice script @evertonfraga! You may want to install lerna in one of its first steps, as it has this very handy command https://github.com/lerna/lerna/tree/master/commands/import

Should we create a new repo for the monorepo? This way we can leave all the different library-specific repos as they are now and archive them.

evertonfraga · 2019-12-05T20:50:24Z

Thanks, @alcuadrado! I was uncertain at first to use an importer like lerna's, as it rewrites all commits as if the files were always under the new file structure. But maybe it's the best way forward.

Re: new repo for it — I lean towards to keep ethereumjs-vm reputation (stars) and network (forks). It can be renamed to ethereumjs-monorepo in the end.

holgerd77 · 2019-12-05T22:26:10Z

On the caching question:

Just had a look at the VM node_modules folder, this is roughly 1 GB uncompressed and 95 MB compressed (which counts). This should be one of the significantly larger ones having ethereumjs-testing with its huge submodule folder as a dependency.

holgerd77 · 2019-12-05T22:33:46Z

What is with open PRs on the other repos (but eventually also on the VM?) containing a significant amount of work? These should likely (optimally) be merged before a transition (hope this is realistic)?

evertonfraga · 2019-12-06T05:12:22Z

PRs on the other repos
Sure, good point. This is something we should perform some manual tests and create a playbook of.

holgerd77 · 2019-12-06T11:38:59Z

Monorepo Transition PR Analysis

Some analysis of the currently open PRs on the repos.

Feel free to edit the post. I would suggest not to include the monorepo working PRs (e.g. on the GitHub actions like ethereumjs/ethereumjs-account#59) to not clutter the list too much.

Account

No significant open PRs.

Block

Muir Glacier PR, ,, @evertonfraga → Merge before or resubmit after transition
v3.0.0 Release, New release v3.0.0 ethereumjs-block#74, @holgerd77 → Merge before or resubmit

Blockchain

Test Refactor PR, refactor: test suite ethereumjs-blockchain#134, @ryanio → Merge before (better, nearly done) or resubmit

Transaction

Simplification Experiment, A possible simplification of this library ethereumjs-tx#154, @alcuadrado → open state, leave and ask for resubmission if picked up again
Test for verifySignature, Add test for verifySignature ethereumjs-tx#105, @fanatid → rather small PR, open state, leave and ask for resubmission if picked up again

Common

Muir Glacier, Adds definition for HF Muir Glacier ethereumjs-common#75, @evertonfraga, → Merge before or resubmit

VM

All recent open PRs (there are some which just need to be closed) are from @s1na or @alcuadrado → Can be asked for resubmission if picked up again

Summary: Ok, PR situation is actually easier to handle than I initially thought. Basically for the moment we are fine and there is some defined path to go for all open PRs. 😄

evertonfraga · 2019-12-10T19:42:43Z

Our goal is to have a smooth migration, without surprises or rushed changes because something was not considered first place. My approach is mostly programmatic, as IMO doing every step manually can be error-prone.

I am using a project board to keep track of all requirements to accomplish that. Please take a look and see if I am forgetting anything.

https://github.com/orgs/ethereumjs/projects/1

Feel free to add new cards, and if any of those are worth discussing, feel free to convert to an issue (Click on card … > Convert to Issue).

holgerd77 · 2019-12-11T12:05:53Z

@evertonfraga That's really cool, thanks for setting this up! 😄

evertonfraga · 2020-01-22T20:30:01Z

Hi all,

I need more context regarding the use of ts-node in our projects in order to proceed with the concerted test efforts using Lerna. I see there are some different ways to spawn the tests across the projects.

1. `account`, `block`, `common` and `tx`

They all pass the main test files to ts-node, naturally don't requiring dist/ files to be present

ts-node node_modules/tape/bin/tape ./test/index.ts

2. `blockchain`

It first builds dist/ files, then runs ts-node requiring modules from dist/.

npm run build && ts-node node_modules/tape/bin/tape ./test/index.ts

3. `vm`

I assume this is outdated code, since that currently the tests don't work without the --dist flag.

npm run build:dist && node --stack-size=1500 ./tests/tester --blockchain --dist

Now I wanted some feedback: should we normalize the libs to run with ts-node, without the need to build the files first?
I have to say that this would help quite a lot on the CI side.

alcuadrado · 2020-01-22T20:36:06Z

I've always found the vm setup somewhat frustrating, as I tend to forget to run build:dist, because I normally run all my tests with ts-node.

evertonfraga · 2020-01-23T16:27:02Z

Thanks for the feedback @alcuadrado.

The VM npm scripts used to let you test both the code and dist, with the —dist flag, but it lost that capability with the TS migration.

I’m taking the ts-node route that will restore that capability, and will help a lot with CI, so I shouldn’t need to move dist files across jobs for integration tests.

holgerd77 · 2020-01-30T11:41:26Z

Some update on Greenkeeper, this is activated for many repositories (see ethereumjs/ethereumjs-account#55 for an exemplary PR) to notify about dependency updates (just had a look at the setup, not in a consistent state though, some repos missing, some not activated):

First, monorepos seems to be supported.

To ease the migration I've now disabled Greenkeeper to stop opening new PRs, I will also close existing open PRs on the respective libraries. We can then re-activate once the transition is done (or eventually also considering switching to another tool, seems you also have some alternative solution here on the Grid side, don't see any pressure for a switch though in general and have no comparison overview. Very much a side topic I would say, but anyhow).

evertonfraga · 2020-02-03T22:10:21Z

We started using RenovateBot for Grid repositories. It had lots of granularity for configurations, but in the end I had no special need with the array of settings for it.

I had planned to implement Greenkeeper in the monorepo already, since it's already used by the target repository (ethereumjs-vm) and it supports multiple package.json. PRs can be grouped by every new dependency, which makes everything cleaner.

I have activated Greenkeeper today in our monorepo playground: https://github.com/evertonfraga/ethereumjs-vm/

Mind that Readme linked above is not yet complete, but it's already a good opportunity for feedback.

holgerd77 · 2020-02-04T12:02:52Z

@evertonfraga whuah, have to admit that I didn't look at this monorepo playground yet you created. This already looks fantastic! 😄

holgerd77 · 2020-02-04T12:04:37Z

"This branch is 1305 commits ahead, 31 commits behind ethereumjs:master."

Lol. 🤪

holgerd77 mentioned this issue Jul 25, 2019

Update libraries to handle monorepo consumers ethereumjs/ethereumjs-config#24

Closed

evertonfraga mentioned this issue Jan 22, 2020

Makes building files unnecessary to run tests ethereumjs/ethereumjs-blockchain#143

Merged

ryanio mentioned this issue Jan 30, 2020

Refactor test suite ethereumjs/ethereumjs-util#231

Merged

evertonfraga mentioned this issue Feb 25, 2020

Turning the ethereumjs-vm into a monorepo #666

Merged

holgerd77 closed this as completed in #666 Feb 27, 2020

evertonfraga added the package: vm label Feb 27, 2020

This was referenced Jun 29, 2020

VM Tooling Analysis and general housekeeping #723

Closed

CI concurrency #804

Merged

This was referenced Aug 24, 2020

Monorepo clusters / Testing integration of connected libraries ethereumjs/organization#48

Closed

Project suggestion: create monorepo for blockchain data schemes ethereumjs/organization#19

Closed

Monorepo: git tag format ethereum/js-team-organization#8

Closed

Setup monorepo with lerna, account library integration #561

Setup monorepo with lerna, account library integration #561

Comments

holgerd77 commented Jul 25, 2019

s1na commented Nov 20, 2019

holgerd77 commented Nov 22, 2019

s1na commented Nov 25, 2019

evertonfraga commented Nov 26, 2019 • edited Loading

evertonfraga commented Nov 27, 2019 • edited Loading

1. Build all 43 jobs, for every commit

2. Only trigger jobs for packages that had changes (👎)

3. Run tests by affected package

s1na commented Nov 27, 2019

ryanio commented Dec 3, 2019

holgerd77 commented Dec 3, 2019 • edited Loading

ryanio commented Dec 3, 2019

alcuadrado commented Dec 3, 2019

alcuadrado commented Dec 3, 2019

holgerd77 commented Dec 4, 2019 • edited by evertonfraga Loading

evertonfraga commented Dec 4, 2019 • edited Loading

Pros

Concerns and "non-concerns"

alcuadrado commented Dec 4, 2019

ryanio commented Dec 4, 2019 • edited Loading

ryanio commented Dec 4, 2019

holgerd77 commented Dec 4, 2019

ryanio commented Dec 4, 2019

evertonfraga commented Dec 4, 2019 • edited Loading

Roadmap

evertonfraga commented Dec 5, 2019

alcuadrado commented Dec 5, 2019

evertonfraga commented Dec 5, 2019

holgerd77 commented Dec 5, 2019 • edited Loading

holgerd77 commented Dec 5, 2019

evertonfraga commented Dec 6, 2019

holgerd77 commented Dec 6, 2019

Monorepo Transition PR Analysis

Account

Block

Blockchain

Transaction

Common

VM

evertonfraga commented Dec 10, 2019 • edited Loading

holgerd77 commented Dec 11, 2019

evertonfraga commented Jan 22, 2020

1. account, block, common and tx

2. blockchain

3. vm

alcuadrado commented Jan 22, 2020 • edited Loading

evertonfraga commented Jan 23, 2020

holgerd77 commented Jan 30, 2020

evertonfraga commented Feb 3, 2020 • edited Loading

holgerd77 commented Feb 4, 2020

holgerd77 commented Feb 4, 2020

evertonfraga commented Nov 26, 2019 •

edited

Loading

evertonfraga commented Nov 27, 2019 •

edited

Loading

holgerd77 commented Dec 3, 2019 •

edited

Loading

holgerd77 commented Dec 4, 2019 •

edited by evertonfraga

Loading

evertonfraga commented Dec 4, 2019 •

edited

Loading

ryanio commented Dec 4, 2019 •

edited

Loading

evertonfraga commented Dec 4, 2019 •

edited

Loading

holgerd77 commented Dec 5, 2019 •

edited

Loading

evertonfraga commented Dec 10, 2019 •

edited

Loading

1. `account`, `block`, `common` and `tx`

2. `blockchain`

3. `vm`

alcuadrado commented Jan 22, 2020 •

edited

Loading

evertonfraga commented Feb 3, 2020 •

edited

Loading