Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding a composer.lock file in the repository #1095

Closed
drupol opened this issue Jun 18, 2023 · 14 comments · Fixed by phpro/grumphp-shim#22
Closed

Consider adding a composer.lock file in the repository #1095

drupol opened this issue Jun 18, 2023 · 14 comments · Fixed by phpro/grumphp-shim#22
Labels

Comments

@drupol
Copy link
Contributor

drupol commented Jun 18, 2023

Hi,

I'm opening this PR to try to get the file composer.lock live in the repository and be part of the Grumphp sources.

Summary

The composer.lock file, though sometimes overlooked, is key for project reproducibility, performance, and security.

Rationale

Composer is the dependency manager for PHP. It allows you to declare the libraries your project depends on and it will manage (install/update) them for you. In a composer.json file, you specify the dependencies for your project. For example, you might need a certain library, and that library might need another library, and so on.

When you run composer install for the first time, it resolves dependencies and then downloads all of them and writes all of the packages and the exact versions of them that it downloaded to the composer.lock file, locking the project to those specific versions.

Committing this file in this repo is important for many reasons:

  1. Reproducibility: Reproducibility is one of the foundational aspects of software engineering. It ensures that all team members and the deployment system are using the exact same versions of the dependencies. This eliminates the potential issues that may arise from differences in versions used, such as function deprecations, changes in behavior, etc. Without the composer.lock file, running composer install would fetch the latest versions of the dependencies you defined in composer.json, which could cause inconsistencies between environments.

  2. Security: It helps ensure that you're using the versions of dependencies that you have tested and reviewed. Without a composer.lock file, there's a risk that a malicious person could find a vulnerability in one of your dependencies, fix that vulnerability, bump the version number, and if you run composer install or composer update, you could be pulling in un-reviewed/un-tested code since you do not control anymore the dependency resolution and delegate it to the host system.

  3. Speed: Having a composer.lock file speeds up the composer installation process. Composer no longer needs to resolve dependencies (which can take some time) and can just download the version of each dependency specified in the composer.lock.

Inclusion of composer.lock in this repository also promotes the following advantages (but I won't go into the details, this post is already too long):

  1. Less time spending in debugging,
  2. Increased reliability,
  3. Enhanced collaboration and scalability,
  4. Effective version control,
  5. Streamlined Continuous Integration and Continuous Deployment,
  6. Efficient auditing and compliance,
  7. Reliable scientific and data-driven applications.

Conclusion

While the composer.lock file may appear nonessential and is sometimes overlooked, its inclusion is critical for ensuring the reproducibility, speed, and security of your project. By including the composer.lock file in your project's repository, you are effectively shifting the responsibility of specifying correct dependencies from the end-users to the project's author.

Consequently, the author takes charge of determining the precise dependencies to install, sparing the users from this task. This approach streamlines the installation process and fosters a consistent project environment.

By including the composer.lock file in the repository, not only does it improve the development experience for contributors, but it also provides users with an assurance of quality, security, and reliability. The presence of the composer.lock file signifies that the project's dependencies are carefully managed and tested. It confirms that the project operates effectively with the locked versions of dependencies, thus reducing the risk of compatibility issues, unexpected behavior and bugs.

The composer.lock file ultimately serves as a testament to the project's stability. It safeguards users against potential risks associated with untested updates or patches in dependencies. Therefore, both contributors and users can benefit from the project author's meticulous dependency management reflected in the committed composer.lock file. This can enhance the overall confidence in the project's robustness and reliability.

Further Considerations

While Grumphp is developed in PHP, it can also be viewed as a binary, not solely a PHP library. Given this, it's imperative that it retains reproducibility across various environments. composer.lock is our most effective asset to ensure this within the PHP ecosystem. Considering the reasons outlined here, I strongly believe it deserves a place in the project.

What others are doing ?

It's often beneficial to consider practices in other projects. Composer, the PHP package manager itself, follows this practice. There are many more such examples in the ecosystem.

Application, binaries, libraries

A PHP application or binary is a standalone piece of software intended to be installed and run as is, and the composer.lock file should be included. This is because the application or binary needs to ensure that it is using the exact versions of its dependencies that it was developed and tested with. If the dependencies were to change unpredictably, the application could break. Therefore, the composer.lock file, which specifies the exact versions of the dependencies, is essential.

On the other hand, a PHP library is a piece of software intended to be included as a dependency of other projects. In this case, the composer.lock file is typically not included in version control. This is because the library should be compatible with a range of versions of its own dependencies. The library doesn't control the exact versions of its dependencies - the parent project that uses the library does. As such, the dependency resolving mechanism should indeed be handled by the project that requires the library, not the library itself.

Relation with Nix

Reproducibility is a hard requirement when bundling packages in Nix. To build a reproducible version of Grumphp, we need to ensure all inputs are accurately provided to Composer. We currently provide the composer.lock ourselves to ensure consistent use of the same dependencies, and I believe we should not do that. Making developers aware of the significance of this file would be beneficial.

Practical, hands on

To ensure reproducibility with a Grumphp build, containing composer.lock:

  1. git clone https://github.com/phpro/grumphp
  2. cd grumphp

This next step is mandatory to prevent Composer to generate random class names for the Autoloader:

  1. composer config autoloader-suffix FooBar

Let's install the required dependencies only:

  1. composer install --no-dev

The next command will compute a checksum of the grumphp directory containing now the vendor directory:

  1. cd -; dir=grumphp find "$dir" -type f -not -path '*/\.git/*' -exec sha256sum {} \; | LC_ALL=C sort -d | sha256sum; cd -

When the composer.lock is available, there is a guarantee of reproducibility, you could run this on any platform and you will always get the same checksum.

When the composer.lock is not available, the checksum might be different from a day to another, there is absolutely no guarantee of reproducibility.

@veewee veewee added the RFC label Jun 19, 2023
@ostrolucky
Copy link

There are lot of downsides of having composer.lock. Including composer.lock is internal decision for development process. Missing it is often easier because it doesn't cause conflicts, bloat diffs, commits or repo size and ensures always latest dependencies are installed. As for distribution, grumphp is distributing .phar file, that achieves your reproducibility goal of Nix and hence there is no need for composer.lock, is there?

As for cherry picked examples, I can provide counter examples as well. We removed composer.lock from Doctrine projects, because it was causing lot of problems. Especially if project needs to support multiple PHP versions and each version needs its own composer.lock file, this is very difficult to solve. And having it means all the PRs touching it are asking for conflicts that need to be resolved once any of them is merged.

@drupol
Copy link
Contributor Author

drupol commented Jun 19, 2023

Including composer.lock is internal decision for development process. Missing it is often easier because it doesn't cause conflicts, bloat diffs, commits or repo size and ensures always latest dependencies are installed.

While there can be downsides to including a composer.lock file (such as potential conflicts, bloated diffs, larger commits or increased repository size) these are generally outweighed by the benefits and things needs to be mitigated:

  1. Stability: Changes to the composer.lock file typically occur only when the composer.json file is modified. Considering the maturity of most projects, alterations to this file aren't a frequent occurrence, thus limiting the issues related to repository size and diff readability.

  2. Conflict Resolution: Developers deal with VCS conflicts as a part of their daily tasks. While having a composer.lock file could potentially introduce more conflicts, resolving these is a routine part of development work. Therefore, this should not be considered a major hurdle to including the lock file.

As for distribution, grumphp is distributing .phar file, that achieves your reproducibility goal of Nix and hence there is no need for composer.lock, is there?

Yes, Grumphp is distributing a .phar file which in theory should satisfy the reproducibility requirement. However, relying solely on a PHAR file might not be the best approach.

A PHAR file, while handy, does not offer transparency in its build process (there are existing cases!). Unless you manually inspect the PHAR, there is no way to ensure its security or the exact versions of dependencies it includes. Therefore, including a composer.lock file can still be beneficial for the sake of transparency, security, and control over dependencies.

Including a composer.lock file alongside the distribution of a PHAR provides a clear picture of the tested and approved state of the dependencies at the time of the PHAR's creation, which bolsters the overall confidence in the project's robustness and reliability.

@veewee
Copy link
Contributor

veewee commented Jun 19, 2023

@drupol Just a little brainspark : Since I am compiling the PHAR with a locked set of dependencies (based on lowest available PHP version), would it make sense to add that lock file, that is being created during PHAR generation, as an upload to the release (just like I provide the PHAR there)? I could also make sure it gets commited to the shim repository.

I am running some basic checks during deploy and PHAR generation, but I am not soing an in-depth security scan of the projects dependencies at the moment. I trust packages like doctrine and symfony to be well managed and security updates to be present in latest possible releases of those projects.

The arguments of @ostrolucky is what is keeping me back from adding the lock file to the codebase:
It adds a lot of maintenance overhead to the project. I am already dealing with quite some dependency hell overhead on a yearly basis (PHP upgrades, Symfony upgrades, tools upgrading, ...)
Your initial practical hands on does not cover : how can it handle multiple PHP versions. At a moment I was supporting about 4 PHP versions with dendencies that did only support 2 for example. This makes it impossible to create a single lock file over multiple PHP versions

Besides the different PHP versions which might require different dependencies, it also breaks the min / max dependencies checks that are built-in to the CI pipeline, since the dependencies are being locked. When I notice something breaks either in the min or mex dependency range, I am able to resolve it quite fast - because there is no lock file.

Don't get me wrong:
I agree with your points and they are valid, but there are a lot of counter arguments as well.
So I am looking for some sort of middle ground here that would solve your nix specific problems and which releases me from having to deal with dependency hell even more :)

@drupol
Copy link
Contributor Author

drupol commented Jun 19, 2023

I understand your concerns about the additional overhead and the complexities associated with managing the composer.lock file, especially in the context of supporting multiple PHP versions and maintaining the CI pipeline's min/max dependencies checks. It is indeed a tricky balancing act. However, I believe that these challenges can be mitigated and the benefits of a composer.lock file are worth the effort.

Regarding the issue of supporting multiple PHP versions, it's true that managing a single composer.lock file could be challenging. However, some projects manage this by having separate composer.lock files for different PHP versions. It could be an overhead, but it guarantees the integrity and reproducibility of the project across different environments.

As for the min/max dependency checks, while locking dependencies does seem to interfere with this mechanism, it is also a way to ensure that the software is tested against known-good versions of its dependencies. We can keep the process of updating dependencies controlled and deliberate, rather than allowing it to occur arbitrarily based on when a developer or the CI pipeline happens to run composer install.

I fully understand the fear of dependency hell, but the reality is that the composer.lock file can also serve as a tool to mitigate this issue. With a lock file, we control updates and handle them one at a time, rather than being forced to deal with a potential slew of potential breaking changes all at once.

Of course, finding a middle ground is the key here. Perhaps we could consider committing composer.lock but also maintain a rigorous process for periodically updating it and testing against different versions. This way, we can maintain the advantages of composer.lock while also keeping up-to-date with our dependencies?

There's no "one-size-fits-all" solution here, and the best approach might depend on the specifics of your project and team.

In fact, a viable middle ground could be to follow the approach adopted by the Composer project itself. They include composer.lock in their repository but exclude it when users download an archive from GitHub. They achieve this by adding an entry for composer.lock in their .gitattributes file. By keeping composer.lock in the repository, contributors who clone the repository can have a consistent and controlled development environment, ensuring reproducibility and enhancing security. At the same time, when a user or a project downloads the project as a dependency, the composer.lock file is excluded, allowing the downloading project's Composer to resolve and install the appropriate versions of the dependencies based on its own composer.json and the downloaded project's composer.json.

This approach can provide us the best of both worlds - it helps ensure a controlled, consistent development environment while also allowing for flexibility in dependency management when the project is used as a dependency itself. It can offer a solution that balances reproducibility, security, and maintainability.

What are your thoughts on this?

@veewee
Copy link
Contributor

veewee commented Jun 19, 2023

Regarding the issue of supporting multiple PHP versions, it's true that managing a single composer.lock file could be challenging. However, some projects manage this by having separate composer.lock files for different PHP versions. It could be an overhead, but it guarantees the integrity and reproducibility of the project across different environments.

Can you link me to one or some of those repositories so that I can take a look on how they manage this?
This would also mean that we e.g. need to ship a PHAR file per PHP version. I'm not sure if this kind of overhead is worth it. Because it now tripled the dependency hell for me :)

As for the min/max dependency checks, while locking dependencies does seem to interfere with this mechanism, it is also a way to ensure that the software is tested against known-good versions of its dependencies. We can keep the process of updating dependencies controlled and deliberate, rather than allowing it to occur arbitrarily based on when a developer or the CI pipeline happens to run composer install.

Now this is actually a thing I think of as a problem: Most of our installs (7,5m) are coming through the regular package's composer installation process. The shim version has a bit more than 1m installs.

Stats:

Since both are being installed through composer, it fit me no purpose to lock the dependencies during development : Since we allow PHP from 74 'til 8.2 various versions of GrumPHP (v1) and on a range of symfony components from 5.4 to 6.4 - I do want to get as much feedback as possible during development. I want to know if it crashes on the combination Symfony 5.4 - PHP 74 - composer1, because this feedback allows me to make sure this keeps on working on pretty much all possible supported combinations. This allows other developers than me to report and/or fix these kind of bugs for me. If I were to lock the dependencies down, they won't be able to reproduce nor fix the issue. (This was for example a very recent case in a bug that got introduced in amp/parallel a few weeks back: I wouldn't be able to detect this with locked dependencies)

Reproducibility here is a non-existing, since every project has its own set of (locked) dependencies.

Of course, finding a middle ground is the key here.

Finding a middle ground is exactly what I am attempting to figure out with this RFC. All opinions and approached are very much welcome!

Perhaps we could consider committing composer.lock but also maintain a rigorous process for periodically updating it and testing against different versions. This way, we can maintain the advantages of composer.lock while also keeping up-to-date with our dependencies?

Then we should at least:

  • validate against min deps
  • validate against max deps
  • validate against locked deps

on every PR / commit, on every platform, on every PHP version + introduce some auto-update lock file mechanism that is somehow PHP version aware. Not sure if this exists already somewhere? If you know of some, can you point me to a package that does this?

Again all very good points you make and I'm really in between visions here - but on the other hand, I can only add some kind of locking mechanism if it isn't a burdon for me to maintain and if it still results in us being able to fix dependency issues in a rather broad set of supported 3rd party packages on very short notice.

@drupol
Copy link
Contributor Author

drupol commented Jun 20, 2023

Can you link me to one or some of those repositories so that I can take a look on how they manage this?
This would also mean that we e.g. need to ship a PHAR file per PHP version. I'm not sure if this kind of overhead is worth it. Because it now tripled the dependency hell for me :)

I must admit, finding a concrete example of a project maintaining separate composer.lock files for different PHP versions proved more challenging than I initially expected. It seems to be a less common practice than I thought, or perhaps it's implemented in a different way that's not immediately visible. My apologies for the confusion.

As for the concerns you're raising, I think I understand where they're coming from. However, I believe the key difference here lies in the nature of the project. Composer itself is a package manager; its main responsibility is managing dependencies, which inherently includes dealing with different PHP versions and a multitude of packages. As such, Composer has mechanisms in place to handle these complexities.

When composer/composer is included as a dependency (in the require or require-dev section), its composer.lock file (if it exists) doesn't impact the depending project. This is because when a project is used as a library, its composer.lock file is ignored. The dependency resolution is based on the composer.json file of the depending project and the composer.json files of its dependencies.

Considering this, the strategy I suggested earlier still stands as a possible solution:

In fact, a viable middle ground could be to follow the approach adopted by the Composer project itself. They include composer.lock in their repository but exclude it when users download an archive from GitHub. They achieve this by adding an entry for composer.lock in their .gitattributes file.

By doing this, we maintain the composer.lock file in the repository for developers, ensuring a consistent environment and reproducibility. However, when Grumphp is downloaded as a dependency, its composer.lock file is not included (and even if it would be, it's not even needed for dependency resolution), allowing the dependency resolution to be determined by the project that's using Grumphp as a dependency. This way, we might be able to balance the benefits of the composer.lock file without adding too much overhead.

I hope this clears up some confusion. Let's continue discussing to find the best approach and also, feel free to let me know if I missed anything or if I'm totally wrong.

Thanks.

@veewee
Copy link
Contributor

veewee commented Jun 21, 2023

As for the concerns you're raising, I think I understand where they're coming from. However, I believe the key difference here lies in the nature of the project. Composer itself is a package manager; its main responsibility is managing dependencies, which inherently includes dealing with different PHP versions and a multitude of packages. As such, Composer has mechanisms in place to handle these complexities.

The suggested way of composer's installation is by downloading the binary directly from the internet.
Composer's lock file contains actual the dependencies that they have defined in their lock file.
This caused things like composer extensions (which grumphp acts like) to have been locked on an ancient version of symfony/process for ages. From their point-of-view, this results in a very stable and trustworthy package - which it is! Yet is causes problems for extensions who also want to use symfony/process from inside the composer extension - which is basically why this package provides a wrapper around ext-proc here.
So conclusion here : lock file is defenitely the way to go!

Another project you mentioned before is phpunit. Yet whilst searching their repository, I did not encounter any composer.lock file. In contrary: the build process explicetly removes it. Even though the suggested way is installation from their online binary.
I'dd say the nature of this package is more in line with phpunit:
You can download it's binary with locked dependencies - which is fine. But it also acts like a library since you need a specific set of core classes like the TestCase from which you can extend. These are PHP classes which are being leaked by PHPunit's autoloader on which you can extend your test-cases.

When composer/composer is included as a dependency (in the require or require-dev section), its composer.lock file (if it exists) doesn't impact the depending project. This is because when a project is used as a library, its composer.lock file is ignored. The dependency resolution is based on the composer.json file of the depending project and the composer.json files of its dependencies.

That is indeed how lock files work. The lock file would only be used for 1. people who want to hack on GrumPHP - or 2. by the nix build process you introduces in order to guard for reproducibility. As mentioned in previous comment, I don't really want option 1 to be true : I want them to work with latest or lowest dependencies so that we can detect issues early. Can we agree I'll have to go through a lot of pain for this 1 sole (at the moment rather limited) purposes?

That's why I suggested of shipping the lock file as part of the phar release generation process and link it as an attachment to the release instead in the first place: at the moment of phar generation, I have a lock file that works on minimum supported PHP version. There is no guarantuee though that it is guarded against security risks atm and that it works an ALL PHP versions (because of dependencies). Yet, with current set of dependencies - it should to the trick.

@Ocramius
Copy link

Ocramius commented Jun 21, 2023

Discussed in https://discord.com/channels/1090672946370052166/1091081297759309926/1121038709295435846 with @veewee

The problem of multiple PHP versions and one composer.lock

Practically, grumphp supporting multiple versions means that the base repository (this project) would require different composer.lock for each supported PHP minor version (because PHP breaks BC in every minor, that is the minimum necessary): that is not sustainable.

While I understand that @drupol wants to make a distribution of this project, this repo per-se is not really a distribution (the .phar is, though).

Non-phar distribution via Nix flake

@drupol could add a Nix flake based on one possible PHP version, acting as "one possible distribution". Assuming we can add "one possible distribution", we have following solutions:

  • the composer.lock could be put in something like .gitattributes, so that .zip and .tar.gz downloads via github don't include it
  • the composer.lock could be placed in a non-standard location, not affecting downstream consumers, but allowing @drupol to pick it up in their flake tooling

A note on current version support / potential simplification

Looking at 2.0.0, I noticed that the PHP version support has been restricted to "php": ">=8.1", which means that you currently only support PHP 8.1 and PHP 8.2.

Assuming that is a stable assumption, you could try and run CI against composer.lock for both PHP 8.1 and PHP 8.2, and say "we have one composer.lock" (occam's razor solution).

@veewee
Copy link
Contributor

veewee commented Jun 21, 2023

Just some small side-marks from my point of view:

which means that you currently only support PHP 8.1 and PHP 8.2.

It's a bit more complicated
We recently launced v2 which has a minimum versoin of PHP 8.1+ because of fibers.
Yet we still support v1 which supports PHP 8.0+ (or security fixes on PHP 7.4 on older minors)
I can see a lot of extra (merge) troubles there as well if we were to add and maintain lock files over all these branches.

Assuming that is a stable assumption, you could try and run CI against composer.lock for both PHP 8.1 and PHP 8.2, and say "we have one composer.lock" (occam's razor solution).

That's what we basically do right now for building the phar distribution: It installs the dependencies for lowest PHP version and assumes it works for upper PHP versions as well. The github actions does a very shallow check if this works.
So you could say that the phar's lock file could be saved somewhere in the repo and you could use that as a source for your nix flake.

@Ocramius
Copy link

Ocramius commented Jun 21, 2023

Yet we still support v1 which supports PHP 8.0+ (or security fixes on PHP 7.4 on older minors)

"Support" as in "fix bugs": do you really care about anything non-bug merged up? I wouldn't even introduce composer.lock in v1 anymore.

It installs the dependencies for lowest PHP version and assumes it works for upper PHP versions as well.

<insert nervous laughter here/> 😅

From what you say here, you already have what you'd call a "committed composer.lock", you just didn't formalize it, and it's implicitly part of your build and distribution process.

So you could say that the phar's lock file could be saved somewhere in the repo and you could use that as a source for your nix flake.

Yeah, I don't like .phar, but that's a discussion for another place/time.

I'd say that what @drupol will potentially be able to provide is both better and more stable at the same time.

@veewee
Copy link
Contributor

veewee commented Jun 21, 2023

@drupol taking the whole discussion into consideration, I believe this PR is the best thing I can do for you:
phpro/grumphp-shim#22

Together with the grumphp-shim package, we will ship an additional phar.composer.lock with the project that keeps track of the dependencies inside the PHAR distribution. This lock file can be used for guarding reproducibility if you wish to compile your own version of GrumPHP and is validated against all actively supported PHP versions by CI during release.

Your nix flake would now need to:

  • download release of grumphp's codebase
  • download the phar.composer.lock from the grumphp-shim codebase with the same release version and use that as a composer.lock file
  • Run the rest of the process.

I don't think this will change much to your current process, since you already point to an external lock file.

Can you find yourself in this solution?

@drupol
Copy link
Contributor Author

drupol commented Jun 21, 2023

@veewee @Ocramius thank you so much for your valuable inputs.

I have been thinking about this thread back and forth. Since Grumphp act as both a library and a binary, finding a balance for these distinct use-cases can indeed be challenging 😅
Also, I don't want to be "that guy" perceived as the one imposing requirements solely based on my own needs related to Nix, hence seeking a common ground is crucial.

As such, I believe the PR you're proposing provides a sufficient solution for now. While it may not be my favorite choice, it does contribute towards the goal of having the composer.lock managed by you, the package author and it certainly gets the job done.

The reason of all of this is because I'm working on quite a pull request within Nix, and I've been working on it for the past 2 months. Right now, making sure a Composer-based application can be "reproducable" in Nix is not so straightforward. Having a composer.lock file in the source repository is one of the hard requirements to achieve this.

A collateral bonus feature of this work is that all the projects distributed in Nix via a PHAR file will be replaced by their original source code counterpart. This won't change anything for the end user.

There are lots of PHP projects used as binaries that don't have a composer.lock file, and I have to add it myself. I don't think this is the best way to go about it. You can see an example of me doing this for Grumphp in this commit. That's why I'm planning to ask these project's authors to keep a composer.lock file in their repository.

Grumphp is the first project where I ask such a thing, but given the valuable inputs I received it, I will likely refine my approach and my request for the next project.

@veewee
Copy link
Contributor

veewee commented Jun 21, 2023

As such, I believe the PR you're proposing provides a sufficient solution for now. While it may not be my favorite choice, it does contribute towards the goal of having the composer.lock managed by you, the package author and it certainly gets the job done.

Nice, glad we agreed on a common ground and can continue collaborating in building amazing tools for the community. I've merged the PR and it will all work starting from the next release in the v2 range of GrumPHP. For now the lock file is already available in the branch of the grumphp-shim repo.

Grumphp is the first project where I ask such a thing, but given the valuable inputs I received it, I will likely refine my approach and my request for the next project.

Your work on nix looks amazing! I wish you good luck in persuading other packages as well.
Maybe one of those packages might introduce a better common ground - we can always reopen the discussion.
For now, I consider this solved :)

@drupol
Copy link
Contributor Author

drupol commented Jun 21, 2023

FYI, I just updated the Grumphp derivation, including YOUR composer.lock :) See the relevant commit.

Issue closed for me as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants