Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove generated build artifacts from git repository #1065

Open
alerque opened this issue Jul 12, 2019 · 13 comments
Open

Remove generated build artifacts from git repository #1065

alerque opened this issue Jul 12, 2019 · 13 comments
Assignees
Labels
A-WISH Some kind of improvement request, hare-brained proposal, or plea.

Comments

@alerque
Copy link
Collaborator

alerque commented Jul 12, 2019

This is more of a note to self rather than a request for anybody else to do this work so feel free to assign this issue to me.

There is a ton of duplicated content committed to the repository in files that are just format conversions of each-other (largely using Pandoc). These should ideally be stripped and the build process should make them as a matter of course. This may include generating a source tarball generator that includes these initial build steps (in addition to one that has fully built binaries) or setting up CI artifacts somewhere predictable with the extra formats available for packagers that don't want to have the whole build toolchain.

@alerque alerque added the A-WISH Some kind of improvement request, hare-brained proposal, or plea. label Jul 12, 2019
@schoettl
Copy link
Collaborator

schoettl commented Jul 12, 2019

But please do not remove the (Make-generated) shell completion scripts (under shell-completion/).

I propose that the Makefile is not run at every built but rather manually when the CLI changes.

-- from documentation

Because it takes a noticable time to generate the completions. hledger is executed with --help multiple times (e.g. for all sub-commands).

Also, they are not generated from the root Makefile.

@alerque
Copy link
Collaborator Author

alerque commented Jul 12, 2019

@schoettl Please excuse me if I'm being dense, but in my experience with make (which is actually quite a bit, but certainly not exhaustive) the entire goal is to have a chain of rules that understand exactly when something needs to be regenerated (and when it doesn't) based on what source files have changed vs. what objects are created in the end.

I don't understand either why having or not having generated artifacts committed to the git repository would have any effect at all on the situation your describing.

@schoettl
Copy link
Collaborator

The only make rule, I can think of, that takes dependencies into account, is "Generate the completion scripts whenever hledger binary has changed."

But actually (as quoted above), the scripts should only be generated, when the CLI changes (i.e. the options and sub-commands). Because this step takes so much time, it should not be executed on every hledger binary change.

For me, generating the completion scripts is more like a version bump.

And shell-completion/Makefile also uses external programs (hledger, and GNU tools, that are maybe not POSIX compliant or installed). So if you really have to delete the all artifacts, than shell completion needs much attention!

Otherwise, which make target do you propose?

@alerque
Copy link
Collaborator Author

alerque commented Jul 14, 2019

@schoettl As best I can make out I think you're conflating two different issues. I don't think what I'm proposing here will have a substantial impact on the issue you are raising. I'm not exactly sure what piece you're missing, but maybe some of these comments will help sort out what is what.

  • Having generated artifacts in the git repository or not will not change what needs to be built when. The rules for what files need to be regenerated when other files change would not be affected at all by this change. Even if a prebuilt artifact exists in the repository now, it still goes out of date and needs to be regenerated when other files change. The build process does not take into account whether a file is tracked by the repository or not, only its state on disk.

  • Usage of a properly configured build system involves naming the final product that you are interested in. The build rules are responsible for figuring out what has to happen to get that final product, i.e. what needs to be refreshed along the way vs. what can be reused. If your goal is to work on the completion routines, then, then they will in fact become out of date whenever the upstream program changes. (This could be improved by splitting the help output into different file than the rest of the code, but that's not part of the scope of this issue.) This will not be affected by whether or not the artifacts are tracked in the repository except on the first clone of the repository. Tracking the build artifacts means people can skip the build step entirely if they don't change any code. Every subsequent build run will take exactly the same amount of time.

    If your goal is not to work on the completions and just hack on the binary, you don't need to invoke the target that generates those artifacts, just call the target of the final component you actually need.

  • If saving having to build some particular artifact at all is the issue (i.e. you don't want to even have the full build chain setup) then you shouldn't be cloning the code repository, you should be downloading a tarball that has all the artifacts generated. This could include binaries, or just "sources" used by distro install systems to generate packages.

Hopefully something in there helps sort out what is what. If not I'd be interested in hearing what exactly the workflow is that you are concerned will be degraded by my proposal.

@schoettl
Copy link
Collaborator

Can I ask again, what build rule do you suggest?

If not I'd be interested in hearing what exactly the workflow is that you are concerned will be degraded by my proposal.

I want the shell completion to be promoted and used because they are new and very handy. If the artifacts are not committed, I see the danger that

  1. they are forgotten (because they are not generated as part of the root Makefile; can be fixed).
  2. they are not generated because dependencies of their build process fail – and no one cares about.
  3. they are wrongly generated because the Makefile use the installed (older?) version of hledger, not the newly compiled hledger.
  4. people are reluctant to call the make target that also generates completions because it takes so much time. In the worst case, completions would not be packaged for distros.

In addition, as long the completions are not packaged for distros, they should be downloadable for users. Users should not have to generate them.

Where is the tarball you mentioned? Where can it be downloaded and where is it generated? I don't know if the completion scripts are in there.

@simonmichael
Copy link
Owner

simonmichael commented Jul 14, 2019 via email

@alerque
Copy link
Collaborator Author

alerque commented Jul 14, 2019

@schoettl I wasn't actually proposing a build rule per-se. Those should already all exist. This issue was only to propose removing build artifacts from git where they cause a lot of duplication, clutter, and extraneous commits that make the development process harder to follow — something that is almost entirely orthogonal to how the build rules are configured. This is about where and how artifacts are stored, not how and when they are built. That being said it's apparent there are likely issues with build rules as well that may get fixed along the way, but what I'm trying to say is that these aren't really part of the same issue.

About your workflow points...

  1. Updated completion scripts should always be part of the "total package" build target, whatever that is, and also part of whatever "install" command is used for system installation (maybe not user local, but definitely system). This is usually where distro package management picks them up, so this issue needs to be fixed by placing the build targets. With correct targets, it will not be affected at all by not tracking the artifacts in git.

  2. If the build process fails and nobody cares about it ... I'm sorry but that's somebody else problem. Our job is to make the build process work and work as easily as reasonably possible. Distro package maintainers should always care, but we can't fix other people not caring. Again if setting up the required build toolchain is too much trouble for somebody then they should be using some kind of pre-built package, not building it. We can offer source tarballs in addition to binary ones that include some build artifacts not tracked in the repository. All I'm suggesting is that tracking build artifacts is solving the problem with the wrong tool.

  3. Again, this is a case that should be tracked and fixed in a separate issue. By this logic we should just commit the hledger binary to the repository too ;-) If we can't trust the build system to track dependencies properly, it needs to be fixed. Having somebody build and add artifacts to git and expect them to manually keep them up to date is not the right solution.

  4. Build time is barely a consideration for distro packaging. If the build rules do what they are supposed to the package will get built, and this is not really a hold up for anybody. If distro packages don't have all the files they are supposed to be distributing as part of the install, then bug reports should be filed or fixes contributed so they do. An example would be Arch Linux currently does not place the man files properly (see Arch Linux man page missing #1050). That's a bug in the packaging script. Those sort of things happen even when intermediate artifacts are tracked. What I'm trying to say is that tracking artifacts isn't the solution to this problem.

The "source" tarballs with source format conversions as required by some packaging systems that have no "build" mechanism is not something that exists yet, I'm proposing fixing that.

@alerque
Copy link
Collaborator Author

alerque commented Jul 15, 2019

Because it takes a noticable time to generate the completions.

I'm seeing about 1.2 seconds run time for make all after make clean using the current Makefile setup. Is this the delay you were referring to @schoettl or is there some other step that I'm missing that is the holdup you were worried about?

If there are any relevant concerns that I'm missing please feel free to note them on the dedicated issue I just opened.

@schoettl
Copy link
Collaborator

Did you refer to the Makefile in shell-completion/? At my computer it takes more than 8 seconds.

make clean && time make all

For packaging or installing, sure a few seconds are OK.

@alerque
Copy link
Collaborator Author

alerque commented Jul 15, 2019

I'm a little surprised at that number @schoettl as my test system is running a budget level CPU released in 2012, and the system built around it is pretty pedestrian by today's standards. I sounds like there is probably something outright wrong going on there, but I'm not sure where to start. There appear to be ~40 invocations of the hledger binary's help output (in batches of 8) and a few (too many) invocations of sed going on for each. Even with the obvious inefficiencies going on there, 8 seconds is way too long to do that much work.

@schoettl
Copy link
Collaborator

I don't know. Arch Linux on ThinkPad with Intel core i5.

@alerque
Copy link
Collaborator Author

alerque commented Jul 15, 2019

@schoettl Can you perhaps try a very crude benchmark to see you have an unexpected bottleneck either with instantiating hledger itself or with doing so in parallel like this:

$ time (seq 1 100 | xargs -iX -n1 hledger > /dev/null)
18.03s user 1.22s system 96% cpu 19.881 total

$ time (seq 1 100 | parallel -j8 'hledger > /dev/null #{}')
22.11s user 1.42s system 553% cpu 4.252 total

Everything else I see other than these basic operations is a bunch of small shell and utility processes that tend to be pretty fast (but they do add up). I'm taking a guess, but I suspect your speed problem will manifest itself in one or both of those two operations and the other little bits just exacerbate the base operation being too slow.

@schoettl
Copy link
Collaborator

schoettl commented Jul 15, 2019

$ time (seq 1 100 | parallel -j8 'hledger > /dev/null #{}'); uptime
real	0m21.067s
user	1m17.648s
sys	0m1.872s
... load average: 2.77, 1.31, 0.80
$ time (seq 1 100 | xargs -iX -n1 hledger > /dev/null); uptime
real	0m30.810s
user	0m29.277s
sys	0m1.032s
... load average: 0.70, 0.82, 0.63

Both benchmarks make CPU fans being loud. And there is not so much difference...
Though my SSD has only 450 MB free space. Can that be a reason?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-WISH Some kind of improvement request, hare-brained proposal, or plea.
Projects
None yet
Development

No branches or pull requests

3 participants