Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make() with multiple targets very slow, seems to do too much checking #110

Closed
krlmlr opened this issue Oct 12, 2016 · 8 comments
Closed

Comments

@krlmlr
Copy link
Collaborator

krlmlr commented Oct 12, 2016

I don't have an all target, instead I run make(list_targets()). On a mid-size project with many interdependent targets (~4 GB overall file size) it takes about 4 minutes just to detect what needs to be done, even if the project is up to date. Checking a single target (with many indirect dependencies) is a lot faster.

For this particular case, I'd be fine if a missing all targets mean that by default all targets are built.

@richfitz
Copy link
Owner

yowch. There have been previous reports of slowness but I don't have anything to benchmark with. If you can profile your problem (or share it) it would be nice to get to the bottom of it.

I'm not surprised that there's too much checking though - the original version was written for correctness rather than speed and I never got around to actually trying to speed it up. I'd hazard a guess that the issue is that objects are being loaded into R when they can just be skipped over...

@krlmlr
Copy link
Collaborator Author

krlmlr commented Oct 12, 2016

I'll see if I can whip up a reprex.

@krlmlr
Copy link
Collaborator Author

krlmlr commented Oct 13, 2016

I think the main cause here is the hashing of file targets. Using check: exists should speed up things, but I'll keep this open because I think ideally targets should be checked only once per remake run.

@richfitz
Copy link
Owner

Right - so within a single run of remake there's no need to rehash them. That seems totally worth implementing.

We could probably go through further; files that are created by remake should never need to be hashed because we know the hash when we write it to the DB.

As a third related thing; hashing large files is big and expensive, but I wonder if it would not suffice in many situations to hash the first (or last 'n') bytes along with the size. I believe that reading the file (rather than the hashing itself) is the bottleneck here.

richfitz added a commit that referenced this issue Oct 19, 2016
Within a session, this should avoid recomputing file hashes where
the file size and mtime have not changed.  This exploits remake's
internal caching, which is not tested in this commit (and this
entirely lacks any sort of integration test).

For #110
@richfitz
Copy link
Owner

Can you try the hash_files branch and see if this helps at all?

@richfitz
Copy link
Owner

Hmm, not sure why appveyor didn't like that - might be something in the mocking...

@krlmlr
Copy link
Collaborator Author

krlmlr commented Oct 20, 2016

I have commented the PR., have you seen #126? #126 helps for single, but not for repeated make() calls.

@richfitz
Copy link
Owner

Yeah, that looked good - I just need to merge it :)

(I will get more up to speed on all of this over the next couple of weeks, just taking a bit of time to shift gears between projects)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants