Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CalculateChecksums hangs with mutually referencing objects. #1692

Open
bobbens opened this issue Apr 4, 2024 · 7 comments · Fixed by #1693
Open

CalculateChecksums hangs with mutually referencing objects. #1692

bobbens opened this issue Apr 4, 2024 · 7 comments · Fixed by #1693

Comments

@bobbens
Copy link

bobbens commented Apr 4, 2024

Steps to reproduce

  1. In the Rules file preprocess section. create items that reference each other. Can just be attributes like foo[:ref] = bar and bar[:ref] = foo where foo and bar are two items.
  2. Try to compile.

Expected behavior

Compiles normally.

Actual behavior

It hangs on CalculateChecksums.

Details

It's probably hitting an infinite lop trying to calculate the checksum going back and forth from foo and `bar.

I'm trying to display complex data of multiple types that reference each other for ease of access. I was trying to do this directly with the items so that it embeds much neater with nanoc. All the data is being pulled from xml files being embedded as file meta-data. I wanted to avoid using some sort of external dictionary and having to do look ups all the time, however, given that mutual references deadlock the compilation process, it seems I'll have to rethink the approach.

If supporting this is out of the scope of nanoc, it would be nice to at least have it try to detect the looping and spit an error or the likes. Thanks!

Crash log

Not crashing, so no crash log.

@bobbens
Copy link
Author

bobbens commented Apr 4, 2024

I tried to create a minimum example and am not reproducing the issue. It is likely to be somewhere else. Sorry for the noise.

@bobbens bobbens closed this as completed Apr 4, 2024
@denisdefreyne
Copy link
Member

Objects that reference each other should be handled by the checksummer correctly (test cases for it exist too).

In general, I stick to primitive data types (strings, arrays, hashes, numbers, booleans and so on) in item metadata, and it might help you in this situation to store item identifiers, rather than items. For example, rather than

@item[:parent_item] = other_item

you would do this:

@item[:parent_item_identifier] = other_item.identifier

You can then look up items by identifier (e.g. @items[@item[:parent_item_identifier]]) which is quite fast.

I started typing this reply as you closed it, but hopefully this advice is still useful!

@bobbens
Copy link
Author

bobbens commented Apr 4, 2024

@denisdefreyne Thanks!
OK, I've looked into it more. I'm not sure if it's worth opening for this, but I managed to get a reproducible case. I've attached below.

lore.zip

I've been playing around a lot with the code to try to figure out why it's slowing down. At the end, it seems to be related to using complex types in attributes and referencing other items. It always stops at CalculateChecksums.

I've prepared the following Rules that reproduce the issue. The difference between them is minimal.

Rules.good -> compiles
Rules.bad -> basically hardlocks

Only change is below:

57c57
<             s[:jumps].push( jmp[:"+@target"] )
---
>             s[:jumps].push( nametossys[ jmp[:"+@target"] ] )

Basically it is storing items instead of names (basic types), and this makes it take forever (waited like an hour without it finishing), compared to <10 seconds for checksums when not using items. It may be worth looking into why there's this incredible slowdown. However, since your approach of using identifiers and then doing Hash lookups seems to work, I'll just use that for now, although it makes the code a bit less elegant. Thanks!

@denisdefreyne
Copy link
Member

Well, now I want to play naev… 🙃

I’ll take a look!

@denisdefreyne
Copy link
Member

A fix for this isn’t trivial, but at least I’ve figured out what the cause is now: each item is checksummed individually, but when items reference each other directy, then that means calculating the checksum for much more (including all contextual site information that is embedded in item references), which is why this takes so long.

A good fix, I think, would be to reuse the Checksummer instance, so that it can reuse already-calculated checksums. #1693 is a step in the right direction, but much more to go.

@bobbens
Copy link
Author

bobbens commented Apr 5, 2024

Sounds great! Looking forward to the fixes.

Also a side note, but I noticed that the check-summing doesn't seem to be threaded. (at least it was getting stuck with 100% CPU usage here). That may be another point of improvement.

@denisdefreyne
Copy link
Member

Fixing this is tougher than I expected and a fix might not be coming soon. But storing item identifiers rather than items will still work well.

Also a side note, but I noticed that the check-summing doesn't seem to be threaded. (at least it was getting stuck with 100% CPU usage here). That may be another point of improvement.

Unfortunately, parallelism in Ruby is not practically possible. The only option is using Ractor, which is too experimental to use in release settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants