Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read .beam files from cached paths in the client #7981

Merged
merged 1 commit into from
Jan 25, 2024

Conversation

josevalim
Copy link
Contributor

@josevalim josevalim commented Dec 27, 2023

This reduces the amount of work on the code server
and erl_prim_loader, by loading data on the client
whenever possible. This speeds up compilation of
Livebook from 7.2s to 6.9s (roughly 4%) and should
keep fewer binary references around.

This is done by introducing erl_prim_loader:read_file/1,
which stays closer to Erlang's file reading semantics
and does not rely on erl_prim_loader code paths.
erl_prim_loader:get_file/1 and erl_prim_loader:get_path/0
should be considered deprecated.

Future work should fully remove the concept of paths
from erl_prim_loader and move them into init.

Copy link
Contributor

github-actions bot commented Dec 27, 2023

CT Test Results

No tests were run for this PR. This is either because the build failed, or the PR is based on a branch without GH actions tests configured.

Results for commit de9942e

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@josevalim josevalim changed the title Read .beam files for cache paths in the client Read .beam files from cached paths in the client Dec 27, 2023
erlang:display({?MODULE,file_error}),
erlang:display(Report)
end,
error;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to this function is just a refactoring to remove a case catch.

@jhogberg jhogberg self-assigned this Dec 28, 2023
@jhogberg jhogberg added the team:VM Assigned to OTP team VM label Dec 28, 2023
@jhogberg
Copy link
Contributor

Thanks for the PR, I'll add it to the nightly runs after new year's eve. :)

@josevalim
Copy link
Contributor Author

Thank you and happy holidays! 🍾

This is a note to myself on future work:

  1. After this is merged, move get_modules to fully work on the client (and spawn processes on the client)
  2. Once Erlang/OTP 27 is out, remove get_path and get_file (assuming Remove erl_prim_loader fallback on module lookup #7697 is merged)

@jhogberg
Copy link
Contributor

jhogberg commented Jan 8, 2024

code_SUITE:set_path/1 in the kernel tests seems to fail across the board with this change :(

@josevalim
Copy link
Contributor Author

josevalim commented Jan 9, 2024

@jhogberg apologies, I have pushed a fix. It was actually a silly failure. This line was failing:

{error, bad_directory} = (catch code:set_path([{a}])),

As you can see, it is pushing a bad path to the code server. However, instead of discarding invalid code paths, the code_server was passing an invalid argument to erl_prim_loader. This worked because erl_prim_loader wraps all function calls in a ?SAFE macro:

-define(SAFE2(Expr, State),
fun() ->
case catch Expr of
{'EXIT',XXXReason} -> {{error,XXXReason}, State};
XXXRes -> XXXRes
end
end()).

When I moved the actions to the client, I did not add the ?SAFE macro for client actions (and, honestly, I don't think we should), so now we got a failure.

I understand why we want to avoid these essential processes from crashing but I think it is a bad practice for the code_server to assume it can send garbage to erl_prim_loader and it will be fine. The proper fix is to validate data at the boundary, and that's exactly what I did. I force-pushed a change that moves path normalization to code.erl.

In a future PR, I will push more guards to erl_prim_loader, so we can catch other places that are sending garbage to erl_prim_loader.

PS: If you want me to add ?SAFE call around client code, I can do it, but I would prefer to add guards and lock its API down. :)

@jhogberg
Copy link
Contributor

jhogberg commented Jan 9, 2024

Thanks!

In a future PR, I will push more guards to erl_prim_loader, so we can catch other places that are sending garbage to erl_prim_loader.

Sounds good.

end;
false ->
{{error, bad_directory}, Path}
Dir = filename:join([Dir0]), % Normalize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this already done in the client in normalize_paths?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really:

1> filename:join(["foo//bar"]).
"foo/bar"

We could move it to normalize paths though and do more work on the client (it is on my to do list), but I am being very conservative with these changes. For this PR, I tried to change the code_server the minimum necessary to support the changes to erl_prim_loader.

I also want to revise all uses of filename:* in code_server, because I think there is some duplication too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That definitely makes sense

@@ -1241,23 +1209,23 @@ loader_down(#state{loading = Loading0} = St, {Mod, Bin, FName}) ->
St
end.

mod_to_bin([{Dir, nocache}|Tail], ModFile, Acc) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this code, I think we should split the cache list from the path list.
In this function, every time we lookup a module we update the entire codepath - this might be quite a lot of work and extra allocations.
I think it would be quite beneficial for performance to split out the Cache as a map from the code path - this way once the cache is warmed up, this lookups will do no to little allocations.
This can be done as a follow-up PR, though, since this design of updating the codepath data-structure on every module lookup is present today already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting!

I think it would be quite beneficial for performance to split out the Cache as a map from the code path - this way once the cache is warmed up, this lookups will do no to little allocations.

Would you have the map keys be the code paths? Then lookup is a bit more expensive. But I guess we could also build unique integers to associate them?

Copy link
Contributor

@michalmuskala michalmuskala Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right this could have some implications of lookup time.
One option could be to maintain two parallel lists, and updating the cache only if an actual update happened - it would have similar performance characteristics as today, but would not retain the updated list if no updates actually happened (which should be most of the time), but would still produce some garbage on every lookup.

With a map, one approach could be to "intern" the paths by converting them to atoms, though I'm not sure if there are any code path length implications on that, and atoms sometimes have a poor hash distribution, which might impact lookup performance. Another, as you suggested could be some manual interning and using the integer position in the code path list as the key to the map.

It might also be worth it to explore using pdict as the storage - traditionally it's the fastest one, though a bit ugly & questionable to use, but might make sense here.

Copy link
Contributor Author

@josevalim josevalim Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another, as you suggested could be some manual interning and using the integer position in the code path list as the key to the map.

I was thinking I could simply have an always incrementing integer, this way there is no need to rehash as code paths are added or removed. :) So the code path is a tuple {Path, IntKey} and we have a separate map.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense as well

@@ -1280,6 +1246,12 @@ with_cache(cache, Dir, ModFile) ->
with_cache(Cache, _Dir, ModFile) when is_map(Cache) ->
{is_map_key(ModFile, Cache), Cache}.

absname_when_relative(File) ->
case filename:pathtype(File) of
absolute -> File;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes the absolute clause in the absname/2 function dead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I want to look at this later, this is just to avoid duplication. I am taking notes, if you have more suggestions, let me know. :)

@jhogberg
Copy link
Contributor

There's a few more failed testcases that appear related to this PR: much of release_handler_SUITE in sasl, and otp_SUITE:test_runtime_dependencies_versions/1. Sorry about taking so long to get back :(

This reduces the amount of work on the code server
and erl_prim_loader, by loading data on the client
whenever possible.

This is done by introducing erl_prim_loader:read_file/1,
which stays closer to Erlang's file reading semantics.

Future work should fully remove the concept of paths
from erl_prim_loader and move them into init.
@josevalim
Copy link
Contributor Author

I did fix one failure (same root cause, passing invalid values to erl_prim_loader).

However, @jhogberg, do you still have the full report of the failures? release_handler_SUITE fails on my machine even without this patch, so I am not sure there are more failures lurking behind. And test_runtime_dependencies_versions is skipped on machine with reason "this test case is designed to run in the Erlang/OTP teams test system for daily". If you have the full report, I can double check, but it may be good now.

@bjorng
Copy link
Contributor

bjorng commented Jan 22, 2024

release_handler_SUITE fails on my machine even without this patch, so I am not sure there are more failures lurking behind.

Did you run it with an installed Erlang system? (make release or make install) If you run it from the git repository there will be failed test cases because the directory structure is different from an installed system.

@josevalim
Copy link
Contributor Author

@bjorng ah, that may be it, I only did ./otp_build all -a && ./otp_build tests. But we should be good to go now, I have fixed the reports @jhogberg sent to me!

@jhogberg jhogberg merged commit de9942e into erlang:master Jan 25, 2024
11 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants