Skip to content

Conversation

marcandre
Copy link
Contributor

As discussed on the mailing list, I think this adds the right check by hashing the content, but I'm lacking tests.

Does it look like I'm on the right track?

@josevalim
Copy link
Member

@marcandre I propose we try a slightly different approach that does not require us to read all snapshots upfront. It can be made in two steps:

  1. First store the MD5 in the manifest. You will add a new field called md5 here and also bump the @manifest_vsn. Now for each file compiled, you compute and store the MD5 here: https://github.com/elixir-lang/elixir/blob/master/lib/mix/lib/mix/compilers/elixir.ex#L447 - you can expect the file to exist at this point

  2. Now in the line of code you removed in the other PR, you can compare the MD5, fetching it from disk if and only if the mtime is outdated and the size is the same

One thing to notice on step 2 is that we are comparing multiple mtimes. For example, imagine a Phoenix.View. It has several templates on the file system, so while the view is the same, we still need to recompile it if a template changes! So this check:

  for source(source: source, external: external, size: size, modules: modules) <-
        all_sources,
      {last_mtime, last_size} = Map.fetch!(sources_stats, source),
      times = Enum.map(external, &(sources_stats |> Map.fetch!(&1) |> elem(0))),
      size != last_size or Mix.Utils.stale?([last_mtime | times], [modified]) or
        Enum.any?(modules, &Map.has_key?(modules_to_recompile, &1)),
      do: source

Should become something like this:

  for source(source: source, external: external, size: size, modules: modules, md5: md5) <-
        all_sources,
      times = Enum.map(external, &(sources_stats |> Map.fetch!(&1) |> elem(0))),
      source_changed?(source_stats, source, md5, modified) or
        Mix.Utils.stale?(times, [modified]) or
        Enum.any?(modules, &Map.has_key?(modules_to_recompile, &1)),
      do: source

Where source_changed? has the size, mtime, and digest logic for the source file exclusively. It is important to not mix those. :)

@josevalim
Copy link
Member

Btw, a future step would be to keep the size and digests for those external files too, such as phoenix templates, but it can be a separate PR!

@marcandre
Copy link
Contributor Author

Great, thanks for these pointers 👍

defp digest(file) do
case File.read(file) do
{:ok, content} ->
:crypto.hash(:md5, content)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:crypto.hash(:md5, content)
:erlang.md5(content)

The difference is that :crypto is a separate application, that we would need to start and depend on, but luckily MD5 is part of Erlang too.

@marcandre
Copy link
Contributor Author

And yes, clearly, the hashes have to be written to the manifest...

@marcandre
Copy link
Contributor Author

I'll work on that tomorrow...

@josevalim
Copy link
Member

There is absolutely no rush. I plan to do a new Elixir patch release this week with the faster app loading and so it would be too soon to include this (I typically like to run with those changes for a couple weeks).

@marcandre marcandre force-pushed the differentiate_on_hash branch 4 times, most recently from 660bebf to 96b2167 Compare June 27, 2021 21:56
@marcandre marcandre force-pushed the differentiate_on_hash branch from 96b2167 to 3667606 Compare June 27, 2021 22:06
@marcandre
Copy link
Contributor Author

This should be more like it

@vans163
Copy link
Contributor

vans163 commented Jun 28, 2021

@marcandre I propose we try a slightly different approach that does not require us to read all snapshots upfront. It can be made in two steps:

  1. First store the MD5 in the manifest. You will add a new field called md5 here and also bump the @manifest_vsn. Now for each file compiled, you compute and store the MD5 here: https://github.com/elixir-lang/elixir/blob/master/lib/mix/lib/mix/compilers/elixir.ex#L447 - you can expect the file to exist at this point
  2. Now in the line of code you removed in the other PR, you can compare the MD5, fetching it from disk if and only if the mtime is outdated and the size is the same

One thing to notice on step 2 is that we are comparing multiple mtimes. For example, imagine a Phoenix.View. It has several templates on the file system, so while the view is the same, we still need to recompile it if a template changes! So this check:

  for source(source: source, external: external, size: size, modules: modules) <-
        all_sources,
      {last_mtime, last_size} = Map.fetch!(sources_stats, source),
      times = Enum.map(external, &(sources_stats |> Map.fetch!(&1) |> elem(0))),
      size != last_size or Mix.Utils.stale?([last_mtime | times], [modified]) or
        Enum.any?(modules, &Map.has_key?(modules_to_recompile, &1)),
      do: source

Should become something like this:

  for source(source: source, external: external, size: size, modules: modules, md5: md5) <-
        all_sources,
      times = Enum.map(external, &(sources_stats |> Map.fetch!(&1) |> elem(0))),
      source_changed?(source_stats, source, md5, modified) or
        Mix.Utils.stale?(times, [modified]) or
        Enum.any?(modules, &Map.has_key?(modules_to_recompile, &1)),
      do: source

Where source_changed? has the size, mtime, and digest logic for the source file exclusively. It is important to not mix those. :)

Step 2 fails if filesystem mounted with nomtime or the filesystem does not support/properly set mtime. Possibly check if mtime is enabled on filesystem?

@josevalim
Copy link
Member

josevalim commented Jun 28, 2021

The whole code today already assumes a file system with mtime and it has never been an issue, especially for development. If this is an issue for someone, then we are glad to discuss options, but it is not a concern for this PR.

@josevalim josevalim merged commit 350a909 into elixir-lang:master Jun 28, 2021
@josevalim
Copy link
Member

💚 💙 💜 💛 ❤️

@josevalim
Copy link
Member

Thank you @marcandre! As I mentioned, we don't run this check for external resources (for example, all templates in a Phoenix view are external resources), so this may be a necessary change in the future if you (or anyone else) is inclined.

@marcandre marcandre deleted the differentiate_on_hash branch June 28, 2021 15:36
@marcandre
Copy link
Contributor Author

Awesome, thanks!
I don't know when I'll have the time to look at external resources; if anyone is interested in picking it up that's great too

@marcandre
Copy link
Contributor Author

Is there a timeline for this to be released?

[15:31][~/super-pas((no branch, rebasing ch2579/revert-data-plan-change))]$ mix test something
Compiling 357 files (.ex)

@josevalim
Copy link
Member

@marcandre I found a bug in master about 10 days but it has been running smoothly since then. I will backport it to v1.12 branch and it should be out in one or two weeks.

@josevalim
Copy link
Member

@marcandre btw, if you could run elixir from the v1.12 branch, then it would be super helpful too to help catch any regressions and you would have the benefits today. :)

@josevalim
Copy link
Member

The website and readme have instructions to install/compile from source if needed.

@marcandre
Copy link
Contributor Author

@marcandre btw, if you could run elixir from the v1.12 branch, then it would be super helpful too to help catch any regressions and you would have the benefits today. :)

Perfect, will do 👍

@marcandre
Copy link
Contributor Author

I was disappointed to get the dreaded "Compiling 392 files (.ex)" when using it locally, only to realize that my branch was modifying config/test.exs, which bumps Mix.Project.config_mtime and invalidates the whole project.

Although this happens less frequently, it would be really nice to fix this too, so that only actual changes to files invalidate the project.

A "quick" fix might be for Mix.Project.config_mtime to use the last stored mtime if the digest has not changed, but it's not clear to me where to store this information. Any pointers?

@josevalim
Copy link
Member

@marcandre we already have a manifest where we store the elixir and otp versions, and we could store the config digests in there. It is currently kept in Mix.Dep.ElixirSCM and called in mix/tasks/loadpaths.ex. It is used both by your project and elixir deps, so we would need to make the config digest be an empty map in the deps case.

@marcandre
Copy link
Contributor Author

Thank you, I'll have a look when I get a bit of time.

@vans163
Copy link
Contributor

vans163 commented Jul 17, 2021

The whole code today already assumes a file system with mtime and it has never been an issue, especially for development. If this is an issue for someone, then we are glad to discuss options, but it is not a concern for this PR.

an example is sshfs has mtime issues

@marcandre
Copy link
Contributor Author

@josevalim May not be related (directly or at all), but I'm getting some strange compiling errors today that resolve if I recompile the whole lib/:

$ mix phx.server
Compiling 5 files (.ex)

== Compilation error in file lib/bobby_web/views/admin/account_view.ex ==
** (CompileError) lib/bobby_web/views/admin/account_view.ex:2: module Bobby is not loaded and could not be found. This may be happening because the module you are trying to load directly or indirectly depends on the current module
    (elixir 1.12.2) expanding macro: Kernel.use/2
    lib/bobby_web/views/admin/account_view.ex:2: BobbyWeb.Admin.AccountView (module)
    expanding macro: BobbyWeb.__using__/1
    lib/bobby_web/views/admin/account_view.ex:2: BobbyWeb.Admin.AccountView (module)
    (elixir 1.12.2) expanding macro: Kernel.use/2
    lib/bobby_web/views/admin/account_view.ex:2: BobbyWeb.Admin.AccountView (module)
    (elixir 1.12.2) lib/kernel/parallel_compiler.ex:319: anonymous fn/4 in Kernel.ParallelCompiler.spawn_workers/7
$ touch mix.exs
$ mix phx.server
Compiling 405 files (.ex)
Compiling lib/postgres_types.ex (it's taking more than 10s)
Compiling lib/bobby_web/views/documents/life_view.ex (it's taking more than 10s)
Generated bobby app

Bobby is our MyApp. Just about everywhere we do use Bobby.

I am not sure how to reproduce it, but it has happened 3 times today (and never before today, no notable to our codebase either 🤷)

This is running elixir from 1.12 branch (at 5189b292a, I'm missing an unrelated commit)

Any idea what could be happening?

@josevalim
Copy link
Member

@marcandre next time it happens, please try passing the --verbose flag. You should see all five files printed before it fails and let's see if lib/bobby.ex is in the list of files.

@josevalim
Copy link
Member

Oh, and please see if touching only the file that defines Bobby next fixes it.

@marcandre
Copy link
Contributor Author

Yay, I was able to reproduce this and I figured out what I did. Opened #11176

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants