Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when loading a 4 channel image when channel option is forced to 3 in StbImage.read_file #35

Closed
ChristopheBelpaire opened this issue Mar 5, 2024 · 6 comments
Assignees

Comments

@ChristopheBelpaire
Copy link

ChristopheBelpaire commented Mar 5, 2024

Hello,
I'm reading the awesome book Machine learning with elixir and
on my M1 macbook with erlang 26.2.2 and elixir 1.16.1,
the following script crashes (from chapter 7 of the book, the dataset comes from kaggle : https://www.kaggle.com/datasets/shaunthesheep/microsoft-catsvsdogs-dataset) :

Mix.install([{:axon, "~> 0.6"}, {:nx, path: "./nx/nx", override: true}, {:exla, path: "./nx/exla", override: true}, {:stb_image, "0.5.2"}, {:kino, "~> 0.8"}], system_env: %{"DEBUG" => 1})
Process.sleep(1000)
Nx.tensor(1, backend: EXLA.Backend)


defmodule CatsAndDogs do
  def pipeline(paths, batch_size, target_height, target_width) do
    paths
    |> Enum.shuffle()
    |> Task.async_stream(&parse_image/1)
    |> Stream.filter(fn
       {:ok, {%StbImage{shape: {_, _, 3}}, _}} -> true
       _ -> false end)
    |> Stream.map(&to_tensors(&1, target_height, target_width))
    |> Stream.chunk_every(batch_size, batch_size, :discard)
    |> Stream.map(fn chunks ->
      {img_chunk, label_chunk} = Enum.unzip(chunks)
      {Nx.stack(img_chunk), Nx.stack(label_chunk)}
    end)
  end

  defp parse_image(path) do
   label = if String.contains?(path, "Cat"), do: 0, else: 1
   case StbImage.read_file(path, channels: 3) do
      {:ok, img} -> {img, label}
      _error -> :error
    end
  end

  defp to_tensors({:ok, {img, label}}, target_height, target_width) do img_tensor =
    img
    |> StbImage.resize(target_height, target_width)
    |> StbImage.to_nx()
    |> Nx.divide(255)
    label_tensor = Nx.tensor([label])
    {img_tensor, label_tensor}
  end

end


{test_paths, train_paths} =(Path.wildcard("/Users/christophebelpaire/perso/machine-learning-in-elixir/PetImages/Cat/*.jpg")
++ Path.wildcard("/Users/christophebelpaire/perso/machine-learning-in-elixir/PetImages/Dog/*.jpg"))
|> Enum.shuffle()
|> Enum.split(1000)


target_height = 96
target_width = 96
batch_size = 32

train_pipeline = CatsAndDogs.pipeline(
  train_paths, batch_size, target_height, target_width
)
test_pipeline = CatsAndDogs.pipeline(
  test_paths, batch_size, target_height, target_width
)

mlp_model =
  Axon.input("images", shape: {nil, target_height, target_width, 3}) |> Axon.flatten()
  |> Axon.dense(256, activation: :relu)
  |> Axon.dense(128, activation: :relu)
  |> Axon.dense(1, activation: :sigmoid)


  IO.gets("Press enter to continue - #{System.pid()}")

  _mlp_trained_model_state =
  mlp_model
  |> Axon.Loop.trainer(:binary_cross_entropy, :adam)
  |> Axon.Loop.metric(:accuracy)
  |> Axon.Loop.run(train_pipeline, %{}, epochs: 5, compiler: EXLA)

Here is the debug message from lldb :

Architecture set to: arm64-apple-macosx-.
(lldb) continue
Process 13117 resuming
Process 13117 stopped
* thread #25, name = 'erts_dios_1', stop reason = EXC_BAD_ACCESS (code=1, address=0x14a394000)
    frame #0: 0x0000000189b496b0 libsystem_platform.dylib`_platform_memmove + 96
libsystem_platform.dylib`:
->  0x189b496b0 <+96>:  ldnp   q0, q1, [x1]
    0x189b496b4 <+100>: add    x1, x1, #0x20
    0x189b496b8 <+104>: subs   x2, x2, #0x20
    0x189b496bc <+108>: b.hi   0x189b496a8               ; <+88>
Target 0: (beam.smp) stopped.
(lldb) bt
* thread #25, name = 'erts_dios_1', stop reason = EXC_BAD_ACCESS (code=1, address=0x14a394000)
  * frame #0: 0x0000000189b496b0 libsystem_platform.dylib`_platform_memmove + 96
    frame #1: 0x0000000148127134 stb_image_nif.so`pack_data + 96
    frame #2: 0x0000000148125c08 stb_image_nif.so`read_file + 812
    frame #3: 0x0000000102d746c8 beam.smp`erts_call_dirty_nif(esdp=0x00000001439c1c80, c_p=0x0000000145eb12b8, I=0x00000001502b7278, reg=0x000000010308c5c0) at erl_nif.c:466:18 [opt]
    frame #4: 0x0000000102c4e0e0 beam.smp`erts_dirty_process_main(esdp=0x00000001439c1c80) at beam_common.c:280:23 [opt]
    frame #5: 0x0000000102bc68e8 beam.smp`sched_dirty_io_thread_func(vesdp=0x00000001439c1c80) at erl_process.c:8768:5 [opt]
    frame #6: 0x0000000102e13cbc beam.smp`thr_wrapper(vtwd=0x000000016d242560) at ethread.c:116:25 [opt]
    frame #7: 0x0000000189b1a034 libsystem_pthread.dylib`_pthread_start + 136
@ChristopheBelpaire
Copy link
Author

Ok, the problem comes from the way I'm loading the images StbImage.read_file(path, channels: 3)
Some of my images seems to have 4 channels, the crash seems to be caused by the fact the load is forced to 3 channel for a 4 channel image.

@ChristopheBelpaire ChristopheBelpaire changed the title Crash when running an Axon model Crash when loading a 4 channel image when channel option is forced to 3 in StbImage.read_file Mar 5, 2024
@cocoa-xu
Copy link
Member

cocoa-xu commented Mar 5, 2024

Hi @ChristopheBelpaire, thanks for reporting this issue. I'll try to fix this issue sometime this week!

@cocoa-xu
Copy link
Member

cocoa-xu commented Mar 5, 2024

$ file /Users/cocoa/Downloads/archive/PetImages/Cat/10404.jpg
/Users/cocoa/Downloads/archive/PetImages/Cat/10404.jpg: Adobe Photoshop Image, 432 x 363, RGB, 3x 8-bit channels

I found one in the dataset that can cause this issue, but it's definitely not a JPEG image and seems to have wrong metadata perhaps. I'll look into this further

@ChristopheBelpaire
Copy link
Author

Awesome, thanks!

I removed the "bad" images with :

    |> Stream.filter(fn
       {:ok, {%StbImage{shape: {_, _, 3}}, _}} -> true
       _ -> false end)

To filter the channel, everything can be loaded with this filter.

@cocoa-xu cocoa-xu self-assigned this Mar 6, 2024
@cocoa-xu
Copy link
Member

cocoa-xu commented Mar 6, 2024

Hi @ChristopheBelpaire, v0.6.6 is out and should fix this issue!

@ChristopheBelpaire
Copy link
Author

Awesome!
I tried my scenario again and it seems to load all the images.
Thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants