Axon ignoring the activation function? #526

a-alhusaini · 2023-08-26T09:02:15Z

I have the following model

model =
  Axon.input("input_0", shape: {nil, 1, 28, 28})
  |> Axon.flatten()
  |> Axon.dense(128, activation: :relu)
  |> Axon.dense(10, activation: :sigmoid)

After training the model and using it the outputs are bigger than 1.

Based on my understanding of the sigmoid function, this shouldn't be possible. What am I missing?

polvalente · 2023-08-26T09:03:55Z

Could you share the full code?

a-alhusaini · 2023-08-26T09:08:32Z

train_data = Scidata.MNIST.download()
test_data = Scidata.MNIST.download_test()

{train_data, train_labels} = train_data
{train_binary, train_type, train_shape} = train_data
{train_label_binary, train_label_type, train_label_shape} = train_labels

train_y =
  Nx.from_binary(train_label_binary, train_label_type)
  |> Nx.reshape(train_label_shape)
  |> Nx.new_axis(-1)
  |> Nx.equal(Enum.to_list(0..9) |> Nx.tensor())
  |> Nx.to_batched(32)
  |> Enum.to_list()

train_x =
  Nx.from_binary(train_binary, train_type)
  |> Nx.reshape(train_shape)
  |> Nx.divide(255)
  |> Nx.to_batched(32)
  |> Enum.to_list()

data = Enum.zip(train_x, train_y)

training_count = floor(0.8 * Enum.count(data))
validation_count = floor(0.2 * training_count)

{training_data, test_data} = Enum.split(data, training_count)
{validation_data, training_data} = Enum.split(training_data, validation_count)

model =
  Axon.input("input_0", shape: {nil, 1, 28, 28})
  |> Axon.flatten()
  |> Axon.dense(128, activation: :relu)
  |> Axon.dense(10, activation: :sigmoid)

state =
  model
  |> Axon.Loop.trainer(:categorical_cross_entropy, Axon.Optimizers.adam(0.01))
  |> Axon.Loop.metric(:accuracy, "Accuracy")
  |> Axon.Loop.validate(model, validation_data)
  |> Axon.Loop.run(training_data, %{}, compiler: EXLA, epochs: 10)

model
|> Axon.Loop.evaluator()
|> Axon.Loop.metric(:accuracy, "Accuracy")
|> Axon.Loop.run(test_data, state)

first_number = Enum.at(train_x, 0)[0]
Axon.predict(model, state, first_number)

polvalente · 2023-08-26T09:16:28Z

What's the Axon version used?

polvalente · 2023-08-26T09:21:00Z

With Axon 0.6 and EXLA 0.6, I got this output:

#Nx.Tensor<
  f32[1][10]
  EXLA.Backend<host:0, 0.3007448411.1655570452.243887>
  [
    [2.3431260944184697e-23, 4.296128036651581e-11, 1.0659236316176752e-17, 0.9999992847442627, 6.712944780263512e-22, 1.0, 2.8145804840526482e-22, 2.4531429665408666e-10, 9.463095196338145e-9, 2.6594868813845096e-6]
  ]
>

Maybe the confusion stems from the scientific notation?

All but the 4th number there are far smaller than 1. Note the e-23 suffix to the first entry. That means that it's 2.34... * 10 ** -23. Likewise, the second is 4.29 * 10 ** -11 and so on.

a-alhusaini · 2023-08-26T09:35:07Z

Ah, sorry, I didn't notice the notation. I use a screen reader and was only listening to the first few numbers of every tensor (it takes a long time for screen reader to read all those numbers). It turns out I never reached the end of any tensor, there the scientific notation is written..

polvalente closed this as completed Aug 26, 2023

a-alhusaini mentioned this issue Aug 26, 2023

[Accessibility] Add option to reduce precision when printing tensors to screen elixir-nx/nx#1285

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Axon ignoring the activation function? #526

Axon ignoring the activation function? #526

a-alhusaini commented Aug 26, 2023

polvalente commented Aug 26, 2023

a-alhusaini commented Aug 26, 2023

polvalente commented Aug 26, 2023

polvalente commented Aug 26, 2023

a-alhusaini commented Aug 26, 2023

Axon ignoring the activation function? #526

Axon ignoring the activation function? #526

Comments

a-alhusaini commented Aug 26, 2023

polvalente commented Aug 26, 2023

a-alhusaini commented Aug 26, 2023

polvalente commented Aug 26, 2023

polvalente commented Aug 26, 2023

a-alhusaini commented Aug 26, 2023