Skip to content

Conversation

@jvdp1
Copy link
Collaborator

@jvdp1 jvdp1 commented Apr 2, 2019

*Addition of the methods "fit" and "predict": the method fit trains the model for n epochs with m batchs (the batchs are now selected consecutively (instead of randomly (I am not sure why it was the case); the method predict returns predicted ouput (y)

*Modification of example_minst.f90 to use "fit" and "predict"

*Addition of 2 examples using real data publicaly available and used in 2 scientific studies (see Montesinos-Lopez et al. (2018). G3)

*ISSUE: Activations other than 'sigmoid' do not seem to work properly

@milancurcic
Copy link
Member

Jeremie, fantastic, thank you, especially for the added examples. Will review this week!

@jvdp1
Copy link
Collaborator Author

jvdp1 commented Apr 3, 2019

Jeremie, fantastic, thank you, especially for the added examples. Will review this week!

Dear, I made some modifications to the method fit: the mini-batches are created at random, as you proposed initially (my implementation was to simple).

Question: When there are multiple CAF images, each image will go through (approximately) the complete training set, and therefore each epoch will be based on #images times whole training datasets. Should the loop "mini_batches" divide by the number of images, such that one epoch is based on only one whole training set? Or do I miss something? Thank you.

@milancurcic
Copy link
Member

the batchs are now selected consecutively (instead of randomly (I am not sure why it was the case)

Quasi-random selection of mini-batches is meant to mimic the "stochastic" of Stochastic Gradient Descent. Note that the random selection is currently not implemented anywhere in the library, but can be implemented in the client code (like example_mnist.f90).

Question: When there are multiple CAF images, each image will go through (approximately) the complete training set, and therefore each epoch will be based on #images times whole training datasets. Should the loop "mini_batches" divide by the number of images, such that one epoch is based on only one whole training set? Or do I miss something? Thank you.

train_batch() takes a single batch of data and updates weights and biases given that batch, in serial or parallel. In parallel mode (multiple CAF images), the mini-batch is split equally among the parallel images:

    im = size(x, dim=2) ! mini-batch size
    nm = size(self % dims) ! number of layers

    ! get start and end index for mini-batch
    indices = tile_indices(im)
    is = indices(1)
    ie = indices(2)

    call db_init(db_batch, self % dims)
    call dw_init(dw_batch, self % dims)

    do concurrent(i = is:ie)
      call self % fwdprop(x(:,i))
      call self % backprop(y(:,i), dw, db)
      do concurrent(n = 1:nm)
        dw_batch(n) % array =  dw_batch(n) % array + dw(n) % array
        db_batch(n) % array =  db_batch(n) % array + db(n) % array
      end do
    end do

    if (num_images() > 1) then
      call dw_co_sum(dw_batch)
      call db_co_sum(db_batch)
    end if

Here, is and ie are indices that define the sub-range in a mini-batch for each parallel image, so each works over do concurrent (i = is:ie). At the end of the training iteration, we sum up db and dw across images. Each image only worked on a subset of the mini-batch, so a sum of db and dw across images yields the same update as if a single image worked on the whole mini-batch.

@jvdp1
Copy link
Collaborator Author

jvdp1 commented Apr 8, 2019

Thank you @milancurcic for your answers. I indeed reed too quickly the code. Well done!

@milancurcic
Copy link
Member

Hi Jeremie,

I mostly like your additions. Few suggestions:

  • predict_batch seems to be a batch wrapper around the output method. Let's rename output -> output_single, predict_batch -> output_batch, then make a generic output around output_single and output_batch, analogous to train_single and train_batch.
  • fit_batch seems to be a wrapper around train_batch for doing a number of epochs and stochastic selection of data batches. How about calling this train_epochs for consistency?

@jvdp1
Copy link
Collaborator Author

jvdp1 commented Apr 24, 2019

Hi Jeremie,

I mostly like your additions. Few suggestions:

  • predict_batch seems to be a batch wrapper around the output method. Let's rename output -> output_single, predict_batch -> output_batch, then make a generic output around output_single and output_batch, analogous to train_single and train_batch.
  • fit_batch seems to be a wrapper around train_batch for doing a number of epochs and stochastic selection of data batches. How about calling this train_epochs for consistency?

Dear Milan,

I modified the different names as you suggested.

@milancurcic milancurcic changed the base branch from master to 10-output-batch June 14, 2019 18:59
@milancurcic milancurcic merged commit 4a944e7 into modern-fortran:10-output-batch Jun 14, 2019
@milancurcic milancurcic mentioned this pull request Mar 18, 2020
@jvdp1 jvdp1 deleted the development branch May 10, 2024 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants