New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Sparse Matrices #50

Closed
gsimchoni opened this Issue Jun 25, 2017 · 3 comments

Comments

Projects
None yet
3 participants
@gsimchoni

gsimchoni commented Jun 25, 2017

Currently the x object accepted by the fit function is a

Vector, matrix, or array of training data...

It would be great if there is also support for sparse matrices, as these X matrices can grow very large on disk, although they can be very sparse. For example, the Nietzsche example runs fine on my 32GB RAM laptop, but I run into RAM problems even with twice as many rows. And this is a classic case for a sparse matrix.

@jjallaire

This comment has been minimized.

Show comment
Hide comment
@jjallaire
Member

jjallaire commented Jun 25, 2017

@jjallaire jjallaire closed this Jun 25, 2017

@dfalbel

This comment has been minimized.

Show comment
Hide comment
@dfalbel

dfalbel Jun 25, 2017

Collaborator

I guess you would need a sparseArray implementation also for this example.

While there's no support for sparse matrixes you can always use train_on_batch function. And transform only one batch per time so RAM doesn't explode. You could, instead of vectorizing evrything,
change this call:

model %>% fit(
    X, y,
    batch_size = 128,
    epochs = 1
  )

for

  all_samples <- 1:length(dataset$sentece)
  batch_size <- 128
  num_steps <- trunc(length(dataset$sentece)/batch_size)
  
  for(step in 1:num_steps){
    
    batch <- sample(all_samples, batch_size)
    all_samples <- all_samples[-batch]
    
    sentences <- dataset$sentece[batch]
    next_chars <- dataset$next_char[batch]
    
    # vectorization
    X <- array(0, dim = c(batch_size, maxlen, length(chars)))
    y <- array(0, dim = c(batch_size, length(chars)))
    
    for(i in 1:batch_size){
      
      X[i,,] <- sapply(chars, function(x){
        as.integer(x == sentences[[i]])
      })
      
      y[i,] <- as.integer(chars == next_chars[[i]])
      
    }
    
    model %>% train_on_batch(
      X, y
    )
    
  }
Collaborator

dfalbel commented Jun 25, 2017

I guess you would need a sparseArray implementation also for this example.

While there's no support for sparse matrixes you can always use train_on_batch function. And transform only one batch per time so RAM doesn't explode. You could, instead of vectorizing evrything,
change this call:

model %>% fit(
    X, y,
    batch_size = 128,
    epochs = 1
  )

for

  all_samples <- 1:length(dataset$sentece)
  batch_size <- 128
  num_steps <- trunc(length(dataset$sentece)/batch_size)
  
  for(step in 1:num_steps){
    
    batch <- sample(all_samples, batch_size)
    all_samples <- all_samples[-batch]
    
    sentences <- dataset$sentece[batch]
    next_chars <- dataset$next_char[batch]
    
    # vectorization
    X <- array(0, dim = c(batch_size, maxlen, length(chars)))
    y <- array(0, dim = c(batch_size, length(chars)))
    
    for(i in 1:batch_size){
      
      X[i,,] <- sapply(chars, function(x){
        as.integer(x == sentences[[i]])
      })
      
      y[i,] <- as.integer(chars == next_chars[[i]])
      
    }
    
    model %>% train_on_batch(
      X, y
    )
    
  }
@jjallaire

This comment has been minimized.

Show comment
Hide comment
@jjallaire

jjallaire Jul 6, 2017

Member

fit_generator() is now able to work with a custom R function so that would be another way to do this that wouldn't require writing your own training loop. Documentation is here: https://rstudio.github.io/keras/articles/faq.html#how-can-i-use-keras-with-datasets-that-dont-fit-in-memory

Member

jjallaire commented Jul 6, 2017

fit_generator() is now able to work with a custom R function so that would be another way to do this that wouldn't require writing your own training loop. Documentation is here: https://rstudio.github.io/keras/articles/faq.html#how-can-i-use-keras-with-datasets-that-dont-fit-in-memory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment