Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible RAM leakage in fingerprint calculations using fingerprint_OB() function #39

Open
tcaceresm opened this issue Apr 5, 2024 · 0 comments

Comments

@tcaceresm
Copy link

Hi there, I opened a issue in Rcpi package, however, I think it's more appropiate to open a issue here because Rcpi relies on ChemmineOB.
I have ~50k molecules, of which I want to calculate fingerprints. I use a function from Rcpi package (see code below), which create an appropiate matrix to store fingerprints, then iterate over molecules and finally calculate the fingerprints using ChemmineOB::fingerprint_OB. However, in each loop, RAM usage increases, despite the fact that the size of the matrix is constant. I noticed that R session memory is increasing, not the object size.
I calculated the fingerprints using open babel cli, and it runs smoothly.
Thanks, and sorry about my english.

function (molecules, type = c("smile", "sdf")) 
{
  check_ob()
  if (type == "smile") {
    if (length(molecules) == 1L) {
      molRefs = eval(parse(text = "ChemmineOB::forEachMol('SMILES', molecules, identity)"))
      fp = eval(parse(text = "ChemmineOB::fingerprint_OB(molRefs, 'FP4')"))
    }
    else if (length(molecules) > 1L) {
      fp = matrix(0L, nrow = length(molecules), ncol = 512L)
      for (i in 1:length(molecules)) {
        molRefs = eval(parse(text = "ChemmineOB::forEachMol('SMILES', molecules[i], identity)"))
###########################################################
####### This is the step which increases RAM usage in each loop step
        fp[i, ] = eval(parse(text = "ChemmineOB::fingerprint_OB(molRefs, 'FP4')"))
###########################################################
      }
    }
  }
  else if (type == "sdf") {
    smi = eval(parse(text = "ChemmineOB::convertFormat(from = 'SDF', to = 'SMILES', source = molecules)"))
    smiclean = strsplit(smi, "\\t.*?\\n")[[1]]
    if (length(smiclean) == 1L) {
      molRefs = eval(parse(text = "ChemmineOB::forEachMol('SMILES', smiclean, identity)"))
      fp = eval(parse(text = "ChemmineOB::fingerprint_OB(molRefs, 'FP4')"))
    }
    else if (length(smiclean) > 1L) {
      fp = matrix(0L, nrow = length(smiclean), ncol = 512L)
      for (i in 1:length(smiclean)) {
        molRefs = eval(parse(text = "ChemmineOB::forEachMol('SMILES', smiclean[i], identity)"))
        fp[i, ] = eval(parse(text = "ChemmineOB::fingerprint_OB(molRefs, 'FP4')"))
      }
    }
  }
  else {
    stop("Molecule type must be \"smile\" or \"sdf\"")
  }
  return(fp)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant