Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up rule save_counts (bottleneck) #15

Closed
8 tasks done
sreichl opened this issue Dec 15, 2023 · 1 comment
Closed
8 tasks done

speed up rule save_counts (bottleneck) #15

sreichl opened this issue Dec 15, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@sreichl
Copy link
Collaborator

sreichl commented Dec 15, 2023

fwrite

  • test
    • input has to be data.frame
    • changed for now in save_counts.R
  • change in utils.R
  • change in metadata_plot.R
  • sctransform_cellScore.R

fread

  • change in metadata_plot.R
  • merge.R
  • prepare.R

general

  • make mr.pareto issue: look for write.csv across all MR.PARETO modules

https://rdrr.io/cran/data.table/man/fwrite.html

library(data.table)
fwrite(as.data.frame(GetAssayData(object = seurat_object, slot = "scale.data", assay = "SCT")), file = file.path(result_dir, paste0(step, 'scaled_', 'RNA', '.csv')), row.names=TRUE)

# more general
#fast writing
fwrite(as.data.frame(df), file=file.path("path/to/file.csv"), row.names=TRUE)

#fast reading
df <- data.frame(fread(file.path("path/to/file.csv"), header=TRUE), row.names=1)
@sreichl sreichl self-assigned this Dec 15, 2023
@sreichl sreichl added the enhancement New feature or request label Dec 15, 2023
@sreichl
Copy link
Collaborator Author

sreichl commented Dec 15, 2023

not the same! differences

  • quotes -> fine/fixed
  • column names differ '-' vs '.' -> check in metadata -> has to match for downstream tasks -> in metadata its '-'
  • last row -> no gene name but rownumber?! -> fixed

new comparison
diff --brief <(tail -n +2 NORMALIZED_RNA.csv | cut -d, -f2-) <(tail -n +2 NORMALIZED_RNA_original.csv | cut -d, -f2-)
Files /dev/fd/63 and /dev/fd/62 differ

compare with brief diff
diff --brief NORMALIZED_RNA_original.csv NORMALIZED_RNA.csv

head -n 1 NORMALIZED_RNA.csv | cut -c 1-100
"",EMICROP_A_AAACCTGAGAATAGGG-1,EMICROP_A_AAACCTGAGACAATAC-1,EMICROP_A_AAACCTGAGACGACGT-1,EMICROP_A_

head -n 1 NORMALIZED_RNA_original.csv | cut -c 1-100
"","EMICROP_A_AAACCTGAGAATAGGG.1","EMICROP_A_AAACCTGAGACAATAC.1","EMICROP_A_AAACCTGAGACGACGT.1","EMI

tail -n 1 NORMALIZED_RNA.csv | cut -c 1-100
"28441",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,

tail -n 1 NORMALIZED_RNA_original.csv | cut -c 1-100
"AC007325.4",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

@sreichl sreichl closed this as completed Feb 8, 2024
sreichl added a commit that referenced this issue Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant