Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version that uses SnapATAC2 #13

Open
nleroy917 opened this issue Nov 16, 2023 · 1 comment
Open

Version that uses SnapATAC2 #13

nleroy917 opened this issue Nov 16, 2023 · 1 comment

Comments

@nleroy917
Copy link

hi guys,

I opened an issue a while back regarding some problems with installation #6 and a PR #7. I even tried Dockerizing it all. But that was not possible due to the balancing act of supporting python, R, and ArchR all in one container.

I'm not sure if you have seen, but SnapATAC2 was recently released, and it looks like a fantastic python-native substitution for ArchR when analyzing scATAC-seq data from within python. There's even a gene activity matrix construction tutorial. In addition to having a native python library, it

  1. doesn't pollute your directory with arrow files, logs and tmp dirs,
  2. is built on rust so it's faster, uses less memory, and is overall more efficient,
  3. supports AnnData natively - an extraordinarily common datatype for scATAC seq data.

If the only purpose of ArchR within this library is to preprocess the data, perhaps it would be worth transitioning to SnapATAC2 to make installation easier, enable dockerization, and make it faster overall.

@marvinquiet
Copy link
Owner

Dear @nleroy917,

Thank you for your valuable contributions and for keeping us informed about the latest developments in SnapATAC2. Regarding your inquiry about the snap.pp.make_gene_matrix function in SnapATAC2, I couldn't find specific documentation for it. Could you clarify whether this function exclusively aggregates counts in promoter regions?

I raise this question because we have compared performances using gene activity matrices provided ArchR, where we observed variations in performance across different matrices (refer to Supplementary Figure 13). Notably, the categories of Model-GeneBody and GeneModel-GB-Exponential-Extend outperformed those utilizing only promoter information. Given the ArchR paper's mention of the superior correlation of the GeneModel-GB-Exponential-Extend category with gene expression, we opted to adhere to this choice. I am keen to understand how SnapATAC2 generates its cell-by-gene matrix using the gene annotation file and whether it incorporates information beyond promoter regions.

BTW, I don't want to impede your ongoing research. An alternative solution could be to utilize SnapATAC2 for generating the gene activity matrix. You can then proceed to use only the Cellcano train and Cellcano predict functions without the need to install the ArchR package, which I understand can be cumbersome.

Sincerely,
Wenjing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants