Version that uses SnapATAC2 #13

nleroy917 · 2023-11-16T16:59:40Z

hi guys,

I opened an issue a while back regarding some problems with installation #6 and a PR #7. I even tried Dockerizing it all. But that was not possible due to the balancing act of supporting python, R, and ArchR all in one container.

I'm not sure if you have seen, but SnapATAC2 was recently released, and it looks like a fantastic python-native substitution for ArchR when analyzing scATAC-seq data from within python. There's even a gene activity matrix construction tutorial. In addition to having a native python library, it

doesn't pollute your directory with arrow files, logs and tmp dirs,
is built on rust so it's faster, uses less memory, and is overall more efficient,
supports AnnData natively - an extraordinarily common datatype for scATAC seq data.

If the only purpose of ArchR within this library is to preprocess the data, perhaps it would be worth transitioning to SnapATAC2 to make installation easier, enable dockerization, and make it faster overall.

The text was updated successfully, but these errors were encountered:

marvinquiet · 2023-11-17T15:41:06Z

Dear @nleroy917,

Thank you for your valuable contributions and for keeping us informed about the latest developments in SnapATAC2. Regarding your inquiry about the snap.pp.make_gene_matrix function in SnapATAC2, I couldn't find specific documentation for it. Could you clarify whether this function exclusively aggregates counts in promoter regions?

I raise this question because we have compared performances using gene activity matrices provided ArchR, where we observed variations in performance across different matrices (refer to Supplementary Figure 13). Notably, the categories of Model-GeneBody and GeneModel-GB-Exponential-Extend outperformed those utilizing only promoter information. Given the ArchR paper's mention of the superior correlation of the GeneModel-GB-Exponential-Extend category with gene expression, we opted to adhere to this choice. I am keen to understand how SnapATAC2 generates its cell-by-gene matrix using the gene annotation file and whether it incorporates information beyond promoter regions.

BTW, I don't want to impede your ongoing research. An alternative solution could be to utilize SnapATAC2 for generating the gene activity matrix. You can then proceed to use only the Cellcano train and Cellcano predict functions without the need to install the ArchR package, which I understand can be cumbersome.

Sincerely,
Wenjing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version that uses SnapATAC2 #13

Version that uses SnapATAC2 #13

nleroy917 commented Nov 16, 2023

marvinquiet commented Nov 17, 2023

Version that uses SnapATAC2 #13

Version that uses SnapATAC2 #13

Comments

nleroy917 commented Nov 16, 2023

marvinquiet commented Nov 17, 2023