-
Notifications
You must be signed in to change notification settings - Fork 48
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Please make sure these conditions are met
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pertpy.
- (optional) I have confirmed this bug exists on the main branch.
Report
I fetched the Replogle 2022 K562 essential dataset with the pertpy data module, and I’m quite confused by the gene and perturbation columns in .obs:
# Unique perturbations from PertPy dataset, sorted by frequency
adata_pertpy = pt.data.replogle_2022_k562_essential()
print(adata_pertpy.obs['perturbation'].value_counts().head(5))
perturbation
nan 1527
chr1.10050_top_two_chr11.1801_top_two_chr12.1832_top_two_chr12.732_top_two_chr1.3789_top_two_chr16.3244_top_two_chr20.537_top_two_chr21.240_top_two_chr3.666_top_two_chr5.1476_top_two_chr5.1603_top_two_chr7.3567_top_two_chr8.2697_top_two 6
chr1.10201_top_two_chr11.1778_second_two_chr11.9_top_two_chr12.3850_top_two_chr1.3349_top_two_chr16.4939_top_two_chr16.5184_top_two_chr2.181_top_two_chr2.2107_top_two_chr2.2686_top_two_chr2.3453_top_two_chr3.3203_top_two_chr7.2795_top_two_chrX.333_top_two_GNPDA1 6
chr1.11646_top_two_chr1.7199_top_two_chr19.32_top_two_chr19.4654_top_two_chr6.1231_top_two 6
chr10.3492_top_two_chr14.1108_top_two_chr19.2350_top_two_chr19.3043_top_two_chr2.342_top_two_chr4.2449_top_two_chr5.3584_top_two_chr6.4480_top_two_chr7.2333_top_two_chrX.230_top_two_FKBP2_GUK1 6
Additionally, the original dataset from the authors has 310,385 cells × 8,563 genes, while the Perty dataset has 207,324 cells × 13,135 genes. I understand there are probably fewer cells in the Pertpy version from QC filtering, but where did the extra genes come from?
Versions
pertpy==1.0.5Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working