Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster.counts #25

Open
amanda-fitz opened this issue Oct 31, 2018 · 4 comments
Open

cluster.counts #25

amanda-fitz opened this issue Oct 31, 2018 · 4 comments

Comments

@amanda-fitz
Copy link

Hi there, can anyone help?
I have two things I'd like to add to my cluster.counts data file.

  1. cluster number assigned by ClonEvol (1,2,3,4 etc) to data table cluster.counts - currently my cluster.counts data table only has the PyClone cluster ID.

  2. Median CCF per sample (currently just median.ccf for each cluster)
    [LM002_cluster.counts.xlsx]

Attached example of my cluster.counts file
(https://github.com/hdng/clonevol/files/2533239/LM002_cluster.counts.xlsx)

Here is my script:

pyclone.directory <- '/Users/amandafitzpatrick/Library/Mobile Documents/comappleCloudDocs/DOCUMENTS/E57 exome sequencing/2018-08-30_results_ascat_pyclone/pyclone'
output.directory <- '/Users/amandafitzpatrick/Library/Mobile Documents/comappleCloudDocs/DOCUMENTS/E57 exome sequencing/2018-08-30_results_ascat_pyclone'
sample.sheet.file <- 'sample_annotation.txt'

min.mutation.count <- 30
cancer.genes <- scan('/Users/amandafitzpatrick/Library/Mobile Documents/comappleCloudDocs/DOCUMENTS/E57 exome sequencing/2018-08-30_results/Exome Sequencing/COMBINED list Stratton plus Caldas.txt', what = character())
patient.id <- 'LM002'

loci.file <- file.path(pyclone.directory, patient.id, 'output', 'tables', 'annotated_loci.tsv')
loci <- read.table(loci.file, header = TRUE, sep = '\t', stringsAsFactors = FALSE)

sample.sheet <- read.table(sample.sheet.file, header = TRUE, sep = '\t', stringsAsFactors = FALSE)

clonevol.data <- loci %>%
mutate(
vaf = 100*cellular_prevalence/2,
is.driver = symbol %in% cancer.genes & 'exonic' == func & 'synonymous_SNV' != exonic_func
) %>%
select(mutation_id, cluster_id, sample_id, vaf, symbol, is.driver) %>%
spread(sample_id, vaf);

n.samples <- length( unique(loci$sample_id) )
if( 1 == n.samples ) stop('Need more than one sample for ClonEvol!')

cluster.counts <- loci %>%
group_by(cluster_id) %>%
summarize(
count = n()/n.samples,
min.ccf = min(cellular_prevalence),
median.ccf = median(cellular_prevalence),
mean.ccf = mean(cellular_prevalence)
) %>%
ungroup() %>%
filter(count >= min.mutation.count) %>%
arrange(-median.ccf)

recode.values <- 1:nrow(cluster.counts)
names(recode.values) <- as.character(cluster.counts$cluster_id)

clonevol.data <- clonevol.data %>%
select(-mutation_id) %>%
filter(cluster_id %in% cluster.counts$cluster_id) %>%
mutate(cluster = recode.values[ as.character(cluster_id) ] )

@hdng
Copy link
Owner

hdng commented Oct 31, 2018

Hi @amanda-fitz,

I am not sure if I understand your question completely, but:

(1) clonevol doesn't perform clustering. It takes the clustering from pyclone and reconstruct the concensus clonal evolution tree and estimates the clonal admixture for individual samples.

(2) clonevol can use/estimate both median/mean CCF. There is a parameter called cluster.center in infer.clonal.models function that takes either a string "mean" or "median".

@amanda-fitz
Copy link
Author

amanda-fitz commented Nov 1, 2018 via email

@hdng
Copy link
Owner

hdng commented Nov 1, 2018

  1. ClonEvol doesn't reassign cluster IDs, so the IDs in all plots should match those from Pyclone. One thing I should note is that ClonEvol requires contiguous integer as cluster IDs with "1" set as founding cluster.

  2. If you are looking for (the confidence interval of) CCF estimate of the clones within individual samples, these are encoded in the output data frame for the model, eg.

y = infer.clonal.models(...)

# CCF of clones in the first tree,
y$matched$merged.trees[[1]]$sample.with.nonzero.cell.frac.ci

@guoxueyu
Copy link

hi @amanda-fitz @hdng
when i use ClonEvol to visualize my data(two samples),i encounter the problem @amanda-fitz described above,why the Cluster number in ClonEvol aml1$variant is the same in P and R? and how can i build the input data using the two different samples's Pyclone clustring results given that they have different cluster name?

best.

here are the resultls of the two sample :
(1) the cluster results of sample_one :
Gene_site CCF CCF_id Cluster_id
SPTBN4_19:40993610 0.000577717970064617 0.004905701953290807 0
SSSCA1_11:65339085 0.000606208481679261 0.007941526831518725 0
STARD10_11:72466059 0.0005419499192131997 0.00166324568158397 0
STRIP2_7:129098248 0.0005328553104007415 0.0009215148845033685 0
TLN2_15:63054620 0.0005334327589028977 0.0010294553920328 0
TLR3;FAM149A_4:187038619 0.0006052806746053058 0.007838195365478317 0
TMEM200A_6:130762164 0.0005869335930044629 0.005927266391566769 0
TMTC2_12:83358854 0.0006003953976641044 0.007264025382633757 0
TOP2B_3:25671580 0.0005748332535731366 0.0047982396078397925 0
TRPM2_21:45795753 0.0006065139423596613 0.007957071492527208 0
TTN_2:179664351 0.0005664532000250138 0.003969393376836105 0
TUBB4A_19:6495416 0.0005567039496530104 0.003026979374553545 0
ZBTB12_6:31868133 0.0006164475427949428 0.008866946468675582 0
ZIC4_3:147109962 0.0005951498207536907 0.006814823106293941 0
ZKSCAN3_6:28327605 0.0005654365502695898 0.0036983320975591282 0
ZNF343_20:2464396 0.0005499692553414132 0.002405898570181837 0
ZNF701_19:53086167 0.0005536185073358852 0.002725997394167213 0
ZNF749_19:57956145 0.0006049772129360563 0.007697991223585969 0
ZNF841_19:52569832 0.0006196443033787964 0.00921261354807395 0
ZNF843_16:31447342 0.0005366334034009819 0.0012061584107274804 0
GPR83_11:94129586 0.0006347704608802662 0.006913299009497001 1

(2) the cluster results of sample_two :
Gene_site CCF CCF_id Cluster_id
TOP2B_3:25671580 0.2679976638493413 0.08677916938910878 0
TRPC3;KIAA1109_4:123075318 0.2700058093836503 0.08390908294681121 0
TUBB4A_19:6495416 0.249111904572308 0.06286147278161541 0
ZIC4_3:147109962 0.2595193323398667 0.07319550876698379 0
ZNF343_20:2464396 0.2904487190531294 0.11464440159241955 0
ZNF701_19:53086167 0.2564645807826187 0.07008302067765426 0
ZNF749_19:57956145 0.2824362473561567 0.07832321938377844 0
ZNF841_19:52569832 0.26211001388249444 0.07244177517264004 0
MSR1_8:16001067 0.4011107153840252 0.10934799632753751 1
SCNN1G_16:23226531 0.4166061871271859 0.11698129469678656 1
IGF2R_6:160467624 0.294294450939142 0.07744067217978816 2
ADGRG7_3:100373931 0.29995597026745097 0.07158426019495136 3
FBXL19_16:30941644 0.3963851144129694 0.11279595292023216 4
PTPRT_20:40827887 0.39031419763822717 0.1013499009563431 4
ZNF843_16:31447342 0.41546292701815574 0.12776067211779962 4
OR52A1_11:5172907 0.3001542295655507 0.07641573409434724 5
MAT1A_10:82040067 0.29114574474311355 0.07695134578642827 6
DLG1_3:196910782 0.3503502187183945 0.14386002892864064 7
SERPINB12_18:61231325 0.3544039453594735 0.12744488185490602 8
SMAD4_18:48593504 0.3751574425813409 0.11794786469061391 9
POLN_4:2172442 0.2941529871391146 0.07363903851906502 10
NAPG;LINC01887_18:10605605 0.3134554089627867 0.10826041109866213 11
ZBTB12_6:31868133 0.6114864929361972 0.02881686518027521 12
C17orf99_17:76161546 0.2950922376368506 0.0773892683946796 13
MYLK4_6:2683370 0.30251509465459386 0.0671273900328978 14
PHF20L1_8:133824901 0.27898538053218513 0.06688004361287586 15
GPR83_11:94129586 0.28794296706965705 0.07382748745568672 16
ZKSCAN3_6:28327605 0.5372857320231438 0.06788054734282246 17
GLDN_15:51696672 0.06772532068056986 0.030602432692736777 18
HLA-C_6:31239613 0.06388765142821265 0.02212462084329112 18
MYOD1_11:17741386 0.06654192335768135 0.027226858496019704 18
LRP1B_2:141291599 0.29400052351621403 0.07200863303482705 19
SOX4_6:21595694 0.29829178384604516 0.06795922586380174 20
SNHG28_1:159805417 0.29787811581751417 0.0782328817670741 21
MPPED2_11:30432357 0.28472389887067157 0.06739179734110781 22
KLHL40_3:42727844 0.25080231841198364 0.11814284530406437 23
TP53_17:7578212 0.32269285539730724 0.12890300392747847 24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants