cluster.counts #25

amanda-fitz · 2018-10-31T07:50:47Z

Hi there, can anyone help?
I have two things I'd like to add to my cluster.counts data file.

cluster number assigned by ClonEvol (1,2,3,4 etc) to data table cluster.counts - currently my cluster.counts data table only has the PyClone cluster ID.
Median CCF per sample (currently just median.ccf for each cluster)
[LM002_cluster.counts.xlsx]

Attached example of my cluster.counts file
(https://github.com/hdng/clonevol/files/2533239/LM002_cluster.counts.xlsx)

Here is my script:

pyclone.directory <- '/Users/amandafitzpatrick/Library/Mobile Documents/com~~apple~~CloudDocs/DOCUMENTS/E57 exome sequencing/2018-08-30_results_ascat_pyclone/pyclone'
output.directory <- '/Users/amandafitzpatrick/Library/Mobile Documents/com~~apple~~CloudDocs/DOCUMENTS/E57 exome sequencing/2018-08-30_results_ascat_pyclone'
sample.sheet.file <- 'sample_annotation.txt'

min.mutation.count <- 30
cancer.genes <- scan('/Users/amandafitzpatrick/Library/Mobile Documents/com~~apple~~CloudDocs/DOCUMENTS/E57 exome sequencing/2018-08-30_results/Exome Sequencing/COMBINED list Stratton plus Caldas.txt', what = character())
patient.id <- 'LM002'

loci.file <- file.path(pyclone.directory, patient.id, 'output', 'tables', 'annotated_loci.tsv')
loci <- read.table(loci.file, header = TRUE, sep = '\t', stringsAsFactors = FALSE)

sample.sheet <- read.table(sample.sheet.file, header = TRUE, sep = '\t', stringsAsFactors = FALSE)

clonevol.data <- loci %>%
mutate(
vaf = 100*cellular_prevalence/2,
is.driver = symbol %in% cancer.genes & 'exonic' == func & 'synonymous_SNV' != exonic_func
) %>%
select(mutation_id, cluster_id, sample_id, vaf, symbol, is.driver) %>%
spread(sample_id, vaf);

n.samples <- length( unique(loci$sample_id) )
if( 1 == n.samples ) stop('Need more than one sample for ClonEvol!')

cluster.counts <- loci %>%
group_by(cluster_id) %>%
summarize(
count = n()/n.samples,
min.ccf = min(cellular_prevalence),
median.ccf = median(cellular_prevalence),
mean.ccf = mean(cellular_prevalence)
) %>%
ungroup() %>%
filter(count >= min.mutation.count) %>%
arrange(-median.ccf)

recode.values <- 1:nrow(cluster.counts)
names(recode.values) <- as.character(cluster.counts$cluster_id)

clonevol.data <- clonevol.data %>%
select(-mutation_id) %>%
filter(cluster_id %in% cluster.counts$cluster_id) %>%
mutate(cluster = recode.values[ as.character(cluster_id) ] )

hdng · 2018-10-31T18:31:30Z

Hi @amanda-fitz,

I am not sure if I understand your question completely, but:

(1) clonevol doesn't perform clustering. It takes the clustering from pyclone and reconstruct the concensus clonal evolution tree and estimates the clonal admixture for individual samples.

(2) clonevol can use/estimate both median/mean CCF. There is a parameter called cluster.center in infer.clonal.models function that takes either a string "mean" or "median".

amanda-fitz · 2018-11-01T09:04:40Z

Hi thanks for your reply and explanations. My question is actually very simple but perhaps I didn't explain well. I would like a numerical output for the variant cluster plot. So from the example below [X] I would like the Cluster number (i.e. cluster number assigned by ClonEvol, on here 1,2,3, etc, which I understand comes from pyclone cluster just assigned a new ID) and for each cluster, the median CCF by sample type. My script generates a 'cluster.counts' data file but it contains only a single median CCF output and the pyclone cluster ID. I imagine it would be straightforward to obtain a data file given this data is used to make the cluster plot? ?

…

________________________________ From: Ha X. Dang <notifications@github.com> Sent: 31 October 2018 18:49 To: hdng/clonevol Cc: Amanda Fitzpatrick; Mention Subject: Re: [hdng/clonevol] cluster.counts (#25) Hi @amanda-fitz<https://github.com/amanda-fitz>, I am not sure if I understand your question completely, but: (1) clonevol doesn't perform clustering. It takes the clustering from pyclone and reconstruct the concensus clonal evolution tree and estimates the clonal admixture for individual samples. (2) clonevol can use/estimate both median/mean CCF. There is a parameter called cluster.center in infer.clonal.models function that takes either a string "mean" or "median". - You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#25 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AqkPftadSP2sl3zWlIMj4zYgJX7MY4vMks5uqfDGgaJpZM4YDhlT>. The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.

hdng · 2018-11-01T15:20:18Z

ClonEvol doesn't reassign cluster IDs, so the IDs in all plots should match those from Pyclone. One thing I should note is that ClonEvol requires contiguous integer as cluster IDs with "1" set as founding cluster.
If you are looking for (the confidence interval of) CCF estimate of the clones within individual samples, these are encoded in the output data frame for the model, eg.

y = infer.clonal.models(...)

# CCF of clones in the first tree,
y$matched$merged.trees[[1]]$sample.with.nonzero.cell.frac.ci

guoxueyu · 2019-08-28T09:11:56Z

hi @amanda-fitz @hdng
when i use ClonEvol to visualize my data(two samples),i encounter the problem @amanda-fitz described above,why the Cluster number in ClonEvol aml1$variant is the same in P and R? and how can i build the input data using the two different samples's Pyclone clustring results given that they have different cluster name?

best.

here are the resultls of the two sample :
(1) the cluster results of sample_one :
Gene_site CCF CCF_id Cluster_id
SPTBN4_19:40993610 0.000577717970064617 0.004905701953290807 0
SSSCA1_11:65339085 0.000606208481679261 0.007941526831518725 0
STARD10_11:72466059 0.0005419499192131997 0.00166324568158397 0
STRIP2_7:129098248 0.0005328553104007415 0.0009215148845033685 0
TLN2_15:63054620 0.0005334327589028977 0.0010294553920328 0
TLR3;FAM149A_4:187038619 0.0006052806746053058 0.007838195365478317 0
TMEM200A_6:130762164 0.0005869335930044629 0.005927266391566769 0
TMTC2_12:83358854 0.0006003953976641044 0.007264025382633757 0
TOP2B_3:25671580 0.0005748332535731366 0.0047982396078397925 0
TRPM2_21:45795753 0.0006065139423596613 0.007957071492527208 0
TTN_2:179664351 0.0005664532000250138 0.003969393376836105 0
TUBB4A_19:6495416 0.0005567039496530104 0.003026979374553545 0
ZBTB12_6:31868133 0.0006164475427949428 0.008866946468675582 0
ZIC4_3:147109962 0.0005951498207536907 0.006814823106293941 0
ZKSCAN3_6:28327605 0.0005654365502695898 0.0036983320975591282 0
ZNF343_20:2464396 0.0005499692553414132 0.002405898570181837 0
ZNF701_19:53086167 0.0005536185073358852 0.002725997394167213 0
ZNF749_19:57956145 0.0006049772129360563 0.007697991223585969 0
ZNF841_19:52569832 0.0006196443033787964 0.00921261354807395 0
ZNF843_16:31447342 0.0005366334034009819 0.0012061584107274804 0
GPR83_11:94129586 0.0006347704608802662 0.006913299009497001 1

(2) the cluster results of sample_two :
Gene_site CCF CCF_id Cluster_id
TOP2B_3:25671580 0.2679976638493413 0.08677916938910878 0
TRPC3;KIAA1109_4:123075318 0.2700058093836503 0.08390908294681121 0
TUBB4A_19:6495416 0.249111904572308 0.06286147278161541 0
ZIC4_3:147109962 0.2595193323398667 0.07319550876698379 0
ZNF343_20:2464396 0.2904487190531294 0.11464440159241955 0
ZNF701_19:53086167 0.2564645807826187 0.07008302067765426 0
ZNF749_19:57956145 0.2824362473561567 0.07832321938377844 0
ZNF841_19:52569832 0.26211001388249444 0.07244177517264004 0
MSR1_8:16001067 0.4011107153840252 0.10934799632753751 1
SCNN1G_16:23226531 0.4166061871271859 0.11698129469678656 1
IGF2R_6:160467624 0.294294450939142 0.07744067217978816 2
ADGRG7_3:100373931 0.29995597026745097 0.07158426019495136 3
FBXL19_16:30941644 0.3963851144129694 0.11279595292023216 4
PTPRT_20:40827887 0.39031419763822717 0.1013499009563431 4
ZNF843_16:31447342 0.41546292701815574 0.12776067211779962 4
OR52A1_11:5172907 0.3001542295655507 0.07641573409434724 5
MAT1A_10:82040067 0.29114574474311355 0.07695134578642827 6
DLG1_3:196910782 0.3503502187183945 0.14386002892864064 7
SERPINB12_18:61231325 0.3544039453594735 0.12744488185490602 8
SMAD4_18:48593504 0.3751574425813409 0.11794786469061391 9
POLN_4:2172442 0.2941529871391146 0.07363903851906502 10
NAPG;LINC01887_18:10605605 0.3134554089627867 0.10826041109866213 11
ZBTB12_6:31868133 0.6114864929361972 0.02881686518027521 12
C17orf99_17:76161546 0.2950922376368506 0.0773892683946796 13
MYLK4_6:2683370 0.30251509465459386 0.0671273900328978 14
PHF20L1_8:133824901 0.27898538053218513 0.06688004361287586 15
GPR83_11:94129586 0.28794296706965705 0.07382748745568672 16
ZKSCAN3_6:28327605 0.5372857320231438 0.06788054734282246 17
GLDN_15:51696672 0.06772532068056986 0.030602432692736777 18
HLA-C_6:31239613 0.06388765142821265 0.02212462084329112 18
MYOD1_11:17741386 0.06654192335768135 0.027226858496019704 18
LRP1B_2:141291599 0.29400052351621403 0.07200863303482705 19
SOX4_6:21595694 0.29829178384604516 0.06795922586380174 20
SNHG28_1:159805417 0.29787811581751417 0.0782328817670741 21
MPPED2_11:30432357 0.28472389887067157 0.06739179734110781 22
KLHL40_3:42727844 0.25080231841198364 0.11814284530406437 23
TP53_17:7578212 0.32269285539730724 0.12890300392747847 24

hdng added the help wanted label Oct 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster.counts #25

cluster.counts #25

amanda-fitz commented Oct 31, 2018

hdng commented Oct 31, 2018

amanda-fitz commented Nov 1, 2018 via email

hdng commented Nov 1, 2018 •

edited

Loading

guoxueyu commented Aug 28, 2019

cluster.counts #25

cluster.counts #25

Comments

amanda-fitz commented Oct 31, 2018

hdng commented Oct 31, 2018

amanda-fitz commented Nov 1, 2018 via email

hdng commented Nov 1, 2018 • edited Loading

guoxueyu commented Aug 28, 2019

hdng commented Nov 1, 2018 •

edited

Loading