Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to compare immunarch objects with repOverlap? #40

Closed
diyaazak opened this issue Feb 7, 2020 · 14 comments
Closed

Is it possible to compare immunarch objects with repOverlap? #40

diyaazak opened this issue Feb 7, 2020 · 14 comments
Assignees
Labels
tag:Environment Installation tag:Single-cell Single-cell-specific tickets type:Question Further information is requested

Comments

@diyaazak
Copy link

diyaazak commented Feb 7, 2020

Hello,

Thanks a lot for the great tool.
As the title says, Is it possible to compare immunarch objects with repOverlap?
I created several immunarch objects from a 10x dataset using the "filter_barcode" function based on clusters I got from UMAP clustering of the cells by Seurat.

I can now look at the repertoire diversity per cluster, but I want to check the overlap between clusters too. Is there a way to do this?

Thanks a lot.

@vadimnazarov
Copy link
Contributor

Hi @diyaazak

sorry for the very late response!

Can you please explain what do you mean by "immunarch objects"?

@vadimnazarov vadimnazarov added status:In progress ⏳ Work in progress tag:Environment Installation type:Question Further information is requested labels Apr 12, 2020
@vadimnazarov
Copy link
Contributor

Ping @diyaazak about comparing single-cell repertoires in immunarch

@vadimnazarov vadimnazarov added status:Waiting for data 📁 Waiting for data sample to reproduce and removed status:In progress ⏳ Work in progress labels May 16, 2020
@diyaazak
Copy link
Author

Hi @vadimnazarov

I'm really sorry I haven't been checking this post for some time..
I'm not sure if "immunarch object" is the right term here. But what I basically do is the follows:

#load the 10x data
Bcell_immunarch = repLoad("...path/filtered_contig_annotations_TRUE.csv")

#I extract the barcodes for each cluster of cells defined by transcriptomic analysis (by Seurat for example):
Cluster_1 = c(Cluster_1_barcodes)
Cluster_2 = c(Cluster_2_barcodes)

#Cluster_1 immunarch "object"
VDJ_Cluster_1=list()
VDJ_Cluster_1$data=filter_barcodes(Bcell_immunarch$data, Cluster_1)

#Cluster_2 immunarch "object"
VDJ_Cluster_2=list()
VDJ_Cluster_2$data=filter_barcodes(Bcell_immunarch$data, Cluster_2)

#I can now use functions like geneUsage, repClonality, etc.. but only looking at each cluster separately. For example:
VDJ_Cluster_1=geneUsage(VDJ_Cluster_1$data$filtered_contig_annotations_TRUE_IGH, .norm = T)
vis(VDJ_Cluster_1)

What I would like to do for examlpe, is use functions like repOverlap to compare VDJ_Cluster_1 to VDJ_Cluster_2, or compare the gene usage between Clusters. Is there a way to do that?

Thanks a lot!

@vadimnazarov
Copy link
Contributor

Hi @diyaazak

Thank you for getting back to us, our team appreciates that!

Currently you need to create a metadata table by hand and put your data in a common immunarch format as a list of two elements - data and meta. In your case:

  1. Load the data
  2. Filter by barcodes
  3. Create new list with your data and metadata:
VDJ_Cluster_1 <- filter_barcodes(Bcell_immunarch$data[[1]], Cluster_1)
VDJ_Cluster_2 <- filter_barcodes(Bcell_immunarch$data[[1]], Cluster_2)
im_clusters <- list(data = list(VDJ_Cluster_1, VDJ_Cluster_2))
metadata <- data.frame(Sample = c("VDJ_Cluster_1", "VDJ_Cluster_2"), stringsAsFactors=F) # add any additional metadata here
im_clusters$meta <- metadata
repOverlap(im_clusters$data)

You can wrap it in a simple function to work with any number of clusters.

Full single-cell support (seamless stratification to patients and clusters by barcodes, paired chain analysis, etc.) is our next major milestone, so your feedback on your data analysis routines and goals is much appreciated!

@diyaazak
Copy link
Author

Thanks a lot for the quick feedback!
I tried it but when I run repOverlap it doesn't seem to work:

rO=repOverlap(im_clusters$data)
rO
[1] 1
attr(,"class")
[1] "numeric" "immunr_ov_matrix"
vis(rO)

Rplot

This is how my data and metadata list looks like:

head(im_clusters)
$data
$data[[1]]

A tibble: 616 x 20

Clones Proportion CDR3.nt CDR3.aa V.name D.name J.name V.end D.start D.end J.start VJ.ins VD.ins DJ.ins Sequence chain

1 2 0.00324 TGTGCA… CARRGI… IGHV1… None IGHJ3 NA NA NA NA NA NA NA TGTGCAA… IGH
2 1 0.00162 TGTGCA… CARWGE… IGHV1… None IGHJ2 NA NA NA NA NA NA NA TGTGCAA… IGH
3 2 0.00324 TGTGCA… CARSQG… IGHV1… None IGHJ1 NA NA NA NA NA NA NA TGTGCAA… IGH

… with 606 more rows, and 4 more variables: Barcode , RawClonotypeID , RawConsensusID , ContigID

$data[[2]]

A tibble: 522 x 20

Clones Proportion CDR3.nt CDR3.aa V.name D.name J.name V.end D.start D.end J.start VJ.ins VD.ins DJ.ins Sequence chain

1 2 0.00379 TGTGCA… CARHAY… IGHV1… None IGHJ4 NA NA NA NA NA NA NA TGTGCAA… IGH
2 2 0.00379 TGTGCA… CARRPW… IGHV1… None IGHJ2 NA NA NA NA NA NA NA TGTGCAA… IGH
3 2 0.00379 TGTGCA… CARDRP… IGHV5… IGHD1… IGHJ2 NA NA NA NA NA NA NA TGTGCAA… IGH

… with 512 more rows, and 4 more variables: Barcode , RawClonotypeID , RawConsensusID , ContigID

$meta
Sample
1 VDJ_Cluster_1
2 VDJ_Cluster_2

Also, geneUsage doesn't seem to work:

IgH=geneUsage(im_clusters$data, .norm = T)
Error: x must be a vector, not a tbl_df/tbl/data.frame/immunr_gene_usage object.

I wonder what I might be doing wrong.
Thanks for your support!

@vadimnazarov
Copy link
Contributor

vadimnazarov commented May 19, 2020

Hi @diyaazak

I see, thank you. I have questions regarding the barcode preparations.
How do you get barcodes for specific clusters from Seurat?
If you have different patients, how do you store the data about barcode/patient pairs?

Regarding the analysis. Can you please name your clusters:

names(im_clusters$data) <- c("VDJ_Cluster_1", "VDJ_Cluster_2")

And try again? Immunarch requires named samples.

@vadimnazarov vadimnazarov added the tag:Single-cell Single-cell-specific tickets label May 19, 2020
@vadimnazarov
Copy link
Contributor

Hi @diyaazak

We made a major improvements to the package, and now you can easily split datasets by clusters and barcodes. Here is more info: https://immunarch.com/articles/web_only/v21_singlecell.html

Please let me know if it's what you seek, if it's work or if you have any other questions. Your input is much appreciated!

@diyaazak
Copy link
Author

diyaazak commented Jul 1, 2020

Hi @vadimnazarov
Thanks a lot for the follow-up and I am sorry for not writing you earlier. This worked finally!
I tried most of the functions and they work fine. I think only plotting geneUsage wouldn't make sense now that the chains are paired (you get too many).
I didn't get how the data was structured in the provided "scdata" example though. I prepared my data as follows in the end:

VDJ <- repLoad(.path = "...path/filtered_contig_annotations_TRUE.csv", .mode = "paired")
Cluster1 <- c("Cluster1_barcodes") #I extract the barcodes manually from Seurat metadata
Cluster2 <- c("Cluster2_barcodes")
VDJ_Cluster_1 <- select_barcodes(VDJ$data[[1]], Cluster1)
VDJ_Cluster_2 <- select_barcodes(VDJ$data[[1]], Cluster2)

im_clusters <- list(data = list(VDJ_Cluster_1, VDJ_Cluster_2))
names(im_clusters$data) <- c("VDJ_Cluster_1", "VDJ_Cluster_2")
metadata <- data.frame(Sample = c("VDJ_Cluster_1", "VDJ_Cluster_2"), stringsAsFactors=F) 
im_clusters$meta <- metadata

Is this the way you would perform it or is there a more straightforward way?

I tried running select_clusters as follows but it doesn't seem to work:

scdata_pat <- select_clusters(VDJ, my_seurat_object$seurat_clusters )

Thanks!

@vadimnazarov
Copy link
Contributor

Hi @diyaazak

thank you! Can you print my_seurat_object$seurat_clusters into the Console and send me the output, please?

Can you also send me the error message after scdata_pat <- select_clusters(VDJ, my_seurat_object$seurat_clusters ), please?

@diyaazak
Copy link
Author

diyaazak commented Jul 2, 2020

head(my_seurat_object$seurat_clusters)
AAACCTGAGAAGATTC AAACCTGAGAGACGAA AAACCTGAGGGCTCTC AAACCTGCAAACAACA
2 0 1 3
AAACCTGCACCCATTC AAACCTGCACCGAAAG
1 0
Levels: 0 1 2 3 4 5 6 7 8 9 10 11

Now that I look at it, it could be because the barcodes in the filtered_contig_annotations.csv file have a -1 at the end while the barcodes from a Seurat object lack that. I only started learning R recently so I was just doing this manually in an excel sheet before using select_barcodes

I don't get an error when I run scdata_pat <- select_clusters(VDJ, my_seurat_object$seurat_clusters ). I just get a list with empty data..the metadata is there though:

scdata_cl
$data
$data$filtered_contig_annotations_0
[1] CDR3.nt CDR3.aa V.name D.name
[5] J.name V.end D.start D.end
[9] J.start VJ.ins VD.ins DJ.ins
[13] Sequence chain raw_clonotype_id ContigID
[17] Clones Barcode Proportion
<0 rows> (or 0-length row.names)

$data$filtered_contig_annotations_1
[1] CDR3.nt CDR3.aa V.name D.name
[5] J.name V.end D.start D.end
[9] J.start VJ.ins VD.ins DJ.ins
[13] Sequence chain raw_clonotype_id ContigID
[17] Clones Barcode Proportion
<0 rows> (or 0-length row.names)

$data$filtered_contig_annotations_2
[1] CDR3.nt CDR3.aa V.name D.name
[5] J.name V.end D.start D.end
[9] J.start VJ.ins VD.ins DJ.ins
[13] Sequence chain raw_clonotype_id ContigID
[17] Clones Barcode Proportion
<0 rows> (or 0-length row.names)

$data$filtered_contig_annotations_3
[1] CDR3.nt CDR3.aa V.name D.name
[5] J.name V.end D.start D.end
[9] J.start VJ.ins VD.ins DJ.ins
[13] Sequence chain raw_clonotype_id ContigID
[17] Clones Barcode Proportion
<0 rows> (or 0-length row.names)

$data$filtered_contig_annotations_4
[1] CDR3.nt CDR3.aa V.name D.name
[5] J.name V.end D.start D.end
[9] J.start VJ.ins VD.ins DJ.ins
[13] Sequence chain raw_clonotype_id ContigID
[17] Clones Barcode Proportion
<0 rows> (or 0-length row.names)

$data$filtered_contig_annotations_5
[1] CDR3.nt CDR3.aa V.name D.name
[5] J.name V.end D.start D.end
[9] J.start VJ.ins VD.ins DJ.ins
[13] Sequence chain raw_clonotype_id ContigID
[17] Clones Barcode Proportion
<0 rows> (or 0-length row.names)

$data$filtered_contig_annotations_6
[1] CDR3.nt CDR3.aa V.name D.name
[5] J.name V.end D.start D.end
[9] J.start VJ.ins VD.ins DJ.ins
[13] Sequence chain raw_clonotype_id ContigID
[17] Clones Barcode Proportion
<0 rows> (or 0-length row.names)

$data$filtered_contig_annotations_7
[1] CDR3.nt CDR3.aa V.name D.name
[5] J.name V.end D.start D.end
[9] J.start VJ.ins VD.ins DJ.ins
[13] Sequence chain raw_clonotype_id ContigID
[17] Clones Barcode Proportion
<0 rows> (or 0-length row.names)

$data$filtered_contig_annotations_8
[1] CDR3.nt CDR3.aa V.name D.name
[5] J.name V.end D.start D.end
[9] J.start VJ.ins VD.ins DJ.ins
[13] Sequence chain raw_clonotype_id ContigID
[17] Clones Barcode Proportion
<0 rows> (or 0-length row.names)

$data$filtered_contig_annotations_9
[1] CDR3.nt CDR3.aa V.name D.name
[5] J.name V.end D.start D.end
[9] J.start VJ.ins VD.ins DJ.ins
[13] Sequence chain raw_clonotype_id ContigID
[17] Clones Barcode Proportion
<0 rows> (or 0-length row.names)

$data$filtered_contig_annotations_10
[1] CDR3.nt CDR3.aa V.name D.name
[5] J.name V.end D.start D.end
[9] J.start VJ.ins VD.ins DJ.ins
[13] Sequence chain raw_clonotype_id ContigID
[17] Clones Barcode Proportion
<0 rows> (or 0-length row.names)

$data$filtered_contig_annotations_11
[1] CDR3.nt CDR3.aa V.name D.name
[5] J.name V.end D.start D.end
[9] J.start VJ.ins VD.ins DJ.ins
[13] Sequence chain raw_clonotype_id ContigID
[17] Clones Barcode Proportion
<0 rows> (or 0-length row.names)

$meta

A tibble: 12 x 3

Sample Cluster.source Cluster

1 filtered_contig_annotations_0 filtered_contig_annotations 0
2 filtered_contig_annotations_1 filtered_contig_annotations 1
3 filtered_contig_annotations_2 filtered_contig_annotations 2
4 filtered_contig_annotations_3 filtered_contig_annotations 3
5 filtered_contig_annotations_4 filtered_contig_annotations 4
6 filtered_contig_annotations_5 filtered_contig_annotations 5
7 filtered_contig_annotations_6 filtered_contig_annotations 6
8 filtered_contig_annotations_7 filtered_contig_annotations 7
9 filtered_contig_annotations_8 filtered_contig_annotations 8
10 filtered_contig_annotations_9 filtered_contig_annotations 9
11 filtered_contig_annotations_10 filtered_contig_annotations 10
12 filtered_contig_annotations_11 filtered_contig_annotations 11

Thanks!

@vadimnazarov
Copy link
Contributor

Hi @diyaazak

It definitely can be a case since barcode matching is straightforward and not "smart" to check for "-1" etc. You can easily add "-1" to your barcodes by (note that I don't change the Seurat object here):

barcode_vec <- my_seurat_object$seurat_clusters
names(barcode_vec) <- paste0(names(barcode_vec), "-1")
scdata_pat <- select_clusters(VDJ, barcode_vec)

Thank you for pointing us to that issue, we will fix it by removing "-1" in the next release for the convenience

@vadimnazarov
Copy link
Contributor

Hi @diyaazak

Did it work?

@diyaazak
Copy link
Author

Hi @vadimnazarov
Yes it did! Thanks for the great help.

@Alexander230
Copy link
Collaborator

Alexander230 commented Jul 13, 2021

Hello,
My name is Aleksandr Popov, I am a developer of the Immunarch package. Thank you for using our software!

There was no activity on this issue for a while, so we plan to close it in 1 month. You are welcome to comment and reopen the issue if there are still unresolved questions.

@Alexander230 Alexander230 removed the status:Waiting for data 📁 Waiting for data sample to reproduce label Dec 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tag:Environment Installation tag:Single-cell Single-cell-specific tickets type:Question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants