Skip to content

Collection of Tumor-Infiltrating Lymphocyte Single-Cell Experiments with TCR

License

Notifications You must be signed in to change notification settings

ncborcherding/utility

Repository files navigation

uTILity

Comprehensive collection of Single-Cell Tumor-Infiltrating Lymphocyte Data

Introduction

The original intent of assembling a data set of publicly-available tumor-infiltrating T cells (TILs) with paired TCR sequencing was to expand and improve the scRepertoire R package. However, after some discussion, we decided to release the data set for everyone, a complete summary of the sequencing runs and the sample information can be found in the meta data of the Seurat object.

This involves several steps 1) loading the respective GE data, 2) harmonizing the data by sample and cohort information, 3) iterating through automatic annotation, and 4) adding the TCR information. This information is stored in the meta data of the Seurat objects - an explanation of each variable is available here.

Folder Structure

├── code
│   ├── Processing_Utility.Rmd - general processing script
│   └── Summarize_Data.Rmd - script to get summary data
├── data
│   ├── SequencingRuns - 10x Outputs
│   └── processedData - Processed .rds and larger combined cohorts
├── NEWS.txt - changes made
├── outputs
│   └── qc - plots for quality control purposes
├── README.md
└── summaryInfo
    ├── TcellSummaryTable.csv
    ├── cohortSummaryTable.csv
    ├── meta.data.headers.txt - what the meta data headers mean
    ├── sample.directory.xlsx - all the available data for the cohort
    ├── sessionInfo.txt - what I am running in terms of the pipeline
    └── tumorSummaryTable.csv

Sample ID:

Cohort Information

Here is the current list of data sources, the number of cells that passed filtering by tissue type. Please cite the data if you are using uTILity.

Tumor Normal Blood Juxta LN Met Cancer Type Citations
CCR-20-4394 26760 0 0 0 0 0 Ovarian cite
EGAS00001004809 181667 0 0 0 0 0 Breast cite
GSE114724 27651 0 0 0 0 0 Breast cite
GSE121636 11436 0 12319 0 0 0 Renal cite
GSE123814 78034 0 0 0 0 0 Multiple cite
GSE139555 93160 78625 25363 0 0 0 Multiple cite
GSE145370 66592 40916 0 0 0 0 Esophageal cite
GSE148190 2263 0 6201 0 15644 0 Melanoma cite
GSE154826 14491 13414 0 0 0 0 Lung cite
GSE159251 8356 0 47721 0 5705 0 Melanoma cite
GSE162500 14644 0 23401 3761 0 0 Lung cite
GSE164522 36990 86811 46027 0 46376 36648 Colorectal cite
GSE168844 0 0 55302 0 0 0 Lung cite
GSE176021 436609 128411 132673 0 71063 32011 Lung cite
GSE179994 78574 0 0 0 0 62341 Lung cite
GSE180268 23215 0 0 0 29699 0 HNSCC cite
GSE181061 40429 27622 37426 0 0 0 Renal cite
GSE185206 163294 17231 0 0 9820 0 Lung cite
GSE195486 122512 0 0 0 0 0 Ovarian cite
GSE200218 0 0 0 0 0 18495 Melanoma cite
GSE200996 86235 0 152722 0 0 0 HNSCC cite
GSE201425 22888 0 27781 0 11350 12253 Biliary cite
GSE211504 0 0 33685 0 0 0 Melanoma cite
GSE212217 0 0 229505 0 0 0 Endometrial cite
GSE213243 2835 0 18363 0 0 2693 Ovarian cite
GSE215219 26303 0 66000 0 0 0 Lung cite
GSE227708 53087 0 0 0 0 0 Merkel Cell cite
GSE242477 41595 0 21595 0 0 0 Melanoma cite
PRJNA705464 98892 15113 30340 0 3505 0 Renal cite

Methods

Single-Cell Data Processing

The filtered gene matrices output from Cell Ranger align function from individual sequencing runs (10x Genomics, Pleasanton, CA) loaded into the R global environment. For each sequencing run cell barcodes were appended to contain a unique prefix to prevent issues with duplicate barcodes. The results were then ported into individual Seurat objects (citation), where the cells with > 10% mitochondrial genes and/or 2.5x standard deviation from the mean of features were excluded for quality control purposes. At the individual sequencing run level, doublets were estimated using the scDblFinder (v1.4.0) R package.

Annotation of Cells

Automatic annotation was performed using the singler (v2.2.0) R package (citation) with the HPCA (citation) and DICE (citation) data sets as references and the fine label discriminators. Individual sequencing runs were subsetted to run through the singleR algorithm in order to reduce memory demands. The output of all the singleR analyses were collated and appended to the meta data of the seurat object. Likewise, the Azimuth (v0.4.6.9004) R Package (citation was used for automatic annotation as a partially orthogonal approach.

Addition of TCR data

The filtered contig annotation T cell receptor (TCR) data for available sequencing runs were loaded into the R global environment. Individual contigs were combined using the combineTCR() function of scRepertoire (v2.0.0) R Package (citation). Clonotypes were assigned to barcodes and were multiple duplicate chains for individual cells were filtered to select for the top expressing contig by read count. The clonotype data was then added to the Seurat Object with proportion across individual patients being used to calculate frequency.

Session Info

Session Info for the initial data processing and analysis can be found here.


Citations

As of right now, there is no citation associated with the assembled data set. However if using the data, please find the corresponding manuscript for each data set summarized above or can be found in the summary table. In addition, if using the processed data, feel free to modify the language in the methods section (above) and please cite the appropriate manuscripts of the software or references that were used.

Itemized List of the Software Used

Itemized List of Reference Data Used


Future Directions

  • Unified Dimensional Reduction of T cells with Cluster Annotations
  • Data Hosting for Interactive Analysis
  • Easy Submission Portal for Researchers to Add Data
  • Using the Data to Build a Reference Atlas

There are areas in which we are actively hoping to develop to further facilitate the usefulness of the data set - if you have other suggestions, please reach out using the contact information below.


License

The data and analysis of uTILity is provided under a CC BY-ND 4.0 license, please feel free to remix, transform, and build upon the material. However, the intent of this resource is noncommercial, if using the data as a nonacademic institution, you are in violation of the lisence agreement. Please find out more information here.


Contact

Questions, comments, suggestions, please feel free to contact Nick Borcherding via this repository, email, or using twitter.

About

Collection of Tumor-Infiltrating Lymphocyte Single-Cell Experiments with TCR

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages