Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting this matrix to Seurat object #4515

Closed
beginner984 opened this issue May 21, 2021 · 13 comments
Closed

Converting this matrix to Seurat object #4515

beginner984 opened this issue May 21, 2021 · 13 comments

Comments

@beginner984
Copy link

I have downloaded log2(TPM/10+1) values of 11,548 genes and 9609 cells from GSE146026 (10x) in tsv format as the raw data is not available

I see patient IDs, cell barcodes, genes, even assigned clusters are here

> dput(head(a[1:5,1:5]))
structure(list(Cell_ID = c("10x_barcode", "patient", "time", 
"sample_ID", "clst"), X10x_1 = c("10x_3288_t1_AAACATACCTTCCG-1", 
"5", "1", "3288.1", "1"), X10x_2 = c("10x_3288_t1_AAACATACTCCTAT-1", 
"5", "1", "3288.1", "1"), X10x_3 = c("10x_3288_t1_AAACATTGAACTGC-1", 
"5", "1", "3288.1", "1"), X10x_4 = c("10x_3288_t1_AAACATTGCTGACA-1", 
"5", "1", "3288.1", "2")), row.names = c(NA, 5L), class = "data.frame")
>

I want to make a Seurat object of that but really I don't know how to do that

I have tried this

Read10X(getwd(),gene.column = 2,cell.column = 1,unique.features = TRUE,strip.suffix = FALSE) Error in Read10X(getwd(), gene.column = 2, cell.column = 1, unique.features = TRUE,  :   Barcode file missing. Expecting barcodes.tsv.gz > 

Anybody have ever dealt with such a case to help me please?

@samuel-marsh
Copy link
Collaborator

samuel-marsh commented May 21, 2021

Hi,

Not member of dev team but hopefully can be helpful. The Read10X function is only applicable to files that are supplied in the 10X format (barcodes.tsv, features.tsv, matrix.mtx). If you want to make Seurat object from a matrix, data.frame, etc you simply need to provide an matrix, dataframe, etc with cell names/barcodes as columns and features/genes as rows.

In this case the authors have included extra rows which you need to remove before creating the object. If you want to add the meta data they have included back to the object you can also do that or perform your own reanalysis or both.

Best,
Sam

tirosh_geo <- read.delim("~/Downloads/GSE146026_Izar_HGSOC_ascites_10x_log.tsv", stringsAsFactors = F, header = T)

# Pull only gene expression information
tirosh_genes <- tirosh_geo[-1:-7,]

# add genes to rownames
rownames(tirosh_genes) <- tirosh_genes[, 1]

# remove old now mislabeled gene column
tirosh_genes <- tirosh_genes[, -1]

# Pull meta data columns from original data
tirosh_meta <- tirosh_geo[1:7,]

# Make rownames equal to column 1 values
rownames(tirosh_meta) <- tirosh_meta[, 1]

# Remove column 1
tirosh_meta <- tirosh_meta[, -1]

# Transpose meta data as Seurat expects meta data to have cell names as rows and meta data values as columns
tirosh_meta_transpose <- data.frame(t(tirosh_meta))

# Create Seurat Object
tirosh_seurat <- CreateSeuratObject(counts = tirosh_genes, meta.data = tirosh_meta_transpose)
View(tirosh_seurat@meta.data)

@timoast timoast closed this as completed May 21, 2021
@beginner984 beginner984 changed the title Converting this matrix top Seurat object Converting this matrix to Seurat object May 21, 2021
@beginner984
Copy link
Author

Thank you so much

That was simply amazing especially for a beginner with scRNA-seq

Now, cells have been already assigned to the clusters and available in metadata

Can I use this information without re-clustering/dimentially reduction?

I tried this

> DimPlot(tirosh_seurat)
Error: Unable to find a DimReduc matching one of 'umap', 'tsne', or 'pca', please specify a dimensional reduction to use
> 

@samuel-marsh
Copy link
Collaborator

Hi,

Yes you can but you need to create the DimReduc using CreateDimReducObject and add that to the object and then also specify the identity that you want as the cluster label using and providing one of the meta data column names that were added during object creation:

Idents(tirosh_seurat) <- "cluster_name"

Best,
Sam

@beginner984
Copy link
Author

Thank you so much
I added clusters to the object

Idents(tirosh_seurat) <- "clst"

I got this plot

Rplot01

I am not sure what does -1 cluster means though
As you kindly made the Seurat object for me, there are already tSNE and cluster information provided by Nature medicine paper which you added that to the metadata here

How I can use tSNE information from the metadata instead of demential reduction by myself?

@samuel-marsh
Copy link
Collaborator

samuel-marsh commented May 22, 2021

Hi,

So I can't help you with what -1 as cluster annotation means because those are cluster annotations were provided by authors so you'd have to look more into their paper/analysis or contact them directly.

In terms of using their tSNE coordinates yes as I mentioned you have to create your own DimReducObject and add that.

Here is code for this case:
*Note caught one error in the previous code I used that was causing issue with meta data frame columns being converted to factors when transposing that as initially causing issues in getting tsne coordinates to work. So posting the entire code start to finish here below:

library(dplyr)
library(magrittr)
library(Seurat)

# Read data
tirosh_geo <- read.delim("~/Downloads/GSE146026_Izar_HGSOC_ascites_10x_log.tsv", stringsAsFactors = F, header = T)

# Pull only gene expression information
tirosh_genes <- tirosh_geo[-1:-7,]

# add genes to rownames
rownames(tirosh_genes) <- tirosh_genes[, 1]

# remove old now mislabeled gene column
tirosh_genes <- tirosh_genes[, -1]

# Pull meta data columns from original data
tirosh_meta <- tirosh_geo[1:7,]

# Make rownames equal to column 1 values
rownames(tirosh_meta) <- tirosh_meta[, 1]

# Remove column 1
tirosh_meta <- tirosh_meta[, -1]

# Transpose meta data as Seurat expects meta data to have cell names as rows and meta data values as columns
tirosh_meta_transpose <- data.frame(t(tirosh_meta), stringsAsFactors = F)

# Create Seurat Object
tirosh_seurat <- CreateSeuratObject(counts = tirosh_genes, meta.data = tirosh_meta_transpose)

# pull tsne coordinates
tSNE_coordinates <- tirosh_meta_transpose %>% 
  rownames_to_column("barcodes") %>% 
  select(barcodes, tsne_1 = TSNE_x, tsne_2 = TSNE_y) %>% 
  column_to_rownames("barcodes")

tSNE_coordinates$tsne_1 <- as.numeric(tSNE_coordinates$tsne_1, length = 4)
tSNE_coordinates$tsne_2 <- as.numeric(tSNE_coordinates$tsne_2, length = 4)

# format as matrix
tSNE_coordinates_mat <- as(tSNE_coordinates, "matrix")

# Create DimReduc and add to object
tirosh_seurat[['tsne']] <- CreateDimReducObject(embeddings = tSNE_coordinates_mat, key = "tSNE_", global = T, assay = "RNA")

Idents(tirosh_seurat) <- "clst"

This is plot that is then created:
image

Best,
Sam

@beginner984
Copy link
Author

Thanks a million

I just can say I could not do that at all

@samuel-marsh
Copy link
Collaborator

Glad I could help!

@beginner984
Copy link
Author

beginner984 commented May 23, 2021

Thank you so much once more

I have plotted this

Rplot

From your code and Nature medicine public data

https://www.nature.com/articles/s41591-020-0926-0

And as I am just pursuing the Nature medicine Figure 1b, I contacted the author what the cluster -1 is and why they are speaking about 18 clusters and I am seeing 21 clusters

They replied me like this

the numbers are ok "as is", and you should just exclude all cells with cluster ID of -1 or above 18. these correspond to the cells that were not assigned to any cluster (-1) or that were assigned to clusters that were reflecting low quality or doublets and hence removed from subsequent analysis (19-21).

If I want a Seurat object with exactly Nature medicine clusters what should I do? I what to map the markers from my own data on their tsne map as they have already annotated cell clusters well

@samuel-marsh
Copy link
Collaborator

Hi,

See subset function.

Best,
Sam

@beginner984
Copy link
Author

beginner984 commented May 23, 2021

I have done so

Now the map is the same

> tirosh_seurat_1 =subset(tirosh_seurat, idents = c("-1", "12","14","6"), invert = TRUE)
> 
> # note that you can set `label = TRUE` or use the LabelClusters function to help label
> # individual clusters
> DimPlot(tirosh_seurat_1, reduction = "tsne",label = TRUE,label.box = FALSE,combine = TRUE)
> 

Rplot02

@samuel-marsh
Copy link
Collaborator

Glad it worked!

@robsoncarvalho7
Copy link

Thanks, 10 billion times.

I was stuck with this dataset, and this discussion just saved me!

I am starting to learn single-cell analysis.

@beginner984 , as you contact the authors, please could you further clarified the following points:

  1. Where did you find the information that the values are log2(TPM/10+1) for 10x?

  2. Why did you remove "12", "14", and "6" and not "19", "20", and 21 (according to the authors' reply)?

  3. I need a matrix with cell names according to each cluster. Could you please send any information about what I should do next?

Thank you so much again!

@beginner984
Copy link
Author

Hi

From this paper https://www.nature.com/articles/s41591-020-0926-0 I downloaded log2(TPM/10+1) and meta data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE146026) as they are clearly saying the raw read counts is somewhere else for which I needed paper work

As I wanted to reproduce their UMAP in Figure 1b , I removed some clusters because in the paper they are saying they were not able to annotate these clusters

Also, I was only interested in immune clusters so basically I did not want to clusters 1 to 9 as they are cancer clusters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants