Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt package to different types of organisms #28

Closed
3 tasks
wleoncio opened this issue Oct 21, 2020 · 3 comments
Closed
3 tasks

Adapt package to different types of organisms #28

wleoncio opened this issue Oct 21, 2020 · 3 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers
Milestone

Comments

@wleoncio
Copy link
Member

wleoncio commented Oct 21, 2020

DIscBIO was developed based on two datasets using human and mouse genes. It would be great if it could be adapted to work on other organisms.

Adapted details from @SystemsBiologist

What to change

Conquer is a collection of analysis-ready public scRNA-seq data sets. We would like to add it to our manuscript. It has about 40 datasets from three organisms: human, Zebrafish and mouse. When I wrote DIscBIO I was focusing on humans but now we want to make it applicable for any organism with a taxonomy ID. To do so we need to change in the DIscBIO-classes.R lines 157-159 from

DIscBIO/R/DIscBIO-classes.R

Lines 157 to 159 in 0c90899

shortNames <- substr(rownames(tmpExpdataAll), 1, 4)
geneTypes <- factor(
c(ENSG = "ENSG", ERCC = "ERCM", ENSG = "ENSM")[shortNames]

to

shortNames <- substr(rownames(tmpExpdataAll), 1, 3)
        geneTypes <- factor(
            c(ENS = "ENS", ERC = "ERC")[shortNames]

I did not change the code because the dev is not working, I was worried to make the situation worst. Could you change the code after you bring back dev to work?

Expected behavior

  • The problem will be in 3 functions (DEGanalysis2clust, DEGanalysis and ClustDiffGenes)
  • The outcome of ClustDiffGenes() is not perfect but it is OK
  • The main problem is in DEGanalysis2clust and DEGanalysis. They are not working at all.

Testing code

library(MultiAssayExperiment)
GSE41265 <- readRDS("~/GSE41265.rds")
Dataset=assays(experiments(GSE41265)[["gene"]])[["count"]]
rownames(Dataset) <- as.list(sub("*\\..*", "", unlist(rownames(Dataset))))
sc<- DISCBIO(Dataset)
sc<- Clustexp(sc,cln=2,quiet=F,clustnr=6,rseed=17000)    
Cdiff<-DEGanalysis2clust(sc,Clustering="K-means",K=2,fdr=0.05,name="M",export = TRUE,quiet=F)  
Cdiff<-DEGanalysis(sc,Clustering="K-means",K=2,fdr=0.05,name="All",export = TRUE,quiet=F)   ####### differential expression analysis between all clusters
CdiffBinomial<-ClustDiffGenes(sc,K=2,export = T,fdr=.01,quiet=F)

At the moment if DEGanalysis and DEGanalysis2clust can work even without having the gene names as ClustDiffGenes that will be great.

@wleoncio wleoncio added the enhancement New feature or request label Oct 21, 2020
@wleoncio wleoncio added this to the DIscBIO 1.1.0 milestone Oct 21, 2020
@wleoncio wleoncio self-assigned this Nov 4, 2020
@wleoncio wleoncio added the good first issue Good for newcomers label Nov 4, 2020
@wleoncio
Copy link
Member Author

wleoncio commented Nov 4, 2020

@SystemsBiologist

Could you change the code after you bring back dev to work?

Sure thing, but what do you mean about dev not working?

Edit: pasting e-mail reply:

The dev is working fine the problem was from binder.
Now everything is working for all organisms except for two functions:
DEGanalysis2clust()
DEGanalysis()
You can see that in the "DIscBIO-CONQUER Notebook": https://github.com/ocbe-uio/DIscBIO/blob/dev/notebook/DIscBIO-CONQUER%20Notebook.ipynb
The ClustDiffGenes() is working although the output does not show the gene symbol name but that is fine. It would be great if we can do in the future the same for DEGanalysis2clust() and DEGanalysis().

@wleoncio
Copy link
Member Author

@SystemsBiologist, it looks like commit c9313b5 has already implemented the changes in the OP, should this issue be closed then? What about the problems posted, i.e.:

The problem will be in 3 functions (DEGanalysis2clust, DEGanalysis and ClustDiffGenes)
The outcome of ClustDiffGenes() is not perfect but it is OK
The main problem is in DEGanalysis2clust and DEGanalysis. They are not working at all.

@SystemsBiologist
Copy link
Collaborator

You can close this one since you have created a "To do list"

@wleoncio wleoncio modified the milestones: Future minor release, DIscBIO > 1.0.1 Nov 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants