Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

group related datasets into a hierarchy ending with errors of copying exprMatrix.tsv.gz to itself #237

Closed
yesonse opened this issue Mar 20, 2022 · 16 comments

Comments

@yesonse
Copy link

yesonse commented Mar 20, 2022

Dear there,

I run a local host with about 20 datasets.
Now I like to group them into collections as suggested https://cellbrowser.readthedocs.io/en/master/collections.html.
After I run "cbBuild -r", the cell browser showed the collections well, but could not find each dataset in each collection.
I found there was no cellbrowser.conf in the subdirectory of collection, and made one for each. I tried to run cbBuild in the subdirectory end with errors of copying exprMatrix.tsv.gz to itself.
I tried to read some codes from the source, and believe it is right to put the dataset under the collection first, then run cbBuild in the sub-directory of the dataset.

I also tried to run cbBuild in directory not under dataRoot but end with deactivating the hierarchy.

Please advise the best way to recovery the dataset in each collection and add new dataset into a collections.

I appreciated the great of work building the cell browser.

Thanks a lot.

Robin

@matthewspeir
Copy link
Collaborator

Hi, Robin.

Can your provide the full error message you are receiving when trying to run cbBuild for these datasets?

@yesonse
Copy link
Author

yesonse commented Mar 21, 2022

Thanks matthew.

Here is it.

root@bioinformatics:/home/CellBrowser/OPC/otx169to176integrated# cbBuild -i cellbrowser.conf -o /home/CellBrowser
INFO:root:Determining if /home/CellBrowser/OPC/otx169to176integrated/exprMatrix.tsv.gz needs to be created
INFO:root:input matrix has input file size that is different from previously processed matrix. Expression matrix must be reindexed. Old file(s): {'fname': '/home/anaconda/opc211101/otx169to176integrated/exprMatrix.tsv.gz', 'md5': '970a1f0448', 'size': 177807570, 'mtime': '2022-03-08 03:07:17'}, current file: 120814330
INFO:root:/home/CellBrowser/OPC/otx169to176integrated/meta.tsv has the same md5 as in /home/CellBrowser/OPC/otx169to176integrated/dataset.json, no need to rebuild meta data
INFO:root:Reading sample names from /home/CellBrowser/OPC/otx169to176integrated/meta.tsv
INFO:root:Checking and reordering meta data to /home/CellBrowser/OPC/otx169to176integrated/meta.tsv
INFO:root:Reading sample names from /home/CellBrowser/OPC/otx169to176integrated/meta.tsv
INFO:root:Reading headers from file /home/CellBrowser/OPC/otx169to176integrated/exprMatrix.tsv.gz
INFO:root:Data contains 14244 samples/cells
INFO:root:Converting to numbers and compressing meta data fields
INFO:root:Field Cell: type uniqueString, 14244 different values
INFO:root:Field origident: type enum, 4 different values
INFO:root:Field nCount_RNA: type int, 3749 different values
INFO:root:Field nFeature_RNA: type int, 2422 different values
INFO:root:Field percentmt: type float, 7721 different values
INFO:root:Field percentribo: type float, 11120 different values
INFO:root:Field predictedsubclassscore: type float, 13235 different values
INFO:root:Field predictedsubclass: type enum, 11 different values
INFO:root:Field SScore: type float, 14244 different values
INFO:root:Field G2MScore: type float, 14244 different values
INFO:root:Field Phase: type enum, 3 different values
INFO:root:Field seurat_clusters: type enum, 11 different values
INFO:root:Field CellType: type enum, 5 different values
INFO:root:Field integrated_snn_res01: type enum, 5 different values
INFO:root:Field integrated_snn_res02: type enum, 9 different values
INFO:root:Field integrated_snn_res03: type enum, 9 different values
INFO:root:Field integrated_snn_res04: type enum, 11 different values
INFO:root:Field integrated_snn_res05: type enum, 11 different values
INFO:root:Field Cluster: type enum, 11 different values
INFO:root:Indexing meta file /home/CellBrowser/OPC/otx169to176integrated/meta.tsv to /home/CellBrowser/OPC/otx169to176integrated/meta.index
INFO:root:Kept 14244 cells present in both meta data file and expression matrix
INFO:root:Auto-detecting number type of /home/CellBrowser/OPC/otx169to176integrated/exprMatrix.tsv.gz
INFO:root:Auto-detect: Numbers in matrix are of type 'float'
INFO:root:Auto-detected gene IDs type: symbols
INFO:root:Copying/compressing /home/CellBrowser/OPC/otx169to176integrated/exprMatrix.tsv.gz to /home/CellBrowser/OPC/otx169to176integrated/exprMatrix.tsv.gz
cp: '/home/CellBrowser/OPC/otx169to176integrated/exprMatrix.tsv.gz' and '/home/CellBrowser/OPC/otx169to176integrated/exprMatrix.tsv.gz' are the same file
ERROR:root:Could not run: cp "/home/CellBrowser/OPC/otx169to176integrated/exprMatrix.tsv.gz" "/home/CellBrowser/OPC/otx169to176integrated/exprMatrix.tsv.gz"
ERROR:root:Unexpected error: (<class 'SystemExit'>, SystemExit(1), <traceback object at 0x7fae7e6d3340>)
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/cellbrowser/cellbrowser.py", line 4783, in cbBuildCli
build(confFnames, outDir, port, redo=options.redo)
File "/usr/local/lib/python3.8/dist-packages/cellbrowser/cellbrowser.py", line 4598, in build
convertDataset(inDir, inConf, outConf, datasetDir, redo)
File "/usr/local/lib/python3.8/dist-packages/cellbrowser/cellbrowser.py", line 3955, in convertDataset
convertExprMatrix(inConf, outMatrixFname, outConf, sampleNames, geneToSym, datasetDir, needFilterMatrix)
File "/usr/local/lib/python3.8/dist-packages/cellbrowser/cellbrowser.py", line 3285, in convertExprMatrix
matType = copyMatrixTrim(matrixFname, outMatrixFname, metaSampleNames, needFilterMatrix, geneToSym, outConf, matType)
File "/usr/local/lib/python3.8/dist-packages/cellbrowser/cellbrowser.py", line 2525, in copyMatrixTrim
ret = runCommand(cmd)
File "/usr/local/lib/python3.8/dist-packages/cellbrowser/cellbrowser.py", line 2459, in runCommand
errAbort("Could not run: %s" % cmd)
File "/usr/local/lib/python3.8/dist-packages/cellbrowser/cellbrowser.py", line 168, in errAbort
sys.exit(1)
SystemExit: 1
root@bioinformatics:/home/CellBrowser/OPC/otx169to176integrated#

last time I run cbBuild in "/home/anaconda/opc211101/otx169to176integrated/" successfully.
After I re-set the dataRoot and group my dataset. I copy the old cellbrowser.conf to the subdirectory and try to rebuild everything.

I found the inMatrix md5 is different from outMatrix in the dataset.json

{
"fileVersions": {
"inMeta": {
"fname": "/home/anaconda/opc211101/otx169to176integrated/meta.tsv",
"md5": "5d5e981856",
"size": 2189471,
"mtime": "2022-03-08 04:19:11"
},
"outMeta": {
"fname": "/home/UCSCcellbrowser/otx169to176integrated/meta.tsv",
"md5": "5d5e981856",
"size": 2189471,
"mtime": "2022-03-08 04:26:16"
},
"inMatrix": {
"fname": "/home/anaconda/opc211101/otx169to176integrated/exprMatrix.tsv.gz",
"md5": "970a1f0448",
"size": 177807570,
"mtime": "2022-03-08 03:07:17"
},
"outMatrix": {
"fname": "/home/UCSCcellbrowser/otx169to176integrated/exprMatrix.tsv.gz",
"md5": "0c55cac114",
"size": 120814330,
"mtime": "2022-03-08 04:29:46"
},
"conf": {
"fname": "/home/anaconda/opc211101/otx169to176integrated/cellbrowser.conf",
"md5": "85c7479abb",
"size": 1170,
"mtime": "2022-03-08 04:25:14"
}
},
"sampleCount": 14244,
"matrixWasFiltered": true,
"metaFields": [
{
"name": "Cell",
"label": "Cell",
"type": "uniqueString",
"maxSize": 20,
"diffValCount": 14244,
"md5": "770c0e2419"
},

@matthewspeir
Copy link
Collaborator

Robin, can you share more details about how you installed the cellBrowser package (i.e. pip, conda)? And maybe what operating system you're running on (i.e. Windows, Mac OSX, or Linux)?

@maximilianh Do you have ideas? I've never seen this error before:

ERROR:root:Could not run: cp "/home/CellBrowser/OPC/otx169to176integrated/exprMatrix.tsv.gz"

@yesonse
Copy link
Author

yesonse commented Mar 23, 2022

Hi Matthew,

I installed it by pip two years ago and upgrade it recently.
I have an ubuntu and host an ucsc cellbrowser well for about 20 datasets (20 directories) without hierarchy at /home/UCSCcellbrowser/.
Now I have to organize those datasets with hierarchy.
So I configured my cellbrowser with a dataroot "/home/CellBrowser/", made several directories there as collections, made "cellbrowser.conf" there too. Then I copy the old 20 directories to these collections. I try to rebuilt each dataset.

would you mind to suggest what is the best way to re-set the cellbrowser with hierarchy?

Thanks

Robin

@matthewspeir
Copy link
Collaborator

Hmm, that's odd that the hierarchy stuff didn't work for you.

Just to be sure, you've removed the 'dataRoot' line from the .cellbrowser.conf file in your home directory?

@yesonse
Copy link
Author

yesonse commented Mar 23, 2022

I did not have a .cellbrowser.conf before.
Just made one with line of "dataRoot=/home/CellBrowser/".

@yesonse
Copy link
Author

yesonse commented Mar 23, 2022

Hi Matthew,

How do you add a dataset to a collection with hierarchy?
If I run cbBuild in a directory not under dataRoot, it just de-activated the hierarchy.
If I move the output folder of cbSeurat under a collection of dataRoot and run cbBuild there, I got the same errors as showed.

Thanks

Robin

@matthewspeir
Copy link
Collaborator

Could you try setting up a .cellbrowser.conf (note the '.' at the beginning of the file name) in your home directory with the dataRoot line to see if that helps?

@maximilianh
Copy link
Owner

maximilianh commented Mar 23, 2022 via email

@yesonse
Copy link
Author

yesonse commented Mar 23, 2022

I had ".cellbrowser.conf" with dataRoot line. For redundancy, I also set CBDATAROOT=/home/CellBrowser/.

my error came from here

2515 shutil.copyfile(inFname, outFname)

when I move the default outputs of cbSeurat to a subdirector of dataRoot and run cbBuild there:
outDir == inDir, inFname == outFname and errors happened.

So I renamed file 'exprMatrix.tsv.gz' to 'oldMatrix.tsv.gz' in the output of cbSeurat and run cbBuild again, it works.

@maximilianh
Copy link
Owner

maximilianh commented Mar 24, 2022 via email

@yesonse
Copy link
Author

yesonse commented Mar 24, 2022

I managed an internal bioinformatic server and used it to host the cell browser for my colleagues of 30-40 people. Several colleagues might add some datasets sometime independently. I did not make a specific htmlDir or dataRoot.
I just simply made one directory and have httpd visit that directory.

So my case could be special and I thought I could set outDir as the dataRoot. I have not realized that hierarchy need a specific "dataRoot" other than outDir for 'cbBuild -o'.
I have thought hierarchy only need a tree of outputs of cbBuild, not related to where and how you run cbBuild.

I figured out that I just need to put the input files needed for cbBuild in other places, put their paths in the cellbrowser.conf under a tree of directories, run cbBuild under each subdirectory.

I also modified the codes and make it works when outDir == inDir :), not re-write the exprMatrix, which make it easy to rebuild the tree of outputs anytime.

Thank you very much for your great of work of build the cellbrowser!

@matthewspeir
Copy link
Collaborator

Hi, @yesonse. Can we close this ticket? Or are you still running into issues?

@yesonse
Copy link
Author

yesonse commented Apr 8, 2022

I am fine now.

@yesonse yesonse closed this as completed Apr 8, 2022
@maximilianh
Copy link
Owner

maximilianh commented Oct 11, 2022 via email

@maximilianh
Copy link
Owner

maximilianh commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants