usage of nanopype, storage module #5
Comments
Hi Aurelien,
Regarding binaries: If you use singularity, you can ignore the warnings about missing binaries, they're all in the container. Regarding reference genome: Please first make sure the file '~/lsru/minion/references/GRCm38.p3.fasta' exists. In your env.yaml where you configure the genomes, can you replace the given path with an absolute path, I'm right now not sure if the ~ is handled properly. I still have open pull requests to the snakemake master regarding singularity and group jobs in the cluster. But I've seen a few fixes on their side too. |
thanks a lot for your fast answer! snakemake -j 999 --cluster-config cluster.json --cluster "sbatch -c 2 -t 1" --profile slurm \
--use-singularity --singularity-args "--bind /mnt/irisgpfs/projects/lsru:/lsru" --singularity-prefix ~/scratch/img \
--snakefile ~/install/virtualenvs/nanopype/nanopype/Snakefile data/raw/reads.fofn gives
and then I realised the
and the updated command line: snakemake -j 999 --cluster-config cluster.json --cluster "sbatch -c 2 -t 1" --profile slurm --use-singularity --singularity-args "--bind /mnt/irisgpfs/projects/lsru:/lsru" --singularity-prefix ~/scratch/img --snakefile ~/install/virtualenvs/nanopype/nanopype/Snakefile /lsru/minion/mouse_embryo/data/raw/reads.fofn and now
seems like the singularity image is not loaded, and links are used outside the container. As a demonstration, inside the container it is ok:
sorry the mess, I guess this is not exactly trivial as a set-up. Even if, it would so awesome to get it working ;) |
Hey, With this you should be able to require the file '~/data/raw/FAH1234_something/reads.fofn' from the pipeline. Importantly, you do not need to mount any host path into the container, this is all done by snakemake! For debugging purposes, you should be able to run the raw data indexing outside of the container, it's pure python code (I hope I have included all packages in the requirements) Pay |
I think I am lost here, I am not working in my home directory (we have strict quota on this one) so how can the container access the data then? In |
No worries, home or not doesn't matter, as long as the path is accessible from any cluster node. I'm only using home in the docs since people are familiar with it. In general, snakemake is mounting the input files of a rule (for indexing the raw read batch) into the working directory within the container. Basecalling with GPU is not yet tested by me, we have a CPU only cluster. When you reach this point, can you open a new issue with some specs on how GPUs are set up for you? Are there nodes with one or multiple GPUs, how do you control which process is using which GPU etc. |
yes, I have GPU settings that allow really fast basecalling (<30 min) instead of > 10 hours so I am really keen on continuing basecalling on those units. But, I am stopping here and will open another issue. I didn't know snakemake could mount the files into the home of the container, that's handy. Unfortunately, now I am trying easier things, no singularity, just the indexing as in the https://nanopype.readthedocs.io/en/stable/rules/storage/ and got this:
|
okay, I see where this might come from, I'm converting the 'storage_data_raw' setting to an absolute path in the Snakefile. Can you try the entire setup with absolute paths? |
I made a progress, actually leaving the relative path in
and then:
|
I think I can now reproduce the issue. In short could you please report the folder/file structure below data/raw/...? It seems the pipeline is finding the run folder, but no raw data batches inside the reads folder. Please see the following two examples. I followed the first steps in the tutorial to extract the test data, and running e.g.
gives me:
however, if I create a dummy run with
and ask to index this one:
I get
and finaly without -n
Which is close to your output. I will think of a warning/error message if no read batches are found, the current error doesn't give a hint on the acutal problem. Please let me know if this was helpful. |
Hello Pay,
Now I was trying the tests again and it failed :( and I get
of note, the for info
|
yes, almost there, the configuration is right, if you run
you get the 'No rule to produce...'
it works. This behavior is currently not consistent with the documentation, I will think of a way to handle either both absolute and relative paths or document it better. To explain the error, the configuration is now with an absolute path but the snakemake workflow got called with a relative path. Snakemake doesn't detect this and doesn't find a rule to produce the relative output. This is only relevant for the storage module, all other modules work on relative paths in the working directory. |
I am sorry Pay, but providing the absolute path for calling the index file still produce |
Hello,
your workflow fits perfectly my needs and I really hope I can manage to get it working in our set-up, which is on a HPC, with
slurm
.I could install all tools but I want to use the
singularity
image that would be more easy to share the tool with people here.I don't understand how I am supposed to use this image, I created it after pulling from docker hub.
the import of data when fine:
after booking interactive resources, and activating the virtual env of
nanopype
python3 ~/install/virtualenvs/nanopype/nanopype/scripts/nanopype_import.py \ data/raw/ EM_S1/20190806_0812_MN22103_FAK07438_17f539d2/fast5
gives
afterwards I naively tried this after creating a slurm profile
gives
I tried to bind the mouse genome folder in the singularity image but I don't get it right. Also, what about all the warnings concerning the binary missing? Are they not found in the singularity image?
Then it is complaining about the
demux.smk
that isForgot to say that I successfully run the tests when I was inside the singularity image but not when running
python3 test/test_rules.py test_unit_singularity
which gave the same error as trying to index the fast5 files.Thanks in advance for your time.
Aurelien
The text was updated successfully, but these errors were encountered: