Better naming of cluster log files #14

mbhall88 · 2020-04-08T10:54:23Z

I don’t like the format the names of the cluster log files have been changed to. It makes it impossible to figure out what log file relates to what job without digging into the snakemake stderr log (which is one of the main things I hate about nextflow).

Current implementation

self.logdir / "{jobid}_{random_string}.err".format(jobid=self.jobid, random_string=self.random_string)

Proposal

self.logdir / self.rule_name / self.wildcards_str / "jobid{jobid}_{random_string}.err".format(jobid=self.jobid, random_string=self.random_string)

Contrasting both implementations

# current
'logdir/2_random.out'
# proposed
'logdir/search_fasta_on_index/i=0/jobid2_random.out'

There are two major advantages I see to the new naming scheme.

It is easier to find the log file for a specific job without having to search for its jobid in the snakemake log.
For large pipelines that produce tens or hundreds of thousands of jobs, this will prevent there being potentially 200,000 log files in one directory. Which I guess might send the cluster into meltdown 😅

leoisl · 2020-04-08T11:23:40Z

I totally agree with this idea, it is indeed a very nice improvement!

I am not sure if we should worry about this (from https://unix.stackexchange.com/questions/32795/what-is-the-maximum-allowed-filename-and-folder-size-with-ecryptfs):
.

Linux has a maximum filename length of 255 characters for most filesystems (including EXT4), and a maximum path of 4096 characters.

filename length is not an issue, but when we add all wildcards of a rule to the log filepath, its length increases a lot. Sometimes I have rules where the wildcards are full paths to other files, but I don't think I ever hit the limit. Anyway, we can ensure the path has at most 4096 character, and cut some of the wildcards in case it exceeds, but I am just wondering if we should address this or not (seems like a rare corner case)?

That is the only possible issue I see we need to tackle, the rest is all fine. I really like this proposal

mbhall88 · 2020-04-08T11:27:34Z

I think 4096 characters is more than sufficient. If someone hit that due to wildcards I would suggest there might be some changes they could make to their pipeline.

I think enforcing it is hard as I guess it could vary from file system to file system?

leoisl · 2020-04-08T11:38:14Z

Yeah, I would just consider common linux filesystem, probably just EXT4. I think we should drop this. Hard to do it properly (work on any filesystem), and will tackle an extremely rare corner case.

[Snakemake-Profiles#14]

[#14]

mbhall88 added the enhancement New feature or request label Apr 8, 2020

mbhall88 pushed a commit to mbhall88/lsf that referenced this issue Apr 8, 2020

better naming of cluster log files [Snakemake-Profiles#14]

63bf7ff

mbhall88 pushed a commit to mbhall88/lsf that referenced this issue Apr 8, 2020

move directory logic into logdir so directories are created if not exist

6a54195

[Snakemake-Profiles#14]

mbhall88 mentioned this issue Apr 8, 2020

Add per-rule configuration capability #15

Merged

mbhall88 pushed a commit that referenced this issue Apr 9, 2020

better naming of cluster log files [#14]

421fc03

mbhall88 pushed a commit that referenced this issue Apr 9, 2020

move directory logic into logdir so directories are created if not exist

d204afe

[#14]

mbhall88 closed this as completed Apr 9, 2020

mbhall88 mentioned this issue Apr 9, 2020

Robust status handling and per-rule config #16

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better naming of cluster log files #14

Better naming of cluster log files #14

mbhall88 commented Apr 8, 2020

leoisl commented Apr 8, 2020

mbhall88 commented Apr 8, 2020

leoisl commented Apr 8, 2020

Better naming of cluster log files #14

Better naming of cluster log files #14

Comments

mbhall88 commented Apr 8, 2020

Current implementation

Proposal

Contrasting both implementations

leoisl commented Apr 8, 2020

mbhall88 commented Apr 8, 2020

leoisl commented Apr 8, 2020