-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using bgz extension instead of gz for bgziped files #129
Comments
bgzipped files are gzip compliant. Why not tell vim to gzip everything with bgzip instead? |
vim was just an example, not the motivation. The more general motivation is On Wed, Nov 18, 2015 at 8:39 AM, Warren Kretzschmar <
|
Yep, you're right. Even different implementations of the same standard (for example .lz and .lzma here) appear to get different file endings. |
For For the rest of |
.bgz are still not recognized when using bgzip. Thank you |
Why do you need to rename your files? The suffix name does matter for the operation of the program. |
i have to rename them from ".bgz" to ".gz" so bgzip can work, otherwise i get this error "unknown suffix -- ignored " |
That's likely a bug in bgzip as it ought to be using the magic number instead of filename. However I see similar login in tabix, which only works on (for example) foo.bed.gz and wouldn't accept foo.bed.bgz. This is why renaming files to your own suffixes is problematic and I'd be reluctant to tinker with this. Even if we change it in htslib, it'll cause problems for people using old installs and we have no idea how many other applications out there are assuming .gz instead of .bgz. I agree bgz would have been better, but IMO this ship sailed long ago. |
I agree that it should not use the filename... My first idea was to have bgzip able to work on ".gz", but also on ".bgz" so old installs as you said would still work on ".gz". |
Yes, it should use the magic number, this fails:
As a quick workaround, use
|
This code in bgzip is checking that the file is compressed, hence in a position to be decompressed. Doing that via filename-extension checking code is ancient, from before we had easy magic-number sniffing infrastructure. [Edit: the similar logic in tabix.c — in We'd now be in a position to move bgzip's is-it-compressed test to after The code that strips |
@pd3 yes in the end looks like this is the best solution |
It seems like this was never done...? When using |
Check that the file is actually compressed rather than that it ends in ".gz", and form the output filename by stripping an extension rather than exactly 3 characters. Enables e.g. `bgzip -d foo.bgz`; fixes (the non-policy part of) samtools#129.
Would make it very clear what type of file is being dealth with. Additionally, can avoid issues with things like vim being able to read bgziped files saved with .gz extensions and resaving them as regular gzip.
I believe minimal changes to bgzip.c to autoappend .bgz instead of .gz and tabix.c to auto detect files with .bgz extensions.
I can submit a pull request to these if needed.
The text was updated successfully, but these errors were encountered: