Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naming scheme consistent with the established conventions #299

Closed
piotrjurkiewicz opened this issue Jan 21, 2016 · 2 comments
Closed

Naming scheme consistent with the established conventions #299

piotrjurkiewicz opened this issue Jan 21, 2016 · 2 comments

Comments

@piotrjurkiewicz
Copy link

  1. Introduction:

    Change of the content encoding type from "bro" to "br" (c4f439d) caused a big fuss. People divided into "bro" vs. "br" camps and focused on political aspects of the change. In particular no one tried to survey existing naming conventions in order to find the right answer for the problem. As problem remained unsolved and controversial, this led to the confusion about file extension and package names, resulting in the following bug reports: Recommend a brotli file extension #288, Rename bro cmdline program to brotli (or similar) #281, add brotli to PyPI repository #72.

    Therefore, I would like to create a consistent naming scheme, compatible with the conventions established by IANA and real world practice. The starting point for this is a survey of existing naming conventions of compression algorithms.

  2. Existing convention in HTTP Content-Coding tokens naming:

    IANA registry contains 6 non-deprecated tokens: compress, deflate, exi, gzip, identity and pack200-gzip.

    Wikipedia lists a number of non-standardized tokens as well, for example: bzip2, lzma or sdch.

    As you may notice, the convention established by IANA is to use a full name of compression algorithm as a token. This means usage of gzip instead gz, etc.

  3. Existing convention in MIME types and file extensions naming:

    Only two popular compression formats have their MIME type and file extension registered in IANA:

    • gzip: MIME type: application/gzip, file extension: .gz
    • zip: MIME type: application/zip, file extension: .zip

    Popular non-standardized formats include:

    • bzip2: MIME type: application/x-bzip2, file extension: .bz2
    • xz: MIME type: application/x-xz, file extension: .xz
    • lzip: MIME type: application/x-lzip, file extension: .lz
    • 7z: MIME type: application/x-7z-compressed, file extension: .7z

    Convention is harder to spot here, but we can distinguish two patterns:

    • use a full name in MIME type
    • use a two-letter abbreviation as a file extension
  4. Existing convention in cmd line program names:

    Unix commands for the commonly used compression algorithms have the following names:

    • /usr/bin/gzip for gzip
    • /usr/bin/bzip2 for bzip2
    • /usr/bin/xz for xz
    • /usr/bin/{zip,unzip} for zip

    Again, we may notice that established convention is to use a full name of compression algorithm/file fomat as program name (xz fits this as well, because algorithm/format name of xz is xz, not lzma (lzma was an older, incompatible file format)).

  5. My proposal for brotli:

    1. Use the full name (brotli) as HTTP Content-Coding token.

      This fits into the convention established by IANA and reduces the probability of registration rejection by IETF. Someone may point out that this means sending 4 bytes more in each HTTP request/response than when using the br token. It is true, but in HTTP2 headers will be anyway Huffman-encoded by HPACK, so there will be no difference in requests size.

    2. Use the full name (application/brotli) as a MIME type.

      In order to fit the existing convention, as well as to reduce future collision and ambiguity probability.

    3. Use the two-letter abbreviation (.br) as a file extension.

      Again to fit the existing two-letter extension convention. In the case of file extension we do not need to worry about collisions, so abbreviation can be safely used.

      This would resolve issue Recommend a brotli file extension #288.

    4. Use the full name (brotli) as cmd line program name.

      As someone pointed out in Rename bro cmdline program to brotli (or similar) #281, bro program name is already occupied in Debian. br abbreviation is commonly used in Linux for network bridge related things, for example brctl tool. Therefore I think that the full name should be used. In fact the longer name is, the lower probability of collision with the existing program and ambiguity is. And the usage of full name fits the existing convention in Unix.

      This would resolve issue Rename bro cmdline program to brotli (or similar) #281 and affect pull request create 'brot' command-line compression program #163.

    5. Use the full name (brotli) as Python/nodejs/distribution/etc. package name.

      Again, we should use the full name in order to reduce collision and ambiguity probability. We cannot use br because this package name is already taken in PyPI. In the case of npm it is even worse, because both br and bro names are already taken by unrelated packages.

      This would resolve issue add brotli to PyPI repository #72.

  6. Summary

    I have tried to create a consistent naming scheme, compatible with the established conventions. I am looking forward for your opinions.

    Notice, that I did not use bro name in any place. I do that not because of political-correctness related things, but simply because it does not fit into any of the established conventions: it is neither the full name, nor the two-letter abbreviation. Therefore, I would like to avoid here discussion related to the bro issue, and to focus on consistency aspects instead.

@jyrkialakuijala
Copy link
Collaborator

Thanks for the detailed proposal.

I don't think 'free' is the correct estimate for hpacking 'br' vs 'brotli'. We are going forward with 'br' for content-encoding, but we have been planning to rename the binary to brotli. Expect this to happen during the next month.

Later, we might declare a recommended framing format for use with brotli. However, this has no impact for content encoding, but might require new names. For an early version of the proposal, see https://github.com/madler/brotli/blob/master/br-format-v3.txt

@wrowe
Copy link

wrowe commented Feb 27, 2017

Glad to read that a rename to /usr/bin/brotli is in the works (camping the git master branch and looking forward to that change for 0.6). For completeness, similar to Debian, /usr/bin/bro is already reserved on fedora/redhat/centos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants