Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add KOI8-RU as a known encoding #49464

Closed
dwayne mannequin opened this issue Feb 11, 2009 · 8 comments
Closed

Add KOI8-RU as a known encoding #49464

dwayne mannequin opened this issue Feb 11, 2009 · 8 comments

Comments

@dwayne
Copy link
Mannequin

dwayne mannequin commented Feb 11, 2009

BPO 5214
Nosy @malemburg, @vstinner
Files
  • koi8_ru.py
  • koi8-ru
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2009-03-27.09:50:21.617>
    created_at = <Date 2009-02-11.07:03:37.315>
    labels = ['expert-unicode']
    title = 'Add KOI8-RU as a known encoding'
    updated_at = <Date 2014-10-09.23:00:42.339>
    user = 'https://bugs.python.org/dwayne'

    bugs.python.org fields:

    activity = <Date 2014-10-09.23:00:42.339>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2009-03-27.09:50:21.617>
    closer = 'lemburg'
    components = ['Unicode']
    creation = <Date 2009-02-11.07:03:37.315>
    creator = 'dwayne'
    dependencies = []
    files = ['13050', '13053']
    hgrepos = []
    issue_num = 5214
    keywords = []
    message_count = 8.0
    messages = ['81630', '81751', '81753', '81754', '81756', '81913', '84250', '84256']
    nosy_count = 3.0
    nosy_names = ['lemburg', 'vstinner', 'dwayne']
    pr_nums = []
    priority = 'normal'
    resolution = 'rejected'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue5214'
    versions = ['Python 2.7']

    @dwayne
    Copy link
    Mannequin Author

    dwayne mannequin commented Feb 11, 2009

    >>> u = unicode("bob", "KOI8-RU")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    LookupError: unknown encoding: KOI8-RU

    This could be broadened to see that we support all encodings that are
    supported by iconv.

    @dwayne dwayne mannequin added the topic-unicode label Feb 11, 2009
    @vstinner
    Copy link
    Member

    I found this file http://ra.dkuug.dk/i18n/charmaps/KOI8-RU. I
    converted it to a format compatible with gencodec.py. Here is the
    resulting file: copy it into <your python library>/encodings/.

    @vstinner
    Copy link
    Member

    Attach file used as gencodec.py input: koi8-ru.

    dwayne: Does the result look correct?

    @vstinner
    Copy link
    Member

    My version of iconv (2.6.1) doesn't support KOI8-RU, only:

    • CSKOI8R
    • KOI-7
    • KOI-8
    • KOI8-R: supported by python trunk
    • KOI8-T
    • KOI8-U: supported by python trunk
    • KOI8
    • KOI8R
    • KOI8U

    Note: python trunk doesn't support KOI8R nor KOI8U (which are just
    aliases to KOI8-R and KOI8-U).

    @malemburg
    Copy link
    Member

    Could you please clarify the official status of this encoding. According
    to this page:

    http://www.terena.org/activities/multiling/koi8-ru/index.html

    it is currently only a proposed draft which hasn't been updated since 1997.

    @dwayne
    Copy link
    Mannequin Author

    dwayne mannequin commented Feb 13, 2009

    @Haypo: The encoding works and doesn't throw and error, my guess is that
    aliases should be updated to cover the variant namings of -R and -U.

    I also found glibc points to this reference
    http://cad.ntu-kpi.kiev.ua/multiling/koi8-ru/ which seems to have
    disappeared. I couldn't find a way to validate that the glibc code
    points where the same as the ones you have.

    My iconv --version is 2.9

    Apart from that I can't vouch for its correctness

    @lemburg: I can't comment on the status of the standard. I would assume
    that like most 8 bit encodings that these are falling away and being
    replaced by Unicode.

    Why I'm interested in these issues is that our Python tools are used to
    recover translations from installed .mo files on Linux. I look for
    encoding issues on a semi-regularly basis and fix any ones that present
    issues. This is the first I've found that is missing in Python.

    For us its useful in that we present a path for people to move from an
    old encoding into Unicode if needed.

    @vstinner
    Copy link
    Member

    @lemburg: I can't comment on the status of the standard.
    I would assume that like most 8 bit encodings that these
    are falling away and being replaced by Unicode.

    Can I close this issue? Or do we have enough KOI8-RU users to include
    this charset in Python?

    I think that iconv is enough for people who need to convert their old
    files to UTF-8 (or anything else).

    @malemburg
    Copy link
    Member

    Viktor, I found this reference which has some background information
    regarding koi8-ru and other cyrillic encodings:
    http://segfault.kiev.ua/cyrillic-encodings/

    "This charset wasn't supported by Ukrainian Internet community due to
    political reasons; KOI8-U was invented as opposition to KOI8-RU."

    Provided that resource is correct, it also appears that its inventor
    Yuri Demchenko now switched to KOI8-U as well:
    http://staff.science.uva.nl/~demch/

    So I guess, we can close this request and leave the codec attached to
    the ticket for interested parties to download and install if they need it.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants