Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for z/OS and EBCDIC. #45639

Closed
lealanko mannequin opened this issue Oct 18, 2007 · 12 comments
Closed

Support for z/OS and EBCDIC. #45639

lealanko mannequin opened this issue Oct 18, 2007 · 12 comments
Labels
build The build process and cross-build extension-modules C modules in the Modules dir interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir topic-unicode type-feature A feature request or enhancement

Comments

@lealanko
Copy link
Mannequin

lealanko mannequin commented Oct 18, 2007

BPO 1298
Nosy @gvanrossum, @loewis
Files
  • python-20071018-zos.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2007-10-24.14:32:49.476>
    created_at = <Date 2007-10-18.17:14:11.196>
    labels = ['interpreter-core', 'build', 'extension-modules', 'type-feature', 'library', 'expert-unicode']
    title = 'Support for z/OS and EBCDIC.'
    updated_at = <Date 2007-10-24.14:32:49.466>
    user = 'https://bugs.python.org/lealanko'

    bugs.python.org fields:

    activity = <Date 2007-10-24.14:32:49.466>
    actor = 'gvanrossum'
    assignee = 'none'
    closed = True
    closed_date = <Date 2007-10-24.14:32:49.476>
    closer = 'gvanrossum'
    components = ['Build', 'Distutils', 'Extension Modules', 'Interpreter Core', 'Library (Lib)', 'Unicode']
    creation = <Date 2007-10-18.17:14:11.196>
    creator = 'lealanko'
    dependencies = []
    files = ['8564']
    hgrepos = []
    issue_num = 1298
    keywords = []
    message_count = 12.0
    messages = ['56532', '56535', '56548', '56549', '56553', '56577', '56647', '56667', '56676', '56683', '56704', '56708']
    nosy_count = 4.0
    nosy_names = ['gvanrossum', 'loewis', 'lealanko', 'JYMEN']
    pr_nums = []
    priority = 'normal'
    resolution = 'rejected'
    stage = None
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue1298'
    versions = ['Python 2.6']

    @lealanko
    Copy link
    Mannequin Author

    lealanko mannequin commented Oct 18, 2007

    The attached patch, based on Jean-Yves Mengant's work, is against svn
    head, and adds support for z/OS in particular, and non-ASCII platforms
    in general. Further details are in a separate mail to python-dev, which
    I will send shortly.

    @lealanko lealanko mannequin added build The build process and cross-build stdlib Python modules in the Lib dir extension-modules C modules in the Modules dir interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode type-feature A feature request or enhancement labels Oct 18, 2007
    @gvanrossum
    Copy link
    Member

    How important is z/OS? I'm very skeptical of the viability of any OS
    that uses an encoding that is not a superset of ASCII.

    @lealanko
    Copy link
    Mannequin Author

    lealanko mannequin commented Oct 19, 2007

    The character set of EBCDIC is a superset of the character set of
    ASCII. In fact CP1047, the variant used on z/OS, has the same
    character set as Latin-1. Only the encoding is completely
    different.

    As a non-ASCII platform, z/OS is certainly challenging for people
    used to modern conventions, and that is exactly why a familiar
    and easy-to-use tool like Python is so valuable there. As for
    viability, there are some obvious difficulties with Python's
    handling of source encodings, but as long as you restrict
    yourself to the ASCII _character set_ in your source code, the
    vast majority of things seem to work fine with my patch.

    There are more details in my mail to python-dev, which doesn't
    seem to have appeared yet. I'm not a subscriber, so it's probably
    pending moderation somewhere. (I hope "The list address accepts
    e-mail from non-members" is still correct information.)

    @lealanko
    Copy link
    Mannequin Author

    lealanko mannequin commented Oct 19, 2007

    How do you measure importance? Z/OS is not important to many
    people in the world, but to those to whom it is important, it is
    _very_ important, in a very tangible way. It was certainly
    important enough for someone to port Python to it. :)

    @gvanrossum
    Copy link
    Member

    How do you measure importance? Z/OS is not important to many
    people in the world, but to those to whom it is important, it is
    _very_ important, in a very tangible way. It was certainly
    important enough for someone to port Python to it. :)

    But is it important enough to cause a lot of work for the maintainers
    of Python, not just once (reviewing your mega-patch) but also in the
    future (making sure that the Z/OS support doesn't break)? We have
    accepted mega-patches for minority OS'es in the past, and our
    experience has unfortunately been that the contributors of such
    patches inevitable lose interest and the Python core developers are
    stuck with maintaining the patch -- or ripping it out, which is just
    as much work but at least promises that there will be no more work
    related to this issue in the future.

    I strongly recommend an alternative: the Z/OS community should
    maintain the patch set themselves. That way the burden of keeping it
    working is to those who benefit. It also makes it possible to decide
    not to upgrade to a newer version of Python because there aren't
    enough benefits. This is done for example by Nokia for its port to
    S60.

    The character set of EBCDIC is a superset of the character set of
    ASCII. In fact CP1047, the variant used on z/OS, has the same
    character set as Latin-1. Only the encoding is completely
    different.

    And there's the crux -- too much code (not just in the core but also
    in the library and in 3rd party code) assumes that the ASCII
    *encoding* is used in 8-bit strings. Breaking this will break tons of
    stuff. Glancing at your code it seems that you haven't tried the
    socket module or the higher-level internet modules to contact web
    servers on the internet...

    @gvanrossum
    Copy link
    Member

    FYI, I checked the moderation queue for python-dev and didn't find your
    message. You might want to resend.

    @lealanko
    Copy link
    Mannequin Author

    lealanko mannequin commented Oct 22, 2007

    Further comments on the port can be at:
    http://mail.python.org/pipermail/python-dev/2007-October/074991.html

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Oct 23, 2007

    I'm marking the patch as rejected, but leave it open. It seems clear
    that it cannot be incorporated into Python because of the maintenance
    issues (the only reasonable way to incorporate it would be if a
    long-time Python contributor steps forward and offers to maintain it,
    which seems unlikely).

    I'm leaving it open for the moment so people can easily find it. I
    encourage you to find some new home for the patch, e.g. by submitting it
    to PyPI (or to some System z community page if there is one); at this
    point, it should be closed.

    If the patch is still around five years from now, and still maintained,
    I might be interested in stepping forward to support it (assuming I am
    still a Python contributor at this point).

    @jymen
    Copy link
    Mannequin

    jymen mannequin commented Oct 23, 2007

    Let me provide my contribution to this discussion around this ZOS port
    topic :
    I initially made the Python 2.2 and 2.4 for ZOS platform and ask the
    python community to link to my pages as a support to ZOS at that time

    Lauri get in touch with me couple of weeks ago asking if I was planning
    to make a port of the 2.5 ; since I was waiting for 2.6 before
    initiating a new port, He goes ahead and makes the 2.5 port happen now.

    About how important is the ZOS system ; let me argue around that : even
    if ZOS is an IBM proprietary OS which
    has been there for decades it will be there for a long time since it
    occupies a very specific 'niche' on the os'es market
    And since IBM has heavily spoiled the migration path to Unix in order to
    keep its revenues on it migrating those
    systems to plain vanilla unixes is a nightmare => Today every US or
    European big company s having a ZOS sytem somewhere.
    Next even if ZOS is proprietary and EBCDIC it has a peasonable POSIX.5
    compliant subsystem and a descent C/C++ compiler
    which makes the port of python not too complex.

    From a script standpoint there are today 3 available scripting languages
    availables :

    • REXX (the mike cowlishaw script language) , perl and python)

    So keeping an accurate version of python on this platform makes sense as
    well to increase the python language usage

    Next I am still happy to continue supporting the ZOS port and I
    perfectly understand that fully integrating the ZOS idiosynchrasies
    into the Python main branch generates maintenability problems ... But
    some of the submitted problems included into Lauri patch are not ZOS
    specific and increase
    and simply increase the portability of the python Kernel to EBCDIC
    platform(ZOS and OS400)

    So finally my opinion here is the the problem can be splitted into two
    parts :

    1 General improvements patches which improves the Python kernel which
    can be incorporated in the python kernel and which
    may not be to complicated to maintain on the main branch

    2 ZOS idiosynchrasies (mainly located in making the autoconf/automake
    and build scripts compliant with ZOS ); this can be done specifically by
    zos python specialists which have access to ZOS mainframe in order to be
    able to test.

    I am happy to continue to make the topic 2 availables on the ZOS python
    port pages with the help of others contributors like Lauri and
    give them credit on the ZOS port page. So I propose to integrate lauri's
    patch in the 2.5.1 current and provide a modified ZOS compliant
    source tar containing modified autoconf/automake and dynamic loading stuff

    I Finally should emphazise on 2 complementary arguments :

    • The ZOS port has been used in industrial products(including the
      company for which I work today) and contributes to promote
      the python language on important non unix platforms showing the extreme
      portability of the language.
    • Even the IBM Labs in Boulder(colorado) get in touch with me in order
      to integrate the port in one of their project.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Oct 23, 2007

    Jean-Yves, please understand that no amount of discussion can likely
    change Guido's or my view on this patch. We both fully understand the
    relevance of OS/390, and *still* reject it, for the reasons discussed.

    Besides, integration into 2.5.1 is not possible, as it would violate our
    maintenance policy of not integrating new features into bug fix (2.x.y)
    releases. Integrating it into 2.6 might be possibly technically, but
    could be a waste of time since 2.x will shortly (i.e. within a few
    years) reach the end of its life. I doubt that the patch as it stands
    will work correctly on 3.x (as *that* stands).

    As you seem to be proposing that supporting EBCDIC will be "easy", just
    try to port the patch to 3.x to see how this assumption is wrong. In
    Python 3.x, Python source code *cannot* be interpreted as EBCDIC,
    without an encoding declaration, since the language specification says
    that the source code is UTF-8; there is no room for platform-specific
    derivations from that default. Also consider Guido's discussion of the
    networking code; unless you can report that httplib and ftplib work
    correctly, I doubt that the port is really complete.

    So I think the only choice is to maintain this port outside of the
    Python source tree, for a few more years. If you plan to contribute it
    again to the Python core some day, please keep track of all the
    individual contributors, as we will then require copyright agreements
    from everyone.

    @lealanko
    Copy link
    Mannequin Author

    lealanko mannequin commented Oct 24, 2007

    The port is certainly not yet "complete" in any sense. I have only fixed
    the most obvious places where explicit conversion between ASCII/Unicode
    values and platform-specific characters is required. There are a number
    of remaining issues, some of which cannot be fixed without major
    rehauls. The point of this first release is just to allow other
    interested people to chime in, to test the patch, and to suggest what
    should be done with it. The latter has certainly happened. :)

    I have no great interest in whether the patch ever gets incorporated
    into the main Python distribution. I do think, though, that it's a good
    idea to make the relationship between characters and Unicode values more
    explicit in the code in any case, and my patch shouldn't affect the
    behavior on any other platforms.

    Guido's comment about networking code is quite accurate, but the problem
    is social, not technical: there is already networking code that assumes
    that 8-bit string literals represent ASCII strings, and there is already
    text-processing code that assumes that 8-bit string literals represent
    "text" as found in ordinary text files on the platform. There is no
    reliable way to make both kinds of code work on a platform whose native
    encoding is not ASCII-compatible. In this sense, it is indeed impossible
    to port Python 2.x to an EBCDIC platform "completely", so that all
    existing code would continue to do "the right thing" without modifications.

    However, Py3k presents a fresh start, and one where this particular
    problem is gone, since string literals are no longer associated with a
    particular encoding, and bytes literals explicitly represent the ASCII
    values of the characters in the literal expression. Then text-processing
    code will likely use string literals, and it easy to make the default
    encoding platform-specific when transferring data between local text
    files and string objects. As far as I can see, EBCDIC shouldn't pose any
    special problems then.

    From what I read in PEP-3120 and the Py3k docs, there seems to be some
    confusion regarding source encoding issues.

    Firstly, Python source code is fundamentally _text_. For instance, a
    string literal is delimited by single quote or double quote characters.
    Characters themselves are abstract entities that have no inherent
    numeric values, although we can name them with e.g. Unicode code points,
    so we can say that the string delimiters are characters represented by
    the code points U+0022 and U+0027.

    What PEP-3120 specifies is a mechanism for mapping octet sequences into
    these abstract characters. If this is made part of the language
    specification, it presumably means that a conformant Py3k source file
    must start as UTF-8 at least until an encoding declaration is
    encountered. Further, a conformant Py3k implementation must accept such
    UTF-8 source files and decode them as specified in the PEP.

    So far so good. however, there is nothing to prevent an implementation
    from providing (as an extension) a facility to allow _other_ kinds of
    source as well. "There is no room for platform-specific derivations" is
    an arbitrary restriction: there are certainly quite a number of ways to
    support both UTF-8 and CP1047 source on z/OS: for instance, the
    filesystem allows storing the encoding of a text file as metadata.

    Moreover, there is a semantics-preserving mapping from UTF-8 source
    files to CP1047 source files: since non-ASCII characters can only appear
    in comments an string literals, and comments have no semantics, it
    suffices to \u-escape the exotic characters in string literals. Hence
    all Python source can be represented as native text on an EBCDIC
    platform.

    Of course you can declare that support for such extensions would be
    heretical and no EBCDIC source file would be True Python Source and no
    EBCDIC implementation would be a True Python Implementation, but I don't
    really care. Python 3000 _can_ be ported to z/OS much better than 2.x,
    and it probably will, even if you don't like it. Oh the wonders of open
    source. :)

    @gvanrossum
    Copy link
    Member

    I have no desire or time to continue this discussion. The ASCII
    assumption will be ingrained as deeply or deeper in 3.0 than in 2.x,
    just like 8-bit bytes and 2's complement. The computer industry has
    chosen, and there just isn't any incentive to invent abstractions for
    properties that are constant in 99.999999% of all practical situations.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    build The build process and cross-build extension-modules C modules in the Modules dir interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir topic-unicode type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant