Support for z/OS and EBCDIC. #45639

lealanko · 2007-10-18T17:14:11Z

BPO	1298
Nosy	@gvanrossum, @loewis
Files	python-20071018-zos.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2007-10-24.14:32:49.476>
created_at = <Date 2007-10-18.17:14:11.196>
labels = ['interpreter-core', 'build', 'extension-modules', 'type-feature', 'library', 'expert-unicode']
title = 'Support for z/OS and EBCDIC.'
updated_at = <Date 2007-10-24.14:32:49.466>
user = 'https://bugs.python.org/lealanko'

bugs.python.org fields:

activity = <Date 2007-10-24.14:32:49.466>
actor = 'gvanrossum'
assignee = 'none'
closed = True
closed_date = <Date 2007-10-24.14:32:49.476>
closer = 'gvanrossum'
components = ['Build', 'Distutils', 'Extension Modules', 'Interpreter Core', 'Library (Lib)', 'Unicode']
creation = <Date 2007-10-18.17:14:11.196>
creator = 'lealanko'
dependencies = []
files = ['8564']
hgrepos = []
issue_num = 1298
keywords = []
message_count = 12.0
messages = ['56532', '56535', '56548', '56549', '56553', '56577', '56647', '56667', '56676', '56683', '56704', '56708']
nosy_count = 4.0
nosy_names = ['gvanrossum', 'loewis', 'lealanko', 'JYMEN']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = None
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue1298'
versions = ['Python 2.6']

lealanko · 2007-10-18T17:14:09Z

The attached patch, based on Jean-Yves Mengant's work, is against svn
head, and adds support for z/OS in particular, and non-ASCII platforms
in general. Further details are in a separate mail to python-dev, which
I will send shortly.

gvanrossum · 2007-10-18T17:41:54Z

How important is z/OS? I'm very skeptical of the viability of any OS
that uses an encoding that is not a superset of ASCII.

lealanko · 2007-10-19T07:10:49Z

The character set of EBCDIC is a superset of the character set of
ASCII. In fact CP1047, the variant used on z/OS, has the same
character set as Latin-1. Only the encoding is completely
different.

As a non-ASCII platform, z/OS is certainly challenging for people
used to modern conventions, and that is exactly why a familiar
and easy-to-use tool like Python is so valuable there. As for
viability, there are some obvious difficulties with Python's
handling of source encodings, but as long as you restrict
yourself to the ASCII _character set_ in your source code, the
vast majority of things seem to work fine with my patch.

There are more details in my mail to python-dev, which doesn't
seem to have appeared yet. I'm not a subscriber, so it's probably
pending moderation somewhere. (I hope "The list address accepts
e-mail from non-members" is still correct information.)

lealanko · 2007-10-19T07:12:24Z

How do you measure importance? Z/OS is not important to many
people in the world, but to those to whom it is important, it is
_very_ important, in a very tangible way. It was certainly
important enough for someone to port Python to it. :)

gvanrossum · 2007-10-19T14:02:24Z

How do you measure importance? Z/OS is not important to many
people in the world, but to those to whom it is important, it is
_very_ important, in a very tangible way. It was certainly
important enough for someone to port Python to it. :)

But is it important enough to cause a lot of work for the maintainers
of Python, not just once (reviewing your mega-patch) but also in the
future (making sure that the Z/OS support doesn't break)? We have
accepted mega-patches for minority OS'es in the past, and our
experience has unfortunately been that the contributors of such
patches inevitable lose interest and the Python core developers are
stuck with maintaining the patch -- or ripping it out, which is just
as much work but at least promises that there will be no more work
related to this issue in the future.

I strongly recommend an alternative: the Z/OS community should
maintain the patch set themselves. That way the burden of keeping it
working is to those who benefit. It also makes it possible to decide
not to upgrade to a newer version of Python because there aren't
enough benefits. This is done for example by Nokia for its port to
S60.

The character set of EBCDIC is a superset of the character set of
ASCII. In fact CP1047, the variant used on z/OS, has the same
character set as Latin-1. Only the encoding is completely
different.

And there's the crux -- too much code (not just in the core but also
in the library and in 3rd party code) assumes that the ASCII
*encoding* is used in 8-bit strings. Breaking this will break tons of
stuff. Glancing at your code it seems that you haven't tried the
socket module or the higher-level internet modules to contact web
servers on the internet...

gvanrossum · 2007-10-19T23:51:22Z

FYI, I checked the moderation queue for python-dev and didn't find your
message. You might want to resend.

lealanko · 2007-10-22T13:32:24Z

Further comments on the port can be at:
http://mail.python.org/pipermail/python-dev/2007-October/074991.html

loewis · 2007-10-23T05:32:09Z

I'm marking the patch as rejected, but leave it open. It seems clear
that it cannot be incorporated into Python because of the maintenance
issues (the only reasonable way to incorporate it would be if a
long-time Python contributor steps forward and offers to maintain it,
which seems unlikely).

I'm leaving it open for the moment so people can easily find it. I
encourage you to find some new home for the patch, e.g. by submitting it
to PyPI (or to some System z community page if there is one); at this
point, it should be closed.

If the patch is still around five years from now, and still maintained,
I might be interested in stepping forward to support it (assuming I am
still a Python contributor at this point).

jymen · 2007-10-23T09:46:54Z

Let me provide my contribution to this discussion around this ZOS port
topic :
I initially made the Python 2.2 and 2.4 for ZOS platform and ask the
python community to link to my pages as a support to ZOS at that time

Lauri get in touch with me couple of weeks ago asking if I was planning
to make a port of the 2.5 ; since I was waiting for 2.6 before
initiating a new port, He goes ahead and makes the 2.5 port happen now.

About how important is the ZOS system ; let me argue around that : even
if ZOS is an IBM proprietary OS which
has been there for decades it will be there for a long time since it
occupies a very specific 'niche' on the os'es market
And since IBM has heavily spoiled the migration path to Unix in order to
keep its revenues on it migrating those
systems to plain vanilla unixes is a nightmare => Today every US or
European big company s having a ZOS sytem somewhere.
Next even if ZOS is proprietary and EBCDIC it has a peasonable POSIX.5
compliant subsystem and a descent C/C++ compiler
which makes the port of python not too complex.

From a script standpoint there are today 3 available scripting languages
availables :

REXX (the mike cowlishaw script language) , perl and python)

So keeping an accurate version of python on this platform makes sense as
well to increase the python language usage

Next I am still happy to continue supporting the ZOS port and I
perfectly understand that fully integrating the ZOS idiosynchrasies
into the Python main branch generates maintenability problems ... But
some of the submitted problems included into Lauri patch are not ZOS
specific and increase
and simply increase the portability of the python Kernel to EBCDIC
platform(ZOS and OS400)

So finally my opinion here is the the problem can be splitted into two
parts :

1 General improvements patches which improves the Python kernel which
can be incorporated in the python kernel and which
may not be to complicated to maintain on the main branch

2 ZOS idiosynchrasies (mainly located in making the autoconf/automake
and build scripts compliant with ZOS ); this can be done specifically by
zos python specialists which have access to ZOS mainframe in order to be
able to test.

I am happy to continue to make the topic 2 availables on the ZOS python
port pages with the help of others contributors like Lauri and
give them credit on the ZOS port page. So I propose to integrate lauri's
patch in the 2.5.1 current and provide a modified ZOS compliant
source tar containing modified autoconf/automake and dynamic loading stuff

I Finally should emphazise on 2 complementary arguments :

The ZOS port has been used in industrial products(including the
company for which I work today) and contributes to promote
the python language on important non unix platforms showing the extreme
portability of the language.
Even the IBM Labs in Boulder(colorado) get in touch with me in order
to integrate the port in one of their project.

loewis · 2007-10-23T17:14:05Z

Jean-Yves, please understand that no amount of discussion can likely
change Guido's or my view on this patch. We both fully understand the
relevance of OS/390, and *still* reject it, for the reasons discussed.

Besides, integration into 2.5.1 is not possible, as it would violate our
maintenance policy of not integrating new features into bug fix (2.x.y)
releases. Integrating it into 2.6 might be possibly technically, but
could be a waste of time since 2.x will shortly (i.e. within a few
years) reach the end of its life. I doubt that the patch as it stands
will work correctly on 3.x (as *that* stands).

As you seem to be proposing that supporting EBCDIC will be "easy", just
try to port the patch to 3.x to see how this assumption is wrong. In
Python 3.x, Python source code *cannot* be interpreted as EBCDIC,
without an encoding declaration, since the language specification says
that the source code is UTF-8; there is no room for platform-specific
derivations from that default. Also consider Guido's discussion of the
networking code; unless you can report that httplib and ftplib work
correctly, I doubt that the port is really complete.

So I think the only choice is to maintain this port outside of the
Python source tree, for a few more years. If you plan to contribute it
again to the Python core some day, please keep track of all the
individual contributors, as we will then require copyright agreements
from everyone.

lealanko · 2007-10-24T10:14:49Z

The port is certainly not yet "complete" in any sense. I have only fixed
the most obvious places where explicit conversion between ASCII/Unicode
values and platform-specific characters is required. There are a number
of remaining issues, some of which cannot be fixed without major
rehauls. The point of this first release is just to allow other
interested people to chime in, to test the patch, and to suggest what
should be done with it. The latter has certainly happened. :)

I have no great interest in whether the patch ever gets incorporated
into the main Python distribution. I do think, though, that it's a good
idea to make the relationship between characters and Unicode values more
explicit in the code in any case, and my patch shouldn't affect the
behavior on any other platforms.

Guido's comment about networking code is quite accurate, but the problem
is social, not technical: there is already networking code that assumes
that 8-bit string literals represent ASCII strings, and there is already
text-processing code that assumes that 8-bit string literals represent
"text" as found in ordinary text files on the platform. There is no
reliable way to make both kinds of code work on a platform whose native
encoding is not ASCII-compatible. In this sense, it is indeed impossible
to port Python 2.x to an EBCDIC platform "completely", so that all
existing code would continue to do "the right thing" without modifications.

However, Py3k presents a fresh start, and one where this particular
problem is gone, since string literals are no longer associated with a
particular encoding, and bytes literals explicitly represent the ASCII
values of the characters in the literal expression. Then text-processing
code will likely use string literals, and it easy to make the default
encoding platform-specific when transferring data between local text
files and string objects. As far as I can see, EBCDIC shouldn't pose any
special problems then.

From what I read in PEP-3120 and the Py3k docs, there seems to be some
confusion regarding source encoding issues.

Firstly, Python source code is fundamentally _text_. For instance, a
string literal is delimited by single quote or double quote characters.
Characters themselves are abstract entities that have no inherent
numeric values, although we can name them with e.g. Unicode code points,
so we can say that the string delimiters are characters represented by
the code points U+0022 and U+0027.

What PEP-3120 specifies is a mechanism for mapping octet sequences into
these abstract characters. If this is made part of the language
specification, it presumably means that a conformant Py3k source file
must start as UTF-8 at least until an encoding declaration is
encountered. Further, a conformant Py3k implementation must accept such
UTF-8 source files and decode them as specified in the PEP.

So far so good. however, there is nothing to prevent an implementation
from providing (as an extension) a facility to allow _other_ kinds of
source as well. "There is no room for platform-specific derivations" is
an arbitrary restriction: there are certainly quite a number of ways to
support both UTF-8 and CP1047 source on z/OS: for instance, the
filesystem allows storing the encoding of a text file as metadata.

Moreover, there is a semantics-preserving mapping from UTF-8 source
files to CP1047 source files: since non-ASCII characters can only appear
in comments an string literals, and comments have no semantics, it
suffices to \u-escape the exotic characters in string literals. Hence
all Python source can be represented as native text on an EBCDIC
platform.

Of course you can declare that support for such extensions would be
heretical and no EBCDIC source file would be True Python Source and no
EBCDIC implementation would be a True Python Implementation, but I don't
really care. Python 3000 _can_ be ported to z/OS much better than 2.x,
and it probably will, even if you don't like it. Oh the wonders of open
source. :)

gvanrossum · 2007-10-24T14:32:49Z

I have no desire or time to continue this discussion. The ASCII
assumption will be ingrained as deeply or deeper in 3.0 than in 2.x,
just like 8-bit bytes and 2's complement. The computer industry has
chosen, and there just isn't any incentive to invent abstractions for
properties that are constant in 99.999999% of all practical situations.

lealanko mannequin added build The build process and cross-build stdlib Python modules in the Lib dir extension-modules C modules in the Modules dir interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode type-feature A feature request or enhancement labels Oct 18, 2007

gvanrossum closed this as completed Oct 24, 2007

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for z/OS and EBCDIC. #45639

Support for z/OS and EBCDIC. #45639

lealanko mannequin commented Oct 18, 2007

lealanko mannequin commented Oct 18, 2007

gvanrossum commented Oct 18, 2007

lealanko mannequin commented Oct 19, 2007

lealanko mannequin commented Oct 19, 2007

gvanrossum commented Oct 19, 2007

gvanrossum commented Oct 19, 2007

lealanko mannequin commented Oct 22, 2007

loewis mannequin commented Oct 23, 2007

jymen mannequin commented Oct 23, 2007

loewis mannequin commented Oct 23, 2007

lealanko mannequin commented Oct 24, 2007

gvanrossum commented Oct 24, 2007

Support for z/OS and EBCDIC. #45639

Support for z/OS and EBCDIC. #45639

Comments

lealanko mannequin commented Oct 18, 2007

lealanko mannequin commented Oct 18, 2007

gvanrossum commented Oct 18, 2007

lealanko mannequin commented Oct 19, 2007

lealanko mannequin commented Oct 19, 2007

gvanrossum commented Oct 19, 2007

gvanrossum commented Oct 19, 2007

lealanko mannequin commented Oct 22, 2007

loewis mannequin commented Oct 23, 2007

jymen mannequin commented Oct 23, 2007

loewis mannequin commented Oct 23, 2007

lealanko mannequin commented Oct 24, 2007

gvanrossum commented Oct 24, 2007