Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git-p4: python 3 compatibility #673

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

yangskyboxlabs
Copy link
Contributor

@yangskyboxlabs yangskyboxlabs commented Nov 28, 2019

This patchset attempts to bring python 3 compatibility to git-p4.

I'm using this PR mostly to access the test pipeline, as I do not have p4d access locally. Please pardon the noise.

Discussion regarding patchset is on the maillinglist under [RFC PATCH 0/4] git-p4: python 3 compatability.

@gitgitgadget-git
Copy link

Welcome to GitGitGadget

Hi @yangskyboxlabs, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests.

Please make sure that this Pull Request has a good description, as it will be used as cover letter.

Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:

  • the lines should not exceed 76 columns,
  • the first line should be like a header and typically start with a prefix like "tests:" or "commit:", and
  • the commit messages' body should be describing the "why?" of the change.
  • Finally, the commit messages should end in a Signed-off-by: line matching the commits' author.

It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code.

Contributing the patches

Before you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form /allow. A good way to find other contributors is to locate recent pull requests where someone has been /allowed:

Both the person who commented /allow and the PR author are able to /allow you.

An alternative is the channel #git-devel on the FreeNode IRC network:

<newcontributor> I've just created my first PR, could someone please /allow me? https://github.com/gitgitgadget/git/pull/12345
<veteran> newcontributor: it is done
<newcontributor> thanks!

Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment /submit.

After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions.

If you want to see what email(s) would be sent for a submit request, add a PR comment /preview to have the email(s) sent to you. You must have a public GitHub email address for this.

If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox ("raw") file corresponding to the mail you want to reply to from the Git mailing list. If you use GMail, you can upload that raw mbox file via:

curl -g --user "<EMailAddress>:<Password>" --url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt

Need help?

New contributors who want advice are encouraged to join
git-mentoring@googlegroups.com,
where volunteers who regularly contribute to Git are willing to answer newbie
questions, give advice, or otherwise provide mentoring to interested
contributors. You must join in order to post or view messages, but anyone can
join.

You may also be able to find help in real time in the developer IRC channel,
#git-devel on Freenode. Remember
that IRC does not support offline messaging, so if you send someone a private
message and log out, they cannot respond to you. The scrollback of #git-devel
is archived, though.

@dscho
Copy link
Member

dscho commented Nov 28, 2019

I'm using this PR mostly to access the test pipeline, as I do not have p4d access locally. Please pardon the noise.

No worries, I did suggest to use a PR for that very purpose.

dscho added a commit to dscho/gitgitgadget that referenced this pull request Nov 28, 2019
In Markdown, it is best not to have line breaks inside paragraphs
because most Markdown renderers interpret those as forced line breaks.

In ea87782 (welcome: add pointers to git-mentoring, IRC, 2019-11-15), we
introduced a "Need help?" section in GitGitGadget's welcome message that
does, however, have a lot of line breaks in its paragraphs, which makes
that part of the welcome message look a bit awkward, see e.g.
git/git#673 (comment)

Let's just replace those line breaks by spaces and let GitHub's Markdown
renderer (and/or the browser's HTML renderer) decide how to flow the
text.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho
Copy link
Member

dscho commented Nov 28, 2019

/allow

@gitgitgadget-git
Copy link

User yangskyboxlabs is now allowed to use GitGitGadget.

WARNING: yangskyboxlabs has no public email address set on GitHub

@dscho
Copy link
Member

dscho commented Nov 28, 2019

(I just told GitGitGadget that you're not a spammer, just in case that you want to use a more convenient method to contribute these patches than working git send-email manually...)

@seraphire
Copy link
Contributor

Hi @yangskyboxlabs, I have been working on the Python3 conversion as well. https://github.com/seraphire/git/tree/seraphire/p4-python3-unicode has my current code if you would find it of interest. I'm working on breaking it up into multiple patches for submission.

@yangskyboxlabs yangskyboxlabs changed the title git-p4: python 3 compatability git-p4: python 3 compatibility Dec 6, 2019
yangskyboxlabs and others added 4 commits January 24, 2020 00:59
Python2.6 and earlier have been end-of-life'd for many years now, and
we actually already use 2.7-only features in the code. Make the version
check reflect current realities.

This also removes the need to explicitly define CalledProcessError if
it's not available.

Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
Python 3 handles strings differently than Python 2.7. Since Python 2
is reaching it's end of life, a series of changes are being submitted to
enable python 3.5 and following support. The current code fails basic
tests under python 3.5.

Some codepaths can represent a command line the program
internally prepares to execute either as a single string
(i.e. each token properly quoted, concatenated with $IFS) or
as a list of argv[] elements, and there are 9 places where
we say "if X is isinstance(_, basestring), then do this
thing to handle X as a command line in a single string; if
not, X is a command line in a list form".

This does not work well with Python 3, as there is no
basestring (everything is Unicode now), and even with Python
2, it was not an ideal way to tell the two cases apart,
because an internally formed command line could have been in
a single Unicode string.

Flip the check to say "if X is not a list, then handle X as
a command line in a single string; otherwise treat it as a
command line in a list form".

This will get rid of references to 'basestring', to migrate
the code ready for Python 3.

Thanks-to: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Ben Keene <seraphire@gmail.com>
Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
Now that python2.7 is the minimum required version and we no longer use
the basestring type, it is not necessary to use type aliasing to ensure
python3 compatibility.

Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
The marshalled dict in the response given on STDOUT by p4 uses `str` for
keys and string values. When run using python3, these values are
deserialized as `bytes`, leading to a whole host of problems as the rest
of the code assumes `str` is used throughout.

This patch changes the deserialization behaviour such that, as much as
possible, text output from p4 is decoded to native unicode strings.
Exceptions are made for the field `data` as it is usually arbitrary
binary data. `depotFile[0-9]*`, `path`, and `clientFile` are also exempt
as they contain path strings not encoded with UTF-8, and must survive
survive round-trip back to p4.

Conversely, text data being piped to p4 must always be encoded when
running under python3.

encode_text_stream() and decode_text_stream() were added to make these
transformations more convenient.

Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
yangskyboxlabs and others added 10 commits January 24, 2020 01:00
Under python3, calls to write() on the stream to `git fast-import` must
be encoded.  This patch wraps the IO object such that this encoding is
done transparently.

Conversely, any text data read from subprocesses must also be decoded
before running through the rest of the pipeline.

Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
Reviewed-by: Ben Keene <seraphire@gmail.com>
P4 allows essentially arbitrary encoding for path data while we would
perfer to be dealing only with unicode strings.  Since path data need to
survive round-trip back to p4, this patch implements the general policy
that we store path data as-is, but decode them to unicode before doing
any non-trivial processing.

A new `decode_path()` method is provided that generally does the correct
conversion, taking into account `git-p4.pathEncoding` configuration.

For python2.7, path strings will be left as-is if it only contains ASCII
characters.

For python3, decoding is always done so that we have str objects.

Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
Reviewed-by: Ben Keene <seraphire@gmail.com>
Opening .gitp4-usercache.txt in text mode makes python 3 happy without
explicitly adding encoding and decoding.

Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
Reviewed-by: Ben Keene <seraphire@gmail.com>
p4 does not appear to understand marshal format version 3 and above.
Version 2 was the latest supported by python-2.7.

Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
Reviewed-by: Ben Keene <seraphire@gmail.com>
As part of its importing process, git-p4 sends a `checkpoint` followed
immediately by `progress` to fast-import to force synchronization.
Due to buffering, it is possible for the `progress` command to not be
flushed before git-p4 begins to wait for the corresponding response.
This causes the script to freeze completely, and is consistently
observable at least on python-3.6.9.

Make sure this command sequence is completely flushed before waiting.

Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
Reviewed-by: Ben Keene <seraphire@gmail.com>
For python3, reduce() has been moved to functools.reduce().  This is
also available in python2.7.

Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
Python3 uses dict.items() instead of .iteritems() to provide
iteratoration over dict.  Although items() is technically less efficient
for python2.7 (allocates a new list instead of simply iterating), the
amount of data involved is very small and the penalty negligible.

Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
Reviewed-by: Ben Keene <seraphire@gmail.com>
It is not clear why a generator was used to create the regex used to
parse git-diff-tree output; I assume an early implementation required
it, but is not part of the mainline change.

Simply use a lazily initialized global instead.

Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
Reviewed-by: Ben Keene <seraphire@gmail.com>
Python3 deprecates raw_input() from 2.7 and replaced it with input().
Since we do not need 2.7's input() semantics, `raw_input()` is aliased
to `input()` for easy forward compatability.

Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
We currently have no ideal way to run git-p4.py tests using both
python2 and python3 without changing the handling of GIT_PROVE_OPTS[1].

After a short discussion, it was agreed that having *-gcc and *-clang
builds run tests using python3 and python2, respectively, is an
acceptable workaround for the moment.

[1]: https://public-inbox.org/git/20200123141626.GB6837@szeder.dev/

Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
@sasquacz2017
Copy link

OK

For consistency, use the built-in percentage formatting functionality
instead of calculating them manually. We print percentages with no
decimals.

Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
Copy link
Contributor

@seraphire seraphire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Removing "basestring" removes the type name overloading and makes Python 3 the target branch.

Note this message was intended for the first commit! I'll review the rest of them now.

Copy link
Contributor

@seraphire seraphire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

2.7 has been the only supported version for some time, this reduces the complexity of the transition.

Note this message was intended for the second commit! I'll review the rest of them now.

@@ -144,7 +154,7 @@ def prompt(prompt_text):
"""
choices = set(m.group(1) for m in re.finditer(r"\[(.)\]", prompt_text))
while True:
response = raw_input(prompt_text).strip().lower()
response = input(prompt_text).strip().lower()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am having trouble with Py2 raw_input on Windows under MINGW64. I have submitted a PR that converts this to sys.stdin.readline(), but that change could be applied on top of this. I don't think this needs a change at this time, but for historical records, the PR is #698

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants