New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
git-p4: python 3 compatibility #673
base: master
Are you sure you want to change the base?
Conversation
Welcome to GitGitGadgetHi @yangskyboxlabs, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests. Please make sure that this Pull Request has a good description, as it will be used as cover letter. Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:
It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code. Contributing the patchesBefore you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form Both the person who commented An alternative is the channel
Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions. If you want to see what email(s) would be sent for a submit request, add a PR comment If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox ("raw") file corresponding to the mail you want to reply to from the Git mailing list. If you use GMail, you can upload that raw mbox file via: curl -g --user "<EMailAddress>:<Password>" --url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt Need help?New contributors who want advice are encouraged to join You may also be able to find help in real time in the developer IRC channel, |
803d9a0
to
5fe95e9
Compare
No worries, I did suggest to use a PR for that very purpose. |
In Markdown, it is best not to have line breaks inside paragraphs because most Markdown renderers interpret those as forced line breaks. In ea87782 (welcome: add pointers to git-mentoring, IRC, 2019-11-15), we introduced a "Need help?" section in GitGitGadget's welcome message that does, however, have a lot of line breaks in its paragraphs, which makes that part of the welcome message look a bit awkward, see e.g. git/git#673 (comment) Let's just replace those line breaks by spaces and let GitHub's Markdown renderer (and/or the browser's HTML renderer) decide how to flow the text. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
5fe95e9
to
5c0b48d
Compare
/allow |
User yangskyboxlabs is now allowed to use GitGitGadget. WARNING: yangskyboxlabs has no public email address set on GitHub |
(I just told GitGitGadget that you're not a spammer, just in case that you want to use a more convenient method to contribute these patches than working |
f92f582
to
b921a2e
Compare
Hi @yangskyboxlabs, I have been working on the Python3 conversion as well. https://github.com/seraphire/git/tree/seraphire/p4-python3-unicode has my current code if you would find it of interest. I'm working on breaking it up into multiple patches for submission. |
0c4e5b7
to
d50427f
Compare
c3e5e36
to
fb32700
Compare
Python2.6 and earlier have been end-of-life'd for many years now, and we actually already use 2.7-only features in the code. Make the version check reflect current realities. This also removes the need to explicitly define CalledProcessError if it's not available. Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
Python 3 handles strings differently than Python 2.7. Since Python 2 is reaching it's end of life, a series of changes are being submitted to enable python 3.5 and following support. The current code fails basic tests under python 3.5. Some codepaths can represent a command line the program internally prepares to execute either as a single string (i.e. each token properly quoted, concatenated with $IFS) or as a list of argv[] elements, and there are 9 places where we say "if X is isinstance(_, basestring), then do this thing to handle X as a command line in a single string; if not, X is a command line in a list form". This does not work well with Python 3, as there is no basestring (everything is Unicode now), and even with Python 2, it was not an ideal way to tell the two cases apart, because an internally formed command line could have been in a single Unicode string. Flip the check to say "if X is not a list, then handle X as a command line in a single string; otherwise treat it as a command line in a list form". This will get rid of references to 'basestring', to migrate the code ready for Python 3. Thanks-to: Junio C Hamano <gitster@pobox.com> Signed-off-by: Ben Keene <seraphire@gmail.com> Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
Now that python2.7 is the minimum required version and we no longer use the basestring type, it is not necessary to use type aliasing to ensure python3 compatibility. Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
The marshalled dict in the response given on STDOUT by p4 uses `str` for keys and string values. When run using python3, these values are deserialized as `bytes`, leading to a whole host of problems as the rest of the code assumes `str` is used throughout. This patch changes the deserialization behaviour such that, as much as possible, text output from p4 is decoded to native unicode strings. Exceptions are made for the field `data` as it is usually arbitrary binary data. `depotFile[0-9]*`, `path`, and `clientFile` are also exempt as they contain path strings not encoded with UTF-8, and must survive survive round-trip back to p4. Conversely, text data being piped to p4 must always be encoded when running under python3. encode_text_stream() and decode_text_stream() were added to make these transformations more convenient. Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
Under python3, calls to write() on the stream to `git fast-import` must be encoded. This patch wraps the IO object such that this encoding is done transparently. Conversely, any text data read from subprocesses must also be decoded before running through the rest of the pipeline. Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com> Reviewed-by: Ben Keene <seraphire@gmail.com>
P4 allows essentially arbitrary encoding for path data while we would perfer to be dealing only with unicode strings. Since path data need to survive round-trip back to p4, this patch implements the general policy that we store path data as-is, but decode them to unicode before doing any non-trivial processing. A new `decode_path()` method is provided that generally does the correct conversion, taking into account `git-p4.pathEncoding` configuration. For python2.7, path strings will be left as-is if it only contains ASCII characters. For python3, decoding is always done so that we have str objects. Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com> Reviewed-by: Ben Keene <seraphire@gmail.com>
Opening .gitp4-usercache.txt in text mode makes python 3 happy without explicitly adding encoding and decoding. Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com> Reviewed-by: Ben Keene <seraphire@gmail.com>
p4 does not appear to understand marshal format version 3 and above. Version 2 was the latest supported by python-2.7. Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com> Reviewed-by: Ben Keene <seraphire@gmail.com>
As part of its importing process, git-p4 sends a `checkpoint` followed immediately by `progress` to fast-import to force synchronization. Due to buffering, it is possible for the `progress` command to not be flushed before git-p4 begins to wait for the corresponding response. This causes the script to freeze completely, and is consistently observable at least on python-3.6.9. Make sure this command sequence is completely flushed before waiting. Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com> Reviewed-by: Ben Keene <seraphire@gmail.com>
For python3, reduce() has been moved to functools.reduce(). This is also available in python2.7. Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
Python3 uses dict.items() instead of .iteritems() to provide iteratoration over dict. Although items() is technically less efficient for python2.7 (allocates a new list instead of simply iterating), the amount of data involved is very small and the penalty negligible. Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com> Reviewed-by: Ben Keene <seraphire@gmail.com>
It is not clear why a generator was used to create the regex used to parse git-diff-tree output; I assume an early implementation required it, but is not part of the mainline change. Simply use a lazily initialized global instead. Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com> Reviewed-by: Ben Keene <seraphire@gmail.com>
Python3 deprecates raw_input() from 2.7 and replaced it with input(). Since we do not need 2.7's input() semantics, `raw_input()` is aliased to `input()` for easy forward compatability. Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
We currently have no ideal way to run git-p4.py tests using both python2 and python3 without changing the handling of GIT_PROVE_OPTS[1]. After a short discussion, it was agreed that having *-gcc and *-clang builds run tests using python3 and python2, respectively, is an acceptable workaround for the moment. [1]: https://public-inbox.org/git/20200123141626.GB6837@szeder.dev/ Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
fb32700
to
050387c
Compare
OK |
For consistency, use the built-in percentage formatting functionality instead of calculating them manually. We print percentages with no decimals. Signed-off-by: Yang Zhao <yang.zhao@skyboxlabs.com>
bf96982
to
de2b2c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Removing "basestring" removes the type name overloading and makes Python 3 the target branch.
Note this message was intended for the first commit! I'll review the rest of them now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
2.7 has been the only supported version for some time, this reduces the complexity of the transition.
Note this message was intended for the second commit! I'll review the rest of them now.
@@ -144,7 +154,7 @@ def prompt(prompt_text): | |||
""" | |||
choices = set(m.group(1) for m in re.finditer(r"\[(.)\]", prompt_text)) | |||
while True: | |||
response = raw_input(prompt_text).strip().lower() | |||
response = input(prompt_text).strip().lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am having trouble with Py2 raw_input on Windows under MINGW64. I have submitted a PR that converts this to sys.stdin.readline(), but that change could be applied on top of this. I don't think this needs a change at this time, but for historical records, the PR is #698
This patchset attempts to bring python 3 compatibility to
git-p4
.I'm using this PR mostly to access the test pipeline, as I do not have
p4d
access locally. Please pardon the noise.Discussion regarding patchset is on the maillinglist under
[RFC PATCH 0/4] git-p4: python 3 compatability
.