-
Notifications
You must be signed in to change notification settings - Fork 156
git-p4: handle non-unicode characters in p4 changelist description #864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
git-p4: handle non-unicode characters in p4 changelist description #864
Conversation
P4 allows non-unicode characters in changelist description body, so git-p4 needs to be character encoding aware when reading p4 cl This change adds 2 config options, one specifies encoding, the other specifies erro handling upon unrecognized character. Those configs apply when it reads p4 description text, mostly from commands "p4 describe" and "p4 changes". Signed-off-by: Feiynag Xue <fxue@roku.com>
/submit |
Submitted as pull.864.git.1612371600332.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
On the Git mailing list, Junio C Hamano wrote (reply to this):
|
On the Git mailing list, Luke Diamand wrote (reply to this):
|
User |
On the Git mailing list, Andrew Oakley wrote (reply to this):
|
User |
description text. This encoding is used to transcode the text to | ||
UTF-8. Defaults to "utf_8". | ||
|
||
git-p4.clDescNonUnicodeHandling:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Configuration name of the document not match with the Py file. Should be clDescEncodingErrHandling here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joelliang would you terribly mind reviewing on the Git mailing list instead? See the "reply to this" instructions and find the mail that you want to reply to here.
P4 allows non-unicode characters in changelist description body,
so git-p4 needs to be character encoding aware when reading p4 cl.
This change adds 2 config options: one specifies encoding,
the other specifies erro handling upon unrecognized character.
Those configs apply when it reads p4 description text, mostly
from commands "p4 describe" and "p4 changes".
I have an open question in mind: what might be the best default config to use?
Currently the python's
bytes.decode()
is called with default utf-8 and strict error handling, so git-p4 pukes on non-unicode characters. I encountered it whengit p4 sync
attempts to ingest a certain CL.It seems to make sense to default to
replace
so that it gets rid of non-unicode chars while trying to retain information. However, i am uncertain on if we have use cases where it relies on the stop-on-non-unicode behavior. (Hypothetically say an automation that's expected to return error on non-unicode char in order to stop them from propagating further?)I tested it with
git p4 sync
to a P4 CL that somehow has non-unicode control character in description. Withgit-p4.cldescencodingerrhandling=ignore
, it proceeded without error.cc: Luke Diamand luke@diamand.org
cc: Andrew Oakley andrew@adoakley.name