Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 support, configuration of conflict resolution and configuration of scm command #26

Merged
merged 7 commits into from
Jun 3, 2015
Merged

Conversation

jacobilsoe
Copy link
Contributor

Adds UTF-8 support. On Windows I also need to do a chcp 65001.

@WtfJoke
Copy link
Member

WtfJoke commented May 30, 2015

Thanks for the input. 👍 I'll have a look on it (probably during the next week)

EDIT: Perhaps @ohumbel can have a look intoit, if he have some spare time 😸

@jacobilsoe jacobilsoe changed the title UTF-8 support UTF-8 support and configuration of conflict resolution Jun 1, 2015
@jacobilsoe jacobilsoe changed the title UTF-8 support and configuration of conflict resolution UTF-8 support, configuration of conflict resolution and configuration of scm command Jun 2, 2015
@WtfJoke
Copy link
Member

WtfJoke commented Jun 2, 2015

I had a rough look on it. Looks good, thank you!
About the utf8/encoding support. I've read in python3 its better to use open with the encoding param.
Like: f = open("myfile.txt", "r", encoding="utf-8")
What do you think about that? I consider changing this.

And is it possible that you add a small test for the encoding? Like a file which wasnt readable before.
Thanks!

@ohumbel
Copy link
Member

ohumbel commented Jun 2, 2015

I definitely prefer: f = open("myfile.txt", "r", encoding="utf-8")
If this breaks the tool for some users, we should make the default encoding configurable, like

  1. utf-8 [default]
  2. platform [== no encoding parameter to open()]
  3. specific encoding [like ISO 8859-1]

@jacobilsoe
Copy link
Contributor Author

Yes, you're right. I have changed it to use python 3 style, added a test and fixed a typo.

@WtfJoke WtfJoke merged commit 45fe00c into rtcTo:develop Jun 3, 2015
@WtfJoke
Copy link
Member

WtfJoke commented Jun 3, 2015

Thanks, I merged them into my development branch

@WtfJoke
Copy link
Member

WtfJoke commented Jun 7, 2015

I tested the code today and on my windows machine I get following exception:

Traceback (most recent call last):
  File "E:\SomePath\rtc2git\migration.py", line 43, in migrate
    componentbaselineentries = rtc.getcomponentbaselineentriesfromstream(streamuuid)
  File "E:\SomePath\rtc2git\rtcFunctions.py", line 122, in getcomponentbaselineentriesfromstream
    for line in file:
  File "E:\Program Files\Python34\lib\codecs.py", line 319, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 1249: invalid start byte

The file itself has utf8 encoding according to notepad++. Any idea? Is that the chcp 65001 you were mentioning?

EDIT: I think I had a look on the wrong file. The other file has the encoding ANSI for some reason...

@jacobilsoe
Copy link
Contributor Author

Hmm. Have you tried chcp 65001? Remember to run chcp without arguments to see the current codepage before switching.

@WtfJoke
Copy link
Member

WtfJoke commented Jun 7, 2015

I havent tried chcp 65001. chcp in console says 850... So do I need to switch something?

@jacobilsoe
Copy link
Contributor Author

Yeah, you need to do a chcp 65001. Then try to migrate. After this you can switch back: chcp 850. But it seems like utf-8 needs to be configurable as well.

@WtfJoke
Copy link
Member

WtfJoke commented Jun 7, 2015

Sadly doesnt work. The file of the compare gets created always in encoding ANSI, even if I switch to chcp 65001

@jacobilsoe
Copy link
Contributor Author

Strange, it works perfectly for me. When I get some time I will have a look. Do you propose that the utf-8 encoding be made an option?

@WtfJoke
Copy link
Member

WtfJoke commented Jun 7, 2015

I dont know... I preferr to find a solution that the file will be written in a proper encoding...

@jacobilsoe
Copy link
Contributor Author

Yep, so do I, but the output from scm is piped to a file, right?

@WtfJoke
Copy link
Member

WtfJoke commented Jun 7, 2015

Yes, thats correct

EDIT:

The 65000/1 code pages are encoded as UTF-7/8 to allow to working with unicode data in 7-bit and 8-bit environments, however there is still VERY limited support for unicode in the CMD shell, piping, redirection and most commands are still ANSI only. (http://ss64.com/nt/chcp.html)

@WtfJoke
Copy link
Member

WtfJoke commented Jun 7, 2015

I wrote a simple test which proceeds a file created by scm. It fails when encoding is specified with open... When not specified, everything works. The file isnt in utf8...

    def testSampleOutput_ShouldbeUTF_ButIsnt(self):
        sample_file_path = self.get_Sample_File_Path("SampleFileShouldbeUTF8.txt")
        shell.execute("chcp 65001")
        #shell.execute("lscm --show-alias n --show-uuid y list components -v -r REPOURL",
        #             sample_file_path)
        with open(sample_file_path, 'r') as file:
            for line in file:
                print(line)

        with open(sample_file_path, 'r', encoding="utf-8") as file:
            for line in file:
                print(line)

The produced file for testing can be found here:
https://db.tt/S45K3Rcx

@jacobilsoe
Copy link
Contributor Author

I just tried chcp 850 and in this case the Compare_baseline file is ANSI and the StreamComponents file is UTF-8 without BOM and the migration fails with:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 31: invalid start byte

Then I tried chcp 65001 and in this case both files are UTF-8 without BOM and the migration proceeds without errors.

@WtfJoke
Copy link
Member

WtfJoke commented Jun 8, 2015

As you can see I did the chcp 65001 in the test. The output file was utf8 when it didnt contain any special character. If for some reason (in this case ü) there was a special character in the output, the file was ANSI even if before was switched to chcp 65001.

Thanks to a hint of a colleague where the problem lies I was able to find a solution after some time. With that I think even the switch of the codepage isnt required anymore (this is untested so far).

Neverthless I think the encoding itself needs to be configurable, because the solution needs an additional manual step. So im going to implement that I guess.

Its necessary to create a new property config file called magic.properties in the jazz scm tools folder (in Windows C:\Users\USER\AppData\Local\jazz-scm). There you can set an encoding and the output will be in that encoding. My testcommand produces like that always a file with encoding utf8, since then I didnt get any exceptions anymore.

You can read about this property in https://jazz.net/wiki/bin/view/Main/SCMMagicFile

@jacobilsoe
Copy link
Contributor Author

Ahh, yes. :-) I just checked and on my machine I have a magic.properties here: c:\users.jazz-scm with the contents: encoding: UTF-8; I remember now that I changed that quite some time ago. That explains why it works on my machine.

@WtfJoke
Copy link
Member

WtfJoke commented Jun 8, 2015

Well that explains everything 😆 Im happy, though that we found a solution

@ohumbel
Copy link
Member

ohumbel commented Jun 8, 2015

congratulations - good research!

@WtfJoke WtfJoke mentioned this pull request Jun 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants