UTF-8 support, configuration of conflict resolution and configuration of scm command #26

jacobilsoe · 2015-05-29T12:09:16Z

Adds UTF-8 support. On Windows I also need to do a chcp 65001.

WtfJoke · 2015-05-30T10:56:25Z

Thanks for the input. 👍 I'll have a look on it (probably during the next week)

EDIT: Perhaps @ohumbel can have a look intoit, if he have some spare time 😸

WtfJoke · 2015-06-02T20:54:29Z

I had a rough look on it. Looks good, thank you!
About the utf8/encoding support. I've read in python3 its better to use open with the encoding param.
Like: f = open("myfile.txt", "r", encoding="utf-8")
What do you think about that? I consider changing this.

And is it possible that you add a small test for the encoding? Like a file which wasnt readable before.
Thanks!

ohumbel · 2015-06-02T21:07:40Z

I definitely prefer: f = open("myfile.txt", "r", encoding="utf-8")
If this breaks the tool for some users, we should make the default encoding configurable, like

utf-8 [default]
platform [== no encoding parameter to open()]
specific encoding [like ISO 8859-1]

jacobilsoe · 2015-06-03T08:21:59Z

Yes, you're right. I have changed it to use python 3 style, added a test and fixed a typo.

WtfJoke · 2015-06-03T21:42:04Z

Thanks, I merged them into my development branch

WtfJoke · 2015-06-07T12:51:00Z

I tested the code today and on my windows machine I get following exception:

Traceback (most recent call last):
  File "E:\SomePath\rtc2git\migration.py", line 43, in migrate
    componentbaselineentries = rtc.getcomponentbaselineentriesfromstream(streamuuid)
  File "E:\SomePath\rtc2git\rtcFunctions.py", line 122, in getcomponentbaselineentriesfromstream
    for line in file:
  File "E:\Program Files\Python34\lib\codecs.py", line 319, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 1249: invalid start byte

The file itself has utf8 encoding according to notepad++. Any idea? Is that the chcp 65001 you were mentioning?

EDIT: I think I had a look on the wrong file. The other file has the encoding ANSI for some reason...

jacobilsoe · 2015-06-07T13:30:38Z

Hmm. Have you tried chcp 65001? Remember to run chcp without arguments to see the current codepage before switching.

WtfJoke · 2015-06-07T14:00:03Z

I havent tried chcp 65001. chcp in console says 850... So do I need to switch something?

jacobilsoe · 2015-06-07T14:09:56Z

Yeah, you need to do a chcp 65001. Then try to migrate. After this you can switch back: chcp 850. But it seems like utf-8 needs to be configurable as well.

WtfJoke · 2015-06-07T18:03:21Z

Sadly doesnt work. The file of the compare gets created always in encoding ANSI, even if I switch to chcp 65001

jacobilsoe · 2015-06-07T18:19:26Z

Strange, it works perfectly for me. When I get some time I will have a look. Do you propose that the utf-8 encoding be made an option?

WtfJoke · 2015-06-07T18:22:58Z

I dont know... I preferr to find a solution that the file will be written in a proper encoding...

jacobilsoe · 2015-06-07T18:24:31Z

Yep, so do I, but the output from scm is piped to a file, right?

WtfJoke · 2015-06-07T18:24:57Z

Yes, thats correct

EDIT:

The 65000/1 code pages are encoded as UTF-7/8 to allow to working with unicode data in 7-bit and 8-bit environments, however there is still VERY limited support for unicode in the CMD shell, piping, redirection and most commands are still ANSI only. (http://ss64.com/nt/chcp.html)

WtfJoke · 2015-06-07T21:28:46Z

I wrote a simple test which proceeds a file created by scm. It fails when encoding is specified with open... When not specified, everything works. The file isnt in utf8...

    def testSampleOutput_ShouldbeUTF_ButIsnt(self):
        sample_file_path = self.get_Sample_File_Path("SampleFileShouldbeUTF8.txt")
        shell.execute("chcp 65001")
        #shell.execute("lscm --show-alias n --show-uuid y list components -v -r REPOURL",
        #             sample_file_path)
        with open(sample_file_path, 'r') as file:
            for line in file:
                print(line)

        with open(sample_file_path, 'r', encoding="utf-8") as file:
            for line in file:
                print(line)

The produced file for testing can be found here:
https://db.tt/S45K3Rcx

jacobilsoe · 2015-06-08T10:59:01Z

I just tried chcp 850 and in this case the Compare_baseline file is ANSI and the StreamComponents file is UTF-8 without BOM and the migration fails with:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 31: invalid start byte

Then I tried chcp 65001 and in this case both files are UTF-8 without BOM and the migration proceeds without errors.

WtfJoke · 2015-06-08T13:45:25Z

As you can see I did the chcp 65001 in the test. The output file was utf8 when it didnt contain any special character. If for some reason (in this case ü) there was a special character in the output, the file was ANSI even if before was switched to chcp 65001.

Thanks to a hint of a colleague where the problem lies I was able to find a solution after some time. With that I think even the switch of the codepage isnt required anymore (this is untested so far).

Neverthless I think the encoding itself needs to be configurable, because the solution needs an additional manual step. So im going to implement that I guess.

Its necessary to create a new property config file called magic.properties in the jazz scm tools folder (in Windows C:\Users\USER\AppData\Local\jazz-scm). There you can set an encoding and the output will be in that encoding. My testcommand produces like that always a file with encoding utf8, since then I didnt get any exceptions anymore.

You can read about this property in https://jazz.net/wiki/bin/view/Main/SCMMagicFile

jacobilsoe · 2015-06-08T13:50:44Z

Ahh, yes. :-) I just checked and on my machine I have a magic.properties here: c:\users.jazz-scm with the contents: encoding: UTF-8; I remember now that I changed that quite some time ago. That explains why it works on my machine.

WtfJoke · 2015-06-08T13:54:37Z

Well that explains everything 😆 Im happy, though that we found a solution

ohumbel · 2015-06-08T19:26:35Z

congratulations - good research!

UTF-8 support

c9a5b8b

unknown and others added 2 commits June 1, 2015 09:40

Merge remote-tracking branch 'upstream/develop' into develop

98da54d

Added configuration of automated conflict resolution

cac18e8

jacobilsoe changed the title ~~UTF-8 support~~ UTF-8 support and configuration of conflict resolution Jun 1, 2015

WtfJoke assigned ohumbel Jun 1, 2015

jacobilsoe added 3 commits June 2, 2015 08:11

Merge remote-tracking branch 'upstream/develop' into develop

475f924

Fixed tests

bbac829

Added support for configuration of scm command

40356fd

jacobilsoe changed the title ~~UTF-8 support and configuration of conflict resolution~~ UTF-8 support, configuration of conflict resolution and configuration of scm command Jun 2, 2015

jacobilsoe mentioned this pull request Jun 2, 2015

Add configuration to choose between lscm and scm #28

Closed

WtfJoke unassigned ohumbel Jun 2, 2015

Use python 3 utf-8 support and added test

45fe00c

WtfJoke merged commit 45fe00c into rtcTo:develop Jun 3, 2015

WtfJoke mentioned this pull request Jun 8, 2015

Make encoding configurable #30

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTF-8 support, configuration of conflict resolution and configuration of scm command #26

UTF-8 support, configuration of conflict resolution and configuration of scm command #26

jacobilsoe commented May 29, 2015

WtfJoke commented May 30, 2015

WtfJoke commented Jun 2, 2015

ohumbel commented Jun 2, 2015

jacobilsoe commented Jun 3, 2015

WtfJoke commented Jun 3, 2015

WtfJoke commented Jun 7, 2015

jacobilsoe commented Jun 7, 2015

WtfJoke commented Jun 7, 2015

jacobilsoe commented Jun 7, 2015

WtfJoke commented Jun 7, 2015

jacobilsoe commented Jun 7, 2015

WtfJoke commented Jun 7, 2015

jacobilsoe commented Jun 7, 2015

WtfJoke commented Jun 7, 2015

WtfJoke commented Jun 7, 2015

jacobilsoe commented Jun 8, 2015

WtfJoke commented Jun 8, 2015

jacobilsoe commented Jun 8, 2015

WtfJoke commented Jun 8, 2015

ohumbel commented Jun 8, 2015

UTF-8 support, configuration of conflict resolution and configuration of scm command #26

UTF-8 support, configuration of conflict resolution and configuration of scm command #26

Conversation

jacobilsoe commented May 29, 2015

WtfJoke commented May 30, 2015

WtfJoke commented Jun 2, 2015

ohumbel commented Jun 2, 2015

jacobilsoe commented Jun 3, 2015

WtfJoke commented Jun 3, 2015

WtfJoke commented Jun 7, 2015

jacobilsoe commented Jun 7, 2015

WtfJoke commented Jun 7, 2015

jacobilsoe commented Jun 7, 2015

WtfJoke commented Jun 7, 2015

jacobilsoe commented Jun 7, 2015

WtfJoke commented Jun 7, 2015

jacobilsoe commented Jun 7, 2015

WtfJoke commented Jun 7, 2015

WtfJoke commented Jun 7, 2015

jacobilsoe commented Jun 8, 2015

WtfJoke commented Jun 8, 2015

jacobilsoe commented Jun 8, 2015

WtfJoke commented Jun 8, 2015

ohumbel commented Jun 8, 2015