Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed behviour in 0.9.x from unicode to str #51

Closed
agx opened this issue Oct 21, 2016 · 17 comments
Closed

Changed behviour in 0.9.x from unicode to str #51

agx opened this issue Oct 21, 2016 · 17 comments
Assignees

Comments

@agx
Copy link

@agx agx commented Oct 21, 2016

While looking into this calypso issue in Debian:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=841247

I noticed that 0.9.x returns str where 0.8 used unicode:

0.8.x:

$ python bla.py 
 <type 'unicode'>
 Universitetet i Tromsø
 <type 'unicode'>
 Forrest Gump

0.9.x:

    $ python bla.py 
    <type 'str'>
    Traceback (most recent call last):
      File "bla.py", line 11, in <module>
    print u"%s" % i.getChildValue("fn")
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 21: ordinal not in range(128)

This is python 2.7. Code:

#!/usr/bin/python

import vobject
import codecs

path = 'tests/data/import.vcard'

o = vobject.readComponents(codecs.open(path, encoding='utf-8').read())
for i in o:
    print u"%s" % type(i.getChildValue("fn"))
    print u"%s" % i.getChildValue("fn")
    #self.import_item(new_item, path)

ics file used:

  https://github.com/agx/calypso/blob/master/tests/data/import.vcard

Is this a known regression or a desired change in behaviour?

agx added a commit to calypso-server/calypso that referenced this issue Oct 22, 2016
The 0.8.1c upload on pypi is broken and 0.9.x is currently not
functional see

     eventable/vobject#51
agx added a commit to calypso-server/calypso that referenced this issue Oct 22, 2016
The 0.8.1c upload on pypi is broken and 0.9.x is currently not
functional see

     eventable/vobject#51
agx added a commit to calypso-server/calypso that referenced this issue Oct 22, 2016
The 0.8.1c upload on pypi is broken and 0.9.x is currently not
functional see

     eventable/vobject#51
@agx
Copy link
Author

@agx agx commented Oct 22, 2016

Run a quick bisect with the testcase, the offending commit is b3f9bbc

agx added a commit to calypso-server/calypso that referenced this issue Oct 22, 2016
The 0.8.1c upload on pypi is broken and 0.9.x is currently not
functional see

     eventable/vobject#51
@wpercy
Copy link
Member

@wpercy wpercy commented Dec 7, 2016

@agx I'm not sure this is actually a regression. After looking through this for the last couple of days, it seems like this is an intentional change to follow a more pythonic standard.
It appears that the standard is to work with unicode internally, but to always output as a string. Does this seem correct to you? I know this will create some issues with legacy code that expects Unicode output, but I want to hear from people before making any huge changes.

@ymitsos
Copy link

@ymitsos ymitsos commented Dec 27, 2016

I am using @tobixen calendar-cli mainly to add new events in my calendar. Recently, I realised that .ics files with Greek characters in the summary field raised an exception which might relates to this issue. The problem is identified in base.py and particularly in function defaultSerialize(). I used a workaround to solve this problem.

@wpercy
Copy link
Member

@wpercy wpercy commented Dec 27, 2016

@ymitsos can you provide the traceback?

@ymitsos
Copy link

@ymitsos ymitsos commented Dec 27, 2016

sure this is the complete traceback:

  File "./calendar-cli.py", line 698, in <module>
    main()
  File "./calendar-cli.py", line 695, in main
    ret = args.func(caldav_conn, args)
  File "./calendar-cli.py", line 141, in calendar_addics
    _calendar_addics(caldav_conn, c.to_ical(), uid, args)
  File "./calendar-cli.py", line 101, in _calendar_addics
    c.add_event(ics)
  File "/home/ymitsos/.virtualenvs/calendar-cli/local/lib/python2.7/site-packages/caldav/objects.py", line 422, in add_event
    return Event(self.client, data = ical, parent = self).save()
  File "/home/ymitsos/.virtualenvs/calendar-cli/local/lib/python2.7/site-packages/caldav/objects.py", line 718, in save
    self._create(self._instance.serialize(), self.id, path)
  File "/home/ymitsos/.virtualenvs/calendar-cli/local/lib/python2.7/site-packages/vobject/base.py", line 255, in serialize
    return behavior.serialize(self, buf, lineLength, validate)
  File "/home/ymitsos/.virtualenvs/calendar-cli/local/lib/python2.7/site-packages/vobject/behavior.py", line 162, in serialize
    out = base.defaultSerialize(transformed, buf, lineLength)
  File "/home/ymitsos/.virtualenvs/calendar-cli/local/lib/python2.7/site-packages/vobject/base.py", line 993, in defaultSerialize
    child.serialize(outbuf, lineLength, validate=False)
  File "/home/ymitsos/.virtualenvs/calendar-cli/local/lib/python2.7/site-packages/vobject/base.py", line 255, in serialize
    return behavior.serialize(self, buf, lineLength, validate)
  File "/home/ymitsos/.virtualenvs/calendar-cli/local/lib/python2.7/site-packages/vobject/behavior.py", line 162, in serialize
    out = base.defaultSerialize(transformed, buf, lineLength)
  File "/home/ymitsos/.virtualenvs/calendar-cli/local/lib/python2.7/site-packages/vobject/base.py", line 993, in defaultSerialize
    child.serialize(outbuf, lineLength, validate=False)
  File "/home/ymitsos/.virtualenvs/calendar-cli/local/lib/python2.7/site-packages/vobject/base.py", line 255, in serialize
    return behavior.serialize(self, buf, lineLength, validate)
  File "/home/ymitsos/.virtualenvs/calendar-cli/local/lib/python2.7/site-packages/vobject/behavior.py", line 162, in serialize
    out = base.defaultSerialize(transformed, buf, lineLength)
  File "/home/ymitsos/.virtualenvs/calendar-cli/local/lib/python2.7/site-packages/vobject/base.py", line 1015, in defaultSerialize
    foldOneLine(outbuf, s.getvalue(), lineLength)
  File "/usr/lib/python2.7/StringIO.py", line 271, in getvalue
    self.buf += ''.join(self.buflist)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 23: ordinal not in range(128)

and the diff of my workaround:

1012c1003,1004
<         s = six.StringIO()
-
>         x = six.StringIO()
>         s = codecs.EncodedFile(x, data_encoding='utf-8', file_encoding='utf-8')
@cphyc
Copy link

@cphyc cphyc commented Jan 11, 2017

Hi all!
I am also facing the same issue (under python 2.7), while trying to get the following event as a string:

 VCALENDAR
    VEVENT
       DTEND: 2017-02-24 17:00:00
       DTSTART: 2017-02-24 14:00:00
       LOCATION: Salle des séminaires
       SUMMARY: Master 2 - examen EC/EL

I get the error:

In [22]: calendar
Out[22]: ---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
/home/ccc/.virtualenvs/p2/lib/python2.7/site-packages/IPython/core/formatters.pyc in __call__(self, obj)
    670                 type_pprinters=self.type_printers,
    671                 deferred_pprinters=self.deferred_printers)
--> 672             printer.pretty(obj)
    673             printer.flush()
    674             return stream.getvalue()

/home/ccc/.virtualenvs/p2/lib/python2.7/site-packages/IPython/lib/pretty.pyc in pretty(self, obj)
    366                 if cls in self.type_pprinters:
    367                     # printer registered in self.type_pprinters
--> 368                     return self.type_pprinters[cls](obj, self, cycle)
    369                 else:
    370                     # deferred printer

/home/ccc/.virtualenvs/p2/lib/python2.7/site-packages/IPython/lib/pretty.pyc in _repr_pprint(obj, p, cycle)
    699     """A pprint that just redirects to the normal repr function."""
    700     # Find newlines and replace them with p.break_()
--> 701     output = repr(obj)
    702     for idx,output_line in enumerate(output.splitlines()):
    703         if idx:

/home/ccc/.virtualenvs/p2/lib/python2.7/site-packages/vobject/base.pyc in __repr__(self)
    681
    682     def __repr__(self):
--> 683         return self.__str__()
    684
    685     def prettyPrint(self, level = 0, tabwidth=3):

/home/ccc/.virtualenvs/p2/lib/python2.7/site-packages/vobject/base.pyc in __str__(self)
    676     def __str__(self):
    677         if self.name:
--> 678             return "<{0}| {1}>".format(self.name, self.getSortedChildren())
    679         else:
    680             return u'<*unnamed*| {0}>'.format(self.getSortedChildren())

/home/ccc/.virtualenvs/p2/lib/python2.7/site-packages/vobject/base.pyc in __repr__(self)
    681
    682     def __repr__(self):
--> 683         return self.__str__()
    684
    685     def prettyPrint(self, level = 0, tabwidth=3):

/home/ccc/.virtualenvs/p2/lib/python2.7/site-packages/vobject/base.pyc in __str__(self)
    676     def __str__(self):
    677         if self.name:
--> 678             return "<{0}| {1}>".format(self.name, self.getSortedChildren())
    679         else:
    680             return u'<*unnamed*| {0}>'.format(self.getSortedChildren())

/home/ccc/.virtualenvs/p2/lib/python2.7/site-packages/vobject/base.pyc in __repr__(self)
    427
    428     def __repr__(self):
--> 429         return self.__str__()
    430
    431     def prettyPrint(self, level = 0, tabwidth=3):

/home/ccc/.virtualenvs/p2/lib/python2.7/site-packages/vobject/base.pyc in __str__(self)
    424
    425     def __str__(self):
--> 426         return "<{0}{1}{2}>".format(self.name, self.params, self.valueRepr())
    427
    428     def __repr__(self):

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 11: ordinal not in range(128)
@Phyks
Copy link

@Phyks Phyks commented Jan 22, 2017

I think I got something about this issue:

>>> vobject.readOne(u"BEGIN VCALENDAR é")
[...]
vobject.base.ParseError: At line 1: Failed to parse line: BEGIN VCALENDAR é

(as expected)

but

>>> vobject.readOne(io.StringIO(u"BEGIN VCALENDAR é"))
[...]
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 16: ordinal not in range(128)

(which is not expected)

The bug seems to be when dealing with a stream, such as a StringIO.

EDIT: Actually, I read the previous post too fast and missed @ymitsos already spotted it. @ymitsos could you provide more infos on the diff please? I cannot match it with actual code :/ I am affected with Python 2.7.13 (but not in Python 3.6). Also, it should be noted this bug affects caldav python library as well https://bitbucket.org/cyrilrbt/caldav/issues/53/getting-unicodedecodeerror-with-non-ascii.

Also this case seems to be out of unittests.

@ymitsos
Copy link

@ymitsos ymitsos commented Jan 23, 2017

This is the diff result on the vobject/base.py file

1012c1012,1013
<         s = six.StringIO()
---
>         x = six.StringIO()
>         s = codecs.EncodedFile(x, data_encoding='utf-8', file_encoding='utf-8')
@Phyks
Copy link

@Phyks Phyks commented Jan 23, 2017

Got it!. Sorry, Github search was not really efficient :/

@tobixen
Copy link
Contributor

@tobixen tobixen commented Jan 23, 2017

Just came to think, at some point in the python caldav library I had to remove a call to StringIO as newer versions of vobject seemed to require an ordinary string instead of a stream object.

This patch is out in version 0.5, which is available from the mercurial repo hosted at bitbucket since yesterday, but not published at pypi yet.

@Millnert
Copy link

@Millnert Millnert commented Mar 11, 2017

Get the same error, I think, with CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE.

The code:

vobject.readOne('BEGIN:VCARD\nVERSION:2.1\nN;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=28=2A=29;=44=C3=B6=72=72=74=65=6C=65=66=6F=6E;;;\nFN;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=44=C3=B6=72=72=74=65=6C=65=66=6F=6E=20=28=2A=29\nTEL;CELL:+46708265717\nEND:VCARD\n')

works and returns an object, but I can't run __str__() on it (i.e. no printing either):

Traceback (most recent call last):
  File "main.py", line 35, in datacards_to_vobjects
    testrepr = vobj.__str__()
  File "/home/anticimex/Envs/vcf/local/lib/python2.7/site-packages/vobject/base.py", line 685, in __str__
    return "<{0}| {1}>".format(self.name, self.getSortedChildren())
  File "/home/anticimex/Envs/vcf/local/lib/python2.7/site-packages/vobject/base.py", line 434, in __repr__
    return self.__str__()
  File "/home/anticimex/Envs/vcf/local/lib/python2.7/site-packages/vobject/base.py", line 431, in __str__
    return "<{0}{1}{2}>".format(self.name, self.params, self.valueRepr())
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 1: ordinal not in range(128)

This on current pip version - vobject (0.9.4.1).

Seeing the discussion above, I'm not sure if I'm just using vobject incorrectly some how?

@agx
Copy link
Author

@agx agx commented Apr 8, 2017

@wpercy but if this was intentional shouldn't the python3 version return <class 'bytes'> (bytestring) instead of <class 'str'> (unicode):

#!/usr/bin/python3

import vobject
import codecs

path = 'tests/data/import.vcard'

o = vobject.readComponents(codecs.open(path, encoding='utf-8').read())
for i in o:
    print("%s" % type(i.getChildValue("fn")))
    print("%s" % i.getChildValue("fn"))

returns

python ./bla3.py 
<type 'str'>
Universitetet i Tromsø
<type 'str'>
Forrest Gump

with your argument I would expect it to return a bytestring not a unicode string as it currently does. So the Python2 and Python3 behave differently now (while they behaved consistently before (both giving back unicode objects).

It would be nice if the current maintainers would say how it should work so we can go and fix from there.

@wpercy
Copy link
Member

@wpercy wpercy commented Apr 11, 2017

@agx I'll be digging into this during the week and hope to have a solution and a fix pushed up sometime this weekend or early next week.

EDIT: after just one day of digging, I believe you to be right. We should be passing in unicode, encoding it to bytestrings for internal manipulations and then outputting unicode (which, in the case of python 3, should be <type 'str'>. I just have to get it back to outputting <type 'unicode'> for python 2.

@agx
Copy link
Author

@agx agx commented Apr 12, 2017

I have a workaround posted for calypso

https://keithp.com/pipermail/calypso/2017-April/000345.html

but it would be great to have this fixed in vobject. And yeah, if the python2 and python3 version aim to be consistent python3 needs to return 'str' while python2 needs to return 'unicode' I think.

@diggy128
Copy link

@diggy128 diggy128 commented Jun 13, 2017

Any update on this? I have the same problem with Greek as @ymitsos in Odoo while generating calendars, but don't want to change the module if this is to be fixed upstream.

@wpercy
Copy link
Member

@wpercy wpercy commented Jun 13, 2017

Yeah, this will be fixed in hopefully in the next 2 weeks.

@wpercy
Copy link
Member

@wpercy wpercy commented Jun 22, 2017

@agx at long last, I think I have this working. Will merge into master and then if you could pull it down and see if it fixes your issues (I tested against the vcard you provided in both Python 2 and 3 and got desired output) that would be awesome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

8 participants
You can’t perform that action at this time.