Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml.parsers.expat make a dictionary which keys are broken if buffer_text is False. #49286

Closed
tksmashiw mannequin opened this issue Jan 23, 2009 · 10 comments
Closed

xml.parsers.expat make a dictionary which keys are broken if buffer_text is False. #49286

tksmashiw mannequin opened this issue Jan 23, 2009 · 10 comments
Labels
topic-XML type-bug An unexpected behavior, bug, or error

Comments

@tksmashiw
Copy link
Mannequin

tksmashiw mannequin commented Jan 23, 2009

BPO 5036

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2009-01-31.03:16:24.927>
created_at = <Date 2009-01-23.03:52:20.503>
labels = ['expert-XML', 'type-bug', 'invalid']
title = 'xml.parsers.expat make a dictionary which keys are broken if buffer_text is False.'
updated_at = <Date 2009-01-31.03:16:24.926>
user = 'https://bugs.python.org/tksmashiw'

bugs.python.org fields:

activity = <Date 2009-01-31.03:16:24.926>
actor = 'benjamin.peterson'
assignee = 'none'
closed = True
closed_date = <Date 2009-01-31.03:16:24.927>
closer = 'benjamin.peterson'
components = ['XML']
creation = <Date 2009-01-23.03:52:20.503>
creator = 'tksmashiw'
dependencies = []
files = []
hgrepos = []
issue_num = 5036
keywords = []
message_count = 10.0
messages = ['80398', '80432', '80435', '80438', '80449', '80451', '80453', '80454', '80638', '80851']
nosy_count = 3.0
nosy_names = ['ggenellina', 'kawai', 'tksmashiw']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = None
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue5036'
versions = ['Python 2.5']

@tksmashiw
Copy link
Mannequin Author

tksmashiw mannequin commented Jan 23, 2009

When I make a dictionary by parsing "legacy-icon-mapping.xml"(which is a
part of
icon-naming-utils[http://tango.freedesktop.org/Tango_Icon_Library]) with
the following script, the three keys of the dictionary are collapsed if
the "buffer_text" attribute is False.

=====================

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import with_statement
import sys
from xml.parsers.expat import ParserCreate
import codecs

class Database:
  """Make a dictionary which is accessible by Databese.dict"""
  def __init__(self, buffer_text):
    self.cnt = None
    self.name = None
    self.data = None
    self.dict = {}
    p = ParserCreate()
    p.buffer_text = buffer_text

    p.StartElementHandler = self.start_element
    p.EndElementHandler = self.end_element
    p.CharacterDataHandler = self.char_data

    with open("/usr/share/icon-naming-utils/legacy-icon-mapping.xml",
'r') as f:
      p.ParseFile(f)

  def start_element(self, name, attrs):
    if name == 'context':
      self.cnt = attrs["dir"]
    if name == 'icon':
      self.name = attrs["name"]
  
  def end_element(self, name):
    if name == 'link':
      self.dict[self.data] = (self.cnt, self.name)

  def char_data(self, data):
    self.data = data.strip()

def print_set(aset):
  for e in aset:
    print '\t' + e

if __name__ == '__main__':
  sys.stdout = codecs.getwriter('utf_8')(sys.stdout)
  map_false_dict = Database(False).dict
  map_true_dict = Database(True).dict
  print "The keys which exist if buffer_text=False but don't exist if
buffer_text=True are"
  print_set(set(map_false_dict.keys()) - set(map_true_dict.keys()))
  print "The keys which exist if buffer_text=True but don't exist if
buffer_text=False are"
  print_set(set(map_true_dict.keys()) - set(map_false_dict.keys()))

=====================

The result of running this script is
======================
The keys which exist if buffer_text=False but don't exist if
buffer_text=True are
rt-descending
ock_text_right
lc
The keys which exist if buffer_text=True but don't exist if
buffer_text=False are
stock_text_right
gnome-mime-application-vnd.stardivision.calc
gtk-sort-descending
======================
I confirmed it in Python-2.5.2 on Fedora 10.

@tksmashiw tksmashiw mannequin added topic-XML type-bug An unexpected behavior, bug, or error labels Jan 23, 2009
@ggenellina
Copy link
Mannequin

ggenellina mannequin commented Jan 24, 2009

If the xml file is small enough, could you attach it to the issue? Or
provide a download location? I could not find it myself (without
downloading the whole package)

(Note that Python 2.5 only gets security fixes now, so unless this
still fails with 2.6 or later, this issue is likely to be closed)

@tksmashiw
Copy link
Mannequin Author

tksmashiw mannequin commented Jan 24, 2009

Thanks for reply!

If the xml file is small enough, could you attach it to the issue? Or
provide a download location?
Sorry, I found here.
http://webcvs.freedesktop.org/icon-theme/icon-naming-utils/legacy-icon-mapping.xml?revision=1.75&content-type=text%2Fplain&pathrev=1.75

(Note that Python 2.5 only gets security fixes now, so unless this
still fails with 2.6 or later, this issue is likely to be closed)
I roughly confirmed the same problem on python-3.0 on MS Windows 2 weeks
ago, but need to verify more strictly...

@kawai
Copy link
Mannequin

kawai mannequin commented Jan 24, 2009

The sample code has bug. expat is OK.

Method char_data must append the incoming characters because the
character sequence is an buffered input.
def char_data(self, data):
self.data += data

You should reset it by self.data = '' at end_element().

@tksmashiw
Copy link
Mannequin Author

tksmashiw mannequin commented Jan 24, 2009

Hi kawai.
I got correct output by modifying the code like you say, but I still
cannot understand why this happens.
Could you tell me more briefly, or point any documents about it?
I can't find any notes which say don't pass strings but append it for
CharacterDataHandler in official documents.
Does everyone know/understand it already? Only I am so stupid? (;;)

@kawai
Copy link
Mannequin

kawai mannequin commented Jan 24, 2009

That's the spec of XML SAX interface.

@kawai
Copy link
Mannequin

kawai mannequin commented Jan 24, 2009

Please read "The ContentHandler.characters() callback is missing data!"
http://www.saxproject.org/faq.html

and close this issue :)

@tksmashiw
Copy link
Mannequin Author

tksmashiw mannequin commented Jan 24, 2009

a mistake of my former message, briefly -> in detail

Please read "The ContentHandler.characters() callback is missing data!"
http://www.saxproject.org/faq.html
I was just reading above site. it is now very clear for me.
Thanks kawai and I'm sorry to take up your time, gagenellina.

@tksmashiw
Copy link
Mannequin Author

tksmashiw mannequin commented Jan 27, 2009

From msg80438

You should reset it by self.data = '' at end_element().

It seems that we should reset it at start_element() like this,
============================

def start_element(self, name, attrs):
  ...abbr...
  if name == 'link':
    self.data = ''

=============================
or unwanted \s, \t, and \n mix in "self.data".
That's all, thanks.

@tksmashiw
Copy link
Mannequin Author

tksmashiw mannequin commented Jan 31, 2009

Could someone close this?

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-XML type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

1 participant