Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infobox parser returns only the first element if value is a list #62

Closed
aplz opened this issue Apr 7, 2017 · 2 comments
Closed

Infobox parser returns only the first element if value is a list #62

aplz opened this issue Apr 7, 2017 · 2 comments
Assignees

Comments

@aplz
Copy link

aplz commented Apr 7, 2017

In the German version of Wikipedia, there are a couple of infoboxes containing lists as values, for instance https://de.wikipedia.org/wiki/Lewisit.
In such cases, the parser returns only the first element.
For instance, using

import wptools
wikipage = wptools.page('Lewisit', lang='de', silent=True).get_parse()
for infobox_item in wikipage.infobox:
    print(infobox_item, wikipage.infobox[infobox_item])

we get
('Dichte', u'*1,8793 g\xb7cm<sup>\u22123</sup> (25 \xb0C, trans-Isomer)')

The expected result would, according to the Wikipedia article look like this:
('Dichte', u' *1,8793 g·cm<sup>−3</sup> (25 °C, trans-Isomer) <ext><name>ref</name><attr> name="Whiting"</attr></ext> *1,8598 g·cm<sup>−3</sup> (25 °C, cis-Isomer) <ext><name>ref</name><attr> name="Whiting"</attr></ext> ')

One way to fix this would be to adapt the code similar to the proposal in this stackoverflow answer

@siznax
Copy link
Owner

siznax commented Apr 14, 2017

Thanks again for the contribution, @aplz!

@aplz
Copy link
Author

aplz commented Apr 18, 2017

Thanks for merging, @siznax!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants