You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the German version of Wikipedia, there are a couple of infoboxes containing lists as values, for instance https://de.wikipedia.org/wiki/Lewisit.
In such cases, the parser returns only the first element.
For instance, using
import wptools
wikipage = wptools.page('Lewisit', lang='de', silent=True).get_parse()
for infobox_item in wikipage.infobox:
print(infobox_item, wikipage.infobox[infobox_item])
we get ('Dichte', u'*1,8793 g\xb7cm<sup>\u22123</sup> (25 \xb0C, trans-Isomer)')
The expected result would, according to the Wikipedia article look like this: ('Dichte', u' *1,8793 g·cm<sup>−3</sup> (25 °C, trans-Isomer) <ext><name>ref</name><attr> name="Whiting"</attr></ext> *1,8598 g·cm<sup>−3</sup> (25 °C, cis-Isomer) <ext><name>ref</name><attr> name="Whiting"</attr></ext> ')
One way to fix this would be to adapt the code similar to the proposal in this stackoverflow answer
The text was updated successfully, but these errors were encountered:
In the German version of Wikipedia, there are a couple of infoboxes containing lists as values, for instance https://de.wikipedia.org/wiki/Lewisit.
In such cases, the parser returns only the first element.
For instance, using
we get
('Dichte', u'*1,8793 g\xb7cm<sup>\u22123</sup> (25 \xb0C, trans-Isomer)')
The expected result would, according to the Wikipedia article look like this:
('Dichte', u' *1,8793 g·cm<sup>−3</sup> (25 °C, trans-Isomer) <ext><name>ref</name><attr> name="Whiting"</attr></ext> *1,8598 g·cm<sup>−3</sup> (25 °C, cis-Isomer) <ext><name>ref</name><attr> name="Whiting"</attr></ext> ')
One way to fix this would be to adapt the code similar to the proposal in this stackoverflow answer
The text was updated successfully, but these errors were encountered: