Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smarter rendering of html Infoboxes #173

Open
j-rausch opened this issue Sep 12, 2018 · 3 comments
Open

smarter rendering of html Infoboxes #173

j-rausch opened this issue Sep 12, 2018 · 3 comments

Comments

@j-rausch
Copy link
Contributor

j-rausch commented Sep 12, 2018

Using the latest wtf_wikipedia with --html option generates HTML files which are missing the infoboxes.
For reference, see https://runkit.com/spencermountain/5b912901c133fe0012ebfc8f

@spencermountain
Copy link
Owner

hey @j-rausch thanks, i've added this back in 5.3.0,
it is pretty weird though, and I'd like to get your input on it.
wikipedia does A LOT of post-hoc rendering-stuff for their infoboxes, and I think you'll see that there's some work left to do.
i'll leave this open

@spencermountain spencermountain changed the title Infoboxes missing in generated html files rich html Infobox rendering Sep 12, 2018
@j-rausch
Copy link
Contributor Author

Thanks! Yeah, wow, it does seem pretty convoluted indeed..
Looking at https://en.wikipedia.org/wiki/Abraham_Lincoln, and how the infobox headers are generated from the wikitext source (https://en.wikipedia.org/w/index.php?title=Abraham_Lincoln&action=edit)

The header Abraham Lincoln is defined nowhere in the infobox mardown:

{{Infobox officeholder
| image = Abraham Lincoln O-77 matte collodion print.jpg
| alt = An iconic photograph of a bearded Abraham Lincoln showing his head and shoulders.
...

I suppose it comes from the default parameter for name in https://en.wikipedia.org/wiki/Template:Infobox_officeholder:

Default
    The pagename

, or Header: The Lincoln Cabinet
for the second infobox:

{{Infobox U.S. Cabinet
| Name = Lincoln
| President = Abraham Lincoln
...

where https://en.wikipedia.org/wiki/Template:Infobox_U.S._Cabinet defines the header as The {{{Name}}} Cabinet.
Is there some resource avilable to easily resolve all these templates, or would this need to be done more or less manually @spencermountain ?

@spencermountain
Copy link
Owner

yeah, i'm gonna take a slap at this today, but I could use some help.

wow, I didn't know it pulled information from the page. We are getting the page name now pretty-reliably from either the api, the dump-xml, or the first-sentence-bolding. I think we can incorporate it into the infobox reliably now.

yeah, lemme stew on this a little bit, then I'll set something up, and you can take it for a spin. It would be great if we could produce somewhat-good infoboxes as html

@spencermountain spencermountain changed the title rich html Infobox rendering smarter rendering of html Infoboxes Dec 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants