New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Http_server: Fix images not downloading on some Portal pages (images sometimes not appearing) #686
Comments
Yeah, I don't think this is resolvable. I don't know of a way to identify all the images in these "revolving" templates. I remember running across this early on in a random enwiki page for India (it switched the image based on the time of day) The problem is that the hdump process loads a page only once, and if there is a "revolving" image template only 1 of the many images will be downloaded. I could try scanning the raw template text, but that becomes extremely difficult as you could get things like "{{random_template|Views of Geneva.jpg|Hn-caecilien66-web.jpg}}" which would need template parsing. For now, I'll leave this as a known issue in the backlog. Let me know if any other thoughts. Thanks |
I have been doing a bit of digging and think I can explain the issue Taking as an example It is not the randomness that is the cause (I believe) Generating from wikidata, the randomness potentially produces new images to 'download', the download process runs, and then the wikitext is processed a second time This second time, potentially generates a different set of images - which do not go through another download - hence causing the process not to find a valid image |
In principle, the second pass could be performed on the html generated in the first pass. |
In light of the above comment, I have made some changes to a few files to implement this concept The basic idea is that during the html construction when a file is not in the I have introduced a new function into Xow_hdump_mgr_load.java The other change (a bit hacky) is in Xoh_file_wtr__basic.java Please see attached |
My apologies here. I missed the comments from 2 weeks ago when my email was weird Thanks for the code files. I took a look at the attached rebuild.zip, and I think it won't handle the html static image dumps. Calling
I tried to debug this further on my side, but with the XOWA GUI and no image databases, all the images on |
I believe that this behaviour is 'limited' to xowa-http The changes I suggested above seem to work in xowa-http but I forgot to see what impact there would be for xowa-gui (which I think does it a different way) |
Attached is a version of Xoh_file_wtr__basic.java that takes account of the application mode |
Cool. Thanks for the updates. I'm running errands tomorrow, so won't get a chance to review till Thursday morning. |
Hey, so I tried it today and couldn't reproduce it. Maybe this is something to do with your forked changes? Could you try with Thanks! Let's assume the XOWA root is something like
|
With the original issue - I had a forked change that shows the problem described (my version allows a 'Show preview' from the xowa-http side) However Have you had an opportunity to try to reproduce those ones? |
Oops. I assumed the first comment was still related to the others. Sorry, my mistake. I should have read the others more closely
I tried now with http://localhost:8080/en.wikipedia.org/wiki/Portal:Arts and see the issue. Let me re-review your commits and work on that next. Sorry again for not spending a bit more time on going through the other comments. I know how much time you spend on these issues, and the least I could have done was read a little more closely. Will work on this over the next few days. Thanks! |
Added commit above. The approach is a bit different, as I ended up adding a new Also, FWIW, your approach was very clever. I didn't actually realize what you were doing until I re-reviewed your changes today. I think if I had to solve the same problem, I would not have come up with this approach -- which is pretty sad considering I wrote both the hdump code. Anyway, nice job! Sorry again for the misunderstanding above, but thanks many more times for a great fix! |
Having done some further experiments and builds I have noticed a number of tweeks that need consideration The second pass goes through a (almost) completely built html. This means there are some anchors (<a>) and image links (<img>) that have not needed to be considered before I still cannot get enwikivoyage pagebanner images to 'download' properly Today, however, I have just noticed (its taken this long!) that the Categories section does not display at all This is due to the fact that the generation of the Categories checks the Hdump status which, now, is always on at that point - hence no Categories In And changed |
As described by @Ope30 in #680, the page
de.wikipedia.org/wiki/Portal:Wikipedia_nach_Themen
seems to be inconsistent in displaying imagesI have seen this before in other wikis
Taking this page as an example, the image to the right of Geographie is chosen 'randomly' from a list of 5 (in this case) images. The wikitext is:
If I take just the list of images
and cut out all the rest of the wikitext and replace with these files, when I
Show preview
(Vorschau zeigen
), I get 2 images and three failuresThe text was updated successfully, but these errors were encountered: