Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing images in assets folder #399

Open
3 tasks
khalilcodes opened this issue Mar 29, 2023 · 11 comments
Open
3 tasks

Missing images in assets folder #399

khalilcodes opened this issue Mar 29, 2023 · 11 comments
Assignees

Comments

@khalilcodes
Copy link
Contributor

khalilcodes commented Mar 29, 2023

After using the https://github.com/flowershow/wordpress-to-markdown script to convert pages/posts to markdown files (with assets) for lifeitself migration, some pages have links to images and these are missing in assets/images folder.

On further investigation it was found that ALL image paths including external ones like <img src="https://i.imgur.com/OPrvrNS.png" /> were converted to relative paths ![](/assets/images/OPrvrNS.png) and downloaded to assets/images folder. This has led to having a larger assets folder with images that were not needed locally in the first place.

Main issue

Although external images are parsed to relative paths and even downloaded, some images for example from https://artearthtech.wordpress.com failed to download and therefore do not render on page.

This issue was first found for the page https://lifeitself.org/blog/2019/01/22/ken-wilber-integral-spirituality where in wordpress had images referencing to files on artearthtech.files.wordpress.com eg. https://artearthtech.files.wordpress.com/2020/03/ken-wilber-map.jpg?w=776 and was fixed for that page in commit abc765d.

Acceptance

  • Missing relative images are present in assets/images
  • Images are rendered on page

Tasks

  • Find and add images referenced in pages but are missing in assets/images folder

Notes

  • We can fix parsing of external images in https://github.com/flowershow/wordpress-to-markdown ? <1hr tried and tested
    • This would mean we'd have to replace all posts/pages (before 2023) with the fixed ones, but what about modified content in these pages if any. Not sure if worth doing.
    • Pro is it removes external linked images from assets that aren't needed.

Status update 13 Sep 2023

Here is a list of all the image embeds that do not have a corresponding image file. Some of them just used the wrong extensions.

Image Path File Status
assets/images/how-much-is-enough_2256291b.jpg blog/2017/02/05/summary-how-much-is-enough-skidelsky-2012 Missing
assets/images/man_walking.png blog/2017/06/28/the-middle-way-what-were-about ✅ Wrong extension used. Fixed.
assets/images/nafeeshamid.png blog/2019/04/17/blind-spots-2-returning-to-mystery ✅ Wrong extension used. Fixed.
assets/images/esteban-post02.jpeg blog/2019/06/16/is-the-answer-to-our-tech-problems-another-app ✅ Wrong extension used. Fixed.
assets/images/nafeeshamid.png blog/2019/06/17/explaining-the-cognitive-triggers-for-extremist-violence-through-brain-scanning ✅ Wrong extension used. Fixed.
assets/images/November-news-blindspots.jpg blog/2019/11/30/2019-november-newsletter Missing
assets/images/NOvember-news-future-hub.jpg blog/2019/11/30/2019-november-newsletter Missing
assets/images/November-news-dino.jpg blog/2019/11/30/2019-november-newsletter Missing
assets/images/November-news-Hannah.jpg blog/2019/11/30/2019-november-newsletter Missing
assets/images/NOvember-news-Liam.jpg blog/2019/11/30/2019-november-newsletter Missing
assets/images/November-news-wordle-two.png blog/2019/11/30/2019-november-newsletter Missing
assets/images/CAintegraltheoryblog.png blog/2019/12/13/contemplative-activism-primer-the-pre-gathering-read Missing
assets/images/2020-2-02-blog-aet-team-faces-window.JPG blog/2020/01/24/ways-of-being-for-2020-aet-winter-sprint-january-2020 Missing
assets/images/2020-02-02-blog-aet-ways-of-being.JPG blog/2020/01/24/ways-of-being-for-2020-aet-winter-sprint-january-2020 Missing
assets/images/2020-02-02-aet-focus-2020.JPG blog/2020/01/24/ways-of-being-for-2020-aet-winter-sprint-january-2020 Missing
assets/images/whatsapp-image-2020-08-19-at-14.46.51-1.jpg blog/2020/09/10/more-than-just-bricks-and-mortar-bergerac-build-festival-2020 Missing. Similar image name: whatsapp-image-2020-08-19-at-14.46.51-1-1
@rufuspollock
Copy link
Member

I've also noticed missing images elsewhere e.g. on https://lifeitself.org/hubs/berlin

See for comparison the old page: https://web.archive.org/web/20220520195709/https://lifeitself.us/hubs/berlin/ which has a lot more images.

Aside: it also has a nice hero image - i wonder if we can work out how to do pages like that ... (that's another issue)

@rufuspollock
Copy link
Member

rufuspollock commented Apr 4, 2023

@khalilcodes

On further investigation it was found that ALL image paths including external ones like <img src="https://i.imgur.com/OPrvrNS.png" /> were converted to relative paths and downloaded to assets/images folder. This has led to having a larger assets folder with images that were not needed locally in the first place.

Actually, this was probably ok in most cases since these are our pictures and having them locally is good (though it would be nice to know - if easy to do - to know what external files got pulled locally).

@rufuspollock
Copy link
Member

@khalilcodes is there a simple way for us to extract a list of all images linked from markdown (e.g. grep *.png and grep *.jpg) and then just check which 404 and then fix those. That would be enough for now.

Re fixing using the new updated script let's not do that because as you say issue with over-writing changes:

This would mean we'd have to replace all posts/pages (before 2023) with the fixed ones, but what about modified content in these pages if any. Not sure if worth doing.

@nathenf
Copy link
Contributor

nathenf commented Apr 30, 2023

@khalilcodes @rufuspollock I have moved this into the next iteration. I am not sure on the status of this so maybe one of you can update accordingly 😄

@rufuspollock
Copy link
Member

@khalilcodes can you look at this briefly and see what status is - would be great to close this out.

@khalilcodes
Copy link
Contributor Author

@rufuspollock running grep I can see there are quite a few images that are missing and weren't downloaded. https://web.archive.org/web/20220520195709/https://lifeitself.us doesn't seem to open at my end. Is there an archive of all images there ?

@rufuspollock
Copy link
Member

@rufuspollock running grep I can see there are quite a few images that are missing and weren't downloaded. web.archive.org/web/20220520195709/https://lifeitself.us doesn't seem to open at my end. Is there an archive of all images there ?

the archive.org link was just illustrative - though i could imagine you could use it to identify the original link for missing images ...

@khalilcodes
Copy link
Contributor Author

@rufuspollock missing Images added in PR #521. There are some broken links to images in the following pages and those could not be found.

@rufuspollock
Copy link
Member

@khalilcodes 👏

Re the 2 missing: were you able to them in archive.org/web versions? And can you link the archive.org/web versions of those pages that you found.

rufuspollock pushed a commit that referenced this issue May 15, 2023
… in jan.


This PR adds all images missing in assets folder and referenced in the markdown pages.

* Add all images into `content/assets/images` folder - (198 images and ~95mb)
@olayway
Copy link
Contributor

olayway commented Sep 18, 2023

@rufuspollock, @nathenf, @laurenwigmore I've created a list of all the image embeds lacking corresponding image files, just to be sure (see the table in the description above). I've fixed some of them but the rest is really missing. Any hints where I could find them?

@rufuspollock
Copy link
Member

@olayway i note Khalil mentioend the november newsletter images were just missing entirely and not even on the really old wordpress site. For thos, only hope here would be looking in wayback machine.

Personally, i think this issue is low priority for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ⌛ Someday Maybe
Development

No branches or pull requests

4 participants