You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 14, 2023. It is now read-only.
Building on #602 and #593, we now want to figure out the requirements for allowing image-based analysis within topics. I reviewed all the old conversations I could find and collected these notes worth keeping in mind:
people like the transfer-learning-based clustering approaches
the same images are very often used, but cropped differently in different stories
the social-sharing image (og:image) is often different from all the others
the idea of "top-image" is conceptually helpful for analysis
the image ResNet50 similarity stuff uses small (224px square) images
people like the ability to see full size images
With regards to back-end pipeline, I'd translate those notes into requirements and next steps like this:
extract all the story-related image URLs from each story in a topic
should we make this optional at the topic level? perhaps to save cost?
decide whether to use Newspaper3k or roll-our-own
make sure we only extract them once for each story
decide whether deduplication is worth solving or not
within a story, mark the "top image" and "social sharing image(s)"
create a DB table structure that allows for this
store full size images and 224px size images by default
@pypt suggests an S3 store for this, re-using a solution we use for other things
do some tests to estimate ongoing cost and growth rate
specify API endpoints for retrieval of said images
my first thought is to just add an images property to any topic story list results (that'd let us render image tree maps quickly)
A separate task is to design an approach to automatically training an image-embeddings model based on the ResNet50 transfer learning approach we learned from Leon (for each snapshot). I think that still needs investigating and research work; particularly on which similarity algorithm to use and on what to present users to support research. Sometimes they say they want his "mosaics", but other times it seems they want clusters.
What did I miss? Thoughts on these requirements?
The text was updated successfully, but these errors were encountered:
Building on #602 and #593, we now want to figure out the requirements for allowing image-based analysis within topics. I reviewed all the old conversations I could find and collected these notes worth keeping in mind:
og:image
) is often different from all the othersWith regards to back-end pipeline, I'd translate those notes into requirements and next steps like this:
images
property to any topic story list results (that'd let us render image tree maps quickly)A separate task is to design an approach to automatically training an image-embeddings model based on the ResNet50 transfer learning approach we learned from Leon (for each
snapshot
). I think that still needs investigating and research work; particularly on which similarity algorithm to use and on what to present users to support research. Sometimes they say they want his "mosaics", but other times it seems they want clusters.What did I miss? Thoughts on these requirements?
The text was updated successfully, but these errors were encountered: