New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search: Change Newest Pages to Newest Posts, comprising only pages which appear in a web feed #74
Comments
Note that the logic for picking the "best" web feed for a site has just been changed - see #77 . This means it'll be 12 Nov 2022 before all sites are updated with the new "best" feed, and 10 Dec 2022 before all sites are indexed with the in_web_feed flag for the new "best" feed. Should start getting enough data to start testing early/mid Nov and deploy around late Nov though. Note also that, as per #77 , a lot of sites only have not very useful feeds, e.g. auto-generated feeds for specific tags which the site owner might not even be aware exist, so the in_web_feed flag might not be as useful as initially hoped. Will therefore need to do some side-by-side comparisons of the Newest Pages/Posts with and without the in_web_feed flag before confirming whether to make this change or not. |
Some stats:
Replacing "published_date:*" with "in_web_feed:true" in mandatory_filter_queries_newest does shorten the list, perhaps too much for now. I've made another change to improve the web_feed detection (and therefore the number of pages identified as in_web_feed) as per #54, so will revisit once that has had time to take effect. |
Latest stats:
So in theory more articles should appear if switching from published_date to in_web_feed. However, there are a number of issues with this approach:
|
Rename Newest Pages to Newest Posts, and change the filter which generates them from the current fq=published_date:* (which lets some non article content through, e.g. the PostgreSQL home page whenever it is updated) to fq=in_web_feed:true.
This aims to make Newest Posts more of an article feed, and therefore hopefully more useful, especially when combined with #34 which should mean full listings have new posts added daily.
Note that #71 has only just been implemented, and as per latest comment on #64 the web feed field has been renamed, so it could take up to 8 weeks for all the pages which are in a web feed to be identified, i.e. 4 weeks for the web feeds to be populated in the new web_feed field, and a further 4 weeks for all the in_web_feed fields to be populated.
The text was updated successfully, but these errors were encountered: