Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing: Add new field to indicate if a page is included in a web feed #71

Closed
m-i-l opened this issue Sep 24, 2022 · 1 comment
Closed
Labels
enhancement New feature or request

Comments

@m-i-l
Copy link
Contributor

m-i-l commented Sep 24, 2022

This could be useful to e.g.:

It could be implemented via a new in_web_feed boolean in the Solr schema, perhaps being combined with the existing web_feed attribute (currently only set on the home page as per #64 ) to indicate which feed the article is from. Would need to get a list of all the pages in a site's feed first (this would be needed for identifying new pages for #34 and could be implemented alongside #54 ).

@m-i-l m-i-l added the enhancement New feature or request label Sep 24, 2022
@m-i-l m-i-l changed the title Indexing: Add new field to indicate if a page is included in an RSS feed Indexing: Add new field to indicate if a page is included in a web feed Sep 25, 2022
m-i-l added a commit that referenced this issue Oct 2, 2022
…ge is included in a web feed. Also updated comments.
m-i-l added a commit that referenced this issue Oct 2, 2022
…new field to indicate if a page is included in a web feed, and added support for #34 Add an incremental reindex
@m-i-l
Copy link
Contributor Author

m-i-l commented Oct 2, 2022

Implemented:

  • The web feed is parsed at the start of the indexing, as per Indexing: Crawl from web feed (RSS/Atom) #54 , and a list of links in the feed held in memory during indexing for a site.
  • Whenever a page is indexed, it is checked to see if it is the list, and if it is the in_web_feed field is populated in Solr.

Note that, as per usual, the new field won't be fully populated until all sites are reindexed. Note also that, given the web feed URL is populated by indexing, the first time a site is indexed it won't have it to seed the indexing, so the in_web_feed won't be populated until a new site's second index.

The 3 bullet points listed above for features enabled by this are tracked separately:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant