Skip to content

Search: Boost relevance scores for pages which appear in a web feed #73

@m-i-l

Description

@m-i-l

The way the crawler works, i.e. just following links, means there is still quite a bit of noise in the index, e.g. some people self-host git repositories on the same domain, or various sites for testing, and all the these are all crawled and added to the index despite not containing any content that is likely to be useful in a search result. (Ideally site owners would edit their robots.txt, or use Manage Site to configure what is crawled, to help keep the index clean.)

There are other issues too, e.g. landing pages like /posts/ being indexed and being returned in the results for a search for a term in the title, as described in #66 .

A part solution is to boost results that are in a web feed, because this is likely to be a useful signal. As per #71 there's now a flag on all content to indicate whether is is part of a web feed or not.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions