Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

WebSearch: new option to include title words in the record URLs #71

Closed
tiborsimko opened this Issue · 4 comments

3 participants

@tiborsimko
Owner

Originally on 2010-05-26

It may help moderately with search engine rankings if detailed record
pages contain important words (from title). That is, when we set up
nicely not only the page title and the meta header section as we
already do, but also put the title into the URL as well.

To achieve this, we can introduce a new config variable named like
CFG_WEBSTYLE_DETAILED_RECORD_LINKS that would have values like:

  • 0 (=normal style):

    http://site.com/record/32
    http://site.com/record/32/holdings/

  • 1 (=embed titles in URLs):

    http://site.com/record/32-basic-nuclear-electronics
    http://site.com/record/32-basic-nuclear-electronics/holdings/

Here, for simplicity, the URL dispatcher can still use only the record
ID as significant when deciding about the dispatch, so it could ignore
any text coming after the record ID and a dash. Or else it could use
that text in order to fuzzy-check the title. The latter bit may be
interesting for lets-provide-meaningful-URLs use case discussed
elsewhere. (e.g. DOI instead of recID)

@kaplun
Collaborator

Originally on 2010-06-16

Indeed as discussed IRL, we should probably raise a 404 when the title used is wrong (to avoid misuses, e.g. for SPAM purposes).

This can be implemented via a tmpl_ function so that the final admin would be able to use whatever algorithm to produce the semantic part.

A possible default implementation might be to take the 4 longest words in the title and use them in order of appearence (e.g.

"Search for the minimal universal extra dimension model at the LHC with ps =7 TeV"

would become

"search-minimal-universal-dimension"

)

Moreover a function should be implemented to check that all this words actually are part of the title. A problem would arise if the record has been modified. In that case, previous version of the record should be checked for. This would be computationally heavy, but would happen rarely.

@jirikuncar
Owner

Originally by @jeromecaffaro on 2011-05-30

As discussed IRL, raising 404 when title does not match is problematic for cases where the title has been updated. If we still want to resolve previous URLs to the record (but avoid misuses) we would have to a) check the titles in the history of the record or b) keep a list of resolved URLs. Alternatively we can c) accept any string (even bad ones) but immediately redirect to the canonical URL so that misuses are less visible.

@tiborsimko
Owner

Do we want to address this in next?

@kaplun
Collaborator

I think it is still a cool functionality to have and is rendered easier to implement with Flask. So it could be nice to still keep it as a wanna-implemented-feature in case one is bored and idle :-)

@jirikuncar jirikuncar added this to the someday milestone
@jirikuncar jirikuncar added r_someday and removed in_triage labels
@jirikuncar jirikuncar modified the milestone: someday
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.