Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periods (.) not stripped from page name #1305

Open
adrianbj opened this issue Jan 15, 2021 · 9 comments
Open

Periods (.) not stripped from page name #1305

adrianbj opened this issue Jan 15, 2021 · 9 comments

Comments

@adrianbj
Copy link

adrianbj commented Jan 15, 2021

Short description of the issue

A client just entered this title for a page:

Here.After by John Smith

and it resulted in this page name:

here.after-by-john-smith

Expected behavior

The period would be replaced by a -

Actual behavior

The period remained in the page name.

@ryancramerdesign
Copy link
Member

ryancramerdesign commented Jan 15, 2021 via email

@adrianbj
Copy link
Author

I know they are technically valid for a URL, but I didn't realize PW kept them, because it automatically strips them when they are at the end of a sentence.

I find the inconsistency a bit strange but if that is intended, then perhaps the description for the name field, which is currently:

Any combination of letters (a-z), numbers (0-9), dashes or underscores (no spaces).

should be modified to say that periods are also allowed.

@teppokoivula
Copy link

It's a relatively common use case to use dots in page names or URL segments to fake files, i.e. sitemap.xml, theme.css, and so on. Changing this could break a lot of old instances 🙂

This is the first time I've noticed that the description is inaccurate, but seconded; that should be updated.

@adrianbj
Copy link
Author

I have noted this over here: #1160 (comment) as well, but just to follow up here - how can you use the $sanitizer API to generate the name when using the Sanitizer::translate option removes periods, but if you don't use that option, then you get extra unwanted dashes.

@adrianbj
Copy link
Author

adrianbj commented Mar 1, 2021

Sorry to come back to this, but I am seeing page names like this:

coral-reef-condition-a-status-report-for-the-u.s-virgin-islands

which I think just looks ugly and it's also inconsistent, because the title of the page is:

Coral reef condition: A status report for the U.S. Virgin Islands

so, it's removing the period after the "S", but not after the "U"

It really should be:

coral-reef-condition-a-status-report-for-the-us-virgin-islands

What I ended up doing is setting these:

$config->pageNameCharset = 'UTF8';
$config->pageNameWhitelist = '-abcdefghijklmnopqrstuvwxyz0123456789';

In addition to removing the period from the whitelist, I also removed the underscore (which I think also makes sense).

Anyway, this seems to be working OK like this - it results in u-s which isn't ideal so I would suggest that the core should replace "." with nothing, but ". " with a dash.

@Toutouwai
Copy link

Toutouwai commented Mar 1, 2021

Just seeking some clarification...

@adrianbj, if I understand right, this issue is specifically focused on when PW derives a page name from a page title? That is, you're not proposing changes to the characters that are allowed in a page name, or to have $sanitizer->pageName() filter out periods. And the $config->pageNameWhitelist you mentioned above is a personal workaround rather than a suggestion for changes to the core. Is that correct?

I have lots of pages with names along the lines of what @teppokoivula described, so I definitely wouldn't want to see periods disallowed from page names across the board. But I agree that there is room for improvement (and possibly hooks/settings to allow individual customisation) for how PW derives page names from titles. The two cases of this that I can think of are:

1. The JS that fills out the Name field according the Title field value in Page Add.
2. The code that populates the page name when a new page is created via the API with a title but no name supplied.

Maybe there could be a new $config->pageNameFromTitleWhitelist value that could inform 1 and 2 above? And/or a hookable $sanitizer->pageNameFromTitle() method?

So that periods can be used in explicitly set page names but (optionally) ignored when names are derived from titles.

@adrianbj
Copy link
Author

adrianbj commented Mar 1, 2021

Hi @Toutouwai - I think you have my thoughts correct:

  1. I am not suggesting that periods should be disallowed, but I think they should be removed when automatically generating the name from the title because they look awful when they are out of place.

  2. Yes, my use of $config->pageNameWhitelist is a personal workaround - definitely not a change for the core.

So that periods can be used in explicitly set page names but (optionally) ignored when names are derived from titles.

Exactly!

@ryancramerdesign
Copy link
Member

@adrianbj See the page name configuration at Modules > Configure > InputfieldPageName. That's where you can specify any installation-specific conversions you want it to perform such as this. In your case, you would enter this on on a line to make it substitute a dash where it would usually have a period:

.=-

@adrianbj
Copy link
Author

@ryancramerdesign - yes, that could also help a little, but my modification of pageNameWhitelist is also doing that.

What are your thoughts about this comment I made:

coral-reef-condition-a-status-report-for-the-u.s-virgin-islands
which I think just looks ugly and it's also inconsistent, because the title of the page is:
Coral reef condition: A status report for the U.S. Virgin Islands
so, it's removing the period after the "S", but not after the "U"

If PW is treating a period with a space after it differently to a period with another character after it, then perhaps there could be some further improvements. Maybe only keep periods if they are before the last section of characters with no more spaces so that things like my-page.html could still work, but it would automatically take care of u.s. in a title.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants