Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slugs: child of hierarchical pages gives 404 #1180

Closed
herrvigg opened this issue Jun 4, 2022 · 40 comments
Closed

Slugs: child of hierarchical pages gives 404 #1180

herrvigg opened this issue Jun 4, 2022 · 40 comments

Comments

@herrvigg
Copy link
Collaborator

herrvigg commented Jun 4, 2022

@spleen1981 I tested with a couple of parent-child pages to try debugging the problems after QTS migration encountered by @bagaweb. However even before migration, with commit 3a54f9e I don't manage to make it work:

  • parent OK localhost://h1-en/
  • child 404 page not found with an URL like localhost://h1-en/h2-en/ 😕
  • If I save the child without parent, then it works as localhost://h2-en/

The pages were created with EN titles hierarchical parent EN / hierarchical child EN and equivalent for FR.
I set only the slugs for EN and FR, the other languages inherited automatic slugs as shown below. All look fine from the admin side.

With a breakpoint in filter_request I get this in the resulting $query for the /h1-en/h2-en/ request:

array (
  'pagename' => 'hierarchical-parent-en/hierarchical-child-en',
  'page' => '',
  'attachment' => 'h2-en',
)

The languages URL seem to be correctly set in

   'current_url' => 
  array (
    'en' => 'http://localhost/h1-en/h2-en/',
    'fr' => 'http://localhost/fr/h1-fr/h2-fr/',
    'de' => 'http://localhost/de/english-hierarchical-parent-en/english-hierarchical-child-en/',
    'it' => 'http://localhost/it/english-hierarchical-parent-en/english-hierarchical-child-en/',
    'es' => 'http://localhost/es/english-hierarchical-parent-en/english-hierarchical-child-en/',
  ),
))

This looks pretty good, but I'm not sure about the pagename in the query.
Where to look next?

@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 4, 2022

I noticed there's a get_page_uri defined in QTX_Slugs, but it's not used everywhere. In two places it's the WP function get_page_uri (same name) that is called. It's unclear which one should be used.

If i replace the two WP calls with $this->get_page_uri the pagename returned by filter_request/query is different:

array (
  'pagename' => 'h1-en/h2-en',
  'page' => '',
  'attachment' => 'h2-en',
)

Unfortunately this doesn't solve the issue. It's quite unclear what the pagename means for slugs (hierarchical page or not).

@spleen1981
Copy link
Contributor

@herrvigg result from WP get_page_uri:
array ( 'pagename' => 'hierarchical-parent-en/hierarchical-child-en', 'page' => '', )
is what is needed there (in filter_request), because it's the original slugs not modified by the module. $this->get_page_uri returns slugs already modified.
Not sure where the 'attachment' => 'h2-en', comes from in your case; except this, everything looks good in the first post.
While playing with those get_page_uri I got stucked in 404 as well, even going back to HEAD (I'm testing with latest commit, after migration).
To restore I had to empty wp-content/cache folder. Can you try the same?
I'm doing exactly the same tests as yours except my site runs in a subfolder (e.g. http://localhost:8080/test/h1-en/h2-en/) and it's working.

@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 5, 2022

I don't have wp-content/cache. Could that be from another plugin?

Strange you don't have the attachment set. I've just been using the Page Attribute to set the parent.

image

@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 5, 2022

If I set attachment to empty string it works!

@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 5, 2022

The problem is happening in parse_request from wp-includes/wp-class.php.

Line 236-251

if ( $wp_rewrite->use_verbose_page_rules && preg_match( '/pagename=\$matches\[([0-9]+)\]/', $query, $varmatch ) )
<do some checks...>
  // Got a match.
  $this->matched_rule = $match;
  break;

The match looks correct, but not the query. It doesn't pick the correct one.

match: "[^/]+/([^/]+)/?$"
query: "index.php?attachment=$matches[1] "

Can you track back which query you get in your case?

@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 5, 2022

I suspect a problem with the check on get_page_by_path that should give a different query before it picks the one with attachment.

@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 5, 2022

Before the query with attachment, there's this one coming:

index.php?pagename=$matches[1]&page=$matches[2]

There's a call to get_page_by_path( 'h1-en/h2-en' ) -> null so it ignores it and picks the attachment after that.

@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 5, 2022

@bagaweb To understand if your issue is the same, could you add these two lines in file modules/slugs/src/slugs-class-slugs.php:

filter_request($query)

// (seek around lines 388)
update_post_caches( $cache_array, 'page' ); // caching query :)
wp_cache_delete( 'qts_page_request' );

var_dump($query);             // <--- debug info
$query['attachment'] = '';    // <--- potential workaround

$query['pagename'] = get_page_uri( $page );
$function          = 'get_page_link';

@spleen1981
Copy link
Contributor

parse_request from wp-includes/class-wp.php will not find any post with get_page_by_path, as those slugs are not the real one but those overridden by slugs module.
Later query vars are filtered with slugs module class method query_vars, where new search is performed and $wp->request $wp->matched_rule $wp->matched_query are overwritten as needed.
This method uses $this->get_page_by_path instead of WP one, which calls $this->get_page_id_by_path which is suppose to provide the correct post object given the modified slugs.
So I would check what is going on in $this->query_vars and $this->get_page_id_by_path with your test.

@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 5, 2022

In slugs $this->get_page_id_by_path returns the correct page. The problem is that the $query has already been set earlier with this attachment, before query_vars is called, and it remains there. I'm not sure that removing it would always work.

What query do you have in your case, that you will find in $wp->matched_query, and why is it different? It seems dependent on the order of the rewrite rules.

@spleen1981
Copy link
Contributor

If you mean $query var, before the filter in class-wp is executed is "index.php?attachment=$matches[1]&embed=true", but I guess that is the residual value from the previous loop and should not matter.
At this point:

  • $this->request="h1-it/h2-it"
  • $this->matched_rule=""
  • $this->matched_query=""

After the filter is applied:

  • $this->request="h1-it/h2-it"
  • $this->matched_rule="(.?.+?)(?:/([0-9]+))?/?$"
  • $this->matched_query="pagename=h1-it%2Fh2-it&page="

which is the expected value.

Hence the thing seems to rely on that filter.

@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 5, 2022

OK I think I start to understand how slugs works... it's an absolute hack through query_vars... to patch the $wp object between different calls. To be done in a clean way all of this should be done in WP get_page_by_path through a filter - that doesn't exist yet.

WP state before query_vars filter

matched_rule = "[^/]+/([^/]+)/?$"
matched_query = "attachment=h2-en"
query = array (
  'attachment' => 'h2-en',
)

State after query_vars filter (hacked by slugs)

matched rule ="(.?.+?)(?:/([0-9]+))?/?$"
mached_query  = "pagename=h1-en%2Fh2-en&page="

But after that the query_vars are updated, the attachment comes through $perma_query_vars that was set before calling query_vars filter.

Finally the request filter is called with query_vars coming from this WP context:

WP::__set_state(array(
   'public_query_vars' => 
  array (
    0 => 'm',
    1 => 'p',
    ...
    50 => 'sitemap-stylesheet',
  ),
   'private_query_vars' => 
  array (
    0 => 'offset',
    1 => 'posts_per_page','
    ...
    25 => 'fields',
  ),
   'extra_query_vars' => 
  array (
  ),
   'query_vars' => 
  array (
    'attachment' => 'h2-en',
  ),
   'query_string' => '',
   'request' => 'h1-en/h2-en',
   'matched_rule' => '(.?.+?)(?:/([0-9]+))?/?$',
   'matched_query' => 'pagename=h1-en%2Fh2-en&page=',
   'did_permalink' => true,
))

The query_vars no longer matches the matched_query that was hacked by slugs.

@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 5, 2022

What is surprising, in your case your WP context before query_vars is different, in your case

matched_rule=""
matched_query=""

In my case

matched_rule = "[^/]+/([^/]+)/?$"
matched_query = "attachment=h2-en"

The problem is that this "attachment" match comes back later in the query vars.
How can it match the "" rule in your case? That's weird.

@spleen1981
Copy link
Contributor

'attachment' => 'h2-en',

Still I don't understand how this is matched by the regex in the first place.

The query_vars don't match the matched_query that was hacked by slugs anymore.

In the old plugin matched_query was overwriting the entire $query arg, breaking any possible custom query var.
We have introduced in 08c6881 the merge of the two to fix that issue.
If we don't find the root cause of the false match we may "filter" the not compatible keys depending on matched_query content (we are already doing it with name for pages).
Does not seem the case to breake custom query vars again..

@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 5, 2022

Still I don't understand how this is matched by the regex in the first place.

It's a valid match after going through all the rewrite rules.

if ( preg_match( "#^$match#", $request_match, $matches ) ||
preg_match( "#^$match#", urldecode( $request_match ), $matches ) ) {

In that case

match = "[^/]+/([^/]+)/?$"
matches = array (
  0 => 'h1-en/h2-en',
  1 => 'h2-en',
)
query = "index.php?attachment=$matches[1]"

In your case, it's because it doesn't find a valid match that the context hacked by slugs is valid...

@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 5, 2022

In my case there are 97 rewrites rules

array (
  '^wp-json/?$' => 'index.php?rest_route=/',
  '^wp-json/(.*)?' => 'index.php?rest_route=/$matches[1]',

  <many...>

  '([^/]+)(?:/([0-9]+))?/?$' => 'index.php?name=$matches[1]&page=$matches[2]',
  '[^/]+/([^/]+)/?$' => 'index.php?attachment=$matches[1]',
  '[^/]+/([^/]+)/trackback/?$' => 'index.php?attachment=$matches[1]&tb=1',
  '[^/]+/([^/]+)/feed/(feed|rdf|rss|rss2|atom)/?$' => 'index.php?attachment=$matches[1]&feed=$matches[2]',
  '[^/]+/([^/]+)/(feed|rdf|rss|rss2|atom)/?$' => 'index.php?attachment=$matches[1]&feed=$matches[2]',
  '[^/]+/([^/]+)/comment-page-([0-9]{1,})/?$' => 'index.php?attachment=$matches[1]&cpage=$matches[2]',
  '[^/]+/([^/]+)/embed/?$' => 'index.php?attachment=$matches[1]&embed=true',
)

It seems you don't have that one with the attachment. I don't know where it comes from.

@spleen1981
Copy link
Contributor

ok let me try to improve this $query = array_merge( wp_parse_args( $wp->matched_query ), $query ); in filter_request

@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 5, 2022

This attachment is also checked in WP get_page_by_path, though it's not the problem here.
The search from postname = 'h1-en/h2-en' would fail as you explained earlier. However, the SQL also checks the attachments like this:

SELECT ID, post_name, post_parent, post_type
FROM wp_posts
WHERE post_name IN ('h1-en','h2-en')
AND post_type IN ('page','attachment')

An idea could be to create attachments for the slugs, for every language. But that would require more updates in DB when the slugs are saved.

@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 5, 2022

Potentially this could also be interesting to avoid many hacks with the query_vars in slugs. Something to think about, perhaps not for this initial version.

@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 5, 2022

In principle you can reproduce the problem if you add this rewrite rule (at any place since you have a 404 first in WP but not in slugs):

'[^/]+/([^/]+)/?$' => 'index.php?attachment=$matches[1]'

@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 5, 2022

For @bagaweb it could be something similar, or some problem with a cache. I also realized there's another internal WP cache for the posts, used in get_page_by_path.

$last_changed = wp_cache_get_last_changed( 'posts' );
$hash      = md5( $page_path . serialize( $post_type ) );
$cache_key = "get_page_by_path:$hash:$last_changed";
$cached    = wp_cache_get( $cache_key, 'posts' );

Saving the post again would invalidate that WP cache I guess (supposedly updating the cache key for that group).

@spleen1981
Copy link
Contributor

@herrvigg / @bagaweb can you try this branch

@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 5, 2022

Yes, it works (after migration).
I'm wondering about the hard-coded list, why not take the public_query_vars?
Couldn't we simply filter those vars already in query_vars? It's the main purpose of this filter.
Currently, slugs uses this most as an action to hack the wp object. The return value is unclear (in fact there's a TODO).

No need for a pull request. Once we're happy with the branch I can merge it and link it to this issue #1180.

@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 5, 2022

If the attachment is removed from the list given by query_vars in slugs, it won't be put in the query by WP. So I think that would be a cleaner fix.

@spleen1981
Copy link
Contributor

spleen1981 commented Jun 5, 2022

public_query_vars may contain additional custom query vars we want to keep, while we want to remove only default WP query vars (which are not available in more elegant way from WP_Query class).
Otherwise we could just discard the entire $query argument to filter_request.

@spleen1981
Copy link
Contributor

I think this will also cover a number of conflicts we are not seeing now...

@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 5, 2022

I came up with this fix 9036c6a in branch https://github.com/qtranslate/qtranslate-xt/tree/fix-slugs-query-vars.

The idea is to filter the query_vars much earlier, by taking the query vars from the previous WP match (such as attachment in this case) and filter the whole list by excluding those vars. But if we find them in the slugs match, we keep them. If they are in neither list, they are kept (public query vars).

If some other plugin alter the query vars after QTX/slugs, it may be a problem, but I don't think this should happen.

@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 5, 2022

I suspect someone in the past had an idea to do some cleanup in the query_vars seen this code:

// TODO check this call, looks bug-prone
        return count( array_diff( $query_vars, $wp->public_query_vars ) ) > 0 ? $query_vars : $wp->public_query_vars;

But it was not the case until now as query_vars was never changed in that function, that check did not make sense (as suggested by the TODO). The 9036c6a fix now makes fully use of the query_vars filtering and the return value becomes clear (filtered query_vars if a match is found with slugs, original value otherwise).

@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 5, 2022

I don't know if this affects filter_request. Maybe this is not necessary anymore?

if ( isset( $query['name'] ) ) {
    unset( $query['name'] );
}

Let me know if this fix works on your side.

@spleen1981
Copy link
Contributor

This is not working for me.
$wp->matched_query in filter_request is now "attachment=&embed=true"

@spleen1981
Copy link
Contributor

EDIT: my bad, wrong test url.
It is working

@spleen1981
Copy link
Contributor

I don't know if this affects filter_request. Maybe this is not necessary anymore?

if ( isset( $query['name'] ) ) {
    unset( $query['name'] );
}

Yes this should not be needed anymore. it's same kind of conflict as attachments.
Also comments needed to be updated. 606f5b3
(@bagaweb to confirm with tests from #1156)

herrvigg added a commit that referenced this issue Jun 5, 2022
Some rewrite rules cause conflicts in slugs due to perma query vars.
Update the query vars returned by `query_vars` filter:
- purge the query vars from the matched query before slugs
- but, keep the new query vars from the matched query found by slugs.

Remove previous workaround for `name` var in `filter_request` (page).
Update doc.
@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 5, 2022

Merged in master.
@bagaweb Could you check with this new patch (again after migration)?

herrvigg added a commit that referenced this issue Jun 6, 2022
Some rewrite rules cause conflicts in slugs due to perma query vars.
Update the query vars returned by `query_vars` filter:
- purge the query vars from the matched query before slugs
- but, keep the new query vars from the matched query found by slugs.

Remove previous workaround for `name` var in `filter_request` (page).
Update doc.
@bagaweb
Copy link

bagaweb commented Jun 6, 2022

Thanks for all your work!

@herrvigg / @bagaweb can you try this branch

I tried this branch, i confirm it works with pages, child pages and custom post types.

Will now try on another website, the one with Cyrillic slugs mentioned in #1156.

@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 6, 2022

Please can you try again on master directly, to be sure? The new patch is slightly different, but it's the same idea.

It would be interesting to see which rewrite rule you had. I still don't understand how the regression is related to the migration though... but what matters is that it's solved for all.

@bagaweb
Copy link

bagaweb commented Jun 6, 2022

I tried again latest master and evertything seems to be working fine!

@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 6, 2022

Sounds great! Finally we are about to release this version.

Just a curiosity, to try understanding what the problem was in your case, could you add these var dumps after line 307 in `modules/slugs/slugs-class-slugs?

if ( isset( $wp->matched_rule ) && isset( $query ) && isset( $matches ) ) {
   var_dump( $wp->matched_query );  // original WP query before slugs
   var_dump( $wp->matched_rule );  // slugs match
   var_dump( $query );  // new slugs query
...

@bagaweb
Copy link

bagaweb commented Jun 6, 2022

Ok I added those lines and tried on a child page, this is the result:

string(30) "attachment=exclusive-apartment" string(24) "(.?.+?)(?:/([0-9]+))?/?$" string(47) "index.php?pagename=$matches[1]&page=$matches[2]"

@Hr0bar
Copy link

Hr0bar commented Jun 6, 2022

Thanks for all the hard work !

@herrvigg
Copy link
Collaborator Author

herrvigg commented Jun 6, 2022

matched_query = "attachment=exclusive-apartment"

OK this looks like the problem I had, a WP match with attachment and that query_var created a conflict later in slugs.
However this should not be related to the migration... I really don't understand how it could have worked before migration without fix.

Anyway, we can close this issue.

@herrvigg herrvigg closed this as completed Jun 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants