New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper encoding of segment values for Actions reports. #13481

Open
wants to merge 10 commits into
base: 3.x-dev
from

Conversation

Projects
None yet
4 participants
@diosmosis
Copy link
Member

diosmosis commented Sep 25, 2018

Changes:

I am not sure if this fix will have repercussions so I haven't modified the tests. Will do so after someone else gives their opinion.

Fixes #13460

@tsteur

This comment has been minimized.

Copy link
Member

tsteur commented Sep 25, 2018

If it fixes your issue and tests pass probably all good to go. It's almost impossible to review it since the segments are just a mess and we never know where they are urlencoded or urldecoded etc.

@diosmosis

This comment has been minimized.

Copy link
Member

diosmosis commented Sep 25, 2018

It will change a lot of the test output I think (another encoding in the segment property in API output). Think this won't be a BC break? at least not a potentially huge one.

@tsteur

This comment has been minimized.

Copy link
Member

tsteur commented Sep 25, 2018

Think this won't be a BC break?

I really don't know. As long as we can still handle the segments urlencoded and urldecoded it should be fine.

@fdellwing

This comment has been minimized.

Copy link
Contributor

fdellwing commented Sep 26, 2018

Applied the patch locally and it resolves the issue for me.

@tsteur

This comment has been minimized.

Copy link
Member

tsteur commented Dec 6, 2018

#13460 (comment)

Don't have a fix yet, but found the cause of the error. Actions are stored urlencoded in the log_action table, but the value in the segment when opening the segmented visitor log/transitions modal/etc. is not.

I'm not sure this is the case? That might have been the case quite a while ago but shouldn't be the case anymore? at least for me it was stored normally
image

or you mean the URL is encoded after tracking?

@@ -26,6 +26,6 @@
<exit_rate>100%</exit_rate>
<avg_time_generation>0.333</avg_time_generation>
<url>http://www.example.org/page</url>
<segment>pageUrl==http%3A%2F%2Fwww.example.org%2Fpage</segment>
<segment>pageUrl==http%253A%252F%252Fwww.example.org%252Fpage</segment>

This comment has been minimized.

@tsteur

tsteur Dec 6, 2018

Member

is this double encoded on purpose?

This comment has been minimized.

@diosmosis

diosmosis Dec 13, 2018

Member

yes, because one decode is for the condition, and one for the value

This comment has been minimized.

@mattab

mattab Dec 31, 2018

Member

I don't think it's correct, would expect to have here either:
pageUrl==http%253A%252F%252Fwww.example.org%252Fpage (current behavior)
or
pageUrl%3D%3Dhttp%253A%252F%252Fwww.example.org%252Fpage (current behavior URL encoded)

but pageUrl==http%253A%252F%252Fwww.example.org%252Fpage wouldn't be a correct value since Matomo would then segment based on Page URL http%3A%2F%2Fwww.example.org%2Fpage which wouldn't exist (didn't check but it's possible it currently works anyway since we try the url decoded version in

matomo/core/Segment.php

Lines 99 to 108 in 2020b12

}
// First try with url decoded value. If that fails, try with raw value.
// If that also fails, it will throw the exception
try {
$this->initializeSegment(urldecode($segmentCondition), $idSites);
} catch (Exception $e) {
$this->initializeSegment($segmentCondition, $idSites);
}
}
but still: we wouldn't want to show this double encoded URL in the API output)

This comment has been minimized.

@diosmosis

diosmosis Dec 31, 2018

Member

It is required to have the value encoded 3 times because it is urldecoded 3 times, see the PR description for more info.

Most URLs don't need to be double encoded, but if the value includes a value that can be decoded, like a plus sign, then it will be interpreted incorrectly if not double encoded. In this case the old API output would be wrong.

@tsteur

This comment has been minimized.

Copy link
Member

tsteur commented Dec 6, 2018

I've looked at the PR for quite a while but the PR is quite complex and not trivial to understand especially re side effects etc. Left some comments though. Do you think there would be otherwise a solution where we don't need to change anything except for maybe the segment expression?

@diosmosis diosmosis force-pushed the 13460-segment-encoding branch from 3d48cc2 to 95313cb Dec 12, 2018

@diosmosis

This comment has been minimized.

Copy link
Member

diosmosis commented Dec 13, 2018

I'm not sure this is the case? That might have been the case quite a while ago but shouldn't be the case anymore?

It's not the case, it looked like it was, but I think I tracked something wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment