Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect link in outlinks #160

Closed
diegoaa opened this issue Dec 21, 2019 · 24 comments · Fixed by #161
Closed

Incorrect link in outlinks #160

diegoaa opened this issue Dec 21, 2019 · 24 comments · Fixed by #161
Labels
Bug Something isn't working

Comments

@diegoaa
Copy link

diegoaa commented Dec 21, 2019

Behavior - Outlinks. The outlinks listed do not have a colon : in them. For eg
https//www.instagram.com/
instead of
https://www.instagram.com/
This makes the browser interpret the link as
http://https//www.instagram.com/
which then opens the site
http://www.https.com//www.instagram.com/
which is not the desired behaviour

@tsteur
Copy link
Member

tsteur commented Dec 21, 2019

Thanks @diegoaa are you using the latest beta version?

@diegoaa
Copy link
Author

diegoaa commented Dec 22, 2019

@tsteur Version 0.3.9

@tsteur
Copy link
Member

tsteur commented Dec 22, 2019

@diegoaa I just tried to reproduce it but it worked nicely for me. Any chance you could send me a link to your website by email to wordpress at matomo.org ? Then I would try & reproduce it there. Could you also double check the link / url is correct in your website?

@diegoaa
Copy link
Author

diegoaa commented Dec 23, 2019

Sent email.

@tsteur
Copy link
Member

tsteur commented Dec 23, 2019

Thanks @diegoaa you can remove the login again. I can reproduce it now. Apologise for this I thought you meant the issue was in the visitor log where we recently fixed this issue. It is not fixed in the actual outlink report though.

image

Segmented visitor log doesn't work either

image

The url seems to be already in the archive so there might be the issue?

image

I suppose the problem is somewhere in the Actions archiving helper.

Moving this issue to the actual Matomo repository as it is not an issue in WP Matomo.

Here is an example request to reproduce:

matomo.php?link=https%3A%2F%2Fwww.instagram.com%2F&idsite=1&rec=1&r=189659&h=15&m=15&s=46&url=https%3A%2F%2Fwww.foobar.com%2Fmy-account%2F&_id=f7d29358cf00f202&_idts=1577067087&_idvc=1&_idn=0&_refts=0&_viewts=1577067087&send_image=1&pdf=1&qt=0&realp=0&wma=0&dir=0&fla=0&java=0&gears=0&ag=0&cookie=1&res=1920x1080&gt_ms=447&send_image=0

@mattab do you know if this is a regression?

refs refs https://github.com/matomo-org/matomo/pull/15233/files

@tsteur tsteur transferred this issue from matomo-org/matomo-for-wordpress Dec 23, 2019
@mattab
Copy link
Member

mattab commented Dec 23, 2019

@tsteur I don't know if it's a regression, but since we never had this issue before afaik, something new must be happening (or indeed a regression)

@tsteur
Copy link
Member

tsteur commented Dec 23, 2019

It looks like it actually works on the demo... also for recent data... not sure why it works there... also works in my local OnPremise installation...
I wonder if it is a WordPress issue after all for some reason...

@tsteur
Copy link
Member

tsteur commented Dec 23, 2019

It actually is a big WP issue... it seems it doesn't write null values in general.... it is turning

into

INSERT INTO wp_matomo_log_action (name, hash, type, url_prefix) VALUES ('%s',CRC32('%s'),'%s','%s')

Where it then add the end has basically %s => '' eg

INSERT INTO wp_matomo_log_action (name, hash, type, url_prefix) VALUES ('https://www.instagram.com/',CRC32('https://www.instagram.com/'),'2','')

instead of ending it with ),'2',null)....

not sure how to fix that...

WP doesn't support null values in bind parameters basically: https://core.trac.wordpress.org/ticket/12819 which is marked as wont fix. One workaround was suggested here: https://wordpress.stackexchange.com/a/143418

@tsteur
Copy link
Member

tsteur commented Dec 23, 2019

@diegoaa this should be fixed in the next release. It may not work for already tracked links/urls but all other URLs.

@diegoaa
Copy link
Author

diegoaa commented Dec 23, 2019

Thanks @tsteur !

@diegoaa
Copy link
Author

diegoaa commented Dec 26, 2019

@tsteur , this is still observed in Version 0.3.10

@tsteur
Copy link
Member

tsteur commented Dec 26, 2019

Sorry, I should have described it a bit better. These outlinks for instragram and facebook are already registered wrong and thus won't correct itself even in newer reports. Only if a different outink was recorded.

The only way to workaround this be probably to make a change in the database if you know how to do this?

update wp_matomo_log_action set url_prefix = null where url_prefix = 0 and `type` = 2 and name like 'https://www.instag%'

The same would work for facebook

update wp_matomo_log_action set url_prefix = null where url_prefix = 0 and `type` = 2 and name like 'https://www.facebook%'

Then future reports will have this correctly set

@diegoaa
Copy link
Author

diegoaa commented Dec 29, 2019

@tsteur I ran the database commands as you suggested, however no change in the outlinks report.

Secondly, even pages on my own domain have showed up in the outlinks report, again without the colon.

@tsteur
Copy link
Member

tsteur commented Dec 29, 2019

Does this happen when you look at todays report or for example at the report of yesterday? Or maybe only when you look at week, month, year?

@tsteur
Copy link
Member

tsteur commented Dec 30, 2019

Secondly, even pages on my own domain have showed up in the outlinks report, again without the colon.

Can you maybe also describe a bit more what you are seeing there? I can't reproduce any such behaviour. Do you think there is otherwise a chance to get access to your WordPress? A user with Matomo Super User role be enough. I don't need any other access to your actual WordPress.

@diegoaa
Copy link
Author

diegoaa commented Dec 30, 2019

@tsteur I have re-enabled the account you accessed earlier. Do look at the outlink reports for the year.

@tsteur
Copy link
Member

tsteur commented Dec 30, 2019

Thanks @diegoaa I just debugged and this is a different issue...

The problem is the site is running on https://www.example.com and the link goes to http://example.com

meaning example.com does not match www.example.com

@diegoaa you can fix this issue by adjusting the links that appear in the outlink report to also use www.. I will also try to look for a fix for this internally.

@mattab Not sure how this is not an issue for more people? I suppose mostly people would use the same domain for URLs in their links. The check for the same host is basically failing here: https://github.com/matomo-org/matomo/blob/3.13.1-b1/js/piwik.js#L3488-L3491

I wonder if we need for every WP installation assume example.com and www.example.com is the same and set these domains as part of the tracker?

@tsteur tsteur reopened this Dec 30, 2019
@tsteur
Copy link
Member

tsteur commented Dec 30, 2019

@diegoaa actually, there is already a setting Do not count subdomains as outlink when you go to Matomo => Settings. Can you enable this setting as a workaround for now?

@tsteur
Copy link
Member

tsteur commented Dec 30, 2019

@mattab I suppose by default otherwise we want to do setDomains(['www.example.com', 'example.com']) if the site url starts with www.? Can obviously also cause issues but be rare I suppose

@tsteur
Copy link
Member

tsteur commented Dec 30, 2019

Or we simply enable Do not count subdomains as outlink as default... or we leave it as it is...

I guess enabling Do not count subdomains as outlink be best since this way people can disable behaviour when needed

@diegoaa
Copy link
Author

diegoaa commented Dec 31, 2019

Thanks, @tsteur I have checked the two boxes as you suggested "Track subdomains in the same website" and "Do not count subdomains as outlink". The outlink report hasn't changed, I'll check again in a day to see if it has.

Did you notice the other domains in the outlink list that had missing colons?

@tsteur
Copy link
Member

tsteur commented Dec 31, 2019

@diegoaa it will only fix it for newly tracked data as it was a tracking issue.

Did you notice the other domains in the outlink list that had missing colons?

Can you remind me what the issue was here?

@diegoaa
Copy link
Author

diegoaa commented Dec 31, 2019

Can you remind me what the issue was here?

Behavior - Outlinks. The outlinks listed do not have a colon : in them. For eg

https//www.instagram.com/

instead of

https://www.instagram.com/

This makes the browser interpret the link as

http://https//www.instagram.com/

which then opens the site

http://www.https.com//www.instagram.com/

which is not the desired behaviour

I ran the database commands as you suggested

update wp_matomo_log_action set url_prefix = null where url_prefix = 0 and type = 2 and name like 'https://www.instag%'

However, the outlinks still are missing colons.

@tsteur
Copy link
Member

tsteur commented Dec 31, 2019

@diegoaa it is working when looking at newer dates by the looks. Eg when you select yesterday's report etc. I have invalidated some of your older reports I think and it might cause them to be reprocessed and then the links might appear correctly in a few hours but I can't guarantee.

The change you made in the DB was basically affecting the raw data but not already generated reports. They may now be regenerated but newer reports, for a new day or new week etc should be fine for sure.

@tsteur tsteur closed this as completed Jan 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
3 participants