New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicate entries in piwik_log_action leading to pages not visible in segments #6436

Closed
voidswitch opened this Issue Oct 13, 2014 · 10 comments

Comments

Projects
None yet
6 participants
@voidswitch

voidswitch commented Oct 13, 2014

Hi,

currently we encountered a strange issue with pages listed in actions->pages, but not if using a segment (which should include this page). Investigating the issue I found duplicate entries in piwik_log_action.

idaction name hash type url_prefix
4114714 example.com/somepage 652463546 1 1
4114715 example.com/somepage 652463546 1 1

In piwik_log_link_visit_action only idaction 4114715 is referenced, and displayed with views in actions->pages; creating a segment matching example.com/somepage gets 0 views for this page.

Something to consider is that, we use 2 balanced Piwik installations for tracking, a setup which should be working fine, but also has the potential to create this kind of issue as type, hash is not unique.

@voidswitch

This comment has been minimized.

voidswitch commented Oct 13, 2014

Looking through ./core/Tracker/TableLogAction.php there seems to be the possibility, that two concurrent request with the same URL could result in duplicate entries.

Consider loadIdsAction executed in parallel on two installations with the same input. There should be a small time window, where both read the current state, create their list of items to be inserted, then insert the same item with different idactions, so that valid entries are created.

@mattab mattab added Major Bug labels Oct 17, 2014

@mattab mattab added this to the Short term milestone Oct 17, 2014

@mattab

This comment has been minimized.

Member

mattab commented Oct 17, 2014

Thanks for the report, we should really fix this issue (although I'm not sure how to fix it).

@voidswitch

This comment has been minimized.

voidswitch commented Oct 23, 2014

If I understand correctly it doesn't make sense to have multiple rows in log_action with the same name/type combination, so adding a unique constraint to log_action for name/type or hash/type and changing insertNewIdsAction to use insert ignore or insert ... on duplicate should fix this.

Although I'm not sure about cleanup of existing data. If duplicates need to be deleted we have to cleanup log_link_visit_action with about 700,000,000 entries

@voidswitch

This comment has been minimized.

voidswitch commented Oct 27, 2014

Another possibility to workaround the issue of multiple Piwik instances writing data seems to be something like

INSERT INTO log_action ( name, hash, type, url_prefix )
SELECT * FROM (SELECT ?, CRC32(?), ?, ?) AS tmp
WHERE NOT EXISTS (
    SELECT name FROM log_action WHERE name = ? and type = ?
) LIMIT 1;

This statement should avoid inserting an entry with the same name and hash as an existing entry without using an UNIQUE constraint.

Please note: I was not able to test this statement (syntax or if it would fix the problem) due to time constraints ;)

@mattab

This comment has been minimized.

Member

mattab commented Oct 28, 2014

@voidswitch thanks for your comments that's useful. Hopefully we get to this ticket in coming weeks as it's causing other bugs such as #4595

@integrationtester

This comment has been minimized.

integrationtester commented Oct 30, 2014

for a cleanup script, you can't just delete the 'duplicate' entries, you have to merge them:

select idaction,name,count(*) from piwik_log_action as action, piwik_log_link_visit_action as linker where linker.idaction_url_ref=action.idaction group by action.idaction order by name;

for each of the names which have multiple idactions:
(in my case i had upwards of 10 ids for the same url)

say i have multiple idactions: 63,90,134,213,180

update piwik_log_link_visit_action set idaction_url_ref=63 where idaction_url_ref in (90,134,213,180);

update piwik_log_link_visit_action set idaction_url=63 where idaction_url in (90,134,213,180);
delete from piwik_log_action where idaction in (90,134,213,180);

@demaya

This comment has been minimized.

demaya commented Dec 10, 2014

Did you solve the problem already? For my most visited page I have no transitions since 2013 👎

I select like 3500 rows with the above mentioned SQL-select statement! It's not that I can manually clean them up...

@fpoepleu

This comment has been minimized.

fpoepleu commented Jan 12, 2015

It would be really great if this were solved. We have the same problem and our databases date back to 2011.
Are there any news?

@diosmosis diosmosis self-assigned this Jan 22, 2015

diosmosis added a commit that referenced this issue Jan 28, 2015

Fixes #6436, fix concurrency issue regarding duplicate actions by alw…
…ays using least idaction and deleting duplicates when they are found to be inserted. Since tracker process can potentially fail before duplicates are removed, added test to make sure reports work when duplicate actions exist in the DB.

diosmosis added a commit that referenced this issue Feb 4, 2015

Fixes #6436, fix concurrency issue regarding duplicate actions by alw…
…ays using least idaction and deleting duplicates when they are found to be inserted. Since tracker process can potentially fail before duplicates are removed, added test to make sure reports work when duplicate actions exist in the DB.

@mattab mattab modified the milestones: Piwik 2.11.0, Short term Feb 5, 2015

@mattab mattab closed this in #7112 Feb 9, 2015

@voidswitch

This comment has been minimized.

voidswitch commented Feb 9, 2015

Just want to say thanks for the great work!

@mattab

This comment has been minimized.

Member

mattab commented Feb 9, 2015

Thanks @voidswitch I also want to say great work to @diosmosis and the team!

diosmosis added a commit that referenced this issue Feb 9, 2015

diosmosis added a commit that referenced this issue Feb 13, 2015

Refs #6436, #7112 fix performance regression, assume there are not du…
…plicates in DB when doing segment query foraction since group by in sub-query is slow.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment