Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hide removed eprints from stats #59

Open
sebastfr opened this issue Jul 16, 2014 · 0 comments
Open

Hide removed eprints from stats #59

sebastfr opened this issue Jul 16, 2014 · 0 comments

Comments

@sebastfr
Copy link
Contributor

Removed eprints may still appear on stats reports (usually as "unknown eprint '12345'") because IRStats2 doesn't keep a table of "active" eprints.

Solution 1

Have an "eprint set" table and JOIN any data tables (irstats2_downloads, irstats2_views etc.) on each query. Given the size of the download table, and potentially the size of the table of "active eprints", this would slow queries considerably. Views like "Top Authors" would need to perform two JOIN (on the largest tables) so that's really not an ideal solution.

Solution 2

Filter items on output, as data is extracted from the DB. Tricky also as this kills the use of SQL "LIMIT" and limits would have to be computed on-the-fly. Why? Cos a Top 10 authors perform such a LIMIT.

Solution 3

Add an extra field on the data tables "active" to flag if an eprint is "active" or not. This should be quicker than solution 1 since that field would be indexed and is a simple WHERE condition. This would however require to update all the data tables every day to mark active items (or non-active items).

Anything else we could do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant