Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Country Leaderboard Availability During Refresh #1099

Merged
merged 1 commit into from
Jan 16, 2024

Conversation

ljdelight
Copy link
Contributor

The initial approach to refreshing the country leaderboard, which involved deleting all entries at once, led to data unavailability and HTTP 502 errors (due to hitting the 75s http request timeout) for users during updates. We've revised the process to sequentially delete and rebuild entries for each country and time duration. This adjustment ensures that leaderboard data remains accessible to users while updates for other countries are in progress.

The initial approach to refreshing the country leaderboard, which involved deleting all entries at once, led to data unavailability and HTTP 502 errors for users during updates. We've revised the process to sequentially delete and rebuild entries for each country and time duration. This adjustment ensures that leaderboard data remains accessible to users while updates for other countries are in progress.
Copy link

sonarcloud bot commented Jan 16, 2024

Quality Gate Passed Quality Gate passed

The SonarCloud Quality Gate passed, but some issues were introduced.

1 New issue
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@ljdelight ljdelight merged commit 0900af4 into main Jan 16, 2024
7 checks passed
@ljdelight ljdelight deleted the ljdelight/refactorRebuildCountryLeaderboard branch January 16, 2024 04:14
@ljdelight
Copy link
Contributor Author

ljdelight commented Jan 16, 2024

The updates per <country,timeDuration> tuple were slow, and I was unable to make measurable improvements to the query. The cumulative slowness per tuple took far too long for the existing scheduler, and we had to disable the country leaderboard.


For the improvements I had tested but lacked improved results:
A local dev instance can be used to check for improvements etc.

These are relevant updates:

CREATE INDEX idx_status_actions_status ON status_actions(status);
CREATE INDEX idx_status_actions_old_status ON status_actions(old_status);
CREATE INDEX idx_status_actions_status_specific ON status_actions(status) WHERE status IN (1, 2, 5, 6);

CREATE INDEX idx_users_leaderboard_opt_out_false ON users(leaderboard_opt_out) WHERE leaderboard_opt_out = FALSE;
CREATE INDEX idx_users_leaderboard_opt_out ON users(leaderboard_opt_out);

-- The status_actions references an id=-1, so it needs to exist before a FK can be created.
-- This is likely the wrong solution and would need investigated more if it were needed.
INSERT INTO users(id, osm_id, osm_created, name, oauth_token, oauth_secret, theme)
SELECT -1, -1, NOW(), 'TempUser', '', '', 0 WHERE NOT EXISTS (
    SELECT id FROM users WHERE id = -1
);

ALTER TABLE status_actions
  ADD CONSTRAINT status_actions_users_osm_id_fkey
  FOREIGN KEY (osm_user_id) REFERENCES users(osm_id);
CREATE INDEX idx_status_actions_osm_user_id ON status_actions(osm_user_id);

VACUUM ANALYZE status_actions;
VACUUM ANALYZE users;

To clean up the above changes:

ALTER TABLE status_actions DROP CONSTRAINT status_actions_users_osm_id_fkey;
drop index idx_status_actions_status;
drop index idx_status_actions_old_status;
drop index idx_status_actions_status_specific;
drop index idx_users_leaderboard_opt_out_false;
drop index idx_users_leaderboard_opt_out;
drop index idx_status_actions_osm_user_id;
delete from users where id = -1;

And a sample query (generated by the API):

INSERT INTO user_leaderboard (month_duration, country_code, user_id, user_name, user_avatar_url, user_ranking, user_score, completed_tasks, avg_time_spent)
SELECT 12,
       'FR',
       users.id,
       users.name,
       users.avatar_url,
       ROW_NUMBER() OVER(
                         ORDER BY SUM(CASE sa.status
                                          WHEN 1 THEN 5
                                          WHEN 2 THEN 3
                                          WHEN 5 THEN 3
                                          WHEN 6 THEN 1
                                          WHEN 3 THEN 0
                                          ELSE 0
                                      END) DESC, sa.osm_user_id ASC),
                    SUM(CASE sa.status
                            WHEN 1 THEN 5
                            WHEN 2 THEN 3
                            WHEN 5 THEN 3
                            WHEN 6 THEN 1
                            WHEN 3 THEN 0
                            ELSE 0
                        END) AS score,
                    COALESCE(SUM(CASE sa.status
                                     WHEN 1 THEN 1
                                     WHEN 2 THEN 1
                                     WHEN 5 THEN 1
                                     WHEN 6 THEN 1
                                     WHEN 3 THEN 0
                                     ELSE 0
                                 END), 0) AS completed_tasks,
                    COALESCE(SUM(tasks.completed_time_spent) / SUM(CASE
                                                                       WHEN tasks.completed_time_spent > 0 THEN 1
                                                                       ELSE 0
                                                                   END), 0) AS avg_time_spent
FROM status_actions sa,
     users,
     tasks
WHERE sa.created::DATE BETWEEN '2022-12-17' AND '2023-12-17'
  AND sa.old_status <> sa.status
  AND users.osm_id = sa.osm_user_id
  AND users.leaderboard_opt_out = FALSE
  AND tasks.id = sa.task_id
  AND ST_Intersects(tasks.location, ST_MakeEnvelope(-5, 42.5, 9.56, 51.15, 4326))
GROUP BY sa.osm_user_id,
         users.id
ORDER BY score DESC,
         sa.osm_user_id ASC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant