Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent DoesNotExist failure on activity API (fix #2227) #2268

Merged

Conversation

@noirbizarre
Copy link
Member

commented Jul 31, 2019

This PR prevent the DoesNotExist exception from breaking the Activity API.

Under the hood, the problem occurs when some data is manually deleted without a proper purge. It would need a data migration to cleanup the existing data.

NB: tested but not included in this PR: to reduce the size of the activity collection it's possible to use TTL index and let mongo do its job in the background (possibly configurable with a ACTIVITY_RETENTION_DAYS settings). See for details;

@noirbizarre noirbizarre requested a review from opendatateam/etalab Jul 31, 2019
@noirbizarre noirbizarre force-pushed the noirbizarre:gh2227-activity-missing-object branch from 70554af to 0b93ef7 Jul 31, 2019
@noirbizarre noirbizarre changed the title Prevent DoesNotExist failure on acitivity API (fix #2227) Prevent DoesNotExist failure on activity API (fix #2227) Aug 1, 2019
@noirbizarre noirbizarre force-pushed the noirbizarre:gh2227-activity-missing-object branch from 0b93ef7 to 4445f59 Aug 6, 2019
@noirbizarre noirbizarre force-pushed the noirbizarre:gh2227-activity-missing-object branch from 4445f59 to fbc3214 Aug 8, 2019
@abulte
abulte approved these changes Aug 8, 2019
Copy link
Member

left a comment

Have you done a quick performance assessment? From what I understand, we're switching from a QuerySet to a list and this can be less efficient.

item.related_to
except DoesNotExist as e:
log.error(e, exc_info=True)
else:

This comment has been minimized.

Copy link
@abulte

abulte Aug 8, 2019

Member

❤️

This comment has been minimized.

Copy link
@noirbizarre

noirbizarre Aug 8, 2019

Author Member

I remembered 😉

@noirbizarre

This comment has been minimized.

Copy link
Member Author

commented Aug 8, 2019

Yep, done on my environment, might need some proper testing on preprod.
Reading the flask-mongoengine paginate code, it was already casted as a list so it shouldn't reduce perfs.

@noirbizarre noirbizarre merged commit 52dbe74 into opendatateam:master Aug 8, 2019
3 checks passed
3 checks passed
ci/circleci: assets Your tests passed on CircleCI!
Details
ci/circleci: dist Your tests passed on CircleCI!
Details
ci/circleci: python Your tests passed on CircleCI!
Details
@noirbizarre noirbizarre deleted the noirbizarre:gh2227-activity-missing-object branch Aug 8, 2019
@abulte

This comment has been minimized.

Copy link
Member

commented Aug 8, 2019

OK. For reference, it's not glorious as it is:

$ hey 'https://next.data.gouv.fr/api/1/activity?page=1&page_size=20'

Summary:
  Total:	80.0114 secs
  Slowest:	18.5939 secs
  Fastest:	2.9409 secs
  Average:	10.4998 secs
  Requests/sec:	2.4996


Response time histogram:
  2.941 [1]	|■■■■■■■■
  4.506 [4]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  6.072 [5]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  7.637 [0]	|
  9.202 [5]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  10.767 [5]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  12.333 [0]	|
  13.898 [5]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  15.463 [3]	|■■■■■■■■■■■■■■■■■■■■■■■■
  17.029 [2]	|■■■■■■■■■■■■■■■■
  18.594 [5]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■


Latency distribution:
  10% in 3.0890 secs
  25% in 5.6635 secs
  50% in 10.6053 secs
  75% in 15.2525 secs
  90% in 17.9233 secs
  95% in 18.5939 secs
  0% in 0.0000 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.4212 secs, 2.9409 secs, 18.5939 secs
  DNS-lookup:	0.0726 secs, 0.0709 secs, 0.0741 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0006 secs
  resp wait:	10.0764 secs, 2.5265 secs, 18.1567 secs
  resp read:	0.0019 secs, 0.0004 secs, 0.0131 secs

Status code distribution:
  [200]	35 responses

Error distribution:
  [165]	Get https://next.data.gouv.fr/api/1/activity?page=1&page_size=20: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
@noirbizarre

This comment has been minimized.

Copy link
Member Author

commented Aug 8, 2019

Yes, activity has not evolved since the first release when there was less than 1000 users.
The algorythm (dynamic fetching of activity items) can't stand as much data.
It was meant as a working base for a fanout model: one activity feed by user with static data as soon as activity happen.
The fanout model (which is well documented by orgs like Twitter, Facebook...) has many advantages beside performances:

  • resilient against activity subject deletion (no foreign key, everything is stored into the user timeline)
  • can handle read/unread status by user
  • can easily handle private activity
  • can be totaly extracted into its own service

but (because there is one), it requires a lot of storage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.