Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent DoesNotExist failure on activity API (fix #2227) #2268

Merged
merged 1 commit into from Aug 8, 2019
Merged

Prevent DoesNotExist failure on activity API (fix #2227) #2268

merged 1 commit into from Aug 8, 2019

Conversation

noirbizarre
Copy link
Contributor

@noirbizarre noirbizarre commented Jul 31, 2019

This PR prevent the DoesNotExist exception from breaking the Activity API.

Under the hood, the problem occurs when some data is manually deleted without a proper purge. It would need a data migration to cleanup the existing data.

NB: tested but not included in this PR: to reduce the size of the activity collection it's possible to use TTL index and let mongo do its job in the background (possibly configurable with a ACTIVITY_RETENTION_DAYS settings). See for details;

@noirbizarre noirbizarre requested a review from a team July 31, 2019 13:18
@noirbizarre noirbizarre changed the title Prevent DoesNotExist failure on acitivity API (fix #2227) Prevent DoesNotExist failure on activity API (fix #2227) Aug 1, 2019
Copy link
Contributor

@abulte abulte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you done a quick performance assessment? From what I understand, we're switching from a QuerySet to a list and this can be less efficient.

item.related_to
except DoesNotExist as e:
log.error(e, exc_info=True)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remembered 😉

@noirbizarre
Copy link
Contributor Author

Yep, done on my environment, might need some proper testing on preprod.
Reading the flask-mongoengine paginate code, it was already casted as a list so it shouldn't reduce perfs.

@noirbizarre noirbizarre merged commit 52dbe74 into opendatateam:master Aug 8, 2019
@noirbizarre noirbizarre deleted the gh2227-activity-missing-object branch August 8, 2019 15:23
@abulte
Copy link
Contributor

abulte commented Aug 8, 2019

OK. For reference, it's not glorious as it is:

$ hey 'https://next.data.gouv.fr/api/1/activity?page=1&page_size=20'

Summary:
  Total:	80.0114 secs
  Slowest:	18.5939 secs
  Fastest:	2.9409 secs
  Average:	10.4998 secs
  Requests/sec:	2.4996


Response time histogram:
  2.941 [1]	|■■■■■■■■
  4.506 [4]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  6.072 [5]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  7.637 [0]	|
  9.202 [5]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  10.767 [5]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  12.333 [0]	|
  13.898 [5]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  15.463 [3]	|■■■■■■■■■■■■■■■■■■■■■■■■
  17.029 [2]	|■■■■■■■■■■■■■■■■
  18.594 [5]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■


Latency distribution:
  10% in 3.0890 secs
  25% in 5.6635 secs
  50% in 10.6053 secs
  75% in 15.2525 secs
  90% in 17.9233 secs
  95% in 18.5939 secs
  0% in 0.0000 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.4212 secs, 2.9409 secs, 18.5939 secs
  DNS-lookup:	0.0726 secs, 0.0709 secs, 0.0741 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0006 secs
  resp wait:	10.0764 secs, 2.5265 secs, 18.1567 secs
  resp read:	0.0019 secs, 0.0004 secs, 0.0131 secs

Status code distribution:
  [200]	35 responses

Error distribution:
  [165]	Get https://next.data.gouv.fr/api/1/activity?page=1&page_size=20: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

@noirbizarre
Copy link
Contributor Author

noirbizarre commented Aug 8, 2019

Yes, activity has not evolved since the first release when there was less than 1000 users.
The algorythm (dynamic fetching of activity items) can't stand as much data.
It was meant as a working base for a fanout model: one activity feed by user with static data as soon as activity happen.
The fanout model (which is well documented by orgs like Twitter, Facebook...) has many advantages beside performances:

  • resilient against activity subject deletion (no foreign key, everything is stored into the user timeline)
  • can handle read/unread status by user
  • can easily handle private activity
  • can be totaly extracted into its own service

but (because there is one), it requires a lot of storage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants