A scalable API Backend built on Google Datastore and Google App Engine
Project frontend available here (separate repo)
Demo available here:
Software | version |
---|---|
Java | 11.0 |
Maven | 3.8 |
Google Cloud | SDK 363.0.0 |
app-engine-java | 1.9.91 |
app-engine-python | 1.9.96 |
bq | 2.0.71 |
cloud-datastore-emulator | 2.1.0 |
core | 2021.10.29 |
gsutil | 5.4 |
Add the following permissions to a custom service account you created + your App Engine service account:
- Service Account Token Creator
- App Engine Admin
- Storage Admin
- Storage Object Admin
- Storage Object Creator
- Storage Object Viewer
- Storage Transfer Viewer
Once the account has been created, create a keypair and download it as a JSON file placed at the root of this folder
Configure the OAuth2 consent screen, add your e-mail as a test user. Then create a pair of Oauth2 credentials with the following URIs:
- https://localhost
- http://localhost
- https://localhost:3000
- http://localhost:3000
- https://front-dot-{{YOUR_PROJECT_ID}}.oa.r.appspot.com
- http://front-dot-{{YOUR_PROJECT_ID}}.oa.r.appspot.com
Add your Client ID in the following places:
openapi.yaml
asx-google-audience
src/main/java/com/tinyinsta/common/Constants.java
asWEB_CLIENT_ID
printf '[
{
"maxAgeSeconds": 60,
"method": [
"GET",
"HEAD",
"DELETE",
"POST",
"PUT"
],
"origin": [
"http://localhost:3000",
"http://localhost",
"https://localhost:3000",
"https://localhost",
"localhost",
"localhost:3000",
"http://front-dot-{{YOUR_PROJECT_ID}}.oa.r.appspot.com",
"https://front-dot-{{YOUR_PROJECT_ID}}.oa.r.appspot.com"
],
"responseHeader": [
"Content-Type",
"Access-Control-Allow-Origin",
"x-goog-resumable"
]
}
]' > cors.json
gsutil cors set cors.json gs://your_bucket_id
git clone mdolr:instars-back
# Windows:
set GOOGLE_APPLICATION_CREDENTIALS=/ABSOLUTE/PATH/TO/name_of_your_service_account_priv_key.json
# Mac / Linux
export GOOGLE_APPLICATION_CREDENTIALS=/ABSOLUTE/PATH/TO/name_of_your_service_account_priv_key.json
mvn clean appengine:run # starts the server on port 8080
-
Web server is accessible at localhost:8080
-
API Explorer is situated at localhost:8080/_ah/api/explorer
-
Datastore emulator is situated at localhost:8080/_ah/admin
mvn clean appengine:deploy
mvn endpoints-framework:openApiDocs
gcloud endpoints services deploy target/openapi-docs/openapi.json
- Requests should be responsive (less than 500ms accounting network latency)
- Queries complexity should be proportional to the size of the results (only query what you need / no scans)
- The app should scale and support contention and concurrent requests as much as possible
Most of the properties on the kinds are self-explanatory. The only one that might be a bit complex and requires some explanation is "batchIndex".
We're using the batchIndex to optimize our calculations of likes and followers, this index lets us keep track of which batch of UserFollower or PostLiker is full or not. Based on this and knowing the size of our batches we can easily determine the number of followers / likes of the respective entities without having to iterate over every batch.
We can also quickly get a random batch which is not full to add new likes / new followers.
We can create new batches dynamically based on contention while always keep a minimum of a given number of "available batches" (=non-full batches).
The most natural choice for user keys was to use the Google Account User ID which is given to us by Google App Engine when an Authorization token is included in the headers
Because of the datastore limitations we can't use a sort method on our queries so we need to rely on the datastore sorting entities by ascending keys by default.
Another problem is that we need to deal with contention, so we need to split posts in different buckets.
We achieved that by formatting the post's key as {{BUCKET_NUMBER}}-{{SUBSTRACTED_TIMESTAMP}}
where :
- BUCKET_NUMBER is a random integer between 0 and the number of buckets we want
- SUBSTRACTED_TIMESTAMP is the maximum long supported in Java minus the current timestamp
By choosing such a format for our keys the datastore auto-sorts the posts in a descending order which is handy to build our timeline
The key is built of an ancestor key set to the parent Post, and the default random Google key for the post receiver entity
The keys for both of these entities are built in the same way which is {{BUCKET_NUMBER}}-{{(USER or POST)_ID}}
By default we create BUCKET_COUNT (e.g 5) entities whenever a post or user is created in the Datastore. And then as the different batches fill we create more buckets in order to always keep at least non-filled BUCKETS_COUNT batches.
Because Google App Engine has a timeout on requests, and the upload can take a long time depending on the user's connection, we have decided to shift the upload process directly to Google Cloud Storage without processing the image on our backend
We have made use of Google recommendations on setting up uploads via signed URL. The upload follows the following flow:
Diagram generated with diagram.codes
The goal here was to measure the throughput our system could handle on a single post. The likes per second load-testing was conducted using Siege with the following settings:
- Time duration: 30 seconds
- Concurrent users: 200 users
We have conducted the tests with varying batch sizes, the "Hit rate" describes the percentage of requests that did not receive a 5XX HTTP status code from the server.
MAX_BATCH_SIZE | Average likes per second | Hit rate |
---|---|---|
39000 | ~45 likes/s | 100% |
50 | 80-95 likes/s | 98% |
25 | ~45 likes/s | 96% |
The hit rate is under 100% with smaller batch sizes as the batch size limit is reached more often which leads to queries updating 2 entity groups at once happening more often. We consider it a relatively small and acceptable error rate but it means there are errors nonetheless.
As we've re-tested later to make sure our API still worked we have observed some cases where the hit rate would drop below 80% even with a 39000 MAX_BATCH_SIZE
this could be improved by raising the minimum number of available buckets, the average likes per second stayed the same though. Because we create more batch when reaching a full one with the MAX_BATCH_SIZE
set to 50 we see better hit rate results.
So in worst case scenario we could get results that look like this:
MAX_BATCH_SIZE | Average likes per second | Hit rate |
---|---|---|
39000 | ~50 likes/s | 80% |
50 | 80-95 likes/s | 84% |
25 | Not tested | Not tested |
We tried to see how our timeline generation system scaled with an increasing pagination size, of course the sizes tested here would never be used in production as it makes no sense to load that much posts at once from an user experience standpoint.
As expected our app doesn't scale with an increasing pagination size, this is not surprising given the architectural choices that we made
In production we will probably want to use a pagination size ranging from 5 to 10 posts depending on how fast the average user scrolls
- Tests were ran locally and on the deployed appengine.
- The number of followers varried between (10, 100 and 500 followers)
- Results are based on an average of 30 requets.
- The average total time fluctuates around 330ms
- The results aren't affected by the increase of the number of Post_a_picture_performance_per_number_of_Followers_local
- The average total time fluctuates around 500ms
- The "Post" query has become significantly more time consuming
- Don't compute the number of likes and followers when a new likes / follower is added and only increment by 1 on the client side, the exact number of likes and followers will be returned once the user reloads the page
- Raise the number of minimum available buckets and have a smarter way to create a lot of buckets in case of contention
@mdolr | Maxime DOLORES |
@jjbes | Julien AUBERT |
@NabilOulbaz | Nabil OULBAZ |
@RobinPourtaud | Robin POURTAUD |