Instars

A scalable API Backend built on Google Datastore and Google App Engine

Project frontend available here (separate repo)

Demo available here:

Installation

Setup

Prerequisites

Software	version
Java	11.0
Maven	3.8
Google Cloud	SDK 363.0.0
app-engine-java	1.9.91
app-engine-python	1.9.96
bq	2.0.71
cloud-datastore-emulator	2.1.0
core	2021.10.29
gsutil	5.4

Service account

Add the following permissions to a custom service account you created + your App Engine service account:

Service Account Token Creator
App Engine Admin
Storage Admin
Storage Object Admin
Storage Object Creator
Storage Object Viewer
Storage Transfer Viewer

Once the account has been created, create a keypair and download it as a JSON file placed at the root of this folder

OAuth2

Configure the OAuth2 consent screen, add your e-mail as a test user. Then create a pair of Oauth2 credentials with the following URIs:

Add your Client ID in the following places:

openapi.yaml as x-google-audience
src/main/java/com/tinyinsta/common/Constants.java as WEB_CLIENT_ID

Cloud storage Bucket CORS Configuration

printf '[
  {
    "maxAgeSeconds": 60,
    "method": [
      "GET",
      "HEAD",
      "DELETE",
      "POST",
      "PUT"
    ],
    "origin": [
      "http://localhost:3000",
      "http://localhost",
      "https://localhost:3000",
      "https://localhost",
      "localhost",
      "localhost:3000",
      "http://front-dot-{{YOUR_PROJECT_ID}}.oa.r.appspot.com",
      "https://front-dot-{{YOUR_PROJECT_ID}}.oa.r.appspot.com"
    ],
    "responseHeader": [
      "Content-Type",
      "Access-Control-Allow-Origin",
      "x-goog-resumable"
    ]
  }
]' > cors.json

gsutil cors set cors.json gs://your_bucket_id

Download

git clone mdolr:instars-back

Run locally

Set environment credentials

# Windows:
set GOOGLE_APPLICATION_CREDENTIALS=/ABSOLUTE/PATH/TO/name_of_your_service_account_priv_key.json

# Mac / Linux
export GOOGLE_APPLICATION_CREDENTIALS=/ABSOLUTE/PATH/TO/name_of_your_service_account_priv_key.json

Start the development server

mvn clean appengine:run # starts the server on port 8080

Web server is accessible at localhost:8080
API Explorer is situated at localhost:8080/_ah/api/explorer
Datastore emulator is situated at localhost:8080/_ah/admin

Deploy on Google App Engine

Build and deploy the build to Google App Engine

mvn clean appengine:deploy

Generate the OpenAPI file to generate the Google Endpoints portal

mvn endpoints-framework:openApiDocs
gcloud endpoints services deploy target/openapi-docs/openapi.json

Our approach to scaling an instagram clone with Google App Engine & Google Datastore

Objectives

Requests should be responsive (less than 500ms accounting network latency)
Queries complexity should be proportional to the size of the results (only query what you need / no scans)
The app should scale and support contention and concurrent requests as much as possible

Kinds

Most of the properties on the kinds are self-explanatory. The only one that might be a bit complex and requires some explanation is "batchIndex".

We're using the batchIndex to optimize our calculations of likes and followers, this index lets us keep track of which batch of UserFollower or PostLiker is full or not. Based on this and knowing the size of our batches we can easily determine the number of followers / likes of the respective entities without having to iterate over every batch.

We can also quickly get a random batch which is not full to add new likes / new followers.

We can create new batches dynamically based on contention while always keep a minimum of a given number of "available batches" (=non-full batches).

Post

PostLiker

PostReceiver

User

UserFollower

Keys definition

User keys

The most natural choice for user keys was to use the Google Account User ID which is given to us by Google App Engine when an Authorization token is included in the headers

Post keys

Because of the datastore limitations we can't use a sort method on our queries so we need to rely on the datastore sorting entities by ascending keys by default.

Another problem is that we need to deal with contention, so we need to split posts in different buckets.

We achieved that by formatting the post's key as {{BUCKET_NUMBER}}-{{SUBSTRACTED_TIMESTAMP}} where :

BUCKET_NUMBER is a random integer between 0 and the number of buckets we want
SUBSTRACTED_TIMESTAMP is the maximum long supported in Java minus the current timestamp

By choosing such a format for our keys the datastore auto-sorts the posts in a descending order which is handy to build our timeline

Post receivers

The key is built of an ancestor key set to the parent Post, and the default random Google key for the post receiver entity

Post likers & User Followers

The keys for both of these entities are built in the same way which is {{BUCKET_NUMBER}}-{{(USER or POST)_ID}}

By default we create BUCKET_COUNT (e.g 5) entities whenever a post or user is created in the Datastore. And then as the different batches fill we create more buckets in order to always keep at least non-filled BUCKETS_COUNT batches.

Cloud storage upload

Because Google App Engine has a timeout on requests, and the upload can take a long time depending on the user's connection, we have decided to shift the upload process directly to Google Cloud Storage without processing the image on our backend

We have made use of Google recommendations on setting up uploads via signed URL. The upload follows the following flow:

Diagram generated with diagram.codes

Load-testing results

Likes per seconds on a single post

The goal here was to measure the throughput our system could handle on a single post. The likes per second load-testing was conducted using Siege with the following settings:

Time duration: 30 seconds
Concurrent users: 200 users

We have conducted the tests with varying batch sizes, the "Hit rate" describes the percentage of requests that did not receive a 5XX HTTP status code from the server.

MAX_BATCH_SIZE	Average likes per second	Hit rate
39000	~45 likes/s	100%
50	80-95 likes/s	98%
25	~45 likes/s	96%

The hit rate is under 100% with smaller batch sizes as the batch size limit is reached more often which leads to queries updating 2 entity groups at once happening more often. We consider it a relatively small and acceptable error rate but it means there are errors nonetheless.

As we've re-tested later to make sure our API still worked we have observed some cases where the hit rate would drop below 80% even with a 39000 MAX_BATCH_SIZE this could be improved by raising the minimum number of available buckets, the average likes per second stayed the same though. Because we create more batch when reaching a full one with the MAX_BATCH_SIZE set to 50 we see better hit rate results.

So in worst case scenario we could get results that look like this:

MAX_BATCH_SIZE	Average likes per second	Hit rate
39000	~50 likes/s	80%
50	80-95 likes/s	84%
25	Not tested	Not tested

Timeline with varying pagination size

We tried to see how our timeline generation system scaled with an increasing pagination size, of course the sizes tested here would never be used in production as it makes no sense to load that much posts at once from an user experience standpoint.

Results on local server

Results on the deployed appengine

As expected our app doesn't scale with an increasing pagination size, this is not surprising given the architectural choices that we made

In production we will probably want to use a pagination size ranging from 5 to 10 posts depending on how fast the average user scrolls

Post a picture varying followers count

Tests were ran locally and on the deployed appengine.
The number of followers varried between (10, 100 and 500 followers)
Results are based on an average of 30 requets.

Results on local server

The average total time fluctuates around 330ms
The results aren't affected by the increase of the number of Post_a_picture_performance_per_number_of_Followers_local

Results on the deployed appengine

The average total time fluctuates around 500ms
The "Post" query has become significantly more time consuming

Possible improvements

Don't compute the number of likes and followers when a new likes / follower is added and only increment by 1 on the client side, the exact number of likes and followers will be returned once the user reloads the page
Raise the number of minimum available buckets and have a smarter way to create a lot of buckets in case of contention

Contributors


@mdolr	Maxime DOLORES
@jjbes	Julien AUBERT
@NabilOulbaz	Nabil OULBAZ
@RobinPourtaud	Robin POURTAUD

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
src/main		src/main
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
openapi.yaml		openapi.yaml
pom.xml		pom.xml

License

mdolr/instars-back

Folders and files

Latest commit

History

Repository files navigation

Instars

Installation

Setup

Prerequisites

Service account

OAuth2

Cloud storage Bucket CORS Configuration

Download

Run locally

Set environment credentials

Start the development server

Deploy on Google App Engine

Build and deploy the build to Google App Engine

Generate the OpenAPI file to generate the Google Endpoints portal

Our approach to scaling an instagram clone with Google App Engine & Google Datastore

Objectives

Kinds

Post

PostLiker

PostReceiver

PostReceiver

User

UserFollower

Keys definition

User keys

Post keys

Post receivers

Post likers & User Followers

Cloud storage upload

Load-testing results

Likes per seconds on a single post

Timeline with varying pagination size

Results on local server

Results on the deployed appengine

Post a picture varying followers count

Results on local server

Results on the deployed appengine

Possible improvements

Contributors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages