This repository contains a set of services and cloud functions necessary in order to send push notifications to the guardian native apps on both iOS and Android.
It has many uses:
- Breaking news sent by our editorial team
- Follow notifications automatically sent when a new article is published
- Football match status alerts when a goal is scored or the status of a game changes
- Technical notification to trigger background download on editions based apps (aka daily edition)
- One off events such as election results
- Api Models - All the models sent and received by the notification API
- Notification - A service to send or schedule notifications to devices
- Registration - A service to register devices to topics
- Notification Worker Lambda - A set of workers that will spin up and process each individual notification by fetching tokens in the database and sending them to APNs or FCM.
- Report - A service to read the state of each notification. Drives an Ophan Dashboard Page
- Event Consumer - A lambda that consumes App sent metrics (fastly -> s3 -> lambda) to enrich reporting. This allows us to measure how many devices received a notification, and when.
- Schedule Lambda - A lambda sending notifications on schedule, based on a plan inserted in dynamo. (polling based)
- Expired Registration Cleaner - A lambda that deletes tokens that haven't been active in 300 days.
- Fake Breaking News - A lambda that periodically sends a fake ("dry run") breaking news in order to spot any potential misconfiguration, technical issue or regression. The results of the dry run are closely monitored and raise an alarm if anything goes wrong.
- Football - A lambda that polls PA and sends football match alerts to Notification
- Report Extractor - A daily lambda to export the metadata of each notification into our datalake.
- Send push notifications to devices in timely manner (~ 3 minutes to reception on device)
- Monitoring of notifications
- Logging for diagnostics and status of each notification
We have Service Level Objectives (SLOs) in place for both the registration and notification services. Please refer to this document for a full description.
There are several data visualizations and dashboards available in Grafana related to these SLOs. You can access them through the Mobile Notifications SLO Hub, which serves as the jumping-off point to explore them.
This service receives requests to send notifications and plans the work for the workers. It stores the status of the notification in the report database, counts how many devices should receive the notification, and split the work accordingly for the harvester to fetch each token chunks.
The chunk size is around 10,000: Not too big such that a complete failure would be a disaster if left undelivered, not too small such that the harvester can efficiently use its running time (ratio cold-start-time / work-time kept as low as possible).
This service receives registrations from the devices. Upon receiving a registration the service completely replace any record present in the DB for that token. The registration service also checks what topic are invalid and removes any invalid or out of date topic before inserting them in the DB. The response sent back to the client contains the filtered list such that the client can update the topics on their storage. Out of date topics include live blogs that aren't live anymore or finished football games.
As a side note, each app is programmed such that it will re-register every two weeks even if no subscription has changed. This would allow us to recover our registration database in roughly 2 weeks in case of complete failure of our system. It has also allowed us to migrate our users from one backend to another over the period of two weeks while we keep an eye on relevant metrics.
This service is pretty simple and exposes the status of all notifications that were sent through our system. It shows information such as ID, title, author, number of devices that received the notification etc.
The responsibility of the harvester is to determine which device should receive a notification. This requires a database access. The harvester takes ranges as an input and fetch that token range from the database. Each range represents up to about 10,000 tokens. Each token is then individually sorted depending on what platform it targets. The platform is what's used to decide what SQS queue to put the message on. Before being sent to the SQS queue, tokens are grouped by packets of 1,000.
As an implementation note, results are streamed as they come back from the database allowing us to start sending tokens to the workers before the request is completed.
There are two type of senders so far: iOS and Android. Each of these is also configured twice, once for the live app and once for the edition app. That makes a total of 4 lambdas. The goal of these senders if to prepare the payload as the mobile app expects it, and send the payload to the notification provider (APNs, FCM) as quickly as possible.
No database is accessed at this stage. Both providers returns information about individual tokens, such as if a token isn't valid anymore. Invalid tokens are queued for deletion and sent to the Registration Cleaning worker.
This worker pick tokens that have been marked for deletion from an SQS queue, and deletes them from the database
This worker counts how many devices there are subscribed to each topics. This is then stored as a flat file on S3. The counts are useful to help the notification service determine how much work there is to send a notification, and split the work accordingly for the harvester. This is run periodically and only count topics above 1000 devices.
The notification providers actually deliver the notification to the device
- Apple - APNS: Apple Push Notification Service
- Android - FCM: Firebase Cloud Messenger (ex GCM)
Part of Notification Worker Lambda(s). Retrieves the topics which have more than 1000 subscribers and stores them to s3. Can be run locally by creating the placing the creating the following file in .gu/notification-worker.conf
db {
url="jdbc:postgresql://localhost:5432/registrationsCODE?currentSchema=registrations"
user="worker_user"
password="<CODE DB PASSWORD"
maxConnectionPoolSize=1
}
}
topicCounts {
bucket="mobile-notifications-topics"
fileName="counts.json"
}
Tunnel to the CODE notifications database(SeeMobile Platform)) Run sbt
set project notificationworkerlambda
and run
runs the lambda locally.
The notification worker lambdas can be run locally.
Export the desired Platform
env var in your terminal (e.g. export Platform=android
). Without the Platform env var an exception is thrown during instantiation.
When running locally our identity resolves to a DevIdentity. This means that we attempt to load config locally instead of from ssm. Ensure that you have a ~/.gu/notification-worker.conf file with the following properties:
{
cleaningSqsUrl = "url"
dryrun = false
fcm {
debug = true
serviceAccountKey = """key"""
threadPoolSize = x
}
}
NB: note the """
enclosing the serviceAccountKey, which are needed to correctly parse this string.
After starting sbt
, inside the notificationworkerlambda
project we can execute:
run NotificationWorkerLocalRun android
The last argument in the command is the name of the worker lambda you want to run locally, eg android
, ios
.
To control the tokens we send from the local lambda we can modify the NotificationWorkerLocalRun.scala file:
val tokens = ChunkedTokens(
notification = notification,
range = ShardRange(0, 1),
tokens = List("<your_token _1>", "your_token_2", ...)
)
Here are some instructions for how to test news notifications on CODE. As this is a public repository, that document needs to be kept private.