This is a sample application demonstrating use of several technologies such as Spring Integration/Boot/WebMVC, Apache Kafka, MongoDB and selected Javascript frameworks on a "trading" solution.
The solution consists of a several modules realizing the following set of functions:
- a REST/JSON (consumer) service able to consume requests,
- a data processor handling consumer requests and saving them into a database,
- a simple front-end (monitor) web application showing received data statistics as well as selected runtime indicators.
There is a REST endpoint exposed to end-users consuming JSON requests (please refer to service description for further reference) and web user interface for business/application monitoring.
The application shows the world map with actual trading and right-column with detailed information of the trading history and most exchanged currencies. Click a region in map to show country-specific data. Additionally, there are two small charts on the bottom of the screen showing the actual CPU utilization and Kafka topic throughput (data in/out). The monitor application requires the user to authenticate:
- username admin
- password pwd
There are several components realizing these functions:
- Consumer - a web application (packaging war) exposing a simple REST service; the service receives user requests, performs validation and send the request to the Kafka topic,
- Processor - a standalone application (packaging jar) attached to a Kafka topic, saving data into the database; the application computes predefined statistics as well,
- Monitor - a web application (packaging war) realizing the user interface,
- Jolokia JMX Agent -- monitors selected attributes of Apache Kafka and exposes them in form of a HTTP service.
The main concept is based on use of Apache Kafka messaging between the consumer and the processor components. Use of messaging is generally recommended pattern for applications of this kind -- where users may generate large amount of requests that need to be processed. These requests can be quickly validated and stored in messaging topic, one or more processor applications may then connect to the same topic and handle the load.
Kafka offers significantly better performance and scalability options, see Performance tuning section below for further details.
There are several frameworks in use from the Spring suite, namely:
- core -- used for component wiring, a mix of XML and annotations is in use, where components requiring configuration are shown in XML, other components are annotated/autowired,
- integration -- used for message processing and execution management, there are extensions available for Kafka messaging providing inbound and outbound adapters,
- boot -- used by processor module to handle console parameters and application context loading,
- webmvc -- to expose REST services, used by consumer and monitor modules,
- security -- to provide authentication for the monitor module,
- data -- to access MongoDB by processor and monitor modules,
- test -- to help testing REST services, mainly.
Use of MongoDB on this example seems to be a good option because of it's great performance, sharding options and powerful aggregation framework. While the application is rather simple from the data perspective, it could easily be implemented using any traditional SQL database such as MySQL, PosgreSQL or similar. Anyway, choosing MongoDB seems to be more future-proof in this particular case.
The AngularJS framework seems to be good match for front-end because of it's REST features (resource support), easy to use and effective final code. There are two external components integrated as directives:
- jVectorMap -- showing the overall map,
- Highcharts -- showing throughput and JMX charts.
There are several REST services exposed by consumer and monitor modules.
Provided by consumer module, the service is expected to be exposed to end-users (clients) and accepts data of given structure (trade requests).
POST ~/consumer/request
There are no URL parameters. There should be a request body containing the following data in JSON form:
- userId -- user identifier, string, mandatory,
- timePlaced -- request time, firmat 'dd-MMM-yy HH:mm:ss', mandatory,
- originatingCountry -- country code, 2 letters, mandatory,
- currencyFrom -- currency code, 3 letters, mandatory,
- currencyTo -- currency code, 3 letters, mandatory,
- amountSell -- amount to sell, number, range 1-1000000, mandatory,
- amountBuy -- amount to buy, number, range 1-1000000, mandatory,
- rate -- exchange rate, number, mandatory.
Example:
{
"userId": "134256",
"currencyFrom": "EUR",
"currencyTo": "GBP",
"amountSell": 1000,
"amountBuy": 747.10,
"rate": 0.7471,
"timePlaced": "24-JAN-15 10:27:44",
"originatingCountry": "FR"
}
Returns 200 OK and a JSON containing the transaction identifier, for example:
{
"txId": "ad2d0300-d237-11e4-bd9e-0002a5d5c51b"
}
Returns 400 BAD REQUEST and a JSON structure describing the error, for example:
{
"currencyFrom": "invalid currency code"
}
Provided by monitor module, the service is expected to be exposed to internal users (admins) only. The service provides information about currently logged-in user.
GET ~/monitor/rest/user
The service requires authenticated user.
There are no URL parameters.
Returns information about name and roles of the current user, for example:
{
"name": "John Admin",
"roles": [
"ADMIN"
]
}
There are no parameters/body to validate, no errors are provided.
Provided by monitor module, the service is expected to be exposed to internal users (admins) only. The service provides basic info related to data to be displayed: current exchange statistics by country and top-five of currencies by number of transactions.
GET ~/monitor/rest/stats
The service requires authenticated user.
There are following URL parameters:
- cc -- country code, 2 letters, optional (limits response to given country if specified, worldwide otherwise).
Returns information about current exchange statistics by country (tput element) and top-five of currencies by number of transactions (xchg element), for example:
{
"tput": {
"CZ": 8575,
"GB": 9585,
"FR": 94877
},
"xchg": [{
"originatingCountry": "DE",
"currencyFrom": "USD",
"currencyTo": "CZK",
"count": 90,
"amount": 5219.8306,
"createDate": 1427411760001
}]
}
Returns 400 BAD REQUEST and a JSON structure describing the error, for example:
{
"cc": "invalid country code"
}
Provided by monitor module, the service is expected to be exposed to internal users (admins) only. The service provides detailed throughput information in form of timestamp-value pairs (to be shown in chart).
GET ~/monitor/rest/stats/tput
The service requires authenticated user.
There are following URL parameters:
- cc -- country code, 2 letters, optional, (limits response to given country if specified, worldwide otherwise).
Returns information about current exchange statistics by country (tput element) and top-five of currencies by number of transactions (xchg element), for example:
[
[1427410920001, 362],
[1427411280001, 199],
...
]
Returns 400 BAD REQUEST and a JSON structure describing the error, for example:
{
"cc": "invalid country code"
}
There are following entities stored permanently in the MongoDB.
Corresponds to incoming request and is enriched with the following attributes:
- timeCreated -- request time,
- transactionId -- transaction identifier, generated on front-end and provided to the user.
Stored in requests collection.
Stores throughput stats for given country and date in throughputStats collection.
{
"countryCode" : "CA",
"createDate" : ISODate("2015-03-26T23:02:00.001Z"),
"count" : 0.00003228222340112552
}
Stores user accounts and corresponding roles, password is SHA1 encrypted.
{
"name" : "admin",
"displayName" : "Default Admin",
"password" : "37fa265330ad83eaa879efb1e2db6380896cf639",
"createDate" : ISODate("2015-03-23T14:53:33.648Z"),
"roles" : [
"ADMIN"
]
}
There are following activities implemented.
The activity shows trade request being issued by the consumer and handled by the consumer module. The call is synchronous and all errors are propagated to the end-user in form of error response. Multi-threading is managed by the servlet container.
The activity shows trade request being processed by the processor module and stored in the database.
The activity shows computation of statistics triggered on regular basis.
The monitor web application uses several REST endpoints to access data; there are two refresh timers configured
- slow timer is refreshing throughput and currency data every minute,
- fast timer is refreshing JMX data every 2 seconds.
There were following quality attributes checked.
The request endpoint response time did not exceeded 60ms during simple JMeter tests utilizing 2 concurrent users.
There is a set of unit tests implemented with a coverage exceeding 95%.
There were no UAT tests implemented because of the character of the application. However, a decent set of Selenium tests could be implemented testing the page rendering and region click.
There are two endpoints available: the REST endpoint where end-users issue their requests and the monitor web application.
The endpoint is exposed to end-users and should be well protected in terms of access control, usage quotas, DoS protection, SSL etc. The service should be managed by an API managemnt tool, such as Apigee or similar and does not makes much sense implementing a subset of the functionality locally.
The solution is deployed to Google Compute Engine, n1-highmem-2 type (2 cores, 13GB RAM) as follows:
Operations should use JMX for monitoring, there is a good set of beans exposed on Apache Tomcat, Apache Kafka as well as on the producer module.
The user interface is not very consistent -- it is mixing both business (map, request chart) and operational views (CPU load, Kafka topic). To fix this, a standard application monitoring system (such as Zabbix or similar) can be introduced, performing detailed application and network analysis. The CPU load and Kafka topic charts could be removed from the view.
There could be more business functionality added in terms of existing data analysis, such as
- displaying more historical data -- the throughput chart (Today chart) shows only limited data range, the existing chart implementation could be replaced with a Highstock alternative, which is capable of displaying wider ranges of historical data,
- displaying more currencies -- the currency table shows only top few records, the table can be extended and there could be detailed chart showing evolution of the currency exchange in time,
- extending map -- the map component is very illustrative and eye-catching, the map can display more then just actual throughput, one or more markers can be attached to regions (countries) displaying data like currency, last transaction time etc.
The whole solution has been designed with extra focus on scalability. All components can be deployed in multiple instances and the load can be balanced:
- Consumer -- the component is deployed into servlet container, there may be multiple containers in use being controlled by the Apache proxy. All instances can post messages into the same Kafka topic.
- Kafka -- the messaging topic can be split into partitions, default implementation of Kafka client uses round-robin partition strategy; a large number of messages can be spread into multiple channels (partitions) and processed concurrently.
- Processor -- there should be as many instances of the processor component as partitions; the existing implementation (standalone application) may not be ideal for operations, some kind of cluster management such as Spring XD, Hadoop YARN or similar may provide better control. It is generally recommended to install one processor module per VM.
- MongoDB -- there may be one or more replicas created, however there may be only one cluster node used for writing - the statistics calculations performed by the processor module could be directed to a read-only replica. To do so, start the processor module with --run-stats=false* on nodes which are not supposed to perform stats calculation.
There are several fine knobs that can be used in processor module:
- There is a thread pool used to handle parallel dequeuing and writes into the database; the pool size should be kept aligned with the actual number of CPU cores to avoid excessive context switching.
- There is a max-messages parameter on kafka consumer identifying a number of messages being read from Kafka in once, keep in relation to thread pool size depending on preference - large amount of small writes or smaller amount of larger chunks of data to be written.
- There is a throughput statistics calculated every minute, this period may be tuned by setting the cron expression - shorter period may cause more often calculations, longer periods may lead to work with larger datasets. Should be tuned based on business requirements and actual data volume. A good index would help in all cases.
- There were several issues related to Kafka monitoring (JMX) service availability caused by lost connection between Zookeeper and Kafka, no further details, restart of both systems solved the issue.
- Highlighted map regions may not be very contrast, the color scheme may need update depending on production experience.