Skip to content
This repository has been archived by the owner on Jul 14, 2024. It is now read-only.

Deployment architecture

Marc Dutoo edited this page May 2, 2014 · 11 revisions

The following requirements will be completed step by step as the project goes forward.

Datacore deployment architecture components

The Datacore platform's own deployment architecture is a typical Java over MongoDB one. Its atoms are Virtual Machine (VM) nodes of two different types : Java nodes, and MongoDB nodes. This is because that though software is computing-bound while database is IO-bound, both do consume RAM, especially the latter, therefore software has to be on different VMs than data. This way, within the Datacore platform, MongoDB and Java nodes will be able to scale up separately (see below).

The overall OASIS architecture, as already said, is Web-oriented, meaning decoupled through REST APIs. Here's Datacore's place in it. Having an overall Web-oriented architecture means that actual individual OASIS endpoints (Datacore ones especially, but Kernel ones as well) will have to be used by clients (service providers but also OASIS Portal) as much as possible instead of using a single generic one, in order to foster client-side caching. Besides that, DNS routing may be used to provide a single entry point, typically to support Datacore URIs. Note that if it is not enough to provide a single URL for all Java nodes serving the same MongoDB shard with High Availability, we will use HA Proxy in front.

Datacore MongoDB deployment architecture

At the bottom layer, Datacore's MongoDB storage is replicated for resilience and high availability, and sharded to allow to scale writes. More precisely, overall MongoDB architecture is made of several MongoDB shards, each serving different data, and each being a replicated MongoDB cluster, that is a set of MongoDB VM nodes serving the same data with one of them being the primary one.

Application servers are stateless (or as if, by using web caches), and they must be clustered to provide high availability. Using the Java MongoDB driver (or its Spring wrapping layer) and mongos unix services make the link between both transparent. Indeed, MongoDB provides optimal and transparent routing from its client to where it actually stores data, through its mongos routing process, meaning that each Datacore application server should have a mongos process.

Note that scaling up should only done by sharding and not by adding replica ("In general, do not use secondary and secondaryPreferred to provide extra capacity", see http://docs.mongodb.org/manual/core/read-preference/), otherwise losing ANY replica could compromise global availability, in addition to risking providing slightly out-of-date data to replica users. An exception is the case of long lived (ex. analytics, backup) queries, which may have their own Java nodes using an appropriately custom configured MongoDB driver (secondary ReadPreference) behind a dedicated public load-balanced IP.

Finally, where data is actually stored, i.e. in which shard, depends on the sharding strategy (i.e. the shard key). Sharding is done per MongoDB Collection (i.e. Datacore Model) and depends on actual business use of data, and therefore is usually different for each Collection, and requires good monitoring tools to understand how to best shard each of them, with one particular generic strategy being storing data near code nodes that uses it i.e. used by close users to read said data, assuming OASIS use is read-heavy.

Datacore high availability and backup

As hinted at in Datacore architecture, high availability is achieved :

  • at data level, by live replication and sharding of data.
  • at web service level, by having application server nodes behind a single load-balanced IP. NB. putting in front of each shard its own application server cluster provides nothing more, save possibly to help geolocalized sharding (see elsewhere).

TODO Q which load balancing : IP or cookie ?

Backup is achieved by live replication of data. Therefore being able to replace faulty data VMs in a timely manner is paramount. IT Governance will design an appropriate answer, which includes SLA and KPIs such as Time to repair, but possibly also requiring at least 2 additional replica data VM rather than a single one in addition to the master one.

However, some backup is required to get a copy of all or some data to feed other environments with meaningful business data. Indeed, especially with a generic business (meta)model such as Datacore's, the only meaningful enough data is the production data (or at least the end user staging environment data, but this one should also be fed from production). Whether it is done by snapshotting at VM level (which MongoDB allows) and copying all production VMs to a new environment, or by database commands (ex. MongoDB slave mechanism, rather than mongodump), or application utilities that ease export (because those procedures require additional operations, such as temporarily disabling MongoDB's own load balancer ; useful ones are Open Source, ex. Wordnik's), or several of those methods at once, will be studied by infrastucture and application providers together. The chosen design will then be put in place, documented and up to some point automated as well.

Other maintenance processes (MongoDB compaction...) will be documented and, up to some point to be defined in pilot phase, automated as well. Pieces of advices and operational best practices that will have to be followed have already been gathered.

Datacore scalability options

Scaling up is achieved first vertically, by adding RAM (ex. from 8 to 36go or even 128go...) so that more data can be mapped in memory, or putting MongoDB Journal on its own physical volume (if possible SSD ex. 2x64go RAID-1 which are very efficient but costly), or CPU if monitoring shows that it is the limiting factor, but not beyond what infrastructure and cost efficiency allow. Vertical scalability is easy and inevitable, but limited and obviously not enough for OASIS goals.

Scaling up is achieved secondly horizontally, by adding new VMs. Java VM nodes can be added transparently to handle more web HTTP requests. MongoDB nodes can be added to either :

  • scale up reads is achieved by adding a new read-only data VM to the MongoDB cluster
  • scale up writes is achieved by adding a full shard, which comprises at least 2 VMs to the MongoDB cluster. Here MongoDB's full power becomes apparent, by making this option quite easy thanks to it automating most of the tasks.

The said 3 processes of adding VMs (Java VM, read-only MongoDB VM, MongoDB shard VMs) must be documented and industrialized up to a good enough level of automation by their application providers to be efficient and not error prone. Ultimately further industrialization at the infrastructure level (up to being able to add such VMs by merely clicking on a button, by ex. using Puppet and / or dedicated APIs...) will be studied based on project results.

Monitoring-driven scalability

Scaling up can't be automated out of the box, because it is not known ex. which type will encounter the most load, or even which technical or business KPI(s) will best tell when which kind of scaling up is required. This will be learned step by step along the project's deployment and increasing use, thanks to knowledge of infrastructure and application behaviour by operators and resp. developers. But the best information must be easily and timely available to infrastructure operators, to help them make the right decision about when and how to scale up.

This is why monitoring is paramount to achieve elasticity : because it crucially allows to know when OASIS has to be scaled up. Indeed, the key to Datacore scalability is to have all information allowing to decide when (anticipate) and how (ex. add a Java node rather than a Mongo node) to scale up.

As a side note, let's say that automated scalability (such a script reacting on such alert and automatically adding a VM) would be possible, but won’t work all the time. Indeed, it can’t handle all scalability problems, at least not those who are rooted in business-level causes, such as one application having a runaway success. That's why OASIS won't seek to fully automate scalability for its own sake.

Monitored Key Performance Indicators (KPIs)

Monitored Key Performance Indicators (KPIs) break up in two categories :

  • IT Governance (as defined in D3.4 deliverable) is expected to design the monitoring architecture to gather the technical infrastructure-related KPIs among those listed in D5.1b.
  • Application-level KPIs (such as number of users, number of piece of Data of each type and overall average, biggest piece of Data & type & user...) must be provided by application providers, that is Open Wide (Datacore) and Atol (Kernel Service, Social Graph).

All KPIs must be gathered in a single monitoring UI, whose (infrastructure and application) architecture is to be designed together by all KPI providers (that is IT Governance, Atol and Open Wide), so that it allows easily (but not necessarily graphically, ex. scripting) to set up alerts, to highlight some KPIs rather than others and to add new ones (assuming they are already provided by infrastructure or application).

Gathering Datacore application-level Key Performance Indicators (KPIs)

TODO NO rather uses Monitoring Architecture, of which Kernel AuditLog API is only one way of accessing the Logstash backbone..

To gather application-level Key Performance Indicators about its operations, the Datacore uses the Kernel Audit log architecture. In practical terms, all Datacore components emit to the Kernel Audit Log endpoint logging information about their operation, comprising operation details and especially tags that tell how it relates to business concepts (to a given Resource, Model type, business domain, application, user account...). Such tags allow, using the Kernel Audit log architecture (Logstash etc.), to filter, assemble and analyze operation logs in indicators that are meaningful to scalability, for instance that show that a given business domain is enjoying dangerously increasing success and that actions should be taken to make its resources scale up (such as sharding its Model’s MongoDB collections. Those indicators are finally monitored through graph or dashboard visualization and alerts, to which operators should react accordingly.

Datacore deployment environments

  • integration
  • technical staging (including performance test)
  • demo
  • production

Datacore minimal VM requirements

For non-production deployment environments, besides scalability and performance testing :

  • Datacore : one VM (for the Java REST API and mongodb) with 1 CPU, 4go RAM, 20go HD ext4, network, running Debian stable (wheezy) 64 bits, to be available on Friday 11th

For scalability and performance testing, the minimal production architecture (see below) will be reproduced in technical staging. It will however using minimal VM requirements to allow to push the platform to its limits more easily, and to experiment the impact of increasing existing VM specs on Datacore operational usage.

Datacore production VM requirements

To be determined at the end of scalability and performance tests.

Datacore production architecture

TODO schema (Bruxelles February schema) TODO also one additional VM in different datacenter dedicated to backup per shard

VM requirements :

  • three MongoDB Config servers (each residing on a different discrete system)
  • at least 2 shards, each being replicated on at least 1 additional VM ; so at least 4 VMs. Each VM is as initially, save for : 4CPU, 8go RAM, 2x500go RAID-1, 1gb network (TODO to be discussed and updated). For reliability, in each shard, there should be at least one replica (if possible all) on a different discrete system. Note that if a shard has an even member count, an arbiter must be started on a (typically related, see below) application server.
  • Application servers : in front of each shard (to let it serve its data according to geographical repartition), a cluster of at least 2 VMs behind (DNS) load balancing (each also running its own mongos client process), at least one (if possible all) being on a different discrete system. Each VM is as initially, save for : 4CPU, 4go RAM, 1gb network (TODO to be discussed and updated). And in front of all clusters, routing (possibly itself clustered and fronted by DNS routing) that respects geographical distribution.

Those requirements will be completed step by step as the project goes forward, especially on the side of additional deployment environments required beyond first deployment, and of further steps of scaling it up obviously.

Deployment industrialization

TODO Puppet, production procedures, both also for Monitoring Architecture

Others

For other topics such as security, elasticity, service levels, see Christophe Blanchot's "OASIS Infrastructure - Work to do" document.

##Configs

####Integration and test platform What is available regarding router and load balancing? @Partners @IPGuard

We will first test our architecture using a single shard. At the end, we'd like to add temporarily (*10) RAM and perform other tests. @IPGuard Is this possible?

  • Number of Shards : 2

  • Number of Mongo : 6 (Lyon : 3, Genève : 3)

  • Number of Java : 4 (Lyon : 2, Genève : 2)

  • Number of Monitoring: 1 (Lyon)

  • Number of Test Client : 1 (Lyon) All Linux Debian stable, 64 bits

  • Mongo :

    • RAM : 4G
    • CPU : 1
    • HDD : 50G (Ext4)
  • Config Server: For now, shared a server with another mongo.

  • Query Router (mongos) : On the same server as java.

  • Java :

    • RAM : 4G
    • CPU : 2
    • HDD : 30G
  • Monitoring :

    • CPU : 2
    • RAM : 4G
    • HDD : 20-30G
  • Test Client :

    • CPU : 2
    • RAM : 4G
    • HDD : 20G

####September 2014 Prod Config Same configuration as above but with a single shard. In production, use HDD with High Availability (HA) for mongo. Not required for java.

  • Number of Shards : 1

  • Number of Mongo : 3 (Lyon : 2, Genève : 1)

  • Number of Java : 2 (Lyon : 1, Genève : 1)

  • Number of Monitoring: 1 (Lyon) All Linux Debian stable, 64 bits

  • Mongo :

    • RAM : 4G
    • CPU : 1
    • HDD : 50G (HA)
    • Ext4
    • Linux 64 bits
  • Config Server: For now, shared a server with another mongo.

  • Query Router (mongos) : On the same server as java.

  • Java :

    • RAM : 4G
    • CPU : 2
    • HDD : 30G
  • Monitoring :

    • CPU : 2
    • RAM : 4G
    • HDD : 20-30G

Prospective (ex. 2017) Full Scale Prod Config

  • Number of Shards : 55

  • Number of Mongo : 166

  • Number of Java : 110 All Linux Debian stable, 64 bits

  • Mongo :

    • RAM : 50G
    • CPU : 4
    • HDD : 500G (HA)
    • RAID-10
    • Ext4
    • Linux 64 bits
    • Network: 1GB
  • Config Server:

    • RAM : G
    • CPU :
    • Ext4
    • Linux 64 bits
    • Network: 1GB
  • Query Router (mongos) : On the same server as the java.

  • Java :

    • RAM : 6G
    • CPU : 8
    • Network: 1GB
  • Monitoring :

    • CPU : 2
    • RAM : 4G
    • HDD : 20-30G
Clone this wiki locally