Million Requests per Second

Load-testing Erlang servers for fun ~~and profit~~

With this project you'll be able to set up, with minor effort, an Erlang TCP server, any number of clients connecting to it, and monitor everything as it happens. This way you can introduce bottlenecks, bugs, or any other issue you want into the server and teach how to analyze the metrics/logs, debug, and resolve the issue.

Specification
Requirements
Setup
Usage
History

1. Specification

Server

Handles client subscribe/unsubscribe.
Broadcasts messages received from one client to the rest.

Client

Sends a message to the server every N seconds.
Receives messages from the server (originally sent by another client).

Monitor

Monitoring of clients
Monitoring of server
Serve dashboard to visualize metrics

Server-Client Interaction

Establishing a connection:

Client connects to server
Server responds with a banner MRPS server 0.1

Messages a client can send:

SEND<msg>: The server sends <msg> to all the other clients connected. They receive the following: Received message: <msg>
COUNT: the server responds with the amount of connected clients
EXIT: Closes the connection to the server

2. Requirements

Terraform 0.11.11
Ansible 2.7.5
Scaleway account (this can be changed to any other provider in Terraform configuration)

3. Setup

Set the following environment variables:

TF_VAR_organization: Your Scaleway organization ID
TF_VAR_token: Your Scaleway API access token, generated by you
TF_VAR_ssh_key: Your SSH key to access provisioned servers
TF_VAR_region: (OPTIONAL) The Scaleway region, by default par1
TF_VAR_count: (OPTIONAL) This is the amount of client servers to provision, by default 1

If you are using (or want) a different provider please check their corresponding documentation on how to setup for it at Terraform Providers.

4. Usage

Creating the infrastructure

$> make infra

This will provision your servers using the Terraform configuration. You will be asked to confirm if you want to perform the actions, type yes to confirm.

Plan: 8 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

Terraform will output the IPs of the provisioned servers:

Outputs:

client_ips = [
    192.168.1.208
]
server_ip = 192.168.1.100

monitor_ip = 192.168.1.133

Preparing the servers

Install everything you need and setup your clients, server, and monitoring by doing:

$> make ops

Trying it out

$> make run

This will run Tsung using its configuration.

One of the core strengths of the Erlang programming language, more specifically of the Erlang/OTP runtime system, is its support for concurrency and distribution, making it ideally suited for soft real-time, high-availablity applications.
Inspired by this post from the Phoenix team, we decided to build a TCP server to try our hand at the C10k problem, namely to serve and maintain as many connections as possible on commodity hardware. This is our journal of the project.

First steps

To build our server we could have rolled our own socket pool using OTP's gen_tcp behaviour, but we chose to use Ranch, a minimal, low latency library from the creator of Cowboy.
Once the Ranch application is started, we can start listening for connections:

start(_StartType, _StartArgs) ->
    ranch:start_listener(mrps, 10, ranch_tcp, 
                         [{port, 6969}, {max_connections, infinity}],
                         mrps_protocol, []),
    mrps_sup:start_link().

Here we start 10 acceptor processes, that listen on TCP port 6969 for an unlimited number of connections. The mrps_protocol is our protocol handler, which defines the logic executed by each connection process.

Our server supports some basic commands:

Accept incoming connections
Return count of connected clients
Broadcast messages received from one client to the rest
Close connection

The core of the protocol implementation is a loop function that handles communication between client and server:

loop(Socket, Transport, Register) ->
    ok = Transport:setopts(Socket, [{active, once}]),
    receive
        {msg, Message} ->
            Transport:send(Socket, ["Received message:\n", Message]),
            loop(Socket, Transport, Register);
        {tcp, Socket, <<"SEND", Message/binary>>} ->
            Register:for_each(send_msg(Message, self())),
            Transport:send(Socket, <<"Message sent\n">>),
            loop(Socket, Transport, Register);
        {tcp, Socket, <<"COUNT\n">>} ->
            Count = integer_to_binary(Register:count()),
            Transport:send(Socket, [Count, <<"\n">>]),
            loop(Socket, Transport, Register);
        {tcp, Socket, <<"EXIT\n">>} ->
            close(Socket, Transport, Register);
        {tcp, Socket, _Data} ->
            loop(Socket, Transport, Register);           
        {tcp_closed, Socket} ->
            close(Socket, Transport, Register);
        {tcp_closed, Socket, _Reason} ->
            close(Socket, Transport, Register)
	end.

The Register we use here is an abstraction (a behaviour) that maintains a list of connected clients and provides a way to iterate through them. Later, we discuss the different implementations we tried out.
Our server listens for the following commands:

SEND<msg> => Server sends <msg> to all the other clients connected. They receive the following: Received message: <msg>
COUNT => Server responds with the amount of connected clients
EXIT => Closes the connection to the server

To broadcast the <msg> to the rest of the clients, we send a message to each connection process:

Client ! {msg, Message}

The message is matched by the loop function, which then sends it to the client:

{msg, Message} ->
            Transport:send(Socket, ["Received message:\n", Message]),
            loop(Socket, Transport, Register);

Registering clients

In order to broadcast messages our application needed a way to store all connected clients. We first created a behaviour that defined the following callbacks:

behaviour_info(callbacks) ->
    [{init, 1},
     {store_client, 1},
     {remove_client, 1},
     {for_each, 1},
     {count, 0}];

We tested out serveral alternatives.

ETS

A simple yet effective way to store global state comes built in the Erlang system. Erlang Term Storage, or ets, is a module that provides dynamic tables that store tuples indexed by a key.
Nothing beats ets tables for simplicity:

init(_Args) ->
    ets:new(clients, [set, public, named_table]).

store_client(Client) ->
    ets:insert(clients, {Client}).

remove_client(Client) ->
    ets:delete(clients, Client).

for_each(Func) ->
    ets:foldl(
        fun({Client}, _Acc) -> Func(Client) end, 
        [],
        clients  
    ).

count() ->
    ets:info(clients, size).

Here the init callback returns a table named clients with public access, so that connection processes can register each client.
We tested out the different types of ets tables:

set: each key must be unique
ordered_set: keys are unique and are stored/accessed in order
bag: keys can repeat, but the tuples to store must be unique
duplicate_bag: keys can repeat and identical tuples can be stored

We found no meaningful difference in performance between the tables, although its been reported that duplicate_bag (since it enforces less constraints) might offer some improvement.
The rest of the api is self-explanatory.

PG2

Erlang provides another solution in the pg2 module, which implements process groups. It handles distributed named process groups, where messages can be sent to one, some or all group members.
Clients are added to a global group, and then removed automatically when their connection process ends.

Syn and Gproc

These are two third-party libraries that provide a global process registry, with strong support for distribution and concurrent reads. They are well tested and maintained; perhaps a bit excessive for such a simple project. An extensive comparison of the different process registries can be found here.
The corresponding behaviour implementations are relatively straight forward.

Orchestration and provisioning

One of our goals was to have a one-click solution for creating infrastructure, installing needed software and deploying code to run our server.
To this end, we used Terraform to create and manage the needed cloud resources, and Ansible as a provisioning and configuration management tool.

Terraform

From the official documentation:

Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. Terraform can manage existing and popular service providers as well as custom in-house solutions.

As stated, it is a tool to create, manage and tear down cloud resources. It features a declarative syntax with which we describe the end state of our infrastructure, and the needed operations (creation, destruction, scaling) are handled automatically.

For our project, we used Scaleway, a cloud provider that offers ARMv7 machines.
Here's how we create a server:

resource "scaleway_server" "server" {
    name        = "load-server"
    image       = "31dfef82-9b45-4b01-9656-031617f36599"
    type        = "C1"
    depends_on  = ["scaleway_ssh_key.ssh_key"]
}

Since we wanted the ability to create clients dynamically, we can pass the number of clients as a parameter to Terraform:

$> terraform apply -var "count=5"

And here's how we create the client servers:

resource "scaleway_server" "clients" {
    count       = "${var.count}"
    name        = "load-client-${count.index}"
    image       = "31dfef82-9b45-4b01-9656-031617f36599"
    type        = "C1"
    depends_on  = ["scaleway_server.server"]
}

Terraform provides a way to output generated values (ips in our case) so we can use them with our provisioning tool. Here's how we populate Ansible's inventory:

data "template_file" "ansible_inventory_tpl" {
  template = "${file("devops/inventory.tpl")}"

  vars {
    server_ip   = "${scaleway_ip.server_ip.ip}"
    client_ips  = "${join("\n", scaleway_ip.client_ips.*.ip)}"
    monitor_ip  = "${scaleway_ip.monitor_ip.ip}"
  }
}

Ansible

Among the many alternatives in the field (Chef, Puppet, Salt) we chose Ansible, an agentless provisioning and configuration management tool that leverages Python and ssh.
The syntax is quite easy to grasp, consisting of yaml files (playbooks) and an inventory file that defines the hosts to provision/configure.
Playbooks are run against said hosts and take care of installing packages, creating users, configuring the system, etc.
Here's a task that installs Tsung on clients:

---  
- name: Install Tsung on clients
  hosts: clients
  remote_user: root
  tasks:
    - name: install tsung
      apt: 
        name: tsung
        state: latest

It is a comprehensive tool, allowing for a variety of system administration tasks that go above and beyond what we needed for this project.

Monitoring

In order to gain insights from our tests, we also set up a monitoring system. In particular, we used Prometheus, an open-source systems monitoring and alerting toolkit.
It consists of multiple components:

the main Prometheus server which scrapes and stores time series data
special-purpose exporters
an alertmanager
client libraries for instrumenting application code

Each machine in our project collects metrics through the official node_exporter and the Erlang client library.

The data is then fed to the visualization and analytics platform Grafana.

Load testing

With our system already in place, we begun testing our server with Tsung, a distributed load testing tool written in Erlang.

Tsung requires testing specifications written in XML that describe the number of users to create, and how those users communicate with the server.

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
devops		devops
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
architechture.svg		architechture.svg
rebar.config		rebar.config
rebar.lock		rebar.lock
rebar3		rebar3
sys.config		sys.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Million Requests per Second

1. Specification

Server

Client

Monitor

Server-Client Interaction

2. Requirements

3. Setup

4. Usage

Creating the infrastructure

Preparing the servers

Trying it out

5. History

First steps

Registering clients

ETS

PG2

Syn and Gproc

Orchestration and provisioning

Terraform

Ansible

Monitoring

Load testing

About

Releases

Packages

Contributors 3

Languages

License

lambdaclass/million_requests_per_second

Folders and files

Latest commit

History

Repository files navigation

Million Requests per Second

1. Specification

Server

Client

Monitor

Server-Client Interaction

2. Requirements

3. Setup

4. Usage

Creating the infrastructure

Preparing the servers

Trying it out

5. History

First steps

Registering clients

ETS

PG2

Syn and Gproc

Orchestration and provisioning

Terraform

Ansible

Monitoring

Load testing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages