NetMet is networking tool that allows you to track and analyze network uptime of multi data centers installations
As a cloud provider one must gurantee SLA. For example 99.95% dataplane uptime (~43 seconds downtime per day). This means few things:
- Cloud provider cannot rely on customer generated tickets for downtime measurements.
- Cloud provider needs to be proactive:
- Get alerts within seconds after any downtime occurs.
- Have all required data for debugging in place
- What Data Centers (DC), Availability Zones (AZ) & Servers are effected
- Is it an underlay or overlay network issue
- When the event started and when it stopped
To verify uptime requirements we developed NetMet – a tool that constantly measures connectivity between all servers including those in different availability zones or even different data centers.
Everybody is welcome to contribute to project. We use standard GitHub process with Issues & PR.
The client-server architecture of NetMet was designed with a clear separation of concerns in mind:
- Clients periodically perform connectivity checks between each other
- Clients periodically perform Internet connectivty checks
- Clients send data to server
- Server performs aggregation of data and stores it in ElasticSearch
- Server exposes an API for retrieving aggregated data to facilitate visualization of it.
Run all netmet clients and servers:
- Netmet Server: run few instances of netmet servers under HAproxy or Nginx
- Netmet Client: run 1 client per 1 server that should be monitored
To collect all metrics needed to monitor network of Data Centers use next schema:
- Run few instances of Netmet servers in different regions
- Run 1 instance of Netmet client per 1 server
- Elasticsearch cluster mode
To avoid Netmet downtime use next schema:
- Netmet servers should be run under Nginx/HAproxy/Lbaas (for now)
- Netmet server may use multiple Elasticsearch addresses (no need in HA)
To install NetMet from source you should run next command
pip install . # run it from root directory
After that netmet
command should become available
To run Netmet Server
APP=server NETMET_SERVER_URL="<url:port>" NETMET_OWN_URL="<url:port>" ELASTIC="<url:port>" PORT=5005 netmet
To run Netmet Client
APP=client PORT=5005 netmet
Netmet is meant to be very easy to configuration. All configuration is done via Netmet server API method POST /api/v1/config which in future is going to update installation (remove/add clients), geneate new client configurations and update clients
To configure the Netmet use POST /api/v2/config
cat > config.json <<- EOM
{
"deployment": {
"static": {
"clients": [
{
"az": "test-az",
"dc": "test-dc",
"host": "127.0.0.1",
"ip": "127.0.0.1",
"port": 5001
}
]
}
},
"external": [
{"dest": "8.8.8.8", "period": 1, "protocol": "icmp", "timeout": 0.5}
],
"mesher": {
"full_mesh": {}
}
}
EOM
curl -H "Content-Type: application/json" -X POST -d '@config.json' ${NETMET_SERVER_URL}/api/v2/config
Running test is very easy.
Install tox tool
pip install tox
Run tox
tox # runs all tests
tox -e pep8 # runs only pep8 code style checks
tox -e py27 # runs unit tests using python 2.7