Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose stats through the ReQL admin API #2885

Closed
timmaxw opened this issue Aug 11, 2014 · 40 comments
Closed

Expose stats through the ReQL admin API #2885

timmaxw opened this issue Aug 11, 2014 · 40 comments

Comments

@timmaxw
Copy link
Member

timmaxw commented Aug 11, 2014

Proposed API: Introduce two new pseudo-tables rethinkdb.table_stats and rethinkdb.server_stats. The first has one document per table, and the second has one document per server. Each has a bunch of fields with statistics about the table or server. We could also have nested sub-documents to organize the stats.

One possible problem is that stats looks a lot like status.

@timmaxw timmaxw added this to the reql-admin milestone Aug 11, 2014
@timmaxw
Copy link
Member Author

timmaxw commented Aug 11, 2014

In addition to the perfmon stats, we could also expose the number of documents in each shard (which comes from performing a distribution query)

@Tryneus
Copy link
Member

Tryneus commented Oct 9, 2014

To get the ball rolling on a stats proposal, I'll first give an example of what stats we currently provide, and then discuss which ones I think are worthwhile to have in the ReQL stats tables. I am trying to keep modification of existing perfmons to a minimum, new stats can be handled later.


Server stats:

{
  <table id>: <server-specific table stats>,
  "active_coroutines": "27",
  "allocated_coroutines": "162",
  "auth_metadata": <auth metadata stats>,
  "connectivity": {
    <peer id>: {
      "bytes_sent": <avg, min, max, per_sec>
    },
  },
  "eventloop": {
    "active_count": "25",
    "recent_duration": <avg, min, max, per_sec>
    "total": "228389"
  },
  "metadata": <metadata stats>,
  "proc": {
    "pid": "5897",
    "timestamp": "1970-03-03T14:38:37.693120425",
    "uptime": "108",
    "version": "1.15.1-131-g03bdaf-dirty"
  },
  "query_language": {
    "ops_running": "0"
  },
  "sys": {
    "global_disk_space_free": "141864062976",
    "global_disk_space_total": "239939149824",
    "global_disk_space_used": "98075086848"
  }
}
  • metadata and auth_metadata are fairly useless as far as stats go, high-performance is not nearly as critical on these structures and the stats provided are fairly inscrutable to an end-user
  • The info in proc is not so much a stat value as a status value and should be available in server_status or not at all
  • connectivity is rather useless as it is currently given - this shows the stats by peer ID, which will change when servers are restarted, and can be very hard for a user to decypher
  • It should be noted that /ajax/stat returned all values as strings, even if they were integers or floats, we should fix this in ReQL stats.
  • The event_loop stats seem pretty meaningless, so I don't see a reason to include them

Proposed format for a server_stats row:

{
  "id": <UUID>, // The ID of the requested server
  "name": <STRING>, // The name of the requested server
  "coroutines": {
    "active": <NUMBER>,
    "allocated": <NUMBER>
  },
  "query_language": {
    "ops_running": <NUMBER>
  },
  "disk": {
    "used_bytes": <NUMBER>,
    "free_bytes": <NUMBER>,
    "total_bytes": <NUMBER>
  }
}

Table stats (per server):

{
  "regions": {
    <role_1>...<role_n>: <role stats>
  },
  "serializers": {
  "disk": {},
  "serializer": {
    "serializer_block_reads": {
      "active_count": "0",
      "recent_duration": <avg, min, max, per_sec>,
      "total": "43"
    },
    "serializer_block_writes": "56",
    "serializer_bytes_in_use": "12582912",
    "serializer_data_extents": "1",
    "serializer_data_extents_allocated": "0",
    "serializer_data_extents_gced": "0",
    "serializer_extents_in_use": "6",
    "serializer_index_reads": "10",
    "serializer_index_writes": {
      "active_count": "0",
      "recent_duration": <avg, min, max, per_sec>,
      "total": "40"
    },
    "serializer_index_writes_size": <avg, min, max>,
    "serializer_lba_extents": "4",
    "serializer_lba_gcs": "0",
    "serializer_old_garbage_block_bytes": "0",
    "serializer_old_total_block_bytes": "0"
  },
  "shard_0"..."shard_7": {
    "btree-primary": {
      "keys_read": "0.00000000",
      "keys_set": "0.00000000",
      "total_keys_read": "0",
      "total_keys_set": "6"
    },
    <btree-index-x>...<btree-index-y>: {
      "keys_read": "0.00000000",
      "keys_set": "0.00000000",
      "total_keys_read": "0",
      "total_keys_set": "6"
    },
    "cache": {}
  }
}
  • The cache and disk fields never contain any child perfmons, so no reason to include those
  • Since these stats are collected per-server, we could present them at server-level, cluster-level, or both.
    • Because stats are not persistent, shutting down a server would result in strange-looking behavior in cluster-level stats.
  • It would be impossible using these stats to give them per real-keyspace-region, as these shards do not line up with real-keyspace-regions. Therefore, it needlessly complicates things to show stats per hash shard, and these should be aggregated into one object.
  • The role stats may not be worth including. Their existence depends on the individual roles of each server in the blueprint, but the only stat I've seen is the broadcast_queue_count of be_primary roles.
  • In serializer, we provide stats for both block reads and index reads - is it actually useful to have both?

Proposed format for a table_stats row:

{
  "servers": [{
    "id": <UUID>, // The ID of the server these stats are from
    "name": <STRING>, // The name of the server these stats are from
    "indexes": {
      <name>: { // The name of the index
        "reads_per_sec": <NUMBER>
        "writes_per_sec": <NUMBER>
        "total_reads": <NUMBER>
        "total_writes": <NUMBER>
      }
    },
    "serializer": {
      "file": {
        "total_bytes": <NUMBER>,
        "data_bytes": <NUMBER>,
        "lba_bytes": <NUMBER>
      }, 
      "reads": {
        "active": <NUMBER>,
        "per_sec": <NUMBER>,
        "total": <NUMBER>
      },
      "writes":
        "active": <NUMBER>,
        "per_sec": <NUMBER>,
        "total": <NUMBER>
    }
  }, ... ]
}

The available interface for these tables is nothing special, they should work just like the existing table_config or server_config tables. Under the hood we will still be doing cross-cluster stats requests, just like with the old /ajax/stat.

Final thoughts:

  • Full perfmons may still be accessible through a debug table.
  • We will be doing some math to the perfmon data to get the serialzer.file info for table_stats, but I think it is much more user-friendly than giving extent counts.
  • The current perfmons do not give us the name of the primary index - passing this information down might turn out to be tricky, haven't looked into it yet.
  • I think the server_stats query_language field should be extended in the future, there's a lot of stuff we could put in there that users would be excited about seeing (e.g. Perfmon for number of open client connections #2989).
  • Giving the table_stats servers field as an array kind of sucks, but I think it's better than a dict indexed by uuid (really scary to new users) or by name (possibility of collisions). If we can make name collisions impossible, I might be ok with a dict indexed by server name.
  • It would be nice if the table_stats index reads/writes were formatted the same as the serializer reads/writes, with active, per_sec, and total, but that would require reorganizing the perfmons a bit, I'm not sure if we're trying to avoid that.
  • I am open to any renaming, including, or excluding of these fields, please discuss.

@timmaxw
Copy link
Member Author

timmaxw commented Oct 10, 2014

I agree that this project should be done one step at a time, but I think we should also use this thread to discuss what we want the stats to look like when we ship reql_admin. Here's my "wish list":

  • For each server: CPU usage, memory usage, disk space usage, disk IO rate, network IO rate.

  • Ideally we would show how much of these resources are being used by RethinkDB, how much are used by non-RethinkDB processes/files on the same server, and (for memory and disk space) how much is available total.

  • We could also break down RethinkDB's share of the resources into more detailed categories. CPU usage can be broken down by which activity is consuming the CPU, memory usage can be broken into "general" and per-table, disk space and disk IO can be broken down per table, and network IO can be broken down by table, activity, and/or which server we're communicating with. Probably we don't want to do all of that. The most important things are probably per-table disk space and memory usage.

  • We could use a format something like this to report resource usage:

    "memory": {
        "rethinkdb": {
            "tables": [
                {"name": ..., "db": ..., "usage": ...}
            ],
            "queries": ...,
            "other": ...,
            "total": ...,
        },
        "non_rethinkdb": ...,
        "free": ...,
        "total": ...
    }

    On the other hand, this might be overkill.

  • Inter-server network latencies would be nice, but maybe a pain to implement.

  • We'd have to think about how to handle multiple disks. One option is to report statistics for each disk. This especially makes sense for disk usage; otherwise it would be hard to tell the difference between two 50% full disks and two disks where one is nearly full and the other nearly empty.

  • For each server: an indicator if the server is swapping to disk.

  • For each table: QPS and 50th/90th/99th percentile latencies for each of point reads, batch reads, and inserts.

  • For each table: the number of rows.

  • QPS and 50th/90th/99th percentile latencies for whole queries (which can touch zero or multiple tables). I'm not sure how to report these. This is probably best as a whole-cluster statistic; maybe we should have a rethinkdb.cluster_stats table, like the rethinkdb.cluster_config table. Or we could break it down by server, but I feel like the aggregated numbers are more useful.

@timmaxw
Copy link
Member Author

timmaxw commented Oct 10, 2014

/cc @coffeemug; I think this is still on the list of things you care about.

@coffeemug
Copy link
Contributor

I do care about this (and all other user-facing issues in reql-admin, which are pretty much all of them 😄) I'll comment on the format details next week.

@Tryneus
Copy link
Member

Tryneus commented Oct 10, 2014

Good wishlist, I think a few things are infeasible, but most can be incorporated. A few comments:

Ideally we would show how much of these resources are being used by RethinkDB, how much are used by non-RethinkDB processes/files on the same server, and (for memory and disk space) how much is available total.

I don't think we should be tracking usage by things other than RethinkDB. These stats should be for service-monitoring, not server-monitoring. If users want that information, they should use real server monitoring tools, which we shouldn't be trying to compete with.

CPU usage can be broken down by which activity is consuming the CPU.

I think this would be prohibitively expensive to track, and would involve a lot of work, but perhaps it could be done.

We'd have to think about how to handle multiple disks.

Do we officially support multiple disks on a single server? I can imagine a user could use a work-around to make this work, but I think we should just report the disk used by the rethinkdb_data directory.

50th/90th/99th percentile latencies

Is this feasible to calculate in a moving window, or would this be for the entire lifetime of the cluster? In particular, I don't think it's feasible to give this at a cluster level unless we're averaging these values across the cluster, which is incorrect/misleading. As for giving these values for individual tables, it would be pretty difficult to categorize queries in that way, and we're probably better off giving per-server statistics like this.


Here are updated proposals for the stats tables. I tried to incorporate as much as I could. This is a wishlist at the moment, but it should still be practical.

server_stats

  • Added memory, cpu, and network fields
  • Added disk usage per-table
  • memory.tables[n].used_bytes would be the cache size in bytes for that table on the given server
  • disk.tables[n].used_bytes would be the size of the table file on the given server
  • Renamed ops_running to active_queries and added queries_per_sec, total_queries, and query_duration_ms
{
  "id": <UUID>,
  "name": <STRING>,
  "coroutines": {
    "active": <NUMBER>,
    "allocated": <NUMBER>
  },
  "query_language": {
    "queries_per_sec": <NUMBER>,
    "active_queries": <NUMBER>,
    "total_queries": <NUMBER>,
    "query_duration_ms": {
      "50_percentile": <NUMBER>,
      "90_percentile": <NUMBER>,
      "99_percentile": <NUMBER>
    }
  },
  "disk": {
    "tables": [ {
      "db": <STRING>,
      "table": <STRING>,
      "table_id": <STRING>,
      "used_bytes": <NUMBER>,
      "reads_per_sec": <NUMBER>,
      "writes_per_sec": <NUMBER> }, ...
    ],
    "used_bytes": <NUMBER>,
    "free_bytes": <NUMBER>,
    "total_bytes": <NUMBER>,
    "reads_per_sec": <NUMBER>,
    "writes_per_sec": <NUMBER>
  },
  "memory": {
    "tables": [ {
      "db": <STRING>,
      "table": <STRING>,
      "table_id": <UUID>,
      "used_bytes": <NUMBER> }, ...
    ],
    "used_bytes": <NUMBER>,
    "free_bytes": <NUMBER>,
    "total_bytes": <NUMBER>,
    "active_swap": <BOOL>
  },
  "cpu": {
    "cores": [ {
      "usage": <NUMBER>
    } ],
    "usage": <NUMBER>
  },
  "network": {
    "cluster_latency_ms": <NUMBER>,
    "intracluster": {
      "sent_bytes": <NUMBER>,
      "received_bytes": <NUMBER>,
      "active_connections": <NUMBER>,
      "total_connections": <NUMBER>
    },
    "clients": {
      "sent_bytes": <NUMBER>,
      "received_bytes": <NUMBER>,
      "active_connections": <NUMBER>,
      "total_connections": <NUMBER>
    }
  }
}

table_stats

  • Combined all per-server stats into one representation - this means we won't track totals to avoid discontinuities when servers leave/rejoin the cluster. Now table disk usage will be tracked per-server in the server stats.
  • Each index is given in the indexes dict by name, since they are guaranteed to be unique
  • Added number of reads/writes active on a given index
  • Added the count of rows in a given index - for the primary key, this will be equivalent to table.count
{
  "id": <UUID>,
  "name": <STRING>,
  "primary_key": <STRING>
  "indexes": {
    <name>: {
      "rows": <NUMBER>,
      "reads_per_sec": <NUMBER>,
      "reads_active": <NUMBER>,
      "writes_per_sec": <NUMBER>,
      "writes_active": <NUMBER>
    }
  },
  "disk": {
    "reads_per_sec": <NUMBER>,
    "reads_active": <NUMBER>,
    "writes_per_sec": <NUMBER>,
    "writes_active": <NUMBER>
  }
}

@Tryneus
Copy link
Member

Tryneus commented Oct 10, 2014

Forgot to fill in the cpu field, my previous post has been edited.

@wojons
Copy link
Contributor

wojons commented Oct 11, 2014

This is a lot of stuff that you have here I will try to cover what i have learned from working with the existing API.

  • First off if a user has a lot of tables on the system returning one super massive document for all the stats that server has for that table may not be the best method I am sure its easiest with primary key would just be the server id, but then in REQL would have to split the data server side or load the whole thing into there code which could use more memory then both parties want to use for something as simple as getting table stats. I guess i should say it should be a document per table with all the servers that have stats for it in that.
  • I aggree with @Tryneus that there is no needed to see how much CPU usage is being used by non rethinkdb processes. I also don't think there is a simple way to see how much of which core was used by rethinkdb with out a little black magic which would be different per OS.
  • I also agree with @Tryneus that disk stats only needed for where there there rethinkdb_data directory is located.
  • @Tryneus all stats should always be running totals since book. I cant tell you how many times i had to make some monitoring plugin for something and because the stats are not a running total your pretty much putting guess snap shot which can help but not true figures. But i Do think that for debugging reasons there should be a API call or command that restarts the running total's and sets them back to 0 this way you can get the most recent few minutes if there were weeks of great performance and 10 minutes of bad ones.
  • I think that inter-cluster traffic should have all of the server_id's maybe the totals over all but its its cool to say yeah i sent X amount of data over the cluster but does not tell me if there is a weird table back-fill always happening to a single node and that is why I sending so much traffic.
  • I was noticing that there are some stats that were in the table stats that were removed that related to extends, index blocks garbage collector, I find those all super useful stats to know what is going on per table. It also helps show how some insert patterns effect different things.

Server Stats Purpose

{
  "id": <UUID>,
  "name": <STRING>,
  "coroutines": {
    "active": <NUMBER>,
    "allocated": <NUMBER>
  },
  "query_language": {
    "queries_per_sec": <NUMBER>,
    "active_queries": <NUMBER>,
    "total_queries": <NUMBER>,
    "query_duration_ms": {
      "50_percentile": <NUMBER>,
      "90_percentile": <NUMBER>,
      "99_percentile": <NUMBER>
    }
  },
  "disk": {
    "tables": 
      <table_id>: {
        "db": <STRING>,
        "table": <STRING>,
        "table_id": <STRING>,
        "used_bytes": <NUMBER>,
        "reads_per_sec": <NUMBER>,
        "writes_per_sec": <NUMBER> 
      }, ...
    },
    "used_bytes": <NUMBER>,
    "free_bytes": <NUMBER>,
    "total_bytes": <NUMBER>,
    "reads_per_sec": <NUMBER>,
    "writes_per_sec": <NUMBER>
  },
  "memory": {
    "tables": 
     <table_id> : {
      "db": <STRING>,
      "table": <STRING>,
      "table_id": <UUID>,
      "used_bytes": <NUMBER> 
     }, ...
    },
    "used_bytes": <NUMBER>,
    "free_bytes": <NUMBER>,
    "total_bytes": <NUMBER>,
    "active_swap": <BOOL>
  },
  "cpu": {
   "user": <NUMBER>
   "system": <NUMBER>
    "total": <NUMBER>
  },
  "network": {
    "cluster_latency_ms": <NUMBER>,
    "intracluster": {
      <SERVER_ID> : {
       "sent_bytes": <NUMBER>,
       "received_bytes": <NUMBER>,
       "active_connections": <NUMBER>,
       "total_connections": <NUMBER>
     },
     "sent_bytes": <NUMBER>,
     "received_bytes": <NUMBER>,
     "active_connections" <NUMBER>
     "total_connections": <NUMBER>
    },
    "clients": {
      "sent_bytes": <NUMBER>,
      "received_bytes": <NUMBER>,
      "active_connections": <NUMBER>,
      "total_connections": <NUMBER>
    }
  }
}

@neumino
Copy link
Member

neumino commented Oct 12, 2014

A few thoughts/questions:

  • What is network.cluster_latency_ms? the average latency to other servers in the cluster?
  • We used to have disk/cpu/memory stats a long time ago (before 1.2), but we took it out because it was a bit hard to properly implement for all filesystems.
  • More stats mean more intra cluster traffic. Last time I talked with @danielmewes (or maybe @Tryneus?) the stats were used only by the web interface, so we may want to not have too much stats to avoid heartbeat timeout.
  • Proposal: averaging multiple values #2078 could make things nicer to compute stats per table.
  • Note that getting the number of disk reads/writes for all tables at the same time is a bit cumbersome (but I don't have a good solution).
  • Stats can currently timeout, and we report it in /ajax/machines/timeout. We should somehow be able to report timeouts. What happens if a server is down? Do we report it with special values, or just drop it?
  • Should we keep a short history? The time at which the stats were computed? Let users retrieve the stats for last second/minute/hour/day?

@Tryneus
Copy link
Member

Tryneus commented Oct 13, 2014

@wojons and @neumino, thanks for bringing up these concerns, I'll try to address them here:

  • Having running totals of stats is fine in the server_stats table, but if we expose running totals in table_stats, there will be discontinuities if a server ever leaves/rejoins the cluster. For this reason, I think table stats should only be active and per-second totals (summed from all servers), while server stats can also include totals since the launch of the process.
  • A lot of these stats are wishful thinking and may be too difficult to implement in a platform-agnostic manner. That said, we are deciding on the structure we want stats to have so we can add these things later (in reql-admin-polish). The cpu, memory, and network stats fall under this.

[...] for debugging reasons there should be a API call or command that restarts the running total's and sets them back to 0 [...]

If we want to provide a way to clear running totals, I propose allowing a delete of the table stats, like r.db('rethinkdb').table('server_stats').get(<SERVER>).delete().

I was noticing that there are some stats that were in the table stats that were removed that related to extends, index blocks garbage collector, I find those all super useful stats to know what is going on per table.

We can add these back in, I omitted them to avoid giving too much information to users. If we do give these kinds of stats to users, they would probably need to be per-server to avoid the discontinuities mentioned above.

What is network.cluster_latency_ms? The average latency to other servers in the cluster?

Yes. We currently don't have tracking for this, but it shouldn't be too hard to get it. This is wishlist stuff and will not be in the first draft.

More stats mean more intra cluster traffic.

My biggest concern here is any stat that is both per-server and per-table. For example, the server_stats.disk.tables and server_stats.memory.tables arrays would bloat up stats quite a bit on clusters with a large number of tables. We may take some measures to make stats requests more efficient, such as only returning the subset of stats that are requested rather than the entire row. This would require some more work on the artificial_table_t interface. In any case, the Web UI should be able to pluck/reduce stats to just the information needed and keep browser <=> cluster traffic to a minimum.

We should somehow be able to report timeouts.

We could have a timeout field in the server stats row with a BOOL value. If the request timed out, this is set to true (and the rest of the stats are not included), or it is false. We shouldn't throw an error here, and I don't see another way to communicate a timeout to the user other than inline with the data.

What happens if a server is down? Do we report it with special values, or just drop it?

With how we're handling server membership in the cluster, I don't think we will have any special handling for a down server, it will just be omitted from the results.

Should we keep a short history? The time at which the stats were computed? Let users retrieve the stats for last second/minute/hour/day?

I don't think this is feasible to track. Existing perfmons don't give us a way to store history, and the memory requirements would be difficult to manage. Stats are collected on-demand, though they are continuously 'computed'.

Note that getting the number of disk reads/writes for all tables at the same time is a bit cumbersome.

I agree this results in a rather nasty query, and that #2708 would simplify this a lot. Note that getting the rows read/written (rather than disk reads/writes) would be much simpler.

@danielmewes
Copy link
Member

Just a couple of points:

Having disk writes/s reads/s is not a useful metric in general.
Instead I think we should have two things:

  • bytes written/s and bytes read/s
  • index writes/s

That would be a per-table property.

@Tryneus you added a count value for each index. We don't currently have a way for computing that efficiently. Instead we should include the approximated number of documents for each shard (as we get from get_btree_key_distribution()).

I don't think we should include disk i/o and memory usage in the table_stats table at all. I think almost always you want to know how much disk/memory a given table is using on a specific server. If you really need to know the accumulated (or average) resource usage of a table, you can still easily write a reduce query over the server_stats table.
(your latest proposal still had the disk field in the table_stats).

cache perfmons will be added again with #2125 . But there is no reason to have an empty cache field as long as that's not done.
Generally I think cache stats would be more relevant than trying to add stats which are more about system-monitoring and not trivial to get right (like CPU usage or intra-cluster network latency). The latter can be obtained through other tools, stats about the cache cannot.

Finally, (@wojons) percentiles for the perfmons that measure timings would be really cool. I think that might be the next step, but shouldn't be in the first version.

@Tryneus
Copy link
Member

Tryneus commented Oct 22, 2014

@coffeemug, could you make a decision on the format of table_stats and server_stats? I think the idea is we want a format that we can grow into when adding more perfmons later, but for the 1.16 release, we would at most add a few perfmons (e.g. queries per second, .query duration, cache size per table).

@danielmewes danielmewes modified the milestones: reql-admin, reql-admin-polish Oct 23, 2014
@danielmewes
Copy link
Member

As to which perfmons we might want to hide from the table:
@wojons and I compiled this table when he was implementing RethinkDB monitoring for ElasticTrace: https://docs.google.com/document/d/1RjfX8D9aQVKWQKUHnqY2uymf-kSGapLmQy_sGCxkgVg/edit?usp=sharing

Note that some perfmons are marked as "internal metric" (or their description uses the word obscure or similar). These are likely the ones we want to skip.

I think @wojons is using most of the remaining ones for his monitoring solution, so we should not remove those unless there's a good reason for doing so.

@danielmewes
Copy link
Member

The perfmons in the serializer that have extents as their unit should probably all be in bytes instead.

@coffeemug
Copy link
Contributor

With respect to which stats to include, I'd like to defer to @danielmewes and @wojons -- they seem to understand what is and isn't needed really well and have done a lot more research than I have (especially @wojons, since he actually built a RethinkDB monitoring product and was responsible for operating infrastructures before).

My only guidance is to include as few metrics as possible, give them good names, and clearly include units into the names in a consistent way to avoid any confusion.

However, I don't like the idea of a table_stats and server_stats tables. I think we should have a single table stats. Whenever possible we'd add a stat per table per server and include db_name/table_name/server_name into the document. For stats that only makes sense per server but not per table we'd not include db_name/table_name in the document, and for stats that only make sense per table but not per server we'd not include server_name in the document.

This way the user can use group and aggregations to slice data any way they want (per server, per table, per table/server, per cluster, per set of servers in a tag, etc.) I think that would be dramatically better than including two tables.

@danielmewes
Copy link
Member

I'll go through this again and prepare a full proposal for which perfmons I think we should have in v1.

@danielmewes
Copy link
Member

Here's a proposal adapted from @Tryneus earlier one. It describes the entries in a unified stats table:

one per server

{
  server_id: <UUID>,
  server_name: <STRING>,
  coroutines: {
    active: <NUMBER>,
    allocated: <NUMBER>
  },
  query_server: {
    queries_per_sec: <NUMBER>,
    queries_total: <NUMBER>,
    queries_active: <NUMBER>
  },
  network: {
    intracluster: {
      sent_bytes_per_sec: <NUMBER>,
      sent_bytes_total: <NUMBER>,
      received_bytes_per_sec: <NUMBER>,
      received_bytes_total: <NUMBER>,
      open_connections: <NUMBER>
    },
    clients: {
      sent_bytes_per_sec: <NUMBER>,
      sent_bytes_total: <NUMBER>,
      received_bytes_per_sec: <NUMBER>,
      received_bytes_total: <NUMBER>,
      open_connections: <NUMBER>
    }
  },
  started_at: <TIME>
}

one per table/server pair

{
  server_id: <UUID>,
  server_name: <STRING>,
  table_id: <UUID>,
  db_name: <STRING>,
  table_name: <STRING>,
  indexes: {
    <name>: {
      reads_per_sec: <NUMBER>
      reads_total: <NUMBER>
      writes_per_sec: <NUMBER>
      writes_total: <NUMBER>
    },
    ...
  },
  disk: {
    read_bytes_per_sec: <NUMBER>,       // currently serializer_block_reads, must be modified to report bytes
    read_bytes_total: <NUMBER>,
    written_bytes_per_sec: <NUMBER>,    // currently serializer_block_writes, must be modified to report bytes
    written_bytes_total: <NUMBER>,
    commits_per_sec: <NUMBER>,          // currently serializer_index_writes
    commits_total: <NUMBER>,
    space_usage: {
      lba_bytes: <NUMBER>,              // currently serializer_lba_extents, multiply by extent size
      data_bytes: <NUMBER>,             // approximate as follows: take serializer_data_extents, multiply by extent size, subtract serializer_old_garbage_block_bytes
      garbage_bytes: <NUMBER>           // currently serializer_old_garbage_block_bytes
    }
  },
  cache: {
    in_use_bytes: <NUMBER>              // Should be straight forward to add
  }
}

one per server that timed out

{
    server_id: <UUID>,
    server_name: <STRING>,
    error: "Timed out. Unable to retrieve stats."
}

All stats have totals, and I think this is ok because they are all reported per server. I've included the started_at to the server stat so you can find out if a server has been restarted.

Most of the stats correspond directly to existing perfmons which should make the implementation relatively easy. The exceptions are the cache.in_use_bytes perfmon and the network.*.open_connections one. I think both of them are very useful, and probably worth adding.

@wojons you also asked about the GC stats. I've omitted them for now, because I think we should improve them first. For example we have a stat that has the number of extents GCed by the data GC, but we don't know how much of those extents was actually garbage. So that makes it confusing. Instead we should first implement some more meaningful stats such as gc_bytes_reclaimed.
Running GC processes will be listed in the jobs table (see #3115 ), so you can check that if you want to find out what's going on at the moment (though there's no running total, sorry).

Remarks? Suggestions? Things that are missing which we absolutely need in the first version?

@deontologician
Copy link
Contributor

+1 for the composite key idea

@coffeemug
Copy link
Contributor

I think the reads/writes /sec should be computed by accumulating over all tables with a filter on the server. Does that sound reasonable?

Ah, yes.

I don't have an opinion on what a good pkey should be, I just saw that it wasn't specified. A composite key seems reasonable.

@danielmewes
Copy link
Member

@timmaxw and I talked about this more offline.

  • We removed a few entries that were not obviously useful. Those will still be available in a debug table.
  • We renamed some stats.
  • We restructured the reads and writes per second. One problem that came up was that if we only report written documents per server/table pair, there is no reliable way for computing the writes per table without counting duplicates if the table is replicated. There now is a new per table document that lists the reads/writes per table (in contrast to per table/server pair). For consistency and simplicity of use we also added the reads/writes to the per-server document, and added another single document that lists the total reads/writes in the cluster (like what you see on the Dashboard of the web UI).
  • The primary key is now an array starting with the string "cluster"|"server"|"table"|"table_server", followed by the UUID of the corresponding server and/or table.

Here's the new proposal:
one globally

{
  id: ["cluster"],
  query_engine: {
    queries_per_sec: <NUMBER>,
    read_docs_per_sec: <NUMBER>,
    written_docs_per_sec: <NUMBER>
  }
}

one per server

{
  id: ["server", <UUID>],
  server_id: <UUID>,
  server_name: <STRING>,
  query_engine: {
    queries_per_sec: <NUMBER>,
    queries_total: <NUMBER>,
    read_docs_per_sec: <NUMBER>,
    read_docs_total: <NUMBER>,
    written_docs_per_sec: <NUMBER>,
    written_docs_total: <NUMBER>,
    client_connections: <NUMBER>
  },
}

one per table

{
  id: ["table", <UUID>],
  table_id: <UUID>,
  db_id: <UUID>,
  db_name: <STRING>,
  table_name: <STRING>,

  query_engine: {
    read_docs_per_sec: <NUMBER>,
    written_docs_per_sec: <NUMBER>
  }
}

one per table/server pair

{
  id: ["table_server", <UUID>, <UUID>]  // table_id, server_id
  server_id: <UUID>,
  server_name: <STRING>,
  table_id: <UUID>,
  db_id: <UUID>,
  db_name: <STRING>,
  table_name: <STRING>,

  query_engine: {
    read_docs_per_sec: <NUMBER>,
    read_docs_total: <NUMBER>,
    written_docs_per_sec: <NUMBER>,
    written_docs_total: <NUMBER>
  },
  storage_engine: {
      cache: {
        in_use_bytes: <NUMBER>              // Should be straight forward to add
      },
      disk: {
        read_bytes_per_sec: <NUMBER>,       // currently serializer_block_reads, must be modified to report bytes
        read_bytes_total: <NUMBER>,
        written_bytes_per_sec: <NUMBER>,    // currently serializer_block_writes, must be modified to report bytes
        written_bytes_total: <NUMBER>,
        space_usage: {
          metadata_bytes: <NUMBER>,         // currently serializer_lba_extents, multiply by extent size (ideally also count metablock)
          data_bytes: <NUMBER>,             // approximate as follows: take serializer_data_extents, multiply by extent size, subtract serializer_old_garbage_block_bytes
          garbage_bytes: <NUMBER>,          // currently serializer_old_garbage_block_bytes
          preallocated_bytes: <NUMBER>      // should be: actual file size minus the three other fields  
        }
      }
   }
}

for timed out servers

{
  id: ["server", <UUID>]
  server_id: <UUID>,
  server_name: <STRING>,
  error: "Timed out. Unable to retrieve stats."
}

@coffeemug does this look ok to you?

(Edit: Updated the one per server document structure)

@coffeemug
Copy link
Contributor

@coffeemug does this look ok to you?

Yes. One option to avoid adding a per-table read/write metric while allowing people not to count dups is to expose info on whether a given table/server stats entry is for a primary replica (e.g. { primary_replica: true}). I like this more since it allows people to slice the data any way they want without us duplicating it in various places. But if people object, I don't feel too strongly about it (I just think it's more elegant).

@timmaxw
Copy link
Member Author

timmaxw commented Oct 29, 2014

I have mixed feelings about that. On the one hand, it reduces duplication; the table was getting rather cluttered. On the other hand, there's no longer a natural way to get read/write stats for a given table; it's such a simple stat that it feels like people shouldn't have to do a complex query to get it. (For reads, they need to sum the stats for all servers; for writes, they have to take the stat on the primary, which should be the same as all the other servers.)

@coffeemug
Copy link
Contributor

Ok, I don't feel too strongly about it and don't mind special casing it. (I think we should include a primary_replica field anyway, to make it easy for people to count custom things without double-counting)

@danielmewes
Copy link
Member

We had considered something like that. Note however that in the current proposal, the finest granularity of information is table/server. For having a primary_replica field, we would need to have one entry per shard. I don't think we should do that.

@coffeemug
Copy link
Contributor

I see. Ok, kindly disregard my suggestion. They can still do it by grouping all machines for a given table and collecting stats from a random one if they wanted to.

@wojons
Copy link
Contributor

wojons commented Oct 30, 2014

I still need to go over the formats but here are some replies that i have so far.

@Tryneus: We can add these back in, I omitted them to avoid giving too much information to users. If we do give these kinds of stats to users, they would probably need to be per-server to avoid the discontinuities mentioned above.

I think that this should stay on a per table issue its nice to know that this server is doing lots of so and so but when you have hurdeds of tables this can be come a problem.

@danielmewes

Having disk writes/s reads/s is not a useful metric in general.
Instead I think we should have two things:

bytes written/s and bytes read/s
index writes/s
That would be a per-table property.

I thnk speed of reads and writes is one thing but that is more of an ssd thing. On ssd lots of small reads and writes are fine but you can kill a spinning disk and it will say 1MB/s I think spead and number of read writes or blocks should be shown

From a technical point of view the primary key should be the pair [server_id, table_id], where table_id can be null for the server entries. I think this might also work reasonably well in practice. What do you think?

I think this is great if its in a multi index

I attached some of the photos so you can see how some of the stats that i get are used i have a few more that i am not graphing at the moment but should get the idea around.

elastictrace-rethinkdb-serverview

elastictrace-rethinkdb-table-view2

@danielmewes
Copy link
Member

Thank you for your feedback @wojons.
Our current plan is to leave all the current perfmons in more or less unchanged, but have them only accessible through a special debug table.

You will also be able to get the current GC activity from the jobs table in the future (#3115, as you know). There won't be a running total, but you could still generate plots from the instantaneous GC "load" value.

I agree that something that's roughly equivalent to "number of disk seeks" would be very useful on rotational drives. However we don't currently have anything implemented to measure this. The writes/s that we currently have in ajax/stat isn't quite the same as that. index_writes is probably the closest we have right now, but even there the relation to disk seeks is not always obvious. That being said, you can also still access that from the debug table until we come up with a better perfmon for this sort of thing.

@wojons
Copy link
Contributor

wojons commented Oct 30, 2014

@danielmewes

I think this a pretty imporant branch so not rushing to get it out will be a good idea make sure that there is enough base code that making addations is stirght foward.

@danielmewes
Copy link
Member

We should also observe #2890 (comment) in here, and not expose db, table and server UUIDs unless the opt arg is set accordingly.

@danielmewes
Copy link
Member

To clarify: I think the primary key should always use UUIDs. Just fields such as server_id, table_id and db_id should be hidden depending on the opt arg.

@coffeemug
Copy link
Contributor

I think the fields should have consistent name -- server, table and db, but the value should change depending on the identifier_format optarg. I don't think we should say server_id in one case, and server_name in the other.

@danielmewes
Copy link
Member

Ok, that sounds good @coffeemug .

@Tryneus
Copy link
Member

Tryneus commented Nov 15, 2014

Working on this now.

@Tryneus Tryneus self-assigned this Nov 15, 2014
@Tryneus
Copy link
Member

Tryneus commented Nov 27, 2014

This is up in review 2356.

@danielmewes
Copy link
Member

👏

@Tryneus
Copy link
Member

Tryneus commented Dec 3, 2014

This has been approved and merged into reql_admin as of 02eb33b.

@Tryneus Tryneus closed this as completed Dec 3, 2014
@deontologician
Copy link
Contributor

The graphs in the webui are now using the real stats tables on reql_admin after CR 2376

@coffeemug
Copy link
Contributor

👍 👏

This is amazing.

@danielmewes danielmewes modified the milestones: reql-admin, 1.16 Jan 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants