-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optional randomization of shard ids for AWS Kinesis load balancing. #2157
Optional randomization of shard ids for AWS Kinesis load balancing. #2157
Conversation
Can one of the osquery admins reply with "ok to test" to kick off a build... |
@@ -64,6 +82,9 @@ Status KinesisLogForwarder::send(std::vector<std::string>& log_data, | |||
LOG(ERROR) << "Kinesis log too big, discarding!"; | |||
} | |||
Aws::Kinesis::Model::PutRecordsRequestEntry entry; | |||
if (FLAGS_aws_kinesis_random_shardid) { | |||
shard_id_ = random_string(32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use something like:
boost::uuids::uuid uuid = boost::uuids::random_generator()();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha i was going to use uuid but figured it seemed hacky. I can switch to uuid.
/cc @zwass |
It looks like this would be used to get more even load balancing when it is not important that logs from a given host go to the same shard each time they are sent (besides for after re-sharding). Perhaps you can document somewhere (in the wiki docs?) when users would benefit from turning this on. |
This also reminds me that it was an oversight to name that variable |
@zwass ya I can change the var names around. Probably wont get around to it until monday. |
random uuid generator. I have also changed name refs to shard_id to be partition_key.
@zwass @theopolis changes made based on your comments.
Thanks for the suggestions guys! Also, I may have to fix some merge conflicts with my other PR based on these changes. I can handle that after this one goes though. |
Configuration flags for kinesis logger: | ||
|
||
```` | ||
--aws_kinesis_random_partition_key VALUE Use random partition keys when sending data to kinesis. Using random values will load balance over stream shards if you are using multiple shards in a stream. Default is "false". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this new configuration item into the above section Configuration
and use the same format as the existing 4 configuration flags:
--aws_kinesis_random_partition_key Enable random kinesis partition keys
There's some right space alignment going on above too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add the other notes about the partition key here, but don't encapsulate them in back-ticks. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added a snipit about it in the kinesis description. I didnt put a reference about it with the other flags since they are all shared flags which this one is not.
Looks good, what do you think @zwass? |
ok to test |
Build origin/pr/2157/merge (Job: 2762) is good! 👍 |
I'm good with the changes. Perhaps we can add something to the docs along the lines of "Note that using this setting will result in the logs of each host distributed across shards, so do not use it if you need logs from each host to be processed by a consistent shard." |
@zwass added... thx again guys! |
Build origin/pr/2157/merge (Job: 2764) is good! 👍 |
* Building osquery unit tests on Windows 10 (osquery#2100) Integrated process abstraction code into more locations Defined new macros for abstracting across various platforms Added GLOG_NO_ABBREVIATED_SEVERITIES for glog to support Windows Fixed some minor CMake issues involving thrift Updated gflags package; reflecting change in provision script Preparing CMake config files for WIN32 support * Use POST for distributed queries within the node API (osquery#2103) * fixed issue with aws logger example for kinesis and firehose (osquery#2102) * Introduce table options (osquery#2101) Table options includes a change to the Registry::call API for TablePlugins. When requesting route information or the 'columns' action, a new 'op' key is included. * Fix Ubuntu 15.04 build (osquery#2105) * Use ruby/gem ABI version 1.9.1 on 12.04 (osquery#2106) * Introduce table aliases (osquery#2104) * Fix benchmark target and bump version of google-benchmark to 1.0.0 (osquery#2065) * Rename time and environment columns for process_events (osquery#2096) * Disable Google Benchmark and AWS SDK from build (osquery#2113) See: osquery#2112 and osquery#2107 * Add --ephemeral for daemons and disable shell events (osquery#2111) This changes several initialization steps: - The daemon (and shell, though not needed) have a new --ephemeral flag. - Events are now disabled in the shell by default, use --nodisable_events to re-enable. - RocksDB-based backing storage is now disabled in the shell by default. The --ephemeral flag for the daemon is disabled by default and will allow skipping configuration and database path sanity, and skipping pidfile checks. This is intended to be used when debugging or monitoring the daemon process. To make the RocksDB backing storage feature usage very clear we introduce a new flag: --disable_database. The shell sets this to true unless overridden in a flagfile or via command line arguments. * Add removeService to Dispatcher API (osquery#2116) With a removeService method, combined with the abstracted thread start in the Dispatcher API, services auto-remove when finished. This will un-break the kernel communication tests. These tests only stop when all their producer threads/services have ended. This also promotes the OS X kernel build to 10.11. * Update index.md to mention support for FreeBSD. (osquery#2118) * Add basic math extension functions (osquery#2123) * Fix os_version detection for Ubuntu 16.04 (osquery#2125) * Add basic string split and inet_aton functions (osquery#2124) The three new SQLite functions: - split: Splits a column using a set of tokens and a selected index. - regex_split: Similar to split but with a regex instead of tokens. - inet_aton: Returns the IPv4 decimal value for a string-formatted address. * Update README/wiki for Ubuntu 16.04 (osquery#2126) * Updated the get_platform.py script to be Python 3 compatible (osquery#2122) Added in future imports to make the script forwards compatible and updated print functions. * Filesystem Abstractions - Code and Unit Tests (osquery#2119) * Implemented filesystem operations abstraction code * Added filesystem operations abstraction unit tests * Modified CMake configurations to support the building of the abstraction code and unit tests * Fix Scientific Linux build process (osquery#2130) * Add process auditing and a SQL intro doc (osquery#2129) * Adding sip_config table to it-compliance pack (osquery#2131) * Use local AWS-SDK formula for OS X (osquery#2132) Thus begins our need to include local (modified) brew formulas. This commit adds a new provision library method: local_brew. Use this function within provision scripts to install packages that are not appropriate for homebrew-core. * Fixed typing in FileOps tests (osquery#2127) Some of the types in fileops tests were causing warnings to be thrown during build, due to type mismatch. I've added a few local variables to quiet these warnings. * Check for none in linked_keg (osquery#2133) * Add systemd service to Xenial (osquery#2134) * Limit SMBIOS reads to 0x000f0000-0x00100000 (osquery#2135) * Use SQLite 3.14.0 to support LIKE and EQUALS (osquery#2137) This commit bumps the third-party SQLite to the 3.14.0 pre-release (18:59). With 3.14.0 the LIKE and EQUALS constraint operators may be mixed within a query. Previously these would fail to produce a valid set. As part of the support, each virtual table should choose to bypass rowid-based deduplication using the new "WITHOUT ROWID" create table epilog. This will be appended to the schema if the table defines a PRIMARY KEY using index=True. * Update version for AWS SDK 0.12.4 (osquery#2139) * Added except for lsb_release not existing on system (osquery#2143) Added an exception case for OSError when the lsb_release command isn't found on hosts. * Use self-process for query join tests (osquery#2144) * Filesystem Abstractions - Integrations (osquery#2128) * Integrated filesystem operation abstraction code into filesystem.cpp * Modified filesystem unit tests to be more platform agnostic * Added append mode for PlatformFile * Minor bug fixes in filesystem operations * Update AWS logger code for AWS SDK 0.12.4 (osquery#2140) The AWS SDK changed how custom HTTP clients are used, and this commit brings compatibility with the new initialization style. * Ability to add default configs and postinstall scripts to deb/rpm packages (osquery#2142) * Allow table specs to use multiple row indexes (osquery#2146) * Minor cleanups to extension autoloading (osquery#2147) * Changed stream validation from list streams to describe streams (osquery#2141) * Update AWS-SDK build to 0.12.5 (osquery#2148) * Fix OS X kernel extension autoload (osquery#2151) * Moved test_utils to it's own directory out of core. Updated references (osquery#2154) * Fixed bash griefing over postifx-compatible conditional compounds (osquery#2159) * Add newlines in firehose records (osquery#2166) * Add SQL and Process Auditing to wiki index (osquery#2168) * [Fix osquery#2165] Use noexcept boost methods in PlatformFile (osquery#2167) * Copy service unit configuration to Ubuntu Xenial default location (osquery#2163) * Optional randomization of shard ids for AWS Kinesis load balancing (osquery#2157) * fix a typo error on the doc for building (osquery#2172) * [Fix osquery#2161] Remove space and quotes from launch daemon (osquery#2174) * Simplify watchdog limits configuration (osquery#2173) * Windows Daemon/Shell: Initial support for Windows tables (osquery#2182) Preparation for Windows Tables. We need a Windows process table so that the daemon will run * Install new requires packages and link to them in CMake (osquery#2183) * Adding check for nullptr before dereferencing. This fixes osquery#2185 (osquery#2187) * Add bash and python to make packages calls (osquery#2193) * Use noexcept boost::filesystem overloads (osquery#2195) * Windows Daemon/Shell: Windows Processes Table (osquery#2184) Include table changes necessary for a Windows processes table and changes to other tables needed for daemon and shell to run. The Windows processes table uses WMI as a backend to gather information. This commit does not yet build these tables. * CMake changes to build Windows tables (osquery#2194) This PR implements the CMake changes to build Windows tables, and serves as a follow-on to PR * Use ccache when available to speed up compilation. (osquery#2178) * Adds -fpermissive and fixes 'using' for anon struct (osquery#2200) * Run profile on all POSIX tables (osquery#2202) * Use vector.data() to get internal vector buffers (osquery#2204) * Fix milli/micro conversion when waiting for active plugins (osquery#2205) * [Fix osquery#2203] Restore extension respawn limits to 20s (osquery#2207) * Silence ccache and clang warnings (osquery#2209) * Remove unused variable in virtual_table (osquery#2210) * Use libuuid from e2fsprogs for codegen TARGETS (osquery#2213) * Use a noexcept lexical_cast for SQL type conversions (osquery#2212) * Windows Daemon/Shell: Make osquery code more Windows-friendly (osquery#2188) * Fix SQLite access after ASIO usage (osquery#2217) Using the boost ASIO libraries before calling SQLite open causes the "file://" protocol to be rewritten with a prepended CWD. * Add link_whole to generated TARGETS file (osquery#2219) * Bump AWS SDK to 0.12.17 (osquery#2214) * Merge posix/windows processes table into single entity (osquery#2220) * Add shutdown method to extensions (osquery#2224) This alters the osquery.thrift spec to add a ::shutdown method to the Extension class. The ExtensionManager inherits from this but includes a no-op shutdown method. When an ExtensionManager (osquery core) stops, it optionally requests all Extensions to shutdown immediately. This helps quit extensions processes faster. * Increase block period in flaky BufferedForwarder test (osquery#2222) This test was intermittently failing because it relies on the actual thread scheduling. Our discussion in issue osquery#2218 decided that it was worth keeping the test around, while trying to mitigate the flakiness. The longer sleeps in this test ran successfully hundreds of times in local testing. * Add ability to provision Arch Linux (osquery#2215) * Optional top level decorator functionality (osquery#2177) * Adding Aobo Keylogger and OSX_Keydnap to osx-attacks (osquery#2230) * Force RocksDB to sync writes for non-event domains (osquery#2228) RocksDB is the default "database" plugin. Writes are normally kept in an in-memory memtable. Writes that are not part of the event pubsub system can be forced to sync to disk. * Adding folder signature for iWorm OSX malware (osquery#2231) * update osx_attacks with Backdoor.MAC.Eleanor with fixes (osquery#2226) * Add systemLog API (osquery#2229) This includes a minor SDK refactor as it move quite a few specialized functions and facilities from core.h into system.h. There was a breaking point for needing to frequently update core includes. The new logger systemLog function allows a call site to bypass logging config and write a line to the OS logger (aka syslog). * [Fix osquery#2196] Fix osquery home directory checking (osquery#2232) * [osquery#2216] Add notice text for required table predicates (osquery#2225) * Fix process path parsing (osquery#2234) This commit fixes two issues with `path` in the linux processes table: (1) Fixes a bug in which `on_disk` is set to `NULL` instead of `0` when the binary is not on disk. (2) Fixes a bug in which a filename ending in ` (deleted)` could cause osquery to return an incorrect value for `on_disk`. See osquery#1607 * Various fixups and best practices (osquery#2237) * Cleaned up arch provisioning (osquery#2239) * Slight performance improvments (osquery#2242) * Log execution of a distributed query (osquery#2241) * Crashes table now grabs all register values (osquery#2243) * Add autocompletion of table names in osqueryi (osquery#2236) * Transition __attribute__((constructor)) to a more platform independent approach (osquery#2233) * Update third-party for Win10 building and add .patch to gitignore (osquery#2250) * Better fault-tolerant defaults for systemd service (osquery#2255) * added ssh_keys table for id_rsa files. (osquery#2245) * Moved windows provisioning script (osquery#2257) * Fix mismatched free/delete in QueryContext dtor (osquery#2259) * Swap removed and added for logs (osquery#2260) * Ubuntu + Debian build fixes (osquery#2247) (osquery#2248) * Remove process_file_events subscriber from Linux (osquery#2267) * Fix debug-kernel build and deploy dependencies (osquery#2266) * Enable DB in osqueryi when --database_path specified (osquery#2268) Prior to this change, both --disable_database=false and --database_path had to be specified together. Now, if the user specifies --database_path the database is enabled automatically. * Introduce --audit_allow_sockets for Linux socket_events (osquery#2270) * CMake configuration file changes to support Windows (osquery#2258) * fixed provision script path in make-win64-dev-env (osquery#2271) * The watcher process should apply memory limits to itself (osquery#2263) * Fix broken links to FIM wiki page (osquery#2272) * Clarify events and database flags in osqueryi docs (osquery#2269) * Add watcher column to osquery_info (osquery#2261) * Move OSQUERY_HOME into core and use as filesystem config default (osquery#2275) * Clean up verbose logging for OS X kernel extension (osquery#2276) * Changes to support building a osquery Windows service. (osquery#2278)
The AWS logging plugin sends shard_id_ as a partition key to AWS kinesis when pushing data. The current implementation uses a static string as the shard id. Kinesis uses these partition keys (i.e. shard ids) to load balance between shards in a stream. By randomizing this value, you will get better performance on larger deployments using many shards in a Kinesis stream. For more background on this, please read the following:
http://sdk.amazonaws.com/cpp/api/0.11.0-7-g6be9f8d/d5/d84/class_aws_1_1_kinesis_1_1_model_1_1_put_record_request.html#a1f230751b741869bb8a3d0618d055cb6
This feature can be enabled by setting the following config variable otherwise the functionality will remain the same: