Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional randomization of shard ids for AWS Kinesis load balancing. #2157

Merged
merged 10 commits into from
Jun 15, 2016

Conversation

hackgnar
Copy link
Contributor

The AWS logging plugin sends shard_id_ as a partition key to AWS kinesis when pushing data. The current implementation uses a static string as the shard id. Kinesis uses these partition keys (i.e. shard ids) to load balance between shards in a stream. By randomizing this value, you will get better performance on larger deployments using many shards in a Kinesis stream. For more background on this, please read the following:

http://sdk.amazonaws.com/cpp/api/0.11.0-7-g6be9f8d/d5/d84/class_aws_1_1_kinesis_1_1_model_1_1_put_record_request.html#a1f230751b741869bb8a3d0618d055cb6

This feature can be enabled by setting the following config variable otherwise the functionality will remain the same:

"aws_kinesis_random_shardid": "true",

@osqueryer
Copy link

Can one of the osquery admins reply with "ok to test" to kick off a build...

@facebook-github-bot facebook-github-bot added the cla signed Automated label: Pull Request author has signed the osquery CLA label Jun 10, 2016
@@ -64,6 +82,9 @@ Status KinesisLogForwarder::send(std::vector<std::string>& log_data,
LOG(ERROR) << "Kinesis log too big, discarding!";
}
Aws::Kinesis::Model::PutRecordsRequestEntry entry;
if (FLAGS_aws_kinesis_random_shardid) {
shard_id_ = random_string(32);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use something like:

boost::uuids::uuid uuid = boost::uuids::random_generator()();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha i was going to use uuid but figured it seemed hacky. I can switch to uuid.

@theopolis
Copy link
Member

/cc @zwass

@zwass
Copy link
Member

zwass commented Jun 11, 2016

It looks like this would be used to get more even load balancing when it is not important that logs from a given host go to the same shard each time they are sent (besides for after re-sharding). Perhaps you can document somewhere (in the wiki docs?) when users would benefit from turning this on.

@zwass
Copy link
Member

zwass commented Jun 11, 2016

This also reminds me that it was an oversight to name that variable shard_id_, as it should probably be partition_key_. Now looks the time to fix it before we add this user facing flag and changing it becomes trickier. I would be happy to put in a PR for that, but it would probably just cause you more trouble if you had to rebase on it. Would you mind making the change? Or prefer that I put in a separate PR for it?

@hackgnar
Copy link
Contributor Author

@zwass ya I can change the var names around. Probably wont get around to it until monday.

@hackgnar
Copy link
Contributor Author

@zwass @theopolis changes made based on your comments.

  1. Random string generator was replaced by boost random uuid generator
  2. All refs to shard_id_ have been changed to partition_key_
  3. Docs have been added for the new flag.

Thanks for the suggestions guys! Also, I may have to fix some merge conflicts with my other PR based on these changes. I can handle that after this one goes though.

Configuration flags for kinesis logger:

````
--aws_kinesis_random_partition_key VALUE Use random partition keys when sending data to kinesis. Using random values will load balance over stream shards if you are using multiple shards in a stream. Default is "false".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this new configuration item into the above section Configuration and use the same format as the existing 4 configuration flags:

--aws_kinesis_random_partition_key Enable random kinesis partition keys

There's some right space alignment going on above too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add the other notes about the partition key here, but don't encapsulate them in back-ticks. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added a snipit about it in the kinesis description. I didnt put a reference about it with the other flags since they are all shared flags which this one is not.

@theopolis
Copy link
Member

Looks good, what do you think @zwass?

@theopolis
Copy link
Member

ok to test

@osqueryer
Copy link

Build origin/pr/2157/merge (Job: 2762) is good! 👍

@zwass
Copy link
Member

zwass commented Jun 15, 2016

I'm good with the changes. Perhaps we can add something to the docs along the lines of "Note that using this setting will result in the logs of each host distributed across shards, so do not use it if you need logs from each host to be processed by a consistent shard."

@hackgnar
Copy link
Contributor Author

@zwass added... thx again guys!

@osqueryer
Copy link

Build origin/pr/2157/merge (Job: 2764) is good! 👍

@theopolis theopolis merged commit b47f246 into osquery:master Jun 15, 2016
pathcl added a commit to pathcl/osquery that referenced this pull request Jul 22, 2016
* Building osquery unit tests on Windows 10 (osquery#2100)

Integrated process abstraction code into more locations
Defined new macros for abstracting across various platforms
Added GLOG_NO_ABBREVIATED_SEVERITIES for glog to support Windows
Fixed some minor CMake issues involving thrift
Updated gflags package; reflecting change in provision script
Preparing CMake config files for WIN32 support

* Use POST for distributed queries within the node API (osquery#2103)

* fixed issue with aws logger example for kinesis and firehose (osquery#2102)

* Introduce table options (osquery#2101)

Table options includes a change to the Registry::call API for TablePlugins.
When requesting route information or the 'columns' action, a new 'op' key is included.

* Fix Ubuntu 15.04 build (osquery#2105)

* Use ruby/gem ABI version 1.9.1 on 12.04 (osquery#2106)

* Introduce table aliases (osquery#2104)

* Fix benchmark target and bump version of google-benchmark to 1.0.0 (osquery#2065)

* Rename time and environment columns for process_events (osquery#2096)

* Disable Google Benchmark and AWS SDK from build (osquery#2113)

See: osquery#2112 and osquery#2107

* Add --ephemeral for daemons and disable shell events (osquery#2111)

This changes several initialization steps:
- The daemon (and shell, though not needed) have a new --ephemeral flag.
- Events are now disabled in the shell by default, use --nodisable_events to
  re-enable.
- RocksDB-based backing storage is now disabled in the shell by default.

The --ephemeral flag for the daemon is disabled by default and will allow
skipping configuration and database path sanity, and skipping pidfile checks.
This is intended to be used when debugging or monitoring the daemon process.

To make the RocksDB backing storage feature usage very clear we introduce a new
flag: --disable_database. The shell sets this to true unless overridden in
a flagfile or via command line arguments.

* Add removeService to Dispatcher API (osquery#2116)

With a removeService method, combined with the abstracted thread start in
the Dispatcher API, services auto-remove when finished.

This will un-break the kernel communication tests. These tests only stop
when all their producer threads/services have ended.

This also promotes the OS X kernel build to 10.11.

* Update index.md to mention support for FreeBSD. (osquery#2118)

* Add basic math extension functions (osquery#2123)

* Fix os_version detection for Ubuntu 16.04 (osquery#2125)

* Add basic string split and inet_aton functions (osquery#2124)

The three new SQLite functions:
- split: Splits a column using a set of tokens and a selected index.
- regex_split: Similar to split but with a regex instead of tokens.
- inet_aton: Returns the IPv4 decimal value for a string-formatted address.

* Update README/wiki for Ubuntu 16.04 (osquery#2126)

* Updated the get_platform.py script to be Python 3 compatible (osquery#2122)

Added in future imports to make the script forwards compatible and updated print functions.

* Filesystem Abstractions - Code and Unit Tests (osquery#2119)

* Implemented filesystem operations abstraction code
* Added filesystem operations abstraction unit tests
* Modified CMake configurations to support the building of the abstraction code and unit tests

* Fix Scientific Linux build process (osquery#2130)

* Add process auditing and a SQL intro doc (osquery#2129)

* Adding sip_config table to it-compliance pack (osquery#2131)

* Use local AWS-SDK formula for OS X (osquery#2132)

Thus begins our need to include local (modified) brew formulas.
This commit adds a new provision library method: local_brew. Use this function
within provision scripts to install packages that are not appropriate for
homebrew-core.

* Fixed typing in FileOps tests (osquery#2127)

Some of the types in fileops tests were causing warnings to be thrown
during build, due to type mismatch. I've added a few local variables to
quiet these warnings.

* Check for none in linked_keg (osquery#2133)

* Add systemd service to Xenial (osquery#2134)

* Limit SMBIOS reads to 0x000f0000-0x00100000 (osquery#2135)

* Use SQLite 3.14.0 to support LIKE and EQUALS (osquery#2137)

This commit bumps the third-party SQLite to the 3.14.0 pre-release (18:59).
With 3.14.0 the LIKE and EQUALS constraint operators may be mixed within a
query. Previously these would fail to produce a valid set.

As part of the support, each virtual table should choose to bypass rowid-based
deduplication using the new "WITHOUT ROWID" create table epilog. This will
be appended to the schema if the table defines a PRIMARY KEY using index=True.

* Update version for AWS SDK 0.12.4 (osquery#2139)

* Added except for lsb_release not existing on system (osquery#2143)

Added an exception case for OSError when the lsb_release command isn't found on hosts.

* Use self-process for query join tests (osquery#2144)

* Filesystem Abstractions - Integrations (osquery#2128)

* Integrated filesystem operation abstraction code into filesystem.cpp
* Modified filesystem unit tests to be more platform agnostic
* Added append mode for PlatformFile
* Minor bug fixes in filesystem operations

* Update AWS logger code for AWS SDK 0.12.4 (osquery#2140)

The AWS SDK changed how custom HTTP clients are used, and this commit brings
compatibility with the new initialization style.

* Ability to add default configs and postinstall scripts to deb/rpm packages (osquery#2142)

* Allow table specs to use multiple row indexes (osquery#2146)

* Minor cleanups to extension autoloading (osquery#2147)

* Changed stream validation from list streams to describe streams (osquery#2141)

* Update AWS-SDK build to 0.12.5 (osquery#2148)

* Fix OS X kernel extension autoload (osquery#2151)

* Moved test_utils to it's own directory out of core. Updated references (osquery#2154)

* Fixed bash griefing over postifx-compatible conditional compounds (osquery#2159)

* Add newlines in firehose records (osquery#2166)

* Add SQL and Process Auditing to wiki index (osquery#2168)

* [Fix osquery#2165] Use noexcept boost methods in PlatformFile (osquery#2167)

* Copy service unit configuration to Ubuntu Xenial default location (osquery#2163)

* Optional randomization of shard ids for AWS Kinesis load balancing (osquery#2157)

* fix a typo error on the doc for building (osquery#2172)

* [Fix osquery#2161] Remove space and quotes from launch daemon (osquery#2174)

* Simplify watchdog limits configuration (osquery#2173)

* Windows Daemon/Shell: Initial support for Windows tables (osquery#2182)

Preparation for Windows Tables. We need a Windows process table so that the daemon will run

* Install new requires packages and link to them in CMake (osquery#2183)

* Adding check for nullptr before dereferencing.  This fixes osquery#2185 (osquery#2187)

* Add bash and python to make packages calls (osquery#2193)

* Use noexcept boost::filesystem overloads (osquery#2195)

* Windows Daemon/Shell: Windows Processes Table (osquery#2184)

Include table changes necessary for a Windows processes table and changes to other tables needed for daemon and shell to run. The Windows processes table uses WMI as a backend to gather information. This commit does not yet build these tables.

* CMake changes to build Windows tables (osquery#2194)

This PR implements the CMake changes to build Windows tables, and serves as a follow-on to PR

* Use ccache when available to speed up compilation. (osquery#2178)

* Adds -fpermissive and fixes 'using' for anon struct (osquery#2200)

* Run profile on all POSIX tables (osquery#2202)

* Use vector.data() to get internal vector buffers (osquery#2204)

* Fix milli/micro conversion when waiting for active plugins (osquery#2205)

* [Fix osquery#2203] Restore extension respawn limits to 20s (osquery#2207)

* Silence ccache and clang warnings (osquery#2209)

* Remove unused variable in virtual_table (osquery#2210)

* Use libuuid from e2fsprogs for codegen TARGETS (osquery#2213)

* Use a noexcept lexical_cast for SQL type conversions (osquery#2212)

* Windows Daemon/Shell: Make osquery code more Windows-friendly  (osquery#2188)

* Fix SQLite access after ASIO usage (osquery#2217)

Using the boost ASIO libraries before calling SQLite open causes the
"file://" protocol to be rewritten with a prepended CWD.

* Add link_whole to generated TARGETS file (osquery#2219)

* Bump AWS SDK to 0.12.17 (osquery#2214)

* Merge posix/windows processes table into single entity (osquery#2220)

* Add shutdown method to extensions (osquery#2224)

This alters the osquery.thrift spec to add a ::shutdown method to the
Extension class. The ExtensionManager inherits from this but includes a
no-op shutdown method.

When an ExtensionManager (osquery core) stops, it optionally requests all
Extensions to shutdown immediately. This helps quit extensions processes
faster.

* Increase block period in flaky BufferedForwarder test (osquery#2222)

This test was intermittently failing because it relies on the actual thread
scheduling. Our discussion in issue osquery#2218 decided that it was worth keeping the
test around, while trying to mitigate the flakiness. The longer sleeps in this
test ran successfully hundreds of times in local testing.

* Add ability to provision Arch Linux (osquery#2215)

* Optional top level decorator functionality (osquery#2177)

* Adding Aobo Keylogger and OSX_Keydnap to osx-attacks (osquery#2230)

* Force RocksDB to sync writes for non-event domains (osquery#2228)

RocksDB is the default "database" plugin. Writes are normally kept in an
in-memory memtable. Writes that are not part of the event pubsub system can
be forced to sync to disk.

* Adding folder signature for iWorm OSX malware (osquery#2231)

* update osx_attacks with Backdoor.MAC.Eleanor with fixes (osquery#2226)

* Add systemLog API (osquery#2229)

This includes a minor SDK refactor as it move quite a few specialized
functions and facilities from core.h into system.h. There was a breaking point
for needing to frequently update core includes.

The new logger systemLog function allows a call site to bypass logging config
and write a line to the OS logger (aka syslog).

* [Fix osquery#2196] Fix osquery home directory checking (osquery#2232)

* [osquery#2216] Add notice text for required table predicates (osquery#2225)

* Fix process path parsing (osquery#2234)

This commit fixes two issues with `path` in the linux processes table:

(1) Fixes a bug in which `on_disk` is set to `NULL` instead of `0` when the
binary is not on disk.

(2) Fixes a bug in which a filename ending in ` (deleted)` could cause osquery
to return an incorrect value for `on_disk`. See
osquery#1607

* Various fixups and best practices (osquery#2237)

* Cleaned up arch provisioning (osquery#2239)

* Slight performance improvments (osquery#2242)

* Log execution of a distributed query (osquery#2241)

* Crashes table now grabs all register values (osquery#2243)

* Add autocompletion of table names in osqueryi (osquery#2236)

* Transition __attribute__((constructor)) to a more platform independent approach (osquery#2233)

* Update third-party for Win10 building and add .patch to gitignore (osquery#2250)

* Better fault-tolerant defaults for systemd service (osquery#2255)

* added ssh_keys table for id_rsa files. (osquery#2245)

* Moved windows provisioning script (osquery#2257)

* Fix mismatched free/delete in QueryContext dtor (osquery#2259)

* Swap removed and added for logs (osquery#2260)

* Ubuntu + Debian build fixes (osquery#2247) (osquery#2248)

* Remove process_file_events subscriber from Linux (osquery#2267)

* Fix debug-kernel build and deploy dependencies (osquery#2266)

* Enable DB in osqueryi when --database_path specified (osquery#2268)

Prior to this change, both --disable_database=false and --database_path had to
be specified together. Now, if the user specifies --database_path the database
is enabled automatically.

* Introduce --audit_allow_sockets for Linux socket_events (osquery#2270)

* CMake configuration file changes to support Windows (osquery#2258)

* fixed provision script path in make-win64-dev-env (osquery#2271)

* The watcher process should apply memory limits to itself (osquery#2263)

* Fix broken links to FIM wiki page (osquery#2272)

* Clarify events and database flags in osqueryi docs (osquery#2269)

* Add watcher column to osquery_info (osquery#2261)

* Move OSQUERY_HOME into core and use as filesystem config default (osquery#2275)

* Clean up verbose logging for OS X kernel extension (osquery#2276)

* Changes to support building a osquery Windows service. (osquery#2278)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Automated label: Pull Request author has signed the osquery CLA logging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants