This repo contains runnable examples of how to use ShadowTraffic for common use cases.
Run each of these with:
docker run --env-file license.env -v $(pwd)/<configuration file>:/home/config.json shadowtraffic/shadowtraffic:latest --config /home/config.json --sample 10 --stdout --watch
- Hello world with Kafka
- Hello world with Postgres
- Hello world with S3
- Hello world with a webhook
- The kitchen sink: Kafka retail data
- The kitchen sink: Postgres retail data
- The kitchen sink: S3 retail data
- Customers have a name, age, and membership level
- 57% of votes are cast for Franklin Roosevelt
- Transactions are uniformly priced between $2 and $200
- Orders have a pre-existing customer
- Support ticket messages arrive every 5000ms
- Publish 80% of the tweets from 20% of the users
- Send messages every 500 ms with a std dev of 40 ms
- Place exactly 15 orders
- Pick a date/timestamp between yesterday and tomorrow
- 5 sensors whose value is the previous value plus a random number between -1 and 1
- Telemetry data gets randomly delayed 10% of the time, discarded 2% of the time, and repeated 5% of the time
- A stream of the h2o dataset configured for n=10M, k=10
- An inventory of films are tracked in 100 stores, like the Sakila dataset
- A new user comes online every 250ms and changes their IP every 1 second
- 50 machines DDOSing EC2 instances in us-east-1 with ~200 byte packets every 10 ms
- Suspicious accounts transacting that log in with a new IP address 1% of the time
- 30 JVMs report their heap readings every 250 ms which oscillate around 50 mb
- 200 merchants have their businesses audited once every ~25 days
- Inventory is updated every 200ms and queries check its status every 500ms
- A stream of rides from New York's yellow taxi network
- Shopping carts add items, check out, and sometimes get abandoned
- The Nexmark streaming benchmark of auction streams
- 70% of all posts are from repeat users
- Harvest customer IDs from Postgres for Kafka events
- Customers go through a 4-stage funnel
- Debezium envelopes have 3 discrete states
- 3 support agents field phone calls, arriving once a second
- Flights take off every 5 seconds and report their geolocation
- Every ~2 seconds, a new game is scheduled to start with bets placed every ~500ms
- Bots post social content that get likes and shares only 5% of the time each
- Latency is about 10 milliseconds, with bursts to 50 and 150 every 2 and 5 minutes
- A server transitions between healthcheck statuses every 15 seconds
- A client emits OTEL telemetry data
- The Northwind data set
Discussion
This example writes events to a Kafka topic named testTopic, using JSON serialization for both the key and value. This generator doesn't specify a key, so the key of the record is always null. The value is a string, which is one of three random emojis.
Discussion
This example writes events to a Postgres table named testTable, which has one column named testColumn. If the table doesn't exist when you start ShadowTraffic, it'll automatically create it for you. testColumn is a string, and it's value will be one of three random emojis.
Discussion
This example writes events to an S3 bucket named testBucket. Each object in the bucket will have the file suffix .jsonl, and each event will be one line of JSON. Events are strings, picked by the oneOf generator that chooses a random emoji.
Discussion
This example writes events to the HTTP endpoint https://my-site/webhook-endpoint. Not much commentary needed!
Discussion
This example writes events to two Kafka topics: customers and orders. Events in the customers topic have a map value containing a few simple attributes: customerId, name, and so on.
Events in the orders topic have two interesting generators. First, customerId is generated by a lookup to the customers topic. ShadowTraffic guarantees that any customerIds for orders will have already been successfully written to the customers topic.
Second, orderNumber is defined by the sequentialInteger generator. This generator is stateful, so each time it generates a value, its internal counter increases by one. You don't need to do any state management. This happens for you automatically.
Discussion
This example writes events to two Postgres tables: customers and orders. Events in the customers table have a few simple columns: customerId, name, and so on.
Events in the orders table have two interesting generators. First, customerId is generated by a lookup to the customers table. ShadowTraffic guarantees that any customerIds for orders will have already been successfully written to the customers table.
Second, orderNumber is defined by the sequentialInteger generator. This generator is stateful, so each time it generates a value, it's internal counter increases by one. You don't need to do any state management. This happens for you automatically.
Discussion
This example writes events to two S3 buckets: customers and orders. Events in objects of the customers bucket have a few simple attributes: customerId, name, and so on.
Events in objects of the orders bucket have two interesting generators. First, customerId is generated by a lookup to the customers bucket. ShadowTraffic guarantees that any customerIds for orders will have already been successfully written to the customers bucket.
Second, orderNumber is defined by the sequentialInteger generator. This generator is stateful, so each time it generates a value, it's internal counter increases by one. You don't need to do any state management. This happens for you automatically.
Discussion
This example generates events to a Postgres table named customers. The table has three columns: name, age, and membership.
name is a random full name, like John Smith.
Age is a random number between 18 and 120. By default, uniformDistribution generates decimal numbers. With decimals set to 0, the generator instead produces integers.
membership is a random choice of three different levels.
Discussion
This example writes to a Kafka topic, with each event having a UUID key and a map value. weightedOneOf sets 57% of all candidate strings in the value to Fraklin Roosevelt and 43% to Herbert Hoover.
Discussion
This example writes to a Kafka topic named transactions, with each event having a UUID key and a map value. The value maps contain two attributes: price and timestamp. price is a random number between 2 and 200, and is guaranteed to have two decimal places. time is the current UNIX time in milliseconds.
Discussion
This example writes to two Kafka topics, customers and orders. Events in customers have only a key and no valueβthus, value will be null. Events in orders have a customerId which is guaranteed to have been successfully written to the customers topic first. The lookup generator targets those events and uses the path parameter to drill into the key's name attribute.
Discussion
This example generates events to both Kafka and Postgres.
In the first generator, straightforward events are generated to a Postgres table called customers. Note that because there are multiple connections, each generator has to explicitly define what connection it sends data to.
In the second generator, events are generated to a Kafka topic called purchases. In the value of each event, customerId is defined as a lookup to values successfully written to the customers table. Notice how ShadowTraffic can support lookups across connection types.
In the last generator, events are generated to a Kafka topic called supportTickets. This works much the same as the previous generator, but it also has a local configuration of throttleMs set. throttleMs allows this generator to only produce an event at most every 5000 milliseconds.
Discussion
This example generates data to two Kafka topics, users and tweets. In the tweets generator, userId is defined as a lookup to get user IDs that have been written to the users topic. But instead of randomly choosing previously generated user IDs, a histogram is used so that 20% of the user IDs will be chosen 80% of the time, and the remaining 80% of user IDs chosen 20% of the time.
This distribution holds true event as more users are generated and the pool of user IDs becomes larger.
Discussion
This example generates events to a Kafka topic with on an uneven cadence. A variable is set named delay, which is about 500 milliseconds with a standard deviation of 40 milliseconds. Each time an event is generator, delay is evaluated to a new number. delay is then referenced to impose a throttle, and the actual value of delay is passed into outgoing row.
Discussion
By default, ShadowTraffic generates events indefinitely. But in this example, setting maxEvents to a number bounds how many events are generated. When that limited is reached, ShadowTraffic stops.
Discussion
In this example, we generate random timestamps between 24 hours into the past and 24 hours into the futureβregardless of the current time. We do that by first setting a variable called now that captures the current wallclock time in milliseconds. Then, we create a uniformDistribution, who's lower value is now minus 24 hours and upper value is now plus 24 hours. We feed the result into formatDateTime to get a nice formatted string.
Discussion
This example models different sensors whose previous reading influences its next reading. There are a few parts to this one.
First, each event has a timestamp, set in the row, that will be part of every generated event.
Second, a stateMachine is used to model each sensor's readings over time. In the start state, each sensor's initial value is set to roughly 50. In the update state, which it indefinitely stays in, the reading is set by adding the previous value with a random value between -1 and 1.
Lastly, to model many sensors, and not just one, fork is used to spawn 5 simultaneous generators, each of which is spun up 250 milliseconds after the previous to make their updates stagger. A special variable called forkKey becomes available so that each generator can know what fork it representsβin this case, that means which sensor it is.
Telemetry data gets randomly delayed 10% of the time, discarded 2% of the time, and repeated 5% of the time
Discussion
This example generates data to a Kafka topic, but uses a few parameters to warp the data:
10%of the data is delayed from being sent out by200 - 800milliseconds.2%of the data is completely discarded and never gets sent to the topic.5%of the data is repeated twice.
All of these parameters compose, so it's conceivable that an event is repeated, with one repetition getting delayed and another getting dropped.
Discussion
This example generates streaming data to match the popular h2o data set. Nothing especially exotic about this one.
Discussion
This example generates data to mimic a subset of the Sakila data set.
In the first generator, 100 forks are created to mimic 100 different stores who sell movies. The second generator, inventory does a lookup into each store ID and periodically modifies its available movies.
Discussion
This example simulates events being written to a Postgres table. By contrast to other examples, this generator not only inserts rows, but also updates and deletes rows.
This generator uses a state machine to track whether a row has been initial written, and whether it's eligible to be updated or deleted. By setting the op key on the generator, you can change whether the event should be treated as an insert, update, or delete.
This generator uses varsOnce to lock each users email so that it will never change.
stagger is used in the fork section so that each new user will apparently come online every 250 milliseconds after the previous.
Discussion
This example simulates 50 machines sending packets to a Kafka topic.
By setting maxForks to 50, this generator will be spawn parallel instances and stop when it reaches 50. To differentiate which machine is sending packets, sourceIP is set to the special variable forKey, which represents the currently running fork.
Discussion
This example simulates that a small percentage of the time, a suspicious login happens, which is defined by a login with a new IP address.
By using varsOnce, each user is assigned a one-time IP address. But composing that with weightedOne, 99% of the time that pre-determined IP address is chosen, and 1% of the time a new address is fabricated on the spot.
Discussion
This example simulates 30 individual JVMs reporting their metrics. fork is used to create the 30 instances, who's keys are defined by strings like jvm-1, jvm-2, etc through sequentialStrings.
A state machine is used to calculate its heapSize as its previous value plus a random number between -1 and 1.
Discussion
This example uses statistical distributions with large numbers to simulate infrequently occurring events. In the audits generator, auditDate is set to now minus a number of days. In the throttleMs, a similar approach is used to make sure this generator doesn't run for a number of days after.
Discussion
This example uses two generators, with the second only being allowed to run every 500 milliseconds according to its throttle value. sequentialStrings and sequentialInteger are used in the first generator to create stateful sequences of data.
Discussion
This example mimics a streaming version of the New York City taxi data set. Most of the generator is simple, but the total_amount is a derived column that requires adding a bunch of other columns together.
Discussion
This example generates shopping cart events, some of which get cancelled and never check out. By supplying no transition for a state in the transitions key of the state machine, CHECKED_OUT and CANCELLED are treated as terminal states.
Discussion
This example generates data similar to the Nexmark benchmark input data. Specifically, it's modeled after the Flink input example.
Discussion
This example does a self-lookup. When you do a self-lookup, you need to allow some percentage of the time to generate values, which is why there's a weightedOneOf generator. If there wasn't, ShadowTraffic could never generate an initial value, and there would be nothing to look up.
Discussion
This example generates events to both Kafka and Postgres. Notice how lookup can work across connections. When you have multiple connections, you must specify which connection each generator should send data to, as denoted by the connection attribute.
Discussion
This example uses a state machine to simulate customers moving through different states on a website. Notice how in transitions, a oneOf generator can be used to dynamically pick the next state.
Discussion
This example models Debezium change data capture envelopes. It uses a state machine to track whether a row is inserted for the first time, updated, or deleted.
Debezium objects show you how an entire object has changed, in both before and after states, so simulating it requires remembering previously generated data. This example uses two ways to do that:
- The state machine is configured with
"merge": { "previous": true }so that the previous event will be deep merged into the current. - The
previousEventgenerator to accesses the last values and swaps thebeforeandafterstates.
Discussion
This example creates exactly 3 support agent events, and an indefinite number of call events.
Notice how calls uses fork and a lookup for its key. In forks, key must be unique, so this has the effect that each agent can only field one call at a time. If two forks are spawned with the same key, the newer one will be stopped from launching.
Discussion
This example simulates random flight paths between different geolocations. waypoints computes the steps between the coordinates, and repeated calls serve them up one at a time.
Discussion
This example creates a fork per game, where each game moves through different states of completion. Each game is scheduled about 2000 milliseconds after the previous, with some jitter created by the normalDistribution generator.
Discussion
This example uses a state machine to weight the activity of a social media post. Most posts don't get much traffic; some a lot of traffic; others a lot of spam. This example uses merge previous to automatically merge in previous activity.
Discussion
This example uses the intervals function to change how the generator behaves over time. When the wallclock time reaches the 5th minute, the latency shifts from 10 to 150. When it reaches the 2nd minute, it shifts to 50. Notice how some wallclock values, like 10:20 AM, are divisible by both 5 and 2. intervals is evaluated in priority order, so in this case the burst to 150 takes priority because it's listed first.
Discussion
This example uses a state machine to weight the probability of a server transitioning from ok, to warn, to bad. The next status mostly resembles the previous one.
Discussion
A reference implementation to generate minimal Open Telemetry events.
Discussion
A reference configuration to generate an indefinite amount of Northwind data to MotherDuck.