Skip to content

ShadowTraffic/shadowtraffic-examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ShadowTraffic examples

This repo contains runnable examples of how to use ShadowTraffic for common use cases.

Run each of these with:

docker run --env-file license.env -v $(pwd)/<configuration file>:/home/config.json shadowtraffic/shadowtraffic:latest --config /home/config.json --sample 10 --stdout --watch

Hello world with Kafka

Demo Demo

Discussion

This example writes events to a Kafka topic named testTopic, using JSON serialization for both the key and value. This generator doesn't specify a key, so the key of the record is always null. The value is a string, which is one of three random emojis.


Hello world with Postgres

Demo Demo

Discussion

This example writes events to a Postgres table named testTable, which has one column named testColumn. If the table doesn't exist when you start ShadowTraffic, it'll automatically create it for you. testColumn is a string, and it's value will be one of three random emojis.


Hello world with S3

Demo Demo

Discussion

This example writes events to an S3 bucket named testBucket. Each object in the bucket will have the file suffix .jsonl, and each event will be one line of JSON. Events are strings, picked by the oneOf generator that chooses a random emoji.


Hello world with a webhook

Demo Demo

Discussion

This example writes events to the HTTP endpoint https://my-site/webhook-endpoint. Not much commentary needed!


The kitchen sink: Kafka retail data

Demo Demo

Discussion

This example writes events to two Kafka topics: customers and orders. Events in the customers topic have a map value containing a few simple attributes: customerId, name, and so on.

Events in the orders topic have two interesting generators. First, customerId is generated by a lookup to the customers topic. ShadowTraffic guarantees that any customerIds for orders will have already been successfully written to the customers topic.

Second, orderNumber is defined by the sequentialInteger generator. This generator is stateful, so each time it generates a value, its internal counter increases by one. You don't need to do any state management. This happens for you automatically.


The kitchen sink: Postgres retail data

Demo Demo

Discussion

This example writes events to two Postgres tables: customers and orders. Events in the customers table have a few simple columns: customerId, name, and so on.

Events in the orders table have two interesting generators. First, customerId is generated by a lookup to the customers table. ShadowTraffic guarantees that any customerIds for orders will have already been successfully written to the customers table.

Second, orderNumber is defined by the sequentialInteger generator. This generator is stateful, so each time it generates a value, it's internal counter increases by one. You don't need to do any state management. This happens for you automatically.


The kitchen sink: S3 retail data

Demo Demo

Discussion

This example writes events to two S3 buckets: customers and orders. Events in objects of the customers bucket have a few simple attributes: customerId, name, and so on.

Events in objects of the orders bucket have two interesting generators. First, customerId is generated by a lookup to the customers bucket. ShadowTraffic guarantees that any customerIds for orders will have already been successfully written to the customers bucket.

Second, orderNumber is defined by the sequentialInteger generator. This generator is stateful, so each time it generates a value, it's internal counter increases by one. You don't need to do any state management. This happens for you automatically.


Customers have a name, age, and membership level

Demo Demo

Discussion

This example generates events to a Postgres table named customers. The table has three columns: name, age, and membership.

name is a random full name, like John Smith.

Age is a random number between 18 and 120. By default, uniformDistribution generates decimal numbers. With decimals set to 0, the generator instead produces integers.

membership is a random choice of three different levels.


57% of votes are cast for Franklin Roosevelt

Demo Demo

Discussion

This example writes to a Kafka topic, with each event having a UUID key and a map value. weightedOneOf sets 57% of all candidate strings in the value to Fraklin Roosevelt and 43% to Herbert Hoover.


Transactions are uniformly priced between $2 and $200

Demo Demo

Discussion

This example writes to a Kafka topic named transactions, with each event having a UUID key and a map value. The value maps contain two attributes: price and timestamp. price is a random number between 2 and 200, and is guaranteed to have two decimal places. time is the current UNIX time in milliseconds.


Orders have a pre-existing customer

Demo Demo

Discussion

This example writes to two Kafka topics, customers and orders. Events in customers have only a key and no valueβ€”thus, value will be null. Events in orders have a customerId which is guaranteed to have been successfully written to the customers topic first. The lookup generator targets those events and uses the path parameter to drill into the key's name attribute.


Support ticket messages arrive every 5000ms

Demo Demo

Discussion

This example generates events to both Kafka and Postgres.

In the first generator, straightforward events are generated to a Postgres table called customers. Note that because there are multiple connections, each generator has to explicitly define what connection it sends data to.

In the second generator, events are generated to a Kafka topic called purchases. In the value of each event, customerId is defined as a lookup to values successfully written to the customers table. Notice how ShadowTraffic can support lookups across connection types.

In the last generator, events are generated to a Kafka topic called supportTickets. This works much the same as the previous generator, but it also has a local configuration of throttleMs set. throttleMs allows this generator to only produce an event at most every 5000 milliseconds.


Publish 80% of the tweets from 20% of the users

Demo Demo

Discussion

This example generates data to two Kafka topics, users and tweets. In the tweets generator, userId is defined as a lookup to get user IDs that have been written to the users topic. But instead of randomly choosing previously generated user IDs, a histogram is used so that 20% of the user IDs will be chosen 80% of the time, and the remaining 80% of user IDs chosen 20% of the time.

This distribution holds true event as more users are generated and the pool of user IDs becomes larger.


Send messages every 500 ms with a std dev of 40 ms

Demo Demo

Discussion

This example generates events to a Kafka topic with on an uneven cadence. A variable is set named delay, which is about 500 milliseconds with a standard deviation of 40 milliseconds. Each time an event is generator, delay is evaluated to a new number. delay is then referenced to impose a throttle, and the actual value of delay is passed into outgoing row.


Place exactly 15 orders

Demo Demo

Discussion

By default, ShadowTraffic generates events indefinitely. But in this example, setting maxEvents to a number bounds how many events are generated. When that limited is reached, ShadowTraffic stops.


Pick a date/timestamp between yesterday and tomorrow

Demo Demo

Discussion

In this example, we generate random timestamps between 24 hours into the past and 24 hours into the futureβ€”regardless of the current time. We do that by first setting a variable called now that captures the current wallclock time in milliseconds. Then, we create a uniformDistribution, who's lower value is now minus 24 hours and upper value is now plus 24 hours. We feed the result into formatDateTime to get a nice formatted string.


5 sensors whose value is the previous value plus a random number between -1 and 1

Demo Demo

Discussion

This example models different sensors whose previous reading influences its next reading. There are a few parts to this one.

First, each event has a timestamp, set in the row, that will be part of every generated event.

Second, a stateMachine is used to model each sensor's readings over time. In the start state, each sensor's initial value is set to roughly 50. In the update state, which it indefinitely stays in, the reading is set by adding the previous value with a random value between -1 and 1.

Lastly, to model many sensors, and not just one, fork is used to spawn 5 simultaneous generators, each of which is spun up 250 milliseconds after the previous to make their updates stagger. A special variable called forkKey becomes available so that each generator can know what fork it representsβ€”in this case, that means which sensor it is.


Telemetry data gets randomly delayed 10% of the time, discarded 2% of the time, and repeated 5% of the time

Demo Demo

Discussion

This example generates data to a Kafka topic, but uses a few parameters to warp the data:

  1. 10% of the data is delayed from being sent out by 200 - 800 milliseconds.
  2. 2% of the data is completely discarded and never gets sent to the topic.
  3. 5% of the data is repeated twice.

All of these parameters compose, so it's conceivable that an event is repeated, with one repetition getting delayed and another getting dropped.


A stream of the h2o dataset configured for n=10M, k=10

Demo Demo

Discussion

This example generates streaming data to match the popular h2o data set. Nothing especially exotic about this one.


An inventory of films are tracked in 100 stores, like the Sakila dataset

Demo Demo

Discussion

This example generates data to mimic a subset of the Sakila data set.

In the first generator, 100 forks are created to mimic 100 different stores who sell movies. The second generator, inventory does a lookup into each store ID and periodically modifies its available movies.


A new user comes online every 250ms and changes their IP every 1 second

Demo Demo

Discussion

This example simulates events being written to a Postgres table. By contrast to other examples, this generator not only inserts rows, but also updates and deletes rows.

This generator uses a state machine to track whether a row has been initial written, and whether it's eligible to be updated or deleted. By setting the op key on the generator, you can change whether the event should be treated as an insert, update, or delete.

This generator uses varsOnce to lock each users email so that it will never change.

stagger is used in the fork section so that each new user will apparently come online every 250 milliseconds after the previous.


50 machines DDOSing EC2 instances in us-east-1 with ~200 byte packets every 10 ms

Demo Demo

Discussion

This example simulates 50 machines sending packets to a Kafka topic.

By setting maxForks to 50, this generator will be spawn parallel instances and stop when it reaches 50. To differentiate which machine is sending packets, sourceIP is set to the special variable forKey, which represents the currently running fork.


Suspicious accounts transacting that log in with a new IP address 1% of the time

Demo Demo

Discussion

This example simulates that a small percentage of the time, a suspicious login happens, which is defined by a login with a new IP address.

By using varsOnce, each user is assigned a one-time IP address. But composing that with weightedOne, 99% of the time that pre-determined IP address is chosen, and 1% of the time a new address is fabricated on the spot.


30 JVMs report their heap readings every 250 ms which oscillate around 50 mb

Demo Demo

Discussion

This example simulates 30 individual JVMs reporting their metrics. fork is used to create the 30 instances, who's keys are defined by strings like jvm-1, jvm-2, etc through sequentialStrings.

A state machine is used to calculate its heapSize as its previous value plus a random number between -1 and 1.


200 merchants have their businesses audited once every ~25 days

Demo Demo

Discussion

This example uses statistical distributions with large numbers to simulate infrequently occurring events. In the audits generator, auditDate is set to now minus a number of days. In the throttleMs, a similar approach is used to make sure this generator doesn't run for a number of days after.


Inventory is updated every 200ms and queries check its status every 500ms

Demo Demo

Discussion

This example uses two generators, with the second only being allowed to run every 500 milliseconds according to its throttle value. sequentialStrings and sequentialInteger are used in the first generator to create stateful sequences of data.


A stream of rides from New York's yellow taxi network

Demo Demo

Discussion

This example mimics a streaming version of the New York City taxi data set. Most of the generator is simple, but the total_amount is a derived column that requires adding a bunch of other columns together.


Shopping carts add items, check out, and sometimes get abandoned

Demo Demo

Discussion

This example generates shopping cart events, some of which get cancelled and never check out. By supplying no transition for a state in the transitions key of the state machine, CHECKED_OUT and CANCELLED are treated as terminal states.


The Nexmark streaming benchmark of auction streams

Demo Demo

Discussion

This example generates data similar to the Nexmark benchmark input data. Specifically, it's modeled after the Flink input example.


70% of all posts are from repeat users

Demo Demo

Discussion

This example does a self-lookup. When you do a self-lookup, you need to allow some percentage of the time to generate values, which is why there's a weightedOneOf generator. If there wasn't, ShadowTraffic could never generate an initial value, and there would be nothing to look up.


Harvest customer IDs from Postgres for Kafka events

Demo Demo

Discussion

This example generates events to both Kafka and Postgres. Notice how lookup can work across connections. When you have multiple connections, you must specify which connection each generator should send data to, as denoted by the connection attribute.


Customers go through a 4-stage funnel

Demo Demo

Discussion

This example uses a state machine to simulate customers moving through different states on a website. Notice how in transitions, a oneOf generator can be used to dynamically pick the next state.


Debezium envelopes have 3 discrete states

Demo Demo

Discussion

This example models Debezium change data capture envelopes. It uses a state machine to track whether a row is inserted for the first time, updated, or deleted.

Debezium objects show you how an entire object has changed, in both before and after states, so simulating it requires remembering previously generated data. This example uses two ways to do that:

  1. The state machine is configured with "merge": { "previous": true } so that the previous event will be deep merged into the current.
  2. The previousEvent generator to accesses the last values and swaps the before and after states.

3 support agents field phone calls, arriving once a second

Demo Demo

Discussion

This example creates exactly 3 support agent events, and an indefinite number of call events.

Notice how calls uses fork and a lookup for its key. In forks, key must be unique, so this has the effect that each agent can only field one call at a time. If two forks are spawned with the same key, the newer one will be stopped from launching.


Flights take off every 5 seconds and report their geolocation

Demo Demo

Discussion

This example simulates random flight paths between different geolocations. waypoints computes the steps between the coordinates, and repeated calls serve them up one at a time.


Every ~2 seconds, a new game is scheduled to start with bets placed every ~500ms

Demo Demo

Discussion

This example creates a fork per game, where each game moves through different states of completion. Each game is scheduled about 2000 milliseconds after the previous, with some jitter created by the normalDistribution generator.


Bots post social content that get likes and shares only 5% of the time each

Demo Demo

Discussion

This example uses a state machine to weight the activity of a social media post. Most posts don't get much traffic; some a lot of traffic; others a lot of spam. This example uses merge previous to automatically merge in previous activity.


Latency is about 10 milliseconds, with bursts to 50 and 150 every 2 and 5 minutes

Demo Demo

Discussion

This example uses the intervals function to change how the generator behaves over time. When the wallclock time reaches the 5th minute, the latency shifts from 10 to 150. When it reaches the 2nd minute, it shifts to 50. Notice how some wallclock values, like 10:20 AM, are divisible by both 5 and 2. intervals is evaluated in priority order, so in this case the burst to 150 takes priority because it's listed first.


A server transitions between healthcheck statuses every 15 seconds

Demo Demo

Discussion

This example uses a state machine to weight the probability of a server transitioning from ok, to warn, to bad. The next status mostly resembles the previous one.


A client emits OTEL telemetry data

Demo Demo

Discussion

A reference implementation to generate minimal Open Telemetry events.


The Northwind data set

Demo Demo

Discussion

A reference configuration to generate an indefinite amount of Northwind data to MotherDuck.

About

πŸš€ Example configuration files to help you get started.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages