Data Generator

This is a collection of tools to generate random sample data in a given directory and simulate file operations on the generated data.

The suite of tools can be deployed on a Kubernetes cluster using the provided Ansible Playbook.

Generating sample data

file-generator.py is a Python program that generates sample data in any given directory. It uses following command line arguments:

Argument	Type	Required	Description
--size	string	Yes	Total size of random data to create in supported units: b,Ki,Mi,Gi,Ti
--max-files	int	Yes	Maximum number of files to create
--min-files	int	No	Minimum number of files to create (Defaults to 1)
--dest-dir	string	Yes	Destination directory for generated data
--help, -h	-	No	Print help

When --min-files and --max-files are provided, the program spreads the total size --size over number of files in the range [min-files, max-files].

Simulating operations on sample data

file-operations.py is a Python program that performs different file operations on files in the given directory. The operations are performed randomly.

The supported file operations are:

Creating a new file
Appending data to an existing file
Deleting an existing file
Removing bytes from an existing file
Changing permissions on an existing file

It uses following command line arguments:

Argument	Type	Required	Description
--buffer	string	Yes	Extra wiggle room for file operations in supported units: b,Ki,Mi,Gi,Ti
--dest-dir	string	Yes	Destination directory for generated data
--help, -h	-	No	Print help

In some of the file operations, the program may create additional data in existing files. --buffer option allows setting an upper limit on the additional data created by the program.

Sometimes, you might want to pause the file operations. You can do that by setting PAUSE_OPERATIONS environment variable to True. The operations will resume when it is set to False.

The file operations has a scanner thread running in the background which periodically updates the list of the files. You can set a custom time interval for scanner using SCANNER_INTERVAL environment variable. By default, it is set to 120 in seconds. If the destination directory contains a huge number of files, consider setting this to a higher value. For Kubernetes deployment, both of the above environment variables are passed through configmap settings:

kind: ConfigMap
apiVersion: v1
metadata:
  name: settings
data:
  OPERATOR_PAUSE: False
  SCANNER_INTERVAL: 600

Deploy on Kubernetes

To deploy the above workloads on a Kubernetes cluster, simply run the Ansible Playbook:

ansible-playbook playbook.yml

The above playbook will create a deployment which launches a Pod with 2 containers, one of them runs file-generator.py to create random data in a Persistent Volume, while the other one runs file-operations.py to perform random operations on the generated data.

The playbook uses defaults.yml for configuration. Here are the available options to configure the playbook:

Variable	Description
file_size	Sets --size option
max_files	Sets --max-files option
min_files	Sets --min-files option
pvc_size	Size of volume (needs to be greater than or equal to file_size option)
buffer	Sets --buffer option
namespace	Namespace for workload
deployment_name	Name of the workload deployment
image	Workload docker image (See this section to build your own image)
destroy	Deletes the workload when set to `true`

Build your own workload image

To build your own image, simply run:

docker build -t <your_image> -f Dockerfile .

To push, run:

docker push <your_image>

Use image variable to use your own image in Ansible Playbook for the workload.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
default.yml		default.yml
entrypoint.sh		entrypoint.sh
file-generator.py		file-generator.py
file-operations.py		file-operations.py
manifest.yml.j2		manifest.yml.j2
ns.yml.j2		ns.yml.j2
playbook.yml		playbook.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Generator

Generating sample data

Simulating operations on sample data

Deploy on Kubernetes

Build your own workload image

About

Releases

Packages

Languages

pranavgaikwad/data-generator

Folders and files

Latest commit

History

Repository files navigation

Data Generator

Generating sample data

Simulating operations on sample data

Deploy on Kubernetes

Build your own workload image

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages