Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
A tool to generate dummy data (name, address, phone number, email, social graph connections, etc.) for populating systems for testing
Java
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

README

Dummy Data Generator BETA

Dummy Data Generator is just that... a generator that will spit out gobs of "made up" data
which can be used to populate systems for test and development purposes.  The kind of data
this program deals with is "people stuff." That is, names, addresses, email address, phone
numbers, "connections" (aka social graph), etc.

Names are generated by using US Census data on the most common last names, female first names,
and male first names.  Each list is read into a big list, and a last name is chosen randomly,
then a random decision on "male or female" is made, and a random name is chosen from the
corresponding first name list.

Address data is composed as follows:  There is a data file full of nouns, and a data file of
street suffixes ("lane," "road," "street," etc.)  Each file is read into a list, and a
random element is selected from each list to generate something like "Mother-in-law Park" for
the street name.  The street number is just a random number.  The city, state, zip and country
are based on two data files: a list of zip/postal codes, and their corresponding city and 
state (or province), one file each for the USA and for Canada.  Each file is read into a list
and a random selection is made "USA or Canada"  then a record is randomly selected from the
corresponding list.  

Email addresses are just a random string for the user portion, and always use "example.com"
for the domain.  This is because example.com is a dedicated domain which is reserved for
testing and is guaranteed to result in an undeliverable address.  So you don't have to
worry about accidentally spamming real people using this data.

Phone numbers are just random numbers formatted to look like phone numbers
in XXX-XXX-XXXX format.  But note that the only rule in effect is that area codes are 3
digits and prefixes are 3 digits.  Some of the generated area codes are probably non-existent
in real life.  Again, this is DUMMY data... but if you have validation rule for input, this
could trip you up.  A future version might use a list of real area codes or something.  

Currently the only output format is CSV, and you can't currently specify the output file 
name.  It's hard-coded to dummydata_out.csv.  Future versions will add additional output 
formats; probably including various forms of  delimited text, such as tab delimited, pipe 
delimited, etc., as well as some sort of XML format, FOAF (encoded as RDF/XML, N3 or Turtle 
or something) and/or JSON.

Ouputting social-graph "connections" is now supported. If you specify -graph <num> on the
command line, the program will generate a file named social_graph.out, which contains a list
of randomly generated  social graph connections in adjacency matrix form, serialized into a file, 
using the "id" parameter of the person records.

The -prompt command line argument will eventually invoke an interactive interface, but
it's not completely implemented yet, and using -prompt will result in an exception being thrown
and the program exiting.

The run.bat is not finished.  I don't remember the DOS batch language stuff well enough to 
write it from memory at the moment, and since I develop on Linux, I haven't made it a high
priority.  Feel free to submit a patch.

Something went wrong with that request. Please try again.