HTTPS clone URL
Subversion checkout URL
A tool to generate dummy data (name, address, phone number, email, social graph connections, etc.) for populating systems for testing
latest commit dddbbde216
Phillip Rhodes authored
Dummy Data Generator BETA Dummy Data Generator is just that... a generator that will spit out gobs of "made up" data which can be used to populate systems for test and development purposes. The kind of data this program deals with is "people stuff." That is, names, addresses, email address, phone numbers, "connections" (aka social graph), etc. Names are generated by using US Census data on the most common last names, female first names, and male first names. Each list is read into a big list, and a last name is chosen randomly, then a random decision on "male or female" is made, and a random name is chosen from the corresponding first name list. Address data is composed as follows: There is a data file full of nouns, and a data file of street suffixes ("lane," "road," "street," etc.) Each file is read into a list, and a random element is selected from each list to generate something like "Mother-in-law Park" for the street name. The street number is just a random number. The city, state, zip and country are based on two data files: a list of zip/postal codes, and their corresponding city and state (or province), one file each for the USA and for Canada. Each file is read into a list and a random selection is made "USA or Canada" then a record is randomly selected from the corresponding list. Email addresses are just a random string for the user portion, and always use "example.com" for the domain. This is because example.com is a dedicated domain which is reserved for testing and is guaranteed to result in an undeliverable address. So you don't have to worry about accidentally spamming real people using this data. Phone numbers are just random numbers formatted to look like phone numbers in XXX-XXX-XXXX format. But note that the only rule in effect is that area codes are 3 digits and prefixes are 3 digits. Some of the generated area codes are probably non-existent in real life. Again, this is DUMMY data... but if you have validation rule for input, this could trip you up. A future version might use a list of real area codes or something. Currently the only output format is CSV, and you can't currently specify the output file name. It's hard-coded to dummydata_out.csv. Future versions will add additional output formats; probably including various forms of delimited text, such as tab delimited, pipe delimited, etc., as well as some sort of XML format, FOAF (encoded as RDF/XML, N3 or Turtle or something) and/or JSON. Ouputting social-graph "connections" is now supported. If you specify -graph <num> on the command line, the program will generate a file named social_graph.out, which contains a list of randomly generated social graph connections in adjacency matrix form, serialized into a file, using the "id" parameter of the person records. The -prompt command line argument will eventually invoke an interactive interface, but it's not completely implemented yet, and using -prompt will result in an exception being thrown and the program exiting. The run.bat is not finished. I don't remember the DOS batch language stuff well enough to write it from memory at the moment, and since I develop on Linux, I haven't made it a high priority. Feel free to submit a patch.