Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Barnum is a small application that can be used to create pseudo-realistic data suitable for unit or performance testing.
Python
Branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
source-data
LICENSE
README
__init__.py
convert_data.py
gen_data.py
gen_vcard.py
gencc.py
genpw.py
source-data.pkl

README

What is Barnum?
===============

Barnum is a python-based application for quickly and easily creating 
pseudo-random data typically used for application testing.

Why did you create Barnum?
==========================

I am developing a shopping cart application in Django and realized that I 
needed a bunch of data to simulate the store's behavior under somewhat normal 
production usage.  

I got tired of always trying to think of names and addresses for customers and 
so decided to automate the process a little bit.  Such was born Barnum.

Why is Barnum unique?
=====================

I was able to find some online systems for generating large amounts of test 
data.  I could not find any application that had the breadth of data generation 
capabilities nor the ability to easily interface with Django in the way I 
wanted to.

One of the most unique aspects of Barnum is that the data is what I'll call
"plausible."  For example, here's an example "identity" randomly generated
from Barnum -
    Sid Seymour
    10 Kimbrough Grove Drive
    Arthur ND, 58006
    (701)642-6471

    Who works at:
    Network Hardware Co as a Personnel Clerk Senior

You should notice a couple of things about this data.
 - There's a realistic first and last name
 - The street names are also plausible
 - Arthur, ND is a real city and the zip code is 58006
 - 701 is an area code used for North Dakota 
 - The fictional company is somewhat reasonable.
 - The job position also makes sense.

Why not just use Random to create strings of letters?
=====================================================

Well, I find that when testing applications, if it's just a random string
of numbers of letters, it gets hard to tell if something is out of place
or "looks wrong."  If you'd like to just generate totally random information,
then you probably don't need Barnum!

What type of information does Barnum generate?
==============================================

Here's a list of types of dummy data Barnum can create:
 - First name and/or last name in either gender
 - Job title
 - Phone number
 - Street number and name
 - Zip code plus city & state
 - Company name
 - Credit card number and type (with valid checksum)
 - Dates
 - Email addresses
 - Sample password
 - Words (latin)
 - Sentences and/or paragraphs of random latin words

How do I use it?
================

The gen_data.py script is the primary showcase for how to create random data
using Barnum.  If you run it from the command line:

 python gen_data.py
 
You'll see some sample data output.

If you'd like to call it from another script, here's an example or two from the
interpreter:

Python 2.4.2 (#1, Feb  9 2006, 05:29:30)
[GCC 3.4.4 (Gentoo 3.4.4-r1, ssp-3.4.4-1.0, pie-8.7.8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import barnum.gen_data as gen_data
>>> gen_data.create_name()
('Danilo', 'Rendon')
>>> gen_data.create_name()
('Melodie', 'Kraft')
>>> gen_data.create_name()
('Laverne', 'Hopson')
>>> gen_data.create_city_state_zip()
('36475', 'Repton', 'AL')
>>> gen_data.create_city_state_zip()
('01090', 'West Springfield', 'MA')
>>> gen_data.create_phone()
'(907)339-3308'
>>> gen_data.create_phone('38138')
'(901)606-5635'
>>> gen_data.create_sentence()
'Delenitaugue iriure zzril euismod dolore vulputate iriuredolor iriure eu.'
>>> gen_data.create_sentence()
'Consequatvel in blandit praesent veniam in ex illum vulputate feugait molestie.'
>>> gen_data.cc_number()
('visa', ['4532837148746906'])
>>> gen_data.cc_number()
('mastercard', ['5417967544412568'])


You can see that it should be trivial to incorporate this data into any python script.
The possibilities of creating CSV's, raw SQL, Python Objects, etc are practically
endless!

Where does the data come from?
==============================

I pulled sample data and existing scripts from a bunch of different sources.  
- The names are from 1990 US Census data http://www.census.gov/genealogy/names/names_files.html
- The street names are from real us streets in a few locales.
- Company names are randomly generated by me.
- Job Titles were taken from another census site that I can't seem to find now.
- Zip Codes from http://www.cfdynamics.com/cfdynamics/zipbase/index.cfm
- Random latin text came from http://www.4guysfromrolla.com/webtech/052800-1.shtml
- Credit Card generator is from Graham King - http://www.darkcoding.net/index.php/credit-card-numbers/
- Password generator is from Pradeep Kishore Gowda via the Python Cookbook

How can I add more data?
========================

If all you'd like to do is add some more seed data to an existing source, edit the appropriate
file in the source-data directory and execute the convert_data.py script to create a new
pickle file.

How can I contribute?
=====================

Just ask. I can't forsee this script needing it's own mailing list so right now, use the ticket
system on google code to submit a ticket with your suggestion/patch.

Why is this so US focused?
==========================

I needed info for the US only.  I had access to this data and knew what I wanted.  If you
would like to add other countries or info, feel free to contribute!


Can this be used for evil?
==========================

Ummm.  Probably not.  All of the data is random.  The credit card numbers conform to the
Luhn 10 checksum formula but are not necessarily valid numbers.  Even if they were, you would
need to know the real name, address and phone number before you could do anything illegal
with the data.  I think we're all pretty safe.

Where did this name come from?
==============================

Choosing names for projects is kind of fun but kind of a hassle.  There needs to be a name
but it can't be anything too stupid.  I started off thinking of an acronym and ended up with
PT ("Python Testing") and immediately thought of P.T. Barnum.  I really liked the name 
because I was using this for Satchmo and project made in Django.  Single word names seemed
cool.  Also, I like the fact that P.T. Barnum was really a master at making people think
something was real that wasn't.  Which is exactly what this little script does.


Why is it licensed under the GPL?
=================================

I use a couple of other python scripts that were licensed under the GPL.  So, I figured it
was best to just release under the GPL.  If you would like another license arrangement,
let me know and I'll see if there's something we can do.
Something went wrong with that request. Please try again.