Kata03: How Big? How Fast?

Rough estimation is a useful talent to possess. As you’re coding away, you may suddenly need to work out approximately how big a data structure will be, or how fast some loop will run. The faster you can do this, the less the coding flow will be disturbed.

So this is a simple kata: a series of questions, each asking for a rough answer. Try to work each out in your head. I’ll post my answers (and how I got them) in a week or so.
How Big?

roughly how many binary digits (bit) are required for the unsigned representation of:
    1,000
    1,000,000
    1,000,000,000
    1,000,000,000,000
    8,000,000,000,000

    1000 -> 1024 - 2**10                 # right
    1000000 -> next up from int -> 2**32 # wrong 2**32 is 4,294,967,296, only need 2**20 to get 1,048,576
    1000000000 ->bigint'ish->2**48       # wrong only need 2**30 to get  1,073,741,824 
    1000000000000 -> ???                 # wrong. 2**40 = 1,099,511,627,776
    8,000,000,000,000 -> previous^3=52   # right 2**43 = 8,796,093,022,208

<h2>How big</h2>

My town has approximately 20,000 residences. How much space is required to store the names, addresses, and a phone number for all of these (if we store them as characters)?

In [1]:
#20000 * unicode((fname varchar(64) + lname varchar(64) * address[1..4]*varchar(64)) * tel(24))
unicode = 2
print("%d kilobytes" %(20000*(unicode*(6*64)+24)/1024))

15468 kilobytes


I’m storing 1,000,000 integers in a binary tree. Roughly how many nodes and levels can I expect the tree to have? Roughly how much space will it occupy on a 32-bit architecture?

In [2]:
# 20 levels to get 1048576 cells at leaf level.
# Suspect this is a factorial / kinda binary thing
nodeCount = sum([2**i for i in range(20)])
print(nodeCount)
print("%d Mb" % (nodeCount * 32/8/1024/1024)) # 32bit/bytes/kilo/megabytes

1048575
3 Mb


<h2>How Fast?</h2>

My copy of Meyer’s Object Oriented Software Construction has about 1,200 body pages. Assuming no flow control or protocol overhead, about how long would it take to send it over an async 56k baud modem line?

In [3]:
# google says 250-300 words per page, agv length 5.1
pages = 1200
words = 275
wordlen = 5.1
bitsperbyte = 8
k = 1024

data = pages*words*wordlen*bitsperbyte/k

kbps = 56
print("%.2d seconds" %(data/kbps))


234 seconds


My binary search algorithm takes about 4.5mS to search a 10,000 entry array, and about 6mS to search 100,000 elements. How long would I expect it to take to search 10,000,000 elements (assuming I have sufficient memory to prevent paging).

In [4]:
# Binary search is O(log(n))
import math
print(math.log(10000,10))  # 4 -> 4.5ms
print(math.log(100000,10)) # 5 -> 6.0ms

# 1.5mS per order of magnatude
print(9.0)



4.0
5.0
9.0


Unix passwords are stored using a one-way hash function: the original string is converted to the ‘encrypted’ password string, which cannot be converted back to the original string. One way to attack the password file is to generate all possible cleartext passwords, applying the password hash to each in turn and checking to see if the result matches the password you’re trying to crack. If the hashes match, then the string you used to generate the hash is the original password (or at least, it’s as good as the original password as far as logging in is concerned). In our particular system, passwords can be up to 16 characters long, and there are 96 possible characters at each position. If it takes 1mS to generate the password hash, is this a viable approach to attacking a password?

In [5]:
(96**16)/1000/60/60/24/365 # No! 1.65e+21 years

1.6501868488916562e+21