# Chapter 1 Solutions

### imports

In [1]:
import pandas as pd

from math import factorial
from collections import OrderedDict
from numpy import log, sqrt, exp
from scipy.optimize import newton

#### 1.1-1
a. Sorting: on small data an exact set of quantiles needs to be computed.

b. Convex Hull: The convex hull can be used to determine minimal paths. Imagine the interior points representing 
    points jutting from an obstical like building on a lake.

#### 1.1-2
Other than speed a person might use resource consumption as a measure of optimal behavior in prcactice.

#### 1.1-3 Linked lists
a. strength: it is not statically sized, and thus can grow and shrink at runtime.

b. weaknesses: lookups are o(n)

#### 1.1-4
The shortest path and travelling salesman problems are both optimization problems, and both are framed as questions of optimal path. Shortest path is a much less complex algorithm and often has a solution. The traveling salesman problem is complex and has none.

#### 1.1-5
a. best only: a customer wants its top five stores by revenue

b. quantiles need to be found on a petabyte sized array of doubles

#### 1.2-1
Databases: sorting and searching play an important role in producing joins, filters, and grouped aggregations, as well as in many other places.

#### 1.2-2

Since for large value of n insertion sort will have longer run times, we need to find the boundary where the two are equivalent and take the smaller values of n. That is we need to solve:

$$8n^2 = 64n\ln(n)$$

We use Newton's method, which returns 26.1. Taking the floor we have 26.

In [2]:
def fun(n):
    return 8*n**2 - 64*n*log(n)

newton(fun, 100000)

26.093485476611917

#### 1.2-3
Again we use Newtons method by setting
$$f(n) = 2^n - 100n^2$$

and solving
$$f(n) = 0$$

this yields a solution of 14 if a starting point of between 15 and 57 is set.

In [3]:
def fun2(n):
    return 2 ** n - 100 * n ** 2

In [4]:
newton(fun2, 57)

14.324727836998202

Let's try log transforming the eqation before solving and setting to 0.

$$2^n = 100n^2 \Rightarrow$$ 
$$n\ln(2) = 2\ln(n) + \ln(100) \Rightarrow$$
$$f(n) = n\ln(2) - 2\ln(n) - \ln(100)$$

and we solve

$$f(n) = 0$$

This yields a solution of 14 as well but with a much greater input range and level of stability. Start points between 10 and 1,000,000 all converge and avoid overflow.

In [5]:
def fun3(n):
    return n * log(2) - 2 * log(n) - log(100)

newton(fun3, 3)

14.324727836998203

In [6]:
#Check space of acceptable start points.
for k in range(10, 1000000, 10):
    try:
        sol = newton(fun3, k)
        assert (sol < 15) and (sol > 14)
    except Exception as e:
        print(k)
        break

#### 1-1

In [7]:
# helpers
def solve(f, bound, x0, maxiter=100):
    # transform microseconds if the algorithm requires transform
    return newton(lambda a: bound - f(a), x0, maxiter=maxiter)

def results(f, mstrans, x0, maxiter=100):
    dct = OrderedDict()
    pairs = [
        ('sec'   , 1),
        ('min'   , 60),
        ('hour'  , 3600),
        ('day'   , 86400),
        ('month' , 2592000),
        ('year'  , 31104000),
        ('cntry' , 3110400000)
    ]
    
    for tup in pairs:
        ms = tup[1] * 1000000
        dct[tup[0]] = solve(f, mstrans(ms), x0, maxiter)
    return dct

def show(dct, header):
    print(header)
    for k, v in dct.iteritems():
        print('\t' + k + '\t:\t' + str(int(v)))

ident = lambda a: a

def facsolve(bound):    
    def _go(n, beenpos, beenneg):
        fac = factorial(n)
        if beenpos and beenneg and (bound - fac > 0):
            return n
        elif bound - fac >= 0:
            beenpos = True
            return _go(n + 1, beenpos, beenneg)
        else:
            beenneg = True
            return _go(n - 1, beenpos, beenneg)
    return _go(1, False, False)

def facresults():
    dct = OrderedDict()
    pairs = [
        ('sec'   , 1),
        ('min'   , 60),
        ('hour'  , 3600),
        ('day'   , 86400),
        ('month' , 2592000),
        ('year'  , 31104000),
        ('cntry' , 3110400000)
    ]
    
    for tup in pairs:
        ms = tup[1] * 1000000
        dct[tup[0]] = facsolve(ms)
    return dct

###### ln(n)

There are $e^{1000000}$ iterations possible in a second. All remaining solutions are the $e^{1000000 * nseconds}$.

###### sqrt(n)

In [8]:
sols = results(sqrt, ident, 1000000000000, 10000000)
show(sols, 'Results for sqrt(n)')

Results for sqrt(n)
	sec	:	1000000000000
	min	:	3600000000000000
	hour	:	12960000000000000000
	day	:	7464960000000001048576
	month	:	6718464000000000805306368
	year	:	967458815999999995705032704
	cntry	:	9674588159999999635992931729408


###### n

In [9]:
sols = results(ident, ident, 10000000000, 1000)
show(sols, 'Results for n')

Results for n
	sec	:	1000000
	min	:	60000000
	hour	:	3600000000
	day	:	86400000000
	month	:	2592000000000
	year	:	31104000000000
	cntry	:	3110400000000000


###### n * ln(n)

In [10]:
sols = results(lambda a: log(a * log(a)), log, 1000, 1000)
show(sols, 'Results for n log(n)')

Results for n log(n)
	sec	:	87847
	min	:	3950157
	hour	:	188909174
	day	:	3911758539
	month	:	102245912509
	year	:	1121055084773
	cntry	:	96591730923946


###### n ^ 2

In [11]:
sols = results(lambda a: 2 * log(a), log, 100, 1000)
show(sols, 'Results for n ^ 2')

Results for n ^ 2
	sec	:	1000
	min	:	7745
	hour	:	59999
	day	:	293938
	month	:	1609968
	year	:	5577096
	cntry	:	55770960


###### n ^ 3

In [12]:
sols = results(lambda a: 3 * log(a), log, 100, 1000)
show(sols, 'Results for n ^ 3')

Results for n ^ 3
	sec	:	99
	min	:	391
	hour	:	1532
	day	:	4420
	month	:	13736
	year	:	31448
	cntry	:	145972


###### 2 ^ n

In [13]:
sols = results(lambda a: a * log(2), log, 10, 1000)
show(sols, 'Results for 2 ^ n')

Results for 2 ^ n
	sec	:	19
	min	:	25
	hour	:	31
	day	:	36
	month	:	41
	year	:	44
	cntry	:	51


###### n!

In [14]:
sols = facresults()
show(sols, 'Results for n!')

Results for n!
	sec	:	9
	min	:	11
	hour	:	12
	day	:	13
	month	:	15
	year	:	16
	cntry	:	17
