### Uniform Distribution
Lets start with generating some fake random data. You can get a random number between 0 and 1 using the python random module as follow:

In [1]:
import random
x=random.random()
print("The Value of x is", x)

The Value of x is 0.9145749085383018


Everytime you call random, you will get a new number.

*Exercise 1:* Using random, write a function `generate_uniform(N, mymin, mymax)`, that returns a python list containing N random numbers between specified minimum and maximum value. Note that you may want to quickly work out on paper how to turn numbers between 0 and 1 to between other values. 

In [4]:
# Skeleton
def generate_uniform(N,x_min,x_max):
    out = []
    ### BEGIN SOLUTION
    r= x_max-x_min
    for x in range(N):
        out.append(random.random()*r+x_min)       
    
    ### END SOLUTION
    return out

In [5]:
# Test your solution here
data=generate_uniform(1000,-10,10)
print ("Data Type:", type(data))
print ("Data Length:", len(data))
if len(data)>0: 
    print ("Type of Data Contents:", type(data[0]))
    print ("Data Minimum:", min(data))
    print ("Data Maximum:", max(data))

Data Type: <class 'list'>
Data Length: 1000
Type of Data Contents: <class 'float'>
Data Minimum: -9.954433024503944
Data Maximum: 9.989013305441674


*Exercise 2a:* 
Write a function that computes the mean of values in a list. Recall the equation for the mean of a random variable $\bf{x}$ computed on a data set of $n$ values $\{ x_i \} = \{x_1, x_2, ..., x_n\}$  is ${\bf\bar{x}} = \frac{1}{n} \sum_i^n x_i$.

In [8]:
# Skeleton
def mean(Data):
    m=0.
    
    ### BEGIN SOLUTION
    for x in Data: m+=x

    ### END SOLUTION
    
    return m/len(Data)

In [9]:
# Test your solution here
print ("Mean of Data:", mean(data))

Mean of Data: 0.35316804466529905


*Exercise 2b:* 
Write a function that computes the variance of values in a list. Recall the equation for the variance of a random variable $\bf{x}$ computed on a data set of $n$ values $\{ x_i \} = \{x_1, x_2, ..., x_n\}$  is ${\bf\langle x \rangle} = \frac{1}{n} \sum_i^n (x_i - {\bf\bar{x}})$.

In [12]:
# Skeleton
def variance(Data):
    m=0.
    
    ### BEGIN SOLUTION

    avg= mean(Data)
    for x in Data: m+= (x-avg)**2
    
    ### END SOLUTION
    
    return m/len(Data)

In [13]:
# Test your solution here
print ("Variance of Data:", variance(data))

Variance of Data: 33.35312602913437


## Histogramming

*Exercise 3:* Write a function that bins the data so that you can create a histogram. An example of how to implement histogramming is the following logic:

* User inputs a list of values `x` and optionally `n_bins` which defaults to 10.
* If not supplied, find the minimum and maximum (`x_min`,`x_max`) of the values in x.
* Determine the bin size (`bin_size`) by dividing the range of the function by the number of bins.
* Create an empty list of zeros of size `n_bins`, call it `hist`.
* Loop over the values in `x`
    * Loop over the values in `hist` with index `i`:
        * If x is between `x_min+i*bin_size` and `x_min+(i+1)*bin_size`, increment `hist[i].` 
        * For efficiency, try to use continue to goto the next bin and data point.
* Return `hist` and the list corresponding of the bin edges (i.e. of `x_min+i*bin_size`).    

In [14]:
# Solution
def histogram(x,n_bins=10,x_min=None,x_max=None):
    ### BEGIN SOLUTION

    if x_min==None:
        x_min=min(x)
    if x_max==None:
        x_max=max(x)
        
    bin_size=(x_max-x_min)/n_bins
    
    hist=[0]*n_bins
    bin_edges=[]
    for i in range(len(hist)): bin_edges.append(x_min+i*bin_size)
    bin_edges.append(x_max)
    
    for val in x:
        for i in range(len(hist)):
            if val>=bin_edges[i] and val<bin_edges[i+1]:
                hist[i]+=1
                break
    
    ### END SOLUTION

    return hist,bin_edges

In [15]:
# Test your solution here
h,b=histogram(data,100)
print(h)

[13, 9, 8, 8, 7, 11, 11, 6, 8, 11, 14, 4, 7, 16, 11, 12, 10, 12, 6, 5, 5, 7, 11, 6, 12, 11, 8, 7, 17, 12, 8, 11, 15, 3, 5, 10, 10, 10, 10, 8, 2, 10, 7, 7, 13, 14, 11, 15, 20, 7, 9, 5, 12, 8, 16, 10, 10, 10, 8, 13, 5, 12, 3, 8, 8, 7, 11, 8, 15, 10, 13, 17, 7, 11, 7, 9, 12, 8, 11, 8, 16, 8, 19, 11, 7, 10, 20, 13, 15, 8, 8, 11, 11, 10, 9, 11, 13, 9, 10, 8]


*Exercise 4:* Write a function that uses the histogram function in the previous exercise to create a text-based "graph". For example the output could look like the following:
```
[  0,  1] : ######
[  1,  2] : #####
[  2,  3] : ######
[  3,  4] : ####
[  4,  5] : ####
[  5,  6] : ######
[  6,  7] : #####
[  7,  8] : ######
[  8,  9] : ####
[  9, 10] : #####
```

Where each line corresponds to a bin and the number of `#`'s are proportional to the value of the data in the bin. 

In [19]:
# Solution
def draw_histogram(x,n_bins,x_min=None,x_max=None,character="#",max_character_per_line=20):
    ### BEGIN SOLUTION

    # Fill in your solution here        
    if x_min==None:
        x_min=min(x)
    if x_max==None:
        x_max=max(x)

    hist,bin_edges=histogram(x,n_bins)

    for i in range(len(hist)):
        char_count=0
        print('[{:9},{:9}'.format(round(bin_edges[i],5),round(bin_edges[i+1],5))+']',': ',end="")
        for v in range(hist[i]):
            if char_count<max_character_per_line:
                print(character,end="")
                char_count+=1
        print()
        
    ### END SOLUTION

    return hist,bin_edges

In [22]:
# Test your solution here
h,b=histogram(data,20)

data_test=generate_uniform(200,10,20)
h,b=draw_histogram(data_test,20)

[ 10.12507, 10.61171] : ###############
[ 10.61171, 11.09836] : #############
[ 11.09836,   11.585] : ###########
[   11.585, 12.07164] : #######
[ 12.07164, 12.55829] : #######
[ 12.55829, 13.04493] : ##############
[ 13.04493, 13.53157] : ############
[ 13.53157, 14.01822] : ###########
[ 14.01822, 14.50486] : ########
[ 14.50486, 14.99151] : ##########
[ 14.99151, 15.47815] : ##########
[ 15.47815, 15.96479] : #############
[ 15.96479, 16.45144] : ######
[ 16.45144, 16.93808] : ###############
[ 16.93808, 17.42472] : #######
[ 17.42472, 17.91137] : ###
[ 17.91137, 18.39801] : ########
[ 18.39801, 18.88465] : #########
[ 18.88465,  19.3713] : ############
[  19.3713, 19.85794] : ########


## Functional Programming

*Exercise 5:* Write a function the applies a booling function (that returns true/false) to every element in data, and return a list of indices of elements where the result was true. Use this function to find the indices of entries greater than 0.5. 

In [25]:
def where(mylist,myfunc):
    out= []
    
    ### BEGIN SOLUTION

    # Fill in your solution here 
    for x in mylist:
        if myfunc(x): out.append(mylist.index(x))
    
    ### END SOLUTION
    
    return out


In [26]:
def is_larger(x,comp=0.5):
    return x>comp

In [28]:
# Test your solution here
test_list=[0.7,0.5,0.3,0.6,0.2]
output=where(test_list,is_larger)
output

[0, 3]

*Exercise 6:* The `inrange(mymin,mymax)` function below returns a function that tests if it's input is between the specified values. Write corresponding functions that test:
* Even
* Odd
* Greater than
* Less than
* Equal
* Divisible by

In [30]:
def in_range(mymin,mymax):
    def testrange(x):
        return x<mymax and x>=mymin
    return testrange

# Examples:
F1=in_range(0,10)
F2=in_range(10,20)

# Test of in_range
print (F1(0), F1(1), F1(10), F1(15), F1(20))
print (F2(0), F2(1), F2(10), F2(15), F2(20))

print ("Number of Entries passing F1:", len(where(data,F1)))
print ("Number of Entries passing F2:", len(where(data,F2)))

True True False False False
False False True True False
Number of Entries passing F1: 519
Number of Entries passing F2: 0


In [31]:
### BEGIN SOLUTION

    # Fill in your solution here        
def is_even(x):
    return x%2==0

def greater_than(num1):
    def compare(num2):
        return num2>num1
    return compare

def less_than(num1):
    def compare(num2):
        return num2<num1
    return compare

def is_equal(num1):
    def compare(num2):
        return num2==num1
    return compare

def div_by(num1):
    def is_div(num2):
        return num2%num1==0
    return is_div
### END SOLUTION

In [33]:
# Test your solution
greater_5=greater_than(5)
lesser_5=less_than(5)
eq_5=is_equal(5)
div_5=div_by(5)

print('15 is even:', is_even(15))
print('6 is even:', is_even(6))
print('12>5:',greater_5(12))
print('3>5:',greater_5(3))
print('1<5:',lesser_5(1))
print('4<5:',lesser_5(4))
print('5=5:',eq_5(5))
print('55=5:',eq_5(55))
print('20%5==0:',div_5(20))
print('21%5==0:',div_5(21))

15 is even: False
6 is even: True
12>5: True
3>5: False
1<5: True
4<5: True
5=5: True
55=5: False
20%5==0: True
21%5==0: False


*Exercise 7:* Repeat the previous exercise using `lambda` and the built-in python functions sum and map instead of your solution above. 

In [35]:
### BEGIN SOLUTION

    # Fill in your solution here
    
even=lambda x:x%2==0     
odd=lambda x:x%2!=0

bigger=lambda x:lambda y: y>x
smaller=lambda x:lambda y: y<x
equals=lambda x:lambda y: y==x
can_div=lambda x:lambda y: y%x==0
    
### END SOLUTION

## Monte Carlo

*Exercise 7:* Write a "generator" function called `generate_function(func,x_min,x_max,N)`, that instead of generating a flat distribution, generates a distribution with functional form coded in `func`. Note that `func` will always be > 0.  

Use the test function below and your histogramming functions above to demonstrate that your generator is working properly.

Hint: A simple, but slow, solution is to a draw random number `test_x` within the specified range and another number `p` between the `min` and `max` of the function (which you will have to determine). If `p<=function(test_x)`, then place `test_x` on the output. If not, repeat the process, drawing two new numbers. Repeat until you have the specified number of generated numbers, `N`. For this problem, it's OK to determine the `min` and `max` by numerically sampling the function.  

In [40]:
def find_min_max(x):
    min=x[0]
    max=x[0]

    for val in x:
        if val<min: min=val
        if val>max: max=val

    return min,max

def generate_function(func,x_min,x_max,N=1000):
    out = list()
    ### BEGIN SOLUTION
    
    y_vals=[]
    for x in range(x_min,x_max): y_vals.append(func(x))

    y_min,y_max=find_min_max(y_vals)

    x_range=(x_max-x_min)
    y_range=(y_max-y_min)
    
    while(len(out)<N):
        rand_x=random.random()*x_range+x_min
        rand_y=random.random()*y_range+y_min
        if rand_y<func(rand_x): out.append(rand_x)
    
    ### END SOLUTION
    
    return out 

In [41]:
# A test function
def test_func(x,a=1,b=1, N=1000):
    return abs(a*x+b)

*Exercise 8:* Use your function to generate 1000 numbers that are normal distributed, using the `gaussian` function below. Confirm the mean and variance of the data is close to the mean and variance you specify when building the Gaussian. Histogram the data. 

In [42]:
import math

def gaussian(mean, sigma):
    def f(x):
        return math.exp(-((x-mean)**2)/(2*sigma**2))/math.sqrt(math.pi*sigma)
    return f

# Example Instantiation
g1=gaussian(0,1)
g2=gaussian(10,3)

In [43]:
dist=generate_function(g2,-500,500)
mean(dist)


9.950189094036418

*Exercise 9:* Combine your `generate_function`, `where`, and `in_range` functions above to create an integrate function. Use your integrate function to show that approximately 68% of Normal distribution is within one variance.

In [44]:
def integrate(func, x_min, x_max, n_points=1000):

    y_vals=generate_function(func,x_min,x_max,N=n_points)
    
    y_min,y_max=find_min_max(y_vals)

    x_range=x_max-x_min
    y_range=y_max-y_min

    sigma=math.sqrt(variance(y_vals))
    xbar=mean(y_vals)

    f1=in_range(xbar-sigma,xbar+sigma)

    valid=where(y_vals,f1)
    integral=len(valid)/len(y_vals)
    
    return integral,integral*x_range*y_range

In [45]:

integrate(g1,-5,5)


(0.683, 45.53081795704066)