### 數位信號處理實習
### Digital Signal Processing Laboratory



# Lab 5 Numpy, Files, and Modules

### The standard Python data types are not very suited for mathematical operations.

In [2]:
a = [2, 3, 8]
2 * a

[2, 3, 8, 2, 3, 8]

In [3]:
a = [2, 3, 8]
2.1 * a

TypeError: can't multiply sequence by non-int of type 'float'

### In order to solve this using Python lists, we would have to do something like:

In [1]:
values = [2, 3, 8]
result = []
for x in values:
    result.append(2.1 * x)

### This is because Python list’s are not designed as mathematical objects.
### Rather, they are purely a collection of items.
### In order to get a type of list which behaves like a mathematical array or matrix, we use "Numpy".

In [5]:
import numpy as np
a = np.array([2, 3, 8])
2.1 * a

array([ 4.2,  6.3, 16.8])

### We abbreviated numpy to np, this is conventional.
###  np.array takes a Python list as argument.
### The list [2, 3, 8] contains int’s, yet the result contains float’s. This means numpy changed the data type automatically for us.

In [6]:
import numpy as np
a = np.array([2, 3, 8])
a * a

array([ 4,  9, 64])

In [7]:
a**2

array([ 4,  9, 64], dtype=int32)

### This has nicely squared the array element-wise.
### numpy arrays are not vectors in the algebraic sense. Arithmetic operations between arrays are performed element-wise, not on the arrays as a whole.

### To tell numpy we want the dot product we simply use the np.dot function:

In [8]:
a = np.array([2, 3, 8])
np.dot(a,a)

77

### Furthermore, if you pass 2D arrays to np.dot it will behave like matrix multiplication. 
### Several other similar NumPy algebraic functions are available (like np.cross, np.outer, etc.)
### Bottom line: when you want to treat numpy array operations as vector or matrix operations, make use of the specialized functions to this end.

# 5-1 Shape

### One of the most important properties an array is its shape.
###  Images for example, consist of a 2D array of pixels.
### But in color images every pixel is an RGB tuple: the intensity in red, green and blue.
### This makes a color image 3D overall.

In [9]:
import numpy as np
a = np.array([2, 3, 8])
a.shape

(3,)

In [11]:
b = np.array([
[2, 3, 8],
[4, 5, 6],
])
b.shape

(2, 3)

# 5-2 Slicing

### Just like with lists, we might want to select certain values from an array.

In [12]:
a = np.array([2, 3, 8])
a[2]

8

In [13]:
a[1:]

array([3, 8])

In [14]:
b = np.array([
[2, 3, 8],
[4, 5, 6],
])

In [15]:
b[1]

array([4, 5, 6])

In [16]:
b[1][2]

6

###  using b[1] returns the 1th row along the ﬁrst dimenion, which is still an array.
### After that, we can select individual items from that.

### This can be abbreviated to:

In [17]:
b[1, 2]

6

### if I wanted the 1th column instead of the ﬁrst row? Then we use : to select all items along the ﬁrst dimension, and then a 1:

In [18]:
b[:, 1]

array([3, 5])

# 5-3 Masking

### This is perhaps the single most powerful feature of Numpy.
### Suppose we have an array, and we want to throw away all values above a certain cutoff:

In [19]:
a = np.array([230, 10, 284, 39, 76])
cutoff = 200
a > cutoff

array([ True, False,  True, False, False])

### Now we set all the values above 200 to zero:

In [20]:
a = np.array([230, 10, 284, 39, 76])
cutoff = 200
a[a > cutoff] = 0
a

array([ 0, 10,  0, 39, 76])

### The crucial line is a[a > cutoff] = 0. 
### This selects all the points in the array where the test was positive and assigns 0 to that position.
### Without knowing this trick we would have had to loop over the array:

In [22]:
a = np.array([230, 10, 284, 39, 76])
cutoff = 200
new_a = []
for x in a:
    if x > cutoff:
        new_a.append(0)
    else:
        new_a.append(x)
a = np.array(new_a)
a

array([ 0, 10,  0, 39, 76])

# 5-4 Broadcasting

### Broadcasting takes place when you perform operations between arrays of different shapes.

In [23]:
a = np.array([
[0, 1],
[2, 3],
[4, 5],
])
b = np.array([10, 100])
a * b

array([[  0, 100],
       [ 20, 300],
       [ 40, 500]])

### The shapes of a and b don’t match.
###  Numpy will stretch b into a second dimension, as if it were stacked three times upon itself.
### The operation then takes place element-wise.
### One of the rules of broadcasting is that only dimensions of size 1 can be stretched.
###  b is 1D, and has shape (2,).  Numpy adds another dimension of size 1 to b. b now has shape (1, 2). 
###  This new dimension can now be stretched three times so that b’s shape matches a’s shape of (3, 2).

In [24]:
c = np.array([
[0, 1, 2],
[3, 4, 5],
])
b = np.array([10, 100])
c * b

ValueError: operands could not be broadcast together with shapes (2,3) (2,) 

### Numpy, again, adds a dimension to b, making it of shape (1, 2). The sizes of the last dimensions of b and c (2 and 3, respectively) are then compared and found to differ.

### The solution to multiplying c and b above is to speciﬁcally tell Numpy that it must add that extra dimension as the second dimension of b.
### This is done by using None to index that second dimension. 
### The shape of b then becomes (2,1), which is compatible for broadcasting with c

In [25]:
c = np.array([
[0, 1, 2],
[3, 4, 5],
])
b = np.array([10, 100])
c * b[:, None]

array([[  0,  10,  20],
       [300, 400, 500]])

# 5-5 dtype

### A commonly used term in working with numpy is dtype - short for data type.
###  e.g. int8. This means the value is integer with a size of 8 bits.
### Each bit is either 0 or 1. With 8 of them, we have 2^8 = 256 possible values. Since we also have to count zero itself, the largest possible value is 255. 
### The data type we have now described is called uint8, where the u stands for unsigned: only positive values are allowed.
###  If we want to allow negative numbers we use int8. The range then shifts to -128 to +127.
### If you know the elements of your array are never going to be bigger than 100, why waste all the memory space? 
### You might be better off setting your array to uint8 to conserve memory. 

In [35]:
import numpy as np
a = np.array([200], dtype='uint8')
a+a

array([144], dtype=uint8)

In [27]:
import numpy as np
a = np.array([200], dtype='uint16')
a + a

array([400], dtype=uint16)

# 5-6 Changing dtype

In [36]:
import numpy as np
a = np.array([200], dtype='uint8')
a.astype('uint64')

array([200], dtype=uint64)

# 5-7 About Files

### While a program is running, its data is stored in random access memory (RAM). RAM is fast and inexpensive, but it is also volatile, which means that when the program ends, or the computer shuts down, data in RAM disappears.
### To make data available the next time the computer is turned on and the program is started, it has to be written to a non-volatile storage medium, such a hard drive, usb drive, or CD-RW.
### Data on non-volatile storage media is stored in named locations called ﬁles. By reading and writing ﬁles, programs can save information between program runs.
### Working with ﬁles is a lot like working with a notebook. To use a notebook, it has to be opened. When done, it has to be closed.
### While the notebook is open, it can either be read from or written to.

# 5-8  Writing our ﬁrst ﬁle

In [37]:
with open("test.txt", "w") as myfile:
    myfile.write("My first file written from Python\n")
    myfile.write("---------------------------------\n")
    myfile.write("Hello, world!\n")

### Opening a ﬁle creates what we call a ﬁle handle.
###  the variable myfile refers to the new handle object. 
### On line 1, the open function takes two arguments. The ﬁrst is the name of the ﬁle, and the second is the mode. Mode "w" means that we are opening the ﬁle for writing.
### With mode "w", if there is no ﬁle named test.txt on the disk, it will be created. If there already is one, it will be replaced by the ﬁle we are writing.
### To put data in the ﬁle we invoke the write method on the handle.
### The ﬁle is closed after line 4, at the end of the with block. 

# 5-9 Reading a ﬁle line-at-a-time

In [38]:
with open("test.txt", "r") as my_new_handle:
    for the_line in my_new_handle:
        # Do something with the line we just read.
        # Here we just print it.
        print(the_line, end="")

My first file written from Python
---------------------------------
Hello, world!


### we suppress the newline character that print usually appends to our strings with end="".
### This is because the string already has its own newline: the for statement in line 2 reads everything up to and including the newline character.

# 5-10 Turning a ﬁle into a list of lines

### It is often useful to fetch data from a disk ﬁle and turn it into a list of lines.

In [None]:
with open("friends.txt", "r") as input_file:
    all_lines = input_file.readlines()
all_lines.sort()

with open("sortedfriends.txt", "w") as output_file:
    for line in all_lines:
        outut_file.write(line)

### The readlines method in line 2 reads all the lines and returns a list of the strings.

# 5-11 Reading the whole ﬁle at once

### Another way of working with text ﬁles is to read the complete contents of the ﬁle into a string, and then to use our string-processing skills to work with the contents.

In [None]:
with open("somefile.txt") as f:
    content = f.read()
words = content.split()
print("There are {0} words in the file.".format(len(words)))

### Notice here that we left out the "r" mode in line 1. By default, if we don’t supply the mode, Python opens the ﬁle for reading.

# 5-12 An example

### Here is a ﬁlter that copies one ﬁle to another, omitting any lines that begin with #

In [None]:
def filter(oldfile, newfile):
    with open(oldfile, "r") as infile, open(newfile, "w") as outfile:
        for line in infile:
            # Put any processing logic here
            if not line.startswith('#'):
                outfile.write(line)

# 5-13 Directories

### Files on non-volatile storage media are organized by a set of rules known as a ﬁle system.
###  File systems are made up of ﬁles and directories, which are containers for both ﬁles and other directories.
### When we create a new ﬁle by opening it and writing, the new ﬁle goes in the current directory (wherever we were when we ran the program).
### If we want to open a ﬁle somewhere else, we have to specify the path to the ﬁle, which is the name of the directory (or folder) where the ﬁle is located:

In [None]:
wordsfile = open("c:/temp/words.txt", "r") # using "c:\\temp\\words.txt"
wordlist = wordsfile.readlines()
print(wordlist[:6])

# 5-14 Directories

### here is a very simple example that copies the contents at some web URL to a local ﬁle.

In [2]:
import urllib.request

url = "http://www.ece.ntust.edu.tw/et/research/chtseng20161121205825.pdf"
destination_filename = "melab.pdf"

urllib.request.urlretrieve(url, destination_filename)

('melab.pdf', <http.client.HTTPMessage at 0x18683894908>)

### The urlretrieve function — just one call — could be used to download any kind of content from the Internet.
###  Read requests documentation on http://docs.python-requests.org to learn how to install and use the module.

### A module is a ﬁle containing Python deﬁnitions and statements intended for use in other Python programs.
### There are many Python modules that come with Python as part of the standard library.
### The help system contains a listing of all the standard modules that are available with Python.

# 5-15 Random numbers

In [None]:
# Create a black box object that generates random numbers
rng = random.Random()

dice_throw = rng.randrange(1,7) # Return an int, one of 1,2,3,4,5,6
delay_in_seconds = rng.random() * 5.0

### The randrange method call generates an integer between its lower and upper argument, using the same semantics as range.
###  Like range, randrange can also take an optional step argument. So let’s assume we needed a random odd number less than 100, we could say:

In [None]:
random_odd = rng.randrange(1, 100, 2)

### The random method returns a ﬂoating point number in the interval [0.0, 1.0)
### In other words, 0.0 is possible, but all returned numbers will be strictly less than 1.0.
### It is usual to scale the results after calling this method, to get them into an interval suitable for your application.

### This example shows how to shufﬂe a list.

In [None]:
cards = list(range(52)) # Generate ints [0 .. 51]
# representing a pack of cards.
rng.shuffle(cards) # Shuffle the pack

## Repeatability and Testing
### Random number generators are based on a deterministic algorithm — repeatable and predictable. 
### So they’re called pseudo-random generators — they are not genuinely random.
### They start with a seed value. Each time you ask for another random number, you’ll get one based on the current seed attribute, and the state of the seed

In [None]:
drng = random.Random(123) # Create generator with known starting state

## Picking balls from bags, throwing dice, shufﬂing a pack of cards

In [1]:
import random

def make_random_ints(num, lower_bound, upper_bound):
    """
    Generate a list containing num random ints between lower_bound
    and upper_bound. upper_bound is an open bound.
    """
    rng = random.Random() # Create a random number generator
    result = []
    for i in range(num):
        result.append(rng.randrange(lower_bound, upper_bound))
    return result

In [3]:
make_random_ints(5, 1, 13) # Pick 5 random month numbers

[11, 6, 7, 3, 7]

### But what if you don’t want duplicates? If you wanted 5 distinct months, then this algorithm is wrong.

In [None]:
xs = list(range(1,13)) # Make list 1..12 (there are no duplicates)
rng = random.Random() # Make a random number generator
rng.shuffle(xs) # Shuffle the list
result = xs[:5] # Take the first five elements

### The second “shufﬂe and slice” algorithm would not be so great if you only wanted a few elements, but from a very large domain.

In [4]:
import random

def make_random_ints_no_dups(num, lower_bound, upper_bound):
    """
    Generate a list containing num random ints between
    lower_bound and upper_bound. upper_bound is an open bound.
    The result list cannot contain duplicates.
    """
    result = []
    rng = random.Random()
    for i in range(num):
        while True:
            candidate = rng.randrange(lower_bound, upper_bound)
            if candidate not in result:
                break
        result.append(candidate)
    return result

xs = make_random_ints_no_dups(5, 1, 10000000)
print(xs)

[1709143, 8285745, 3847147, 3838450, 2224993]


# 5-16 The time module

### As we start to work with more sophisticated algorithms and bigger programs, a natural concern is “is our code efﬁcient?” 
### The time module has a function called clock that is recommended for this purpose.
### The way to use it is to call clock and assign the result to a variable, say t0, just before you start executing the code you want to measure. Then after execution, call clock again, (this time we’ll save the result in variable t1). The difference t1-t0 is the time elapsed, and is a measure of how fast your program is running.

In [5]:
import time

def do_my_sum(xs):
    sum = 0
    for v in xs:
        sum += v
    return sum

sz = 10000000 # Lets have 10 million elements in the list
testdata = range(sz)

t0 = time.clock()
my_result = do_my_sum(testdata)
t1 = time.clock()
print("my_result = {0} (time taken = {1:.4f} seconds)".format(my_result, t1-t0))

t2 = time.clock()
their_result = sum(testdata)
t3 = time.clock()
print("their_result = {0} (time taken = {1:.4f} seconds)".format(their_result, t3-t2))

my_result = 49999995000000 (time taken = 1.1581 seconds)
their_result = 49999995000000 (time taken = 0.7761 seconds)


# 5-17 The math module

### The math module contains the kinds of mathematical functions you’d typically ﬁnd on your calculator (sin, cos, sqrt, asin, log, log10) and some mathematical constants like pi and e:

In [6]:
import math

In [10]:
math.pi # Constant pi

3.141592653589793

In [9]:
math.e # Constant natural log base

2.718281828459045

In [11]:
math.sqrt(2.0) # Square root function

1.4142135623730951

In [12]:
math.radians(90) # Convert 90 degrees to radians

1.5707963267948966

In [13]:
 math.sin(math.radians(90)) # Find sin of 90 degrees

1.0

In [14]:
 math.asin(1.0) * 2 # Double the arcsin of 1.0 to get pi

3.141592653589793

### Like almost all other programming languages, angles are expressed in radians rather than degrees. 
### There are two functions radians and degrees to convert between these two popular ways of measuring angles.
### Mathematical functions are “pure” and don’t have any state — calculating the square root of 2.0 doesn’t depend on any kind of state or history about what happened in the past. So the functions are not methods of an object — they are simply functions that are grouped together in a module called math.

# 5-18 Creating your own modules

### All we need to do to create our own modules is to save our script as a ﬁle with a .py extension.
### for example, this script is saved as a ﬁle named seqtools.py:

In [None]:
def remove_at(pos, seq):
    return seq[:pos] + seq[pos+1:]

In [16]:
import seqtools
s = "A string!"
seqtools.remove_at(4, s)

'A sting!'

### We do not include the .py ﬁle extension when importing. Python expects the ﬁle names of Python modules to end in .py, so the ﬁle extension is not included in the import statement.
### The use of modules makes it possible to break up very large programs into manageable sized parts, and to keep related parts together.

# 5-19 Namespaces

### A namespace is a collection of identiﬁers that belong to a module, or to a function.
### Each module has its own namespace, so we can use the same identiﬁer name in multiple modules without causing an identiﬁcation problem.

In [None]:
# module1.py

question = "What is the meaning of Life, the Universe, and Everything?"
answer = 42

In [None]:
# module2.py

question = "What is your quest?"
answer = "To seek the holy grail."

In [17]:
import module1
import module2

print(module1.question)
print(module2.question)
print(module1.answer)
print(module2.answer)

What is the meaning of Life, the Universe, and Everything?
What is your quest?
42
To seek the holy grail.


### Functions also have their own namespaces:

In [18]:
def f():
    n = 7
    print("printing n inside of f:", n)

def g():
    n = 42
    print("printing n inside of g:", n)

n = 11
print("printing n before calling f:", n)
f()
print("printing n after calling f:", n)
g()
print("printing n after calling g:", n)

printing n before calling f: 11
printing n inside of f: 7
printing n after calling f: 11
printing n inside of g: 42
printing n after calling g: 11


### The three n’s here do not collide since they are each in a different namespace
### Namespaces permit several programmers to work on the same project without having naming collisions.

### How are namespaces, ﬁles and modules related?

### Python has a convenient and simplifying one-to-one mapping, one module per ﬁle, giving rise to one namespace. Also, Python takes the module name from the ﬁle name, and this becomes the name of the namespace.

### Files and directories organize where things are stored in our computer. On the other hand, namespaces and modules are a programming concept: they help us organize how we want to group related functions and attributes.

### So in Python, if you rename the ﬁle math.py, its module name also changes, your import statements would need to change, and your code that refers to functions or attributes inside that namespace would also need to change.

# 5-20 Scope and lookup rules

### The scope of an identiﬁer is the region of program code in which the identiﬁer can be accessed, or used.
### "Local scope" refers to identiﬁers declared within a function. These identiﬁers are kept in the namespace that belongs to the function, and each function has its own namespace.
### "Global scope" refers to all the identiﬁers declared within the current module, or ﬁle.
### "Built-in scope" refers to all the identiﬁers built into Python — those like range and min that can be used without having to import anything, and are (almost) always available.

In [19]:
def range(n):
    return 123*n

print(range(10))

1230


### We’ve deﬁned our own function called range, so there is now a potential ambiguity. When we use range, do we mean our own one, or the built-in one? 
### Using the scope lookup rules determines this: our own range function, not the built-in one, is called, because our function range is in the global namespace, which takes precedence over the built-in names.

In [20]:
n = 10
m = 3
def f(n):
    m = 7
    return 2*n+m

print(f(5), n, m)

17 10 3


### two variables m and n in lines 1 and 2 are outside the function in the global namespace.
### Inside the function, new variables called n and m are created just for the duration of the execution of f.
### the def puts name f into the global namespace here. 

# 5-21 Attributes and the dot operator

### Variables deﬁned inside a module are called attributes of the module.
### Attributes are accessed using the dot operator (.).
### The question attribute of module1 and module2 is accessed using module1.question and module2.question.
### Modules contain functions as well as attributes, and the dot operator is used to access them in the same way. seqtools.remove_at refers to the remove_at function in the seqtools module.

# 5-22 Three import statement variants

### Here are three different ways to import names into the current namespace, and to use them:

In [None]:
import math
2 x = math.sqrt(10)

### Here just the single identiﬁer math is added to the current namespace. If you want to access one of the functions in the module, you need to use the dot notation to get to it.

In [None]:
from math import cos, sin, sqrt
2 x = sqrt(10)

### The names are added directly to the current namespace, and can be used without qualiﬁcation.

In [None]:
from math import * # Import all the identifiers from math,
# adding them to the current namespace.
x = sqrt(10) # Use them without qualification.

### Of these three, the ﬁrst method is generally preferred, even though it means a little more typing each time. Although, we can make things shorter by importing a module under a different name:

In [1]:
import math as m
m.pi

3.141592653589793

In [2]:
def area(radius):
    import math
    return math.pi * radius * radius

x = math.sqrt(10) # This gives an error

NameError: name 'math' is not defined

### Here we imported math, but we imported it into the local namespace of area.