## What We Looked At Last Time
* We wrapped up our discussion of dictionaries.
* We introduced sets and their operations in great detail.
* We demonstrated that sets have utility outside comparisons (ex: remove duplicates within a list)  

## What We'll Look At Today
* We'll wrap up our discussion of sets.
* We'll introduce NumPy arrays, which are pivotal to efficient computation in Python.

### Searching in and Comparing Sets
* As with lists, `in` can be used to see if a set has a particular item.
* `==` holds true for sets that have the same elements.
* `<=` tests whether the set to its left is a **subset** of the one to its right: that is, all the elements in the left operand are in the right operand.
* `<` tests whether the set to its left is a **proper subset** of the one to its right: that is, all the elements in the left operand are in the right operand, and **the sets are not equal**.

In [None]:
cslist = [] #We're going to create a list of sets.
cslist.append({'red','blue','yellow'}) #primary colors
cslist.append({'purple','orange','green'}) #secondary colors
cslist.append({'yellow','blue','red','yellow'}) #Same elements as #1, but in a different order + duplicate 
cslist.append(cslist[0].union(cslist[1])) #all primary and secondary colors 
print(cslist)

In [None]:
#Check if string 'orange' is in each set
for colset in cslist:
    print(f'\'orange\' in {colset}? ', 'orange' in colset) 


In [None]:
#Determine if first set is equal to each set
for colset in cslist:
    print(f'{cslist[0]} == {colset}? ', cslist[0] == colset) 

In [None]:
#Determine if first set is a subset of each set.
for colset in cslist:
    print(f'{cslist[0]} <= {colset}? ', cslist[0] <= colset) 

In [None]:
#Determine if first set is a proper subset of each set.
for colset in cslist:
    print(f'{cslist[0]} < {colset}? ', cslist[0] < colset) 

### Other Common Set Operations
* `!=` holds true if two sets do not have the exact same elements.
* `>=` is the superset operator (left operand has all elements of the right)
* `>` is the proper superset operator (left operand has all elements of the right and they are not equal)
* `issubset` can be substituted for `<=` and `issuperset` can be substituted for `>=`.  

In [None]:
#Determine if fourth set is not-equal to each set
for colset in cslist:
    print(f'{cslist[3]} != {colset}? ', cslist[3] != colset) 

In [None]:
#Determine if fourth set is superset to each set
for colset in cslist:
    print(f'{cslist[3]} >= {colset}? ', cslist[3] >= colset) 

In [None]:
#Determine if fourth set is proper superset of each set
for colset in cslist:
    print(f'{cslist[3]} > {colset}? ', cslist[3] > colset) 

### The Value of `issubset` and `issuperset`
* Unless proper subset (<) or superset (>) are required, using methods `issubset` and `issuperset` are preferred.
    * For one the argument to `issubset` or `issuperset` can be _any_ iterable (which is then converted to a set.)
    * Secondly, the `issubset` and `issuperset` implementations are purportedly a bit more efficient. 

In [None]:
#Make sure you understand the logic behind the below
somenums={1, 2, 3}
print(somenums.issubset([1, 2, 3, 7]))
print(somenums.issuperset([1, 2, 2, 1]))
print(somenums.issubset([3, 2, 1]))

### Mutable Set Operators and Methods

* Python provides a few more "shortcuts" to modify a set based on particular comparisons to another set (s and t must be sets).
    * `s|=t` performs **union-in-place**: s is replaced by _s.union(t)_
    * `s&=t` performs **intersection-in-place**: s is replaced by _s.intersection(t)_
    * `s-=t` performs **difference-in-place**: s is replaced by _s.difference(t)_
    * `s^=t` performs **symmetric-difference-in-place**: s is replaced by _s.symmetric_difference(t)_
* While these operators lead to more compact code, they may be less readable to those less proficient in Python.
    

In [None]:
mult2 = {2, 4, 6, 8, 10, 12}
mult3 = {3, 6, 9, 12, 15, 18}
mult2 |= mult3 #mult2 now has the union of mult2 and mult3
print(mult2)

In [None]:
mult2 = {2, 4, 6, 8, 10, 12}
mult3 = {3, 6, 9, 12, 15, 18}
mult2 &= mult3 #mult2 now has the intersection of mult2 and mult3
print(mult2)

In [None]:
mult2 = {2, 4, 6, 8, 10, 12}
mult3 = {3, 6, 9, 12, 15, 18}
mult2 -= mult3 #mult2 now has the difference of mult2 and mult3
print(mult2)

In [None]:
mult2 = {2, 4, 6, 8, 10, 12}
mult3 = {3, 6, 9, 12, 15, 18}
mult2 ^= mult3 #mult2 now has the symmetric difference of mult2 and mult3
print(mult2)

### Frozenset: An Immutable Set Type
* **Basic sets are _mutable_**.
* **Set _elements_ must be _immutable_**; therefore, a set cannot have other sets as elements.
* A **frozenset** is an _immutable_ set—it cannot be modified after you create it, so a set _can_ contain frozensets as elements. 
* The built-in function **`frozenset`** creates a frozenset from any iterable. 

In [None]:
#This code will produce an "unhashable type" error
compset = {{1,2,3},{5,3,1},{1, 3, 7}}

In [None]:
#Creating a set of sets and testing for membership
compset = {frozenset({1,2,3}),frozenset({5,3,1}),frozenset({1, 3, 7})}
print(frozenset({3,2,1}) in compset)

# Arrays-Oriented Programming 

### **NumPy** (**Numerical Python**) Library
* First appeared in 2006 and is the **preferred Python array implementation**.
* High-performance, richly functional **_n_-dimensional array** type called **`ndarray`**. 
* **Written in C** and **up to 100 times faster than lists**.
* Critical in big-data processing, AI applications and much more. 
* According to `libraries.io`, **over 450 Python libraries depend on NumPy**. 
* Many popular data science libraries such as Pandas, SciPy (Scientific Python) and Keras (for deep learning) are built on or depend on NumPy. 

## Creating `array`s from Existing Data 
* NumPy arrays are often generated from existing data structures using the **`array`** function.  
* The argument must be an `array` or other iterable.
* The result is a **new** `array` containing the argument’s elements

In [None]:
import numpy as np

In [None]:
myList = [2, 3, 5, 7, 11] 
myArray = np.array([2, 3, 5, 7, 11])
print(type(myList))
print(type(myArray))
print(myList)
print(myArray) #Note the print representation lacks commas.

## Array Basics
* 2+ dimensional lists can also converted quickly to array-format using the `array` function.
* For performance reasons, NumPy is written in the C programming language and uses C-based data types
    * This means that matching data-types will appear differently (ex: different significant digits for decimals) when stored in a NumPy array. 
    * [Other NumPy types can be found here.](https://docs.scipy.org/doc/numpy/user/basics.types.html)
    * Structured arrays permit more flexible array representations, but are not dealt with in this session.


In [None]:
integers = np.array([[1, 2, 3], [4, 5, 6]])
floats = np.array([0.0, 0.1, 0.2, 0.3, 0.4])
print(integers)
print(floats)


### Determining `array` Properties
* Using the Python function `type` will not display the data type stored in an array -- instead, the `.dtype` method must be called on the array. 
* `ndim` contains an `array`’s number of dimensions 
* `shape` contains a _tuple_ specifying an `array`’s dimensions

In [None]:
integers = np.array([[1, 2, 3], [4, 5, 6]])
floats = np.array([0.0, 0.1, 0.2, 0.3, 0.4])
print(type(integers))
print(integers.dtype)
print(floats.dtype)

In [None]:
print(integers.ndim)
print(floats.ndim)
print(integers.shape)
print(floats.shape) #Remember that 1d tuples always include a comma for distinction!

### Iterating through a Multidimensional `array`’s Elements
* Arrays are generally iterated through using `for` or `while` loops.
* `.flat` will provide a flattened representation of an array.
    * For a 2-d array, the 2nd "row" will be appended to the first, followed by the 3rd, etc.
    * For larger arrays, concatenation takes place using a "right-to-left" priority by dimension.
    

In [None]:
for row in integers:
    for column in row:
        print(column, end='  ')
    print() 

In [None]:
for i in integers.flat:
    print(i, end='  ')

# Filling `array`s with Specific Values
* Functions `zeros` or  `ones` will create an array with a shape corresponding to a given tuple (or 1-d array if a single number is provided) 
* `full` creates an `array` containing a value specified as a second argument.
* Only integers are permitted for the array shape parameter, but any numeric type is suitable for `full`'s second parameter.

In [None]:
arr0s = np.zeros(5)
arr1s = np.ones((3,4))
arrvals = np.full((2,3),5.5)
print(arr0s)
print()
print(arr1s)
print()
print(arrvals)


# Creating `array`s from Ranges 
* NumPy provides optimized functions for creating `array`s from ranges
* The `arange` method has similar operation to `range`, but produces a 1-dimensional array.
* `linspace` creates a 1-dimensional array of floating-point values from the 1st argument to the 2nd, uniformly spaced to represent a number of values by the 3rd argument.
    * Unlike `range` or `arange`, both starting and ending points _are included_ when `linspace` is used.

In [None]:
print(np.arange(5))
print(np.arange(5,10))
print(np.arange(10,1,-2))

In [None]:
print(np.linspace(0.0, 1.0, num=5))

### Reshaping an `array` 
* The `array` method **`reshape`** transforms an array into different number of dimensions
* The new shape must have the **same** number of elements as the original

In [None]:
print(np.arange(1,13))
print(np.arange(1,13).reshape(3,4))

### Large `array`s and Display 
* When displaying an `array`, if there are 1000 items or more, NumPy drops the middle rows, columns or both from the output

In [None]:
import numpy as np
print(np.arange(1, 100001).reshape(4, 25000))

In [None]:
np.arange(1, 100001).reshape(100, 1000)

In [None]:
from datetime import datetime
from time import sleep
start=datetime.now()
sleep(1)
end=datetime.now()
print((end-start).total_seconds())
print(type(end-start))

## List vs. `array` Performance: Introducing `datetime` 
* Most `array` operations execute **significantly** faster than corresponding list operations
* The `datetime` library includes all sorts of functionality related to measuring, displaying, and converting absolute times and durations.
* We can create an object that provides a microsecond-precise measurement of the _current_ time using `datetime.now()`


In [None]:
from datetime import datetime
print(datetime.now())
print(type(datetime.now()))

### Using `datetime` to Measure Durations. 
* If we determine the time _before_ and _after_ an operation(s) we're interested in, we can measure the corresponding execution time.
* The object return by an arithmetic operation (ex: subtraction) using `datetime` objects is a `timedelta`, which measures duration.
* The `timedelta` class has a method `total_seconds()` which returns a string representation of the duration in seconds.

In [None]:
from time import sleep
start=datetime.now()
sleep(1)
end=datetime.now()
print((end-start).total_seconds())
print(type(end-start))

### Timing the Creation of a List and an Array Containing Results of 6,000,000 Die Rolls 
* We will use the `timedelta` approach from above to compute the time taken to generate a list with 6 million die rolls, and then an array with 6-million die rolls.
    * The list generation will use standard list comprehension.
    * The array will be built using the NumPy `random.randint` function, which fills a 1-dimensional array with random integers from the first parameter to the second-parameter (not inclusive), a total number of times equal to the third parameter.

In [None]:
import random
start=datetime.now()
randrolls=[random.randrange(1,7) for i in range(0,6_000_000)]
end=datetime.now()
print((end-start).total_seconds())

In [None]:
start=datetime.now()
rolls_array = np.random.randint(1, 7, 6_000_000)
end=datetime.now()
print((end-start).total_seconds())


### 60,000,000 and 600,000,000 Die Rolls  
* Generating a list of 60 million or 600 million die rolls would be quite slow.
* Both are quite fast with NumPy generation, however.

In [None]:
start=datetime.now()
rolls_array = np.random.randint(1, 7, 60_000_000)
end=datetime.now()
print((end-start).total_seconds())

In [None]:
start=datetime.now()
rolls_array = np.random.randint(1, 7, 600_000_000)
end=datetime.now()
print((end-start).total_seconds())