## CS 210 Spring 2024 - Feb 29
### Set, Random, NumPy

---

#### [Python documentation](https://docs.python.org/3/)

---

### <font color="brown">Set</font>

#### <font color="brown">Building a set out of non-structure types (numeric, string, boolean)</font>

In [1]:
set1 = {2,2,3}
print(set1)

{2, 3}


In [2]:
# can items be of different types
set2 = {2,"two","three",True,12.6}
print(set2)

{True, 2, 'two', 'three', 12.6}


In [3]:
# can sets contain structures?
set3 = {[1,2,3]}
print(set3)

TypeError: unhashable type: 'list'

**Sets can't have structures as members**

In [4]:
# are sets ordered?
set2[0]

TypeError: 'set' object is not subscriptable

**Sets are unordered, so cannot be indexed**

---

#### <font color="brown">Building a set out of iterables (strings, lists, tuples)</font>

In [6]:
set1 = set("apple")
print(set1)
set2 = set("pie")
print(set2)
set3 = set([1,3,4,2,1])  # using a constructor here, different from doing set3 = {[1,2,3]} which won't work
print(set3)
set4 = set(('x','y','x','z'))
print(set4)

{'l', 'p', 'e', 'a'}
{'p', 'i', 'e'}
{1, 2, 3, 4}
{'x', 'z', 'y'}


In [7]:
# but you still can't do this:
set5 = set([1,2,[3,4],5]) # trying to sneak in a list item inside the outer list
set5

TypeError: unhashable type: 'list'

---

#### <font color="brown">Adding to a set</font>

In [20]:
myset = set()
myset.add('a')
myset.add('p')
myset.add('p')
print(myset)

{'p', 'a'}


In [21]:
# add an iterable?
myset.add([1,2,3])
myset

TypeError: unhashable type: 'list'

**Can't add a list**

In [22]:
myset.add('abc')
myset

{'a', 'abc', 'p'}

**But can add a string, which is treated as a single unit**

---

#### <font color="brown">Updating a set</font>

In [23]:
# updating with another set
print('Before', myset)
myset.update(set('pqr'))
print('After', myset)

Before {'p', 'abc', 'a'}
After {'r', 'a', 'p', 'abc', 'q'}


In [24]:
# updating with a list (iterable) - you can't do this with add
myset.update(['x','y','z'])
myset

{'a', 'abc', 'p', 'q', 'r', 'x', 'y', 'z'}

In [25]:
# updating with a tuple (iterable)
myset.update((10,15))
myset

{10, 15, 'a', 'abc', 'p', 'q', 'r', 'x', 'y', 'z'}

In [28]:
# updating with a string - now the string is broken down into its characters, unlike add
myset.update('def')
myset

{10, 15, 'a', 'abc', 'd', 'e', 'f', 'p', 'q', 'r', 'x', 'y', 'z'}

**When update is used, string parameter is treated as an iterable**

---

#### <font color="brown">Set Operations</font>

In [30]:
print(set("apple") | set("pie"))  # union

print(set("apple") & set("pie"))  # intersection

print(set("apple") - set("pie"))  # difference
print(set("pie") - set("apple"))

print(set("pie") <= set("apple")) # subset
print(set("pe") <= set("apple"))
print(set("peal") <= set("apple"))

print(set("pe") < set("apple"))   # proper subset
print(set("peal") < set("apple"))

print(set("set") ^ set("list"))   # union difference intersection, i.e. {s,e,t,l,i} - {s,t}

print('i' in set("pie"))  # membership

{'i', 'a', 'l', 'p', 'e'}
{'p', 'e'}
{'l', 'a'}
{'i'}
False
True
True
True
False
{'l', 'i', 'e'}
True


#### Example of set usage: find unique words in document

In [31]:
def getword(token):
    token = token.strip(',.')
    if not token.isalpha():
        return None
    if len(token) < 4:
        return None
    return token.lower()

In [32]:
from collections import Counter

unique_words = set()
word_count = Counter()
for line in open('metamorphosis.txt'):
    tokens = line.split()  # default separator is whitespace
    for token in tokens:
        word = getword(token)
        if word:
            unique_words.add(word)
            word_count.update([word])

In [33]:
print(len(word_count))
print(len(unique_words))

64
64


In [34]:
print(unique_words)

{'little', 'dull', 'gregor', 'only', 'mild', 'rain', 'hard', 'must', 'present', 'there', 'that', 'where', 'this', 'before', 'however', 'stopped', 'state', 'himself', 'about', 'then', 'began', 'felt', 'never', 'window', 'legs', 'have', 'made', 'because', 'always', 'heard', 'which', 'unable', 'something', 'sleep', 'right', 'look', 'pain', 'into', 'used', 'rolled', 'back', 'longer', 'hitting', 'drops', 'tried', 'when', 'times', 'onto', 'sleeping', 'turned', 'could', 'pane', 'forget', 'weather', 'threw', 'eyes', 'hundred', 'feel', 'shut', 'quite', 'thought', 'position', 'nonsense', 'floundering'}


In [35]:
print(sorted(unique_words))

['about', 'always', 'back', 'because', 'before', 'began', 'could', 'drops', 'dull', 'eyes', 'feel', 'felt', 'floundering', 'forget', 'gregor', 'hard', 'have', 'heard', 'himself', 'hitting', 'however', 'hundred', 'into', 'legs', 'little', 'longer', 'look', 'made', 'mild', 'must', 'never', 'nonsense', 'only', 'onto', 'pain', 'pane', 'position', 'present', 'quite', 'rain', 'right', 'rolled', 'shut', 'sleep', 'sleeping', 'something', 'state', 'stopped', 'that', 'then', 'there', 'this', 'thought', 'threw', 'times', 'tried', 'turned', 'unable', 'used', 'weather', 'when', 'where', 'which', 'window']


In [36]:
print(word_count)

Counter({'that': 4, 'look': 2, 'dull': 2, 'feel': 2, 'right': 2, 'have': 2, 'gregor': 1, 'then': 1, 'turned': 1, 'window': 1, 'weather': 1, 'drops': 1, 'rain': 1, 'could': 1, 'heard': 1, 'hitting': 1, 'pane': 1, 'which': 1, 'made': 1, 'quite': 1, 'about': 1, 'sleep': 1, 'little': 1, 'longer': 1, 'forget': 1, 'this': 1, 'nonsense': 1, 'thought': 1, 'something': 1, 'unable': 1, 'because': 1, 'used': 1, 'sleeping': 1, 'present': 1, 'state': 1, 'into': 1, 'position': 1, 'however': 1, 'hard': 1, 'threw': 1, 'himself': 1, 'onto': 1, 'always': 1, 'rolled': 1, 'back': 1, 'where': 1, 'must': 1, 'tried': 1, 'hundred': 1, 'times': 1, 'shut': 1, 'eyes': 1, 'floundering': 1, 'legs': 1, 'only': 1, 'stopped': 1, 'when': 1, 'began': 1, 'mild': 1, 'pain': 1, 'there': 1, 'never': 1, 'felt': 1, 'before': 1})


---

### <font color="brown">Random</font>

In [37]:
import random

In [38]:
articles = ["the", "a"]
subjects = ["man", "woman", "boy", "girl", "scientist", "loser", "poser"]
verbs = ["jumped", "sang", "ran", "cried", "laughed", "played", "programmed"]
adverbs = ["loudly", "badly", "heavily", "softly", "madly", "sadly"]

In [39]:
line = 1
while line < 7:
    str = ""
    str += random.choice(articles)
    str += " " + random.choice(subjects)
    str += " " + random.choice(verbs)
    adv = random.randint(0,1)  # gives 0 or 1 randomly
    if adv:
         str += " " + random.choice(adverbs)
    line += 1
print(str)

a girl ran


---

### <font color="brown">NumPy - Numerical Python</font>
https://numpy.org/

#### A key feature of numpy is the n-dimensional array object, or ndarray, which allows you to perform mathematical operations on entire arrays like you would with single numeric values

---

In [40]:
import numpy as np

#### <font color="brown">Creating an ndarray from a list</font>

In [41]:
data1 = [3, 2.8, 19, 5, 17.6, 5.1]
arr1 = np.array(data1)
arr1

array([ 3. ,  2.8, 19. ,  5. , 17.6,  5.1])

In [42]:
type(arr1)

numpy.ndarray

**All items in an ndarray MUST BE OF THE SAME TYPE (unlike Python list)**

In [43]:
num1 = np.array([1,2,3,4,5])
str1 = np.array(['cs112','cs210','cs211'])
bool1 = np.array([True,True,False,True])
print(num1)
print(str1)
print(bool1)

[1 2 3 4 5]
['cs112' 'cs210' 'cs211']
[ True  True False  True]


In [44]:
# mix int with float
mixedarr2 = np.array([1,2.5])
print(mixedarr2)

[1.  2.5]


**Example above shows that if list has only int and float, then int is converted to float**

In [45]:
# mix numeric with boolean
mixedarr2 = np.array([1,2.3,True,False])
print(mixedarr2)

[1.  2.3 1.  0. ]


**Example above shows that if boolean is mixed with numbers, it is converted to a number (0 for False, 1 for True)**

In [46]:
# mix string with other types
mixedarr1 = np.array([1,2.3,'1',True])
print(mixedarr1)

['1' '2.3' '1' 'True']


**Example above shows that if at least one of the items is a string, other items are cooerced to strings**

---

#### <font color="brown">Multiplying and adding on an ndarray</font>

In [47]:
data1

[3, 2.8, 19, 5, 17.6, 5.1]

In [48]:
# multiplying a Python list by a scalar repeats it (just like string)
data1*2

[3, 2.8, 19, 5, 17.6, 5.1, 3, 2.8, 19, 5, 17.6, 5.1]

In [49]:
# multiplying a numpy array multiplies all items individually 
print(arr1)
arr1*2

[ 3.   2.8 19.   5.  17.6  5.1]


array([ 6. ,  5.6, 38. , 10. , 35.2, 10.2])

In [50]:
# adding a scalar to a Python list?
data1 + 2

TypeError: can only concatenate list (not "int") to list

In [51]:
# adding a scalar to a numpy array
arr1 + 2

array([ 5. ,  4.8, 21. ,  7. , 19.6,  7.1])

In [52]:
# adding two lists in Python appends all items of second to first
data1 + data1

[3, 2.8, 19, 5, 17.6, 5.1, 3, 2.8, 19, 5, 17.6, 5.1]

In [53]:
# adding two numpy arrays does element-wise addition
arr1 + arr1

array([ 6. ,  5.6, 38. , 10. , 35.2, 10.2])

---

#### <font color="brown">Every ndarray has a type</font>
Type is accessed through ndarray.dtype property (not function)

In [54]:
data1 = [3, 2.8, 19, 5, 17.6, 5.1]
arr1 = np.array(data1)
print(arr1.dtype)

num1 = np.array([1,2,3,4,5])
print(num1.dtype)

str1 = np.array(['cs112','cs210','cs211'])
print(str1.dtype)

bool1 = np.array([True,True,False,True])
print(bool1.dtype)

float64
int64
<U5
bool


**U5 above means means Unicode, 5 characters. Actual bytes per character depends on platform**

In [55]:
str2 = np.array(['one','three','five','eleven'])
print(str2.dtype)

<U6


---

#### <font color="brown">Every ndarray has a shape</font>
Shape is accessed through ndarray.shape property (not function)

In [56]:
print(arr1)
arr1.shape

[ 3.   2.8 19.   5.  17.6  5.1]


(6,)

In [57]:
arr2d = np.array([[1,2,3],[4,5,6]])  # input is nested list
print(arr2d)
print(arr2d.dtype)
print(arr2d.shape)   # 2 rows, 3 columns

[[1 2 3]
 [4 5 6]]
int64
(2, 3)


In [58]:
print(arr2d.ndim)  # ndim gives number of rows

2


In [59]:
r,c = arr2d.shape
print(f'rows={r}, columns={c}')

rows=2, columns=3


In [60]:
arr3d = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[11,12,13]]])

In [61]:
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [11, 12, 13]]])

In [62]:
arr3d.shape

(2, 2, 3)

**Nested lists of different lengths will not work**

In [64]:
np.array([[1,2,3],[4,5,6,7]])  

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

---

#### <font color="brown">Creating boilerplate ndarrays using special NumPy functions</font>

**Array initialized to zeros**

In [65]:
# array initialized to zeros
zr = np.zeros(10)
zr

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [66]:
zr2d = np.zeros((5,3))  # 5 x 3 array fileld with zeros
zr2d

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [67]:
zr2d = np.zeros(5,3)   # won't work, shape argument must be a tuple, except for 1-d
zr2d 

TypeError: Cannot interpret '3' as a data type

**Array initialized to ones**

In [68]:
ones2d = np.ones((3,4))
ones2d

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [69]:
# use dtype argument to set type to int instead of default float
ones2d = np.ones((3,4),dtype=int)
ones2d

array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]])

**Array initialized to empty (no particular value)**

In [70]:
np.empty((2,3,2))  # 3D

array([[[1., 1.],
        [1., 1.],
        [1., 1.]],

       [[1., 1.],
        [1., 1.],
        [1., 1.]]])

**<font color="red">not safe to assume that np.empty() will get you ones, or zeros, or anything specific</font>**

**Making zeros, ones, empty array out of another array's shape**

In [71]:
arr2d = np.array([[1,2,3],[4,5,6]])

In [72]:
np.ones_like(arr2d)

array([[1, 1, 1],
       [1, 1, 1]])

In [73]:
np.empty_like(arr2d)

array([[4613937818241073152, 4613487458278336102, 4626041242239631360],
       [4617315517961601024, 4625647177272236442, 4617428107952285286]])

**Identity matrix (square matrix with 1s on main diagonal)**

In [74]:
np.eye(3)  # single parametr because matrix is square

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

---

#### <font color="brown">NumPy functions arange and reshape</font>

**The arange function is the numpy array equivalent of Python range function**

In [75]:
np.arange(5)

array([0, 1, 2, 3, 4])

In [76]:
np.arange(-3,3,2)

array([-3, -1,  1])

In [77]:
np.arange(5,-2,-1)

array([ 5,  4,  3,  2,  1,  0, -1])

**Reshaping an ndarray**

In [78]:
np.arange(15).reshape(3,5)   # reshape can be used on any ndarray

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [79]:
arr2d = np.array([[1,2,3],[4,5,6]])

In [80]:
arr2d.reshape(6)

array([1, 2, 3, 4, 5, 6])

In [81]:
arr3d = np.arange(12).reshape(2,3,2)
print(arr3d)

[[[ 0  1]
  [ 2  3]
  [ 4  5]]

 [[ 6  7]
  [ 8  9]
  [10 11]]]


**Above is 2 planes each of dim 3x2**

In [82]:
# get 2nd row, 1st column of 1st plane
arr3d[0,1,0]

2

In [83]:
# alternatively, you can use this syntax
arr3d[0][1][0]

2

In [84]:
# get 3nd row, 2st column of 2nd plane
arr3d[1,2,1]

11

In [85]:
np.zeros_like(arr3d)

array([[[0, 0],
        [0, 0],
        [0, 0]],

       [[0, 0],
        [0, 0],
        [0, 0]]])