# Python Data Structures

- discuss the object oriented features of some commonly used Python built-in data structures
- learn when they should be used instead of a regular class and when they shouldn't:
    - Tuples and Named Tuples
    - Dataclasses
    - Dictionaries
    - Lists and Sets
    - Three types of queues
    
## Empty objects

- every class we've created, we implictly used `object` class
- instantiating an object class is useless
    - can't add attributes dynamically

In [1]:
o = object()

In [2]:
o.x = 5

AttributeError: 'object' object has no attribute 'x'

In [3]:
class MyClass:
    pass

In [4]:
m = MyClass()
m.x = "hello"

In [5]:
m.x

'hello'

## Tuples and named tuples

- tuples are object that can store a specific number of other objects in sequence
- tuples are *immutable*
    - can't modify - add, remove, or replace members on the fly
- tuples are hashable which make them candidate for keys in dictionaries and members in sets
- tuples overlap with the idea of coordinates or dimensions
    - e.g., (x, y) pair or (r, g, b) color
    - order matters

In [6]:
# ticker, the current price, the 52-week high, and the 52-week low
stock = "AAPL", 123.52, 53.15, 137.98

In [7]:
stock2 = ("AAPL", 123.52, 53.15, 137.98)

In [8]:
import datetime

def middle(stock, date):
    symbol, current, high, low = stock
    return (((high + low) / 2), date)

In [9]:
middle(("AAPL", 123.52, 53.15, 137.98), datetime.date(2020, 12, 4))

(95.565, datetime.date(2020, 12, 4))

In [10]:
# single value tuple; must add a trailing ,
a = (42, )

In [11]:
a

42

In [14]:
# no trailing , required for two or more values 
nums = (1, 2, 3,)

In [15]:
nums

(1, 2, 3)

In [19]:
a, b, c = nums

In [21]:
a, c

(1, 3)

In [17]:
a, b, c = nums[0], nums[1], nums[2]

In [18]:
a, b, c

(1, 2, 3)

## Named tuple via typing.NamedTuple

- named tuples are tuples with attitude
- great way to create an immutable grouping of values
- we don't need `__init__` method, it's created for us
- names are created at the class level, but we're not actually creating class-level attributess

In [25]:
from typing import NamedTuple

class Stock(NamedTuple):
    symbol: str
    current: float
    high: float
    low: float

In [26]:
help(Stock)

Help on class Stock in module __main__:

class Stock(builtins.tuple)
 |  Stock(symbol: str, current: float, high: float, low: float)
 |  
 |  Stock(symbol, current, high, low)
 |  
 |  Method resolution order:
 |      Stock
 |      builtins.tuple
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __getnewargs__(self)
 |      Return self as a plain tuple.  Used by copy and pickle.
 |  
 |  __repr__(self)
 |      Return a nicely formatted representation string
 |  
 |  _asdict(self)
 |      Return a new dict which maps field names to their values.
 |  
 |  _replace(self, /, **kwds)
 |      Return a new Stock object replacing specified fields with new values
 |  
 |  ----------------------------------------------------------------------
 |  Class methods defined here:
 |  
 |  _make(iterable) from builtins.type
 |      Make a new Stock object from a sequence or iterable
 |  
 |  ----------------------------------------------------------------------
 |  Static methods defined 

In [27]:
s2 = Stock("AAPL", 123.52, high=137.98, low=53.15)

In [28]:
s2.high

137.98

In [29]:
s2.symbol

'AAPL'

In [30]:
s2.current = 100

AttributeError: can't set attribute

In [31]:
# tuple can contain mutable types
t = ("Relayer", ["Gates of Delirium", "Sound Chaser"])

In [32]:
t[1].append('To Be Over')

In [33]:
t

('Relayer', ['Gates of Delirium', 'Sound Chaser', 'To Be Over'])

In [34]:
hash(t)

TypeError: unhashable type: 'list'

In [35]:
hash(s2)

1666602977145175382

In [37]:
# NamedTuple with property
from typing import NamedTuple

class Stock(NamedTuple):
    symbol: str
    current: float
    high: float
    low: float
    
    @property
    def middle(self) -> float:
        return (self.high + self.low)/2

## Dataclasses

- since Python 3.7, dataclasses let us define ordinary objects with a clean syntax for specifying attributes


In [38]:
from dataclasses import dataclass

@dataclass
class Stock:
    symbol: str
    current: float
    high: float
    low: float
    

In [39]:
help(Stock)

Help on class Stock in module __main__:

class Stock(builtins.object)
 |  Stock(symbol: str, current: float, high: float, low: float) -> None
 |  
 |  Stock(symbol: str, current: float, high: float, low: float)
 |  
 |  Methods defined here:
 |  
 |  __eq__(self, other)
 |      Return self==value.
 |  
 |  __init__(self, symbol: str, current: float, high: float, low: float) -> None
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __repr__(self)
 |      Return repr(self).
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __annotations__ = {'current': <class 'float'>, 'high': <class 'float'>...
 |  
 |  __datacla

In [40]:
s = Stock("AAPL", 123.52, 137.98, 53.15)

In [41]:
s.current

123.52

In [42]:
s.current = 145.99

In [43]:
s.unexpected_attribute = 'allowed'

In [44]:
s.unexpected_attribute

'allowed'

In [46]:
class StockOrdinary:
    def __init__(self, 
                 name: str, 
                 current: float, 
                 high: float, 
                 low: float) -> None:
        
        self.name = name
        self.current = current
        self.high = high
        self.low = low

In [47]:
s_ord = StockOrdinary("AAPL", 123.52, 137.98, 53.15)

In [48]:
s_ord_2 = StockOrdinary("AAPL", 123.52, 137.98, 53.15)

In [49]:
# can't compare two objects of regular Stock class
s_ord == s_ord_2

False

In [52]:
stock2 = Stock(symbol='AAPL', current=122.25, high=137.98, low=53.15)

In [55]:
stock1 = Stock(symbol='AAPL', current=122.25, high=137.98, low=53.15)

In [56]:
# dataclass stocks are equal out of the box
stock1 == stock2

True

In [2]:
# dataclass also let's you initialize attributes
from dataclasses import dataclass

@dataclass
class StockDefaults:
    name: str
    current: float = 0
    high: float = 0
    low: float = 0

In [3]:
# default values are used
s1 = StockDefaults('GOOG')

In [4]:
s1

StockDefaults(name='GOOG', current=0, high=0, low=0)

In [5]:
# can still provide values for attributes
StockDefaults("GOOG", 1826.77, 1847.20, 1013.54)

StockDefaults(name='GOOG', current=1826.77, high=1847.2, low=1013.54)

In [6]:
# equal comparison is provided by default; we can also order the objects if needed
@dataclass(order=True)
class StockOrdered:
    name: str
    current: float = 0
    high: float = 0
    low: float = 0

In [7]:
stock_ordered1 = StockOrdered("GOOG", 1826.77, 1847.20, 1013.54)
stock_ordered2 = StockOrdered("GOOG")
stock_ordered3 = StockOrdered("GOOG", 1728.28, high=1733.18, low=1666.33)

In [8]:
stock_ordered1 < stock_ordered2

False

In [9]:
stock_ordered1 > stock_ordered2

True

In [10]:
from pprint import pprint

In [11]:
pprint(sorted([stock_ordered1, stock_ordered2, stock_ordered3]))

[StockOrdered(name='GOOG', current=0, high=0, low=0),
 StockOrdered(name='GOOG', current=1728.28, high=1733.18, low=1666.33),
 StockOrdered(name='GOOG', current=1826.77, high=1847.2, low=1013.54)]


In [12]:
# create frozen class similar to typing.NamedTuple
@dataclass(frozen=True, order=True)
class StockFrozen:
    name: str
    current: float = 0
    high: float = 0
    low: float = 0

In [13]:
goog = StockFrozen("GOOG")

In [14]:
goog.high

0

In [15]:
# can't updates attributes of Frozen instance
goog.high = 100

FrozenInstanceError: cannot assign to field 'high'

## Dictionary

- incredibly useful data structure that allow us to map objects directly to other objects
- extremely efficient at looking up a **value**, given a specific **key** that maps to a value
- this is possible due to use of the **hash** of the key to locate the value
- every immutable Python object has a numeric hash code 
    - a relatively simple table is used to map the numeric hashes directly to values
- Python class/objects are mutable; but if they are **hashable** can be used as key
    - provide `__hash__()` that is used by built-in **hash** function when hashing
- oder of the key/value inserted is maintained by dictionary from Python 3.7
- for two values (strings or numbers, tuples) to be equal, they must have the same characters or values and their **hash** values must also be equal
- **hash** collision can occur, so the look may be **not always** immidiate
    - **hash** collision can slow down process
- there are several ways to create a dictionary

In [94]:
help(hash)

Help on built-in function hash in module builtins:

hash(obj, /)
    Return the hash value for the given object.
    
    Two objects that compare equal must also have the same hash value, but the
    reverse is not necessarily true.



In [76]:
hash('abc')

-6234227045135174999

In [77]:
hash(123)

123

In [78]:
hash(('a', 'b', 'c'))

8531899467551042527

In [79]:
hash([1, 2, 3])

TypeError: unhashable type: 'list'

In [131]:
x = 2022
y = 2305843009213695971

In [128]:
hash(x)

2022

In [129]:
hash(y)

2020

In [132]:
# text book's Python version says two hashes are equal, but not here!!!!!!
hash(x) == hash(y)

False

In [80]:
# using keyword parameters; similar to dataclass and namedtuple
stock = dict(current=1235.20, high=1242.54, low=1231.06)

In [81]:
stock

{'current': 1235.2, 'high': 1242.54, 'low': 1231.06}

In [82]:
stocks = {
    "GOOG": (1235.20, 1242.54, 1231.06),
    "MSFT": (110.41, 110.45, 109.84) 
}

In [83]:
stocks

{'GOOG': (1235.2, 1242.54, 1231.06), 'MSFT': (110.41, 110.45, 109.84)}

In [84]:
# accessing value
stocks["GOOG"]

(1235.2, 1242.54, 1231.06)

In [85]:
stocks['APPL']

KeyError: 'APPL'

In [87]:
# better approach
print(stocks.get('APPL'))

None


In [88]:
# provide default value
print(stocks.get('APPL', 'NOT FOUND'))

NOT FOUND


In [89]:
# updating existing key/value pairs
stocks['GOOG'] = (100, 100, 100)

In [90]:
stocks.get("GOOG")

(100, 100, 100)

In [91]:
# adding new key/value iff the key doesn't exist
# if key is in the dictionary, it behaves just like get
stocks.setdefault("GOOG", "INVALID")

(100, 100, 100)

In [92]:
stocks.setdefault("BB", (10.87, 10.76, 10.90))

(10.87, 10.76, 10.9)

In [93]:
stocks['BB']

(10.87, 10.76, 10.9)

In [96]:
# mypy typehints; can use built-in types as typehints Python 3.9 and mypy 0.812
# need: from __future__ import annotations as the first import 
# for older version of Python and mypy

stocks: dict[str, tuple[float, float, float]] = {}

In [97]:
stocks.setdefault('APPL', (150, 175, 125))

(150, 175, 125)

In [98]:
stocks

{'APPL': (150, 175, 125)}

In [99]:
for stock, values in stocks.items():
    print(f"{stock} last value is {values[0]}")

APPL last value is 150


In [118]:
# using objects as key
class AnObject:
    def __init__(self, avalue):
        self.avalue = avalue
        
    def __repr__(self):
        return f'AnObject: {self.avalue}'

In [119]:
random_keys = {}

In [120]:
random_keys["astring"] = "somestring"
random_keys[5] = "aninteger"
random_keys[25.2] = "floats work too"
random_keys[("abc", 123)] = "so do tuples"

In [121]:
random_keys

{'astring': 'somestring',
 5: 'aninteger',
 25.2: 'floats work too',
 ('abc', 123): 'so do tuples'}

In [122]:
my_object = AnObject(14)

In [123]:
random_keys[my_object] = "We can even store objects"

In [124]:
random_keys

{'astring': 'somestring',
 5: 'aninteger',
 25.2: 'floats work too',
 ('abc', 123): 'so do tuples',
 AnObject: 14: 'We can even store objects'}

In [125]:
# change my_objects wouldn't affect the dictionary
my_object.avalue = 12

In [126]:
# random_keys has type hints: dict[Union[str, int, float, Tuple[str, int], AnObject], str]
for key in random_keys:
    print(f"{key!r} has value {random_keys[key]!r}")

'astring' has value 'somestring'
5 has value 'aninteger'
25.2 has value 'floats work too'
('abc', 123) has value 'so do tuples'
AnObject: 12 has value 'We can even store objects'


## Dictionary use cases

- dictionaries are extremely versatile and have numerous uses
- couple of important ones:
    1. dict[str, tuple[float, float, float]] or dict[str, Stock]
        - similar to the stock example where symbol maps to tuple of prices
    2. dict[str, Union[str, float, Tuple[float, float]]
        - e.g., {'name': 'GOOG', 'current': 1245.21, 'range': (1252.64, 1245.18)}
        - this case overlaps with named tuples, dataclass and objects in general
        
- technically, most classes are implemented using dictionary    

In [133]:
help(my_object)

Help on AnObject in module __main__ object:

class AnObject(builtins.object)
 |  AnObject(avalue)
 |  
 |  Methods defined here:
 |  
 |  __init__(self, avalue)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __repr__(self)
 |      Return repr(self).
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)



In [135]:
my_object.__dict__

{'avalue': 12}

In [136]:
my_object.__dict__['avalue']

12

In [137]:
my_object.avalue

12

## Using defaultdict

- we've used `setdefault` method to set the default value if the key doesn't exist in dict instance
- this can get monotonous/expensive to check or setdefault value everytime a new key is inserted or existing key is updated
- e.g. let's find letter frequencey using regular dictionary

In [139]:
def letter_frequency(sentence: str) -> dict[str, int]:
    frequencies: dict[str, int] = {} # regular dictionary
    for letter in sentence:
        frequency = frequencies.setdefault(letter, 0)  # step 1
        frequencies[letter] += 1 # step 2
    return frequencies

In [140]:
hist = letter_frequency("Mississippi river is the longest river in Mississippi")

In [141]:
hist

{'M': 2,
 'i': 12,
 's': 10,
 'p': 4,
 ' ': 7,
 'r': 4,
 'v': 2,
 'e': 4,
 't': 2,
 'h': 1,
 'l': 1,
 'o': 1,
 'n': 2,
 'g': 1}

In [142]:
from collections import defaultdict

def letter_frequency_2(sentence: str) -> defaultdict[str, int]:
    frequencies: defaultdict[str, int] = defaultdict(int)
    for letter in sentence:
        frequencies[letter] += 1 # one step !!
    return frequencies

In [143]:
hist1 = hist = letter_frequency("Mississippi river is the longest river in Mississippi")

In [144]:
hist1

{'M': 2,
 'i': 12,
 's': 10,
 'p': 4,
 ' ': 7,
 'r': 4,
 'v': 2,
 'e': 4,
 't': 2,
 'h': 1,
 'l': 1,
 'o': 1,
 'n': 2,
 'g': 1}

In [154]:
# int() function returns 0
int()

0

In [160]:
# we can pass a whole bunch of built-in and user 
# defined functions to initilize the new key with!
str()

''

In [157]:
float()

0.0

In [158]:
list()

[]

In [159]:
dict()

{}

In [147]:
# initializing distance dict in Diajkstra's SSSP algorithm
# dist[(u, v)] = infinity
import math
dist: defaultdict[tuple[int, int], int] = defaultdict(lambda: math.inf)

In [148]:
dist[(1, 2)]

inf

In [151]:
if dist[(1, 2)] > 100:
    dist[(1, 2)] = 1+2

In [152]:
dist[(1, 2)]

3

In [163]:
# we can create our own functions and dataclass with default values
# to pass as a default function for defaultdict
from dataclasses import dataclass

@dataclass
class Price:
    current: float = 0.0
    high: float = 0.0
    low: float = 0.0

In [164]:
Price()

Price(current=0.0, high=0.0, low=0.0)

In [166]:
portfolio = defaultdict(Price)

In [167]:
portfolio['GOOG']

Price(current=0.0, high=0.0, low=0.0)

In [168]:
portfolio

defaultdict(__main__.Price, {'GOOG': Price(current=0.0, high=0.0, low=0.0)})

In [184]:
portfolio["AAPL"] = Price(current=122.25, high=137.98, low=53.15)

In [170]:
from pprint import pprint

In [171]:
pprint(portfolio)

defaultdict(<class '__main__.Price'>,
            {'AAPL': Price(current=122.25, high=137.98, low=53.15),
             'GOOG': Price(current=0.0, high=0.0, low=0.0)})


In [172]:
# jupyter notebook uses pprint to print values of variables/objects
portfolio

defaultdict(__main__.Price,
            {'GOOG': Price(current=0.0, high=0.0, low=0.0),
             'AAPL': Price(current=122.25, high=137.98, low=53.15)})

In [173]:
# what if we wanted prices for stocks grouped by month
# dictionary within dictionary by month!
# within inner dictionary we want Price

def make_defaultdict():
    return defaultdict(Price)

In [174]:
by_month = defaultdict(make_defaultdict)

In [176]:
by_month["APPL"]["Jan"] = Price(current=122.25, high=137.98, low=53.15)

In [177]:
by_month

defaultdict(<function __main__.make_defaultdict()>,
            {'APPL': defaultdict(__main__.Price,
                         {'Jan': Price(current=122.25, high=137.98, low=53.15)})})

In [179]:
by_month['APPL']

defaultdict(__main__.Price,
            {'Jan': Price(current=122.25, high=137.98, low=53.15)})

In [180]:
by_month['APPL']['Jan']

Price(current=122.25, high=137.98, low=53.15)

In [181]:
# shortcut is to use lambda function
by_month1 = defaultdict(lambda: defaultdict(Price))

In [182]:
by_month1["APPL"]["Jan"] = Price(current=122.25, high=137.98, low=53.15)

In [183]:
by_month1

defaultdict(<function __main__.<lambda>()>,
            {'APPL': defaultdict(__main__.Price,
                         {'Jan': Price(current=122.25, high=137.98, low=53.15)})})

## Counter

- well can be very important tasks develop do:
    - *I want to count specific instances in an iterable* use case is so common that Python develpers thought it deserved a special built-in data structure!

In [185]:
from collections import Counter

In [186]:
freq = Counter("Mississippi river is the longest river in Mississippi")

In [187]:
freq.most_common()

[('i', 12),
 ('s', 10),
 (' ', 7),
 ('p', 4),
 ('r', 4),
 ('e', 4),
 ('M', 2),
 ('v', 2),
 ('t', 2),
 ('n', 2),
 ('h', 1),
 ('l', 1),
 ('o', 1),
 ('g', 1)]

In [188]:
freq.most_common(5)

[('i', 12), ('s', 10), (' ', 7), ('p', 4), ('r', 4)]

## Lists

- generic list structure is integrated 
- list should be used to store several instances of same type of objects
    - however, Python list can store any type of objects
- lists also maintain the order of the elements
- lists are mutable
- don't use list for collecting different attributes of same object
    - tuple, named typle, dataclass and dictionary may be better
- some example of list: list[str], list[int], list[tuple], list[float], etc.

In [195]:
numbers = list(range(20, -1, -1))

In [196]:
numbers

[20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

In [197]:
nums = [2, 4, 6, 8]

In [198]:
nums.append(10)

In [194]:
help(nums)

Help on list object:

class list(object)
 |  list(iterable=(), /)
 |  
 |  Built-in mutable sequence.
 |  
 |  If no argument is given, the constructor creates a new empty list.
 |  The argument must be an iterable if specified.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __iadd__(self, value, /)
 |      Implement self+=value.
 |  
 |  __imul__(self, value, /)
 |      Implement self*=value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate sign

In [203]:
nums.reverse()

In [204]:
nums

[10, 8, 6, 4, 2]

In [205]:
numbers.sort()

In [206]:
numbers

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

### Sorting lists

- see this article on Python sorting algorithms: https://realpython.com/sorting-algorithms-python/
- important tasks of working with lists is to sort them!
- sorting is popular topic studied in algorithm class
    - many sorting algorithms with differnt running time!
- Python uses Timsort algorithm created by Tim Peters 
- Timsort algorithm is considered a hybrid sorting algorithm
    - employs a best-of-both-worlds combination of insertion sort and merge sort 

In [207]:
numbers.sort(reverse=True)

In [208]:
numbers

[20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

In [211]:
# sorting objects with multiple values
values = [(1, 'a'), (2, 'b'), (1, '1'), (2, 'a')]

In [212]:
values.sort()

In [213]:
values

[(1, '1'), (1, 'a'), (2, 'a'), (2, 'b')]

In [217]:
values.sort(key=lambda t: t[0])

In [218]:
values

[(1, '1'), (1, 'a'), (2, 'a'), (2, 'b')]

In [219]:
values.sort(key=lambda t: t[1])

In [220]:
values

[(1, '1'), (1, 'a'), (2, 'a'), (2, 'b')]

In [239]:
# another option
import operator

In [240]:
values.sort(key=operator.itemgetter(0))

In [241]:
values

[(1, '1'), (1, 'a'), (2, 'a'), (2, 'b')]

In [222]:
# sorting user defined objects
# use order=True paramter to dataclass 
# if you want to order based on all the attributes in the order declared...
@dataclass
class Student:
    first_name: str
    last_name: str
    id: int
    gpa: float
        
    # you need to define __lt__ function to compare two objects of Student type
    def __lt__(self, other: 'Student') -> bool:
        return self.gpa > other.gpa

In [223]:
s1 = Student('John', 'Smith', 123, 2.5)
s2 = Student('Jake', 'Jordan', 200, 3.5)
s3 = Student('Alice', 'Wonderland', 300, 4.5)

In [224]:
students = [s1, s2, s3]

In [225]:
students

[Student(first_name='John', last_name='Smith', id=123, gpa=2.5),
 Student(first_name='Jake', last_name='Jordan', id=200, gpa=3.5),
 Student(first_name='Alice', last_name='Wonderland', id=300, gpa=4.5)]

In [227]:
# use the __lt__ function provided in each object to order
students.sort()

In [228]:
students

[Student(first_name='Alice', last_name='Wonderland', id=300, gpa=4.5),
 Student(first_name='Jake', last_name='Jordan', id=200, gpa=3.5),
 Student(first_name='John', last_name='Smith', id=123, gpa=2.5)]

In [234]:
# also sort based on each attribute
students.sort(reverse=True, key=lambda item: item.last_name)

In [235]:
students

[Student(first_name='Alice', last_name='Wonderland', id=300, gpa=4.5),
 Student(first_name='John', last_name='Smith', id=123, gpa=2.5),
 Student(first_name='Jake', last_name='Jordan', id=200, gpa=3.5)]

In [236]:
students.sort(key=lambda item: item.id)

In [237]:
students

[Student(first_name='John', last_name='Smith', id=123, gpa=2.5),
 Student(first_name='Jake', last_name='Jordan', id=200, gpa=3.5),
 Student(first_name='Alice', last_name='Wonderland', id=300, gpa=4.5)]

In [238]:
# another options is to use operator module
import operator

In [242]:
students.sort(key=operator.attrgetter("gpa"))

In [243]:
students

[Student(first_name='John', last_name='Smith', id=123, gpa=2.5),
 Student(first_name='Jake', last_name='Jordan', id=200, gpa=3.5),
 Student(first_name='Alice', last_name='Wonderland', id=300, gpa=4.5)]

### List comprehension

- list shortcuts can make you an efficient programmer
- E.g., an arithmetic set $S = \{x^2 : x \in \{0 ... 9\}\}$
    - is equivalent to: 
    ```python
    S = [x**2 for x in range(10)]
    ```
- consists of brackets containing an expression followed by a for clause, then zero or more for or if clauses
    - the expressions can be anything
    - always results a new list from evaluating expression
- syntax:
```python
someList = [expression for item in list if conditional] # one-way selector
someList = [expression if conditionl else expression for item in list] # two-way selector
```

In [244]:
# beginner way to create a list of squared values of list 0 to 9?
sq = []
for i in range(10):
    sq.append(i**2)

In [245]:
sq

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [246]:
# prefessional way: List comprehension:
S = [x**2 for x in range(10)]

In [247]:
S

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In Math: $V = \{2^0, 2^1, 2^2, 2^3, ... 2^{12}\}$

In [248]:
V = [2**x for x in range(13)]
print(V)

[1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096]


In Math: $M = \{x | x \in S \ and \ x \ even\}$

In [249]:
# List comprehension
M1 = [x for x in S if x%2==0]

In [250]:
M1

[0, 4, 16, 36, 64]

In [253]:
evens = [True if x%2==0 else False for x in range(1, 21)]

In [254]:
evens

[False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True]

In [255]:
sentence = "The quick brown fox jumps over the lazy dog"
# words = sentence.split()
# can make a list of tuples or list of lists
wlist = [(w.upper(), w.lower(), len(w)) for w in sentence.split()]

In [256]:
wlist

[('THE', 'the', 3),
 ('QUICK', 'quick', 5),
 ('BROWN', 'brown', 5),
 ('FOX', 'fox', 3),
 ('JUMPS', 'jumps', 5),
 ('OVER', 'over', 4),
 ('THE', 'the', 3),
 ('LAZY', 'lazy', 4),
 ('DOG', 'dog', 3)]

### Nested list comprehension

- syntax to handle the nested lists with nested loop in loop comphrension

```python
lst = [value for innerList in outerList for value in innerList]
lst = [value for innerList in outerList for value in innerList if condition]
lst = [value if condition else value1 for innerList in outerList for value in innerList]
```

In [257]:
# let's create a nestedList of [[1, 2, 3, 4]*4]
nestedList = [list(range(1, 5))]*5

In [258]:
nestedList

[[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]]

In [259]:
# let's just keep the even values from each nested lists
even = [x for lst in nestedList for x in lst if x%2==0 ]

In [260]:
even

[2, 4, 2, 4, 2, 4, 2, 4, 2, 4]

In [261]:
# let's create boolen single list of True/False
evenOdd = [True if x%2 == 0 else False for lst in nestedList for x in lst]

In [262]:
evenOdd

[False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True]

## Sets

- lists are great and versatile, but sometimes we need all the elements of a container to be unique
- Python sets can store any hashable objects, not just strings and numbers
    - hashable object must implement `__hash__()` method
- sets are inherently unordered due to hash-based data structure used for efficient access to the members
- its easy to check if an item is **in** the set
- if you need to order the set, must convert it into list
    - iterator of Set will access elements in alphabetical order, however!
- not literal syntax to create an empty list
- must use `set()` function to create an empty set

In [263]:
song_library = [
    ("Phantom Of The Opera", "Sarah Brightman"),
    ("Knocking On Heaven's Door", "Guns N' Roses"),
    ("Captain Nemo", "Sarah Brightman"),
    ("Patterns In The Ivy", "Opeth"),
    ("November Rain", "Guns N' Roses"),
    ("Beautiful", "Sarah Brightman"),
    ("Mal's Song", "Vixy and Tony"), ]

In [264]:
artists = set()

In [267]:
for song, artist in song_library:
    artists.add(artist)

In [268]:
artists

{"Guns N' Roses", 'Opeth', 'Sarah Brightman', 'Vixy and Tony'}

In [269]:
'Opeth' in artists

True

In [270]:
'Michael' in artists

False

In [271]:
alphabetical = sorted(list(artists))

In [272]:
alphabetical

["Guns N' Roses", 'Opeth', 'Sarah Brightman', 'Vixy and Tony']

In [273]:
aset = set([1, 2, 3, 4, 5, 6])
bset = set([3, 4, 7, 8, 9, 10])

In [274]:
# set operations
help(set)

Help on class set in module builtins:

class set(object)
 |  set() -> new empty set object
 |  set(iterable) -> new set object
 |  
 |  Build an unordered collection of unique elements.
 |  
 |  Methods defined here:
 |  
 |  __and__(self, value, /)
 |      Return self&value.
 |  
 |  __contains__(...)
 |      x.__contains__(y) <==> y in x.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __iand__(self, value, /)
 |      Return self&=value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __ior__(self, value, /)
 |      Return self|=value.
 |  
 |  __isub__(self, value, /)
 |      Return self-=value.
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __ixor__(self, value, /)
 |      Re

In [275]:
aset.difference(bset)

{1, 2, 5, 6}

In [276]:
# operator for difference
aset - bset

{1, 2, 5, 6}

In [277]:
aset.intersection(bset)

{3, 4}

In [278]:
# operator for intersection
aset & bset

{3, 4}

In [279]:
aset.symmetric_difference(bset)

{1, 2, 5, 6, 7, 8, 9, 10}

In [280]:
# operator for symmetric difference
aset ^ bset

{1, 2, 5, 6, 7, 8, 9, 10}

In [281]:
aset.union(bset)

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

In [282]:
# union operator
aset | bset

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

## Stack

- **Last In First Out** data structure
- Python doesn't provide stack datastructure; but **list** can be easily adapted as stack
    - use append() to push
    - use pop() to pop the last element pushed
- extend List class to create Stack class
    - we'll learn that in "When Objects are Alike (Inheritance)" chapter

In [283]:
stack = list()

In [284]:
stack.append(1)
stack.append(2)
stack.append((3, 4))

In [285]:
stack.pop()

(3, 4)

In [286]:
stack.pop()

2

In [287]:
stack.pop()

1

## Queues

- **First In First Out (FIFIO)** data structure
- Python doesn't provide queue data structure; but list can be easily adapted to use it as a queue
- `queue` module provide queue often used for multithreading
- there are three important types of queues:
    1. simple queue using `append()` and `pop()` on list
    2. double ended queue (deque) from collections.deque
    3. heapq (priority queue) from heapq module
        - creates min pririty queue; smaller values have higher priorities

- extend List class to create Queue class
    - we'll learn that in "When Objects are Alike (Inheritance)" chapter

In [291]:
q = list()

In [292]:
q.append(1)

In [293]:
q.append(2)

In [294]:
q.append((3, 4))

In [295]:
q.pop(0)

1

In [296]:
q.pop(0)

2

In [297]:
q.pop(0)

(3, 4)

In [299]:
from collections import deque

In [300]:
help(deque)

Help on class deque in module collections:

class deque(builtins.object)
 |  deque([iterable[, maxlen]]) --> deque object
 |  
 |  A list-like sequence optimized for data accesses near its endpoints.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __bool__(self, /)
 |      True if self else False
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __copy__(...)
 |      Return a shallow copy of a deque.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(self, key, /)
 |      Return self[key].
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __iadd__(self, value, /)
 |      Implement self+=value.
 |  
 |  __imul__(self, value, /)
 |      Implement self*=value.
 |  
 |  __in

In [301]:
deq = deque()

In [302]:
deq.append(10)

In [303]:
deq.appendleft(9)

In [304]:
deq

deque([9, 10])

In [305]:
deq.insert(0, 100)

In [306]:
deq

deque([100, 9, 10])

In [307]:
deq.pop()

10

In [308]:
deq.popleft()

100

In [309]:
import heapq

In [310]:
help(heapq)

Help on module heapq:

NAME
    heapq - Heap queue algorithm (a.k.a. priority queue).

MODULE REFERENCE
    https://docs.python.org/3.10/library/heapq.html
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for
    all k, counting elements from 0.  For the sake of comparison,
    non-existing elements are considered to be infinite.  The interesting
    property of a heap is that a[0] is always its smallest element.
    
    Usage:
    
    heap = []            # creates an empty heap
    heappush(heap, item) # pushes a new item on the heap
    item = heappop(heap) # pops the smallest item from the heap
    item = heap[0]       # smalles

In [318]:
heap = []

In [319]:
heapq.heappush(heap, (10, 'Go to Work'))

In [320]:
heapq.heappush(heap, (5, 'Eat Breakfast'))

In [322]:
heapq.heappush(heap, (1, 'Wake up'))

In [323]:
heap

[(1, 'Wake up'), (10, 'Go to Work'), (5, 'Eat Breakfast')]

In [325]:
heapq.heappop(heap)

(1, 'Wake up')