# Python Data Structures

- discuss the object-oriented features of some commonly used Python built-in data structures
- learn when they should be used instead of a regular class and when they shouldn't:
    - Tuples and Named Tuples
    - Dataclasses
    - Dictionaries
    - Lists and Sets
    - Three types of queues
    
## Empty objects

- every class we've created, we implicitly used `object` class
- instantiating an object class is useless
    - can't add attributes dynamically

In [1]:
o = object()

In [3]:
help(o)

Help on object object:

class object
 |  The base class of the class hierarchy.
 |
 |  When called, it accepts no arguments and returns a new featureless
 |  instance that has no instance attributes and cannot be given any.
 |
 |  Built-in subclasses:
 |      anext_awaitable
 |      async_generator
 |      async_generator_asend
 |      async_generator_athrow
 |      ... and 90 other subclasses
 |
 |  Methods defined here:
 |
 |  __delattr__(self, name, /)
 |      Implement delattr(self, name).
 |
 |  __dir__(self, /)
 |      Default dir() implementation.
 |
 |  __eq__(self, value, /)
 |      Return self==value.
 |
 |  __format__(self, format_spec, /)
 |      Default object formatter.
 |
 |      Return str(self) if format_spec is empty. Raise TypeError otherwise.
 |
 |  __ge__(self, value, /)
 |      Return self>=value.
 |
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |
 |  __getstate__(self, /)
 |      Helper for pickle.
 |
 |  __gt__(self, value, /)
 |      

In [2]:
o.x = 5

AttributeError: 'object' object has no attribute 'x'

In [7]:
class MyClass(object):
    pass

In [8]:
m = MyClass()
m.x = "hello"

In [9]:
m.x

'hello'

## Tuples and named tuples

- tuples are objects that can store a specific number of other objects in sequence
- tuples are *immutable*
    - can't modify - add, remove, or replace members on the fly
- tuples are hashable which makes them candidates for keys in dictionaries and members in sets
- tuples overlap with the idea of coordinates or dimensions
    - e.g., (x, y) pair or (r, g, b) color
    - order matters

In [10]:
# ticker, the current price, the 52-week high, and the 52-week low
stock = "AAPL", 123.52, 53.15, 137.98

In [11]:
stock2 = ("AAPL", 123.52, 53.15, 137.98)

In [12]:
import datetime

def middle(stock, date):
    symbol, current, high, low = stock
    return (((high + low) / 2), date)

In [13]:
middle(("AAPL", 123.52, 53.15, 137.98), datetime.date(2020, 12, 4))

(95.565, datetime.date(2020, 12, 4))

In [14]:
# single value tuple; must add a trailing ,
a = (42, )

In [15]:
a

(42,)

In [16]:
# no trailing , required for two or more values 
nums = (1, 2, 3,)

In [17]:
nums

(1, 2, 3)

In [18]:
a, b, c = nums

In [19]:
a, c

(1, 3)

In [20]:
a, b, c = nums[0], nums[1], nums[2]

In [21]:
a, b, c

(1, 2, 3)

## Named tuple via typing.NamedTuple

- named tuples are tuples with attributes
- a great way to create an immutable grouping of values
- can be thought of as similar to `C struct` but with immutable read-only attributes
- we don't need `__init__` method, it's created for us
- names are created at the class level, but we're not creating class-level attributes

In [22]:
from typing import NamedTuple

class Stock(NamedTuple):
    symbol: str
    current: float
    high: float
    low: float

In [23]:
help(Stock)

Help on class Stock in module __main__:

class Stock(builtins.tuple)
 |  Stock(symbol: str, current: float, high: float, low: float)
 |
 |  Stock(symbol, current, high, low)
 |
 |  Method resolution order:
 |      Stock
 |      builtins.tuple
 |      builtins.object
 |
 |  Methods defined here:
 |
 |  __getnewargs__(self)
 |      Return self as a plain tuple.  Used by copy and pickle.
 |
 |  __repr__(self)
 |      Return a nicely formatted representation string
 |
 |  _asdict(self)
 |      Return a new dict which maps field names to their values.
 |
 |  _replace(self, /, **kwds)
 |      Return a new Stock object replacing specified fields with new values
 |
 |  ----------------------------------------------------------------------
 |  Class methods defined here:
 |
 |  _make(iterable) from builtins.type
 |      Make a new Stock object from a sequence or iterable
 |
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |
 |  __new__

In [24]:
s2 = Stock("AAPL", 123.52, high=137.98, low=53.15)

In [25]:
s2.high

137.98

In [26]:
s2.symbol

'AAPL'

In [27]:
s2.current = 100

AttributeError: can't set attribute

In [28]:
# tuple can contain mutable types
t = ("Relayer", ["Gates of Delirium", "Sound Chaser"])

In [29]:
t[1].append('To Be Over')

In [30]:
t

('Relayer', ['Gates of Delirium', 'Sound Chaser', 'To Be Over'])

In [31]:
hash(t)

TypeError: unhashable type: 'list'

In [32]:
hash(s2)

7048465712386663350

In [33]:
# NamedTuple with property
from typing import NamedTuple

class Stock(NamedTuple):
    symbol: str
    current: float
    high: float
    low: float
    
    @property
    def middle(self) -> float:
        return (self.high + self.low)/2

## Dataclasses

- since Python 3.7, dataclass class decorator let us define ordinary objects
  - provides a clean syntax for specifying attributes
- similar to `C struct` where attributes are mutable
- let's create a `dataclass` similar to Stock class above

In [34]:
from dataclasses import dataclass

@dataclass
class Stock:
    symbol: str
    current: float
    high: float
    low: float
    

In [35]:
help(Stock)

Help on class Stock in module __main__:

class Stock(builtins.object)
 |  Stock(symbol: str, current: float, high: float, low: float) -> None
 |
 |  Stock(symbol: str, current: float, high: float, low: float)
 |
 |  Methods defined here:
 |
 |  __eq__(self, other)
 |      Return self==value.
 |
 |  __init__(self, symbol: str, current: float, high: float, low: float) -> None
 |      Initialize self.  See help(type(self)) for accurate signature.
 |
 |  __repr__(self)
 |      Return repr(self).
 |
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |
 |  __dict__
 |      dictionary for instance variables
 |
 |  __weakref__
 |      list of weak references to the object
 |
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |
 |  __annotations__ = {'current': <class 'float'>, 'high': <class 'float'>...
 |
 |  __dataclass_fields__ = {'current': Field(name='current',t

In [36]:
s = Stock("AAPL", 123.52, 137.98, 53.15)

In [37]:
s.current

123.52

In [38]:
s.current = 145.99

In [39]:
s.unexpected_attribute = 'allowed'

In [40]:
s.unexpected_attribute

'allowed'

In [42]:
class StockOrdinary:
    def __init__(self, 
                 name: str, 
                 current: float, 
                 high: float, 
                 low: float) -> None:
        
        self.name = name
        self.current = current
        self.high = high
        self.low = low

In [43]:
s_ord = StockOrdinary("AAPL", 123.52, 137.98, 53.15)

In [44]:
s_ord_2 = StockOrdinary("AAPL", 123.52, 137.98, 53.15)

In [45]:
# can't compare two objects of regular Stock class
s_ord == s_ord_2

False

In [46]:
stock2 = Stock(symbol='AAPL', current=122.25, high=137.98, low=53.15)

In [47]:
stock1 = Stock(symbol='AAPL', current=122.25, high=137.98, low=53.15)

In [48]:
# dataclass stocks are equal out of the box
stock1 == stock2

True

In [49]:
# dataclass also let's you initialize attributes
from dataclasses import dataclass

@dataclass
class StockDefaults:
    name: str
    current: float = 0
    high: float = 0
    low: float = 0

In [50]:
# default values are used
s1 = StockDefaults('GOOG')

In [51]:
s1

StockDefaults(name='GOOG', current=0, high=0, low=0)

In [52]:
# can still provide values for attributes
StockDefaults("GOOG", 1826.77, 1847.20, 1013.54)

StockDefaults(name='GOOG', current=1826.77, high=1847.2, low=1013.54)

In [53]:
# equal comparison is provided by default; we can also order the objects if needed
@dataclass(order=True)
class StockOrdered:
    name: str
    current: float = 0
    high: float = 0
    low: float = 0

In [54]:
stock_ordered1 = StockOrdered("GOOG", 1826.77, 1847.20, 1013.54)
stock_ordered2 = StockOrdered("GOOG")
stock_ordered3 = StockOrdered("GOOG", 1728.28, high=1733.18, low=1666.33)

In [55]:
# stock_ordeered2 has default 0s for all other attributes
stock_ordered1 < stock_ordered2

False

In [56]:
stock_ordered1 > stock_ordered2

True

In [57]:
from pprint import pprint

In [58]:
# let's print the sorted order of three stock objects
pprint(sorted([stock_ordered1, stock_ordered2, stock_ordered3]))

[StockOrdered(name='GOOG', current=0, high=0, low=0),
 StockOrdered(name='GOOG', current=1728.28, high=1733.18, low=1666.33),
 StockOrdered(name='GOOG', current=1826.77, high=1847.2, low=1013.54)]


In [59]:
# create a frozen class similar to typing.NamedTuple
@dataclass(frozen=True, order=True)
class StockFrozen:
    name: str
    current: float = 0
    high: float = 0
    low: float = 0

In [60]:
goog = StockFrozen("GOOG")

In [61]:
goog.high

0

In [62]:
# can't updates attributes of Frozen instance
goog.high = 100

FrozenInstanceError: cannot assign to field 'high'

## Dictionary

- a super useful data structure that allows us to map objects directly to other objects
- extremely efficient at looking up a **value**, given a specific **key** that maps to a value
- this is possible due to the use of the **hash** of the key to locate the value
- every immutable Python object has a numeric hash code 
    - a relatively simple table is used to map the numeric hashes directly to values
- Python class/objects are mutable; but if they are **hashable** can be used as key
    - provide `__hash__()` that is used by built-in **hash** function when hashing
- the order of the key/value inserted is maintained by the dictionary class from Python 3.7
- for two values (strings, numbers, tuples, etc.) to be equal, they must have the same characters or values, and their **hash** values must also be equal
- **hash** collision can occur, so the look-up may be **not always** efficient (`O(1)`)
    - **hash** collision can slow down the insert and lookup process
- there are several ways to create a dictionary

In [63]:
help(hash)

Help on built-in function hash in module builtins:

hash(obj, /)
    Return the hash value for the given object.

    Two objects that compare equal must also have the same hash value, but the
    reverse is not necessarily true.



In [64]:
hash('abc')

-3876723928077875354

In [65]:
hash(123)

123

In [66]:
hash(('a', 'b', 'c'))

-780505523361007311

In [67]:
hash([1, 2, 3])

TypeError: unhashable type: 'list'

In [68]:
# Has collision example!
x = 2021
y = 2305843009213695972

In [69]:
hash(x)

2021

In [70]:
hash(y)

2021

In [71]:
hash(x) == hash(y)

True

In [72]:
# using keyword parameters; similar to dataclass and namedtuple
stock = dict(current=1235.20, high=1242.54, low=1231.06)

In [73]:
stock

{'current': 1235.2, 'high': 1242.54, 'low': 1231.06}

In [74]:
stocks = {
    "GOOG": (1235.20, 1242.54, 1231.06),
    "MSFT": (110.41, 110.45, 109.84) 
}

In [75]:
stocks

{'GOOG': (1235.2, 1242.54, 1231.06), 'MSFT': (110.41, 110.45, 109.84)}

In [76]:
# accessing value
stocks["GOOG"]

(1235.2, 1242.54, 1231.06)

In [77]:
stocks['APPL']

KeyError: 'APPL'

In [78]:
# better approach
print(stocks.get('APPL'))

None


In [79]:
# provide default value
print(stocks.get('APPL', 'NOT FOUND'))

NOT FOUND


In [80]:
# updating existing key/value pairs
stocks['GOOG'] = (100, 100, 100)

In [81]:
stocks.get("GOOG")

(100, 100, 100)

In [82]:
# adding new key/value iff the key doesn't exist
# if key is in the dictionary, it behaves just like get
stocks.setdefault("GOOG", "INVALID")

(100, 100, 100)

In [84]:
stocks.setdefault("BB", (10.87, 10.76, 11.90))

(10.87, 10.76, 10.9)

In [85]:
stocks['BB']

(10.87, 10.76, 10.9)

In [86]:
# mypy type hints; can use built-in types as type hints from Python 3.9 and mypy 0.812
# need: from __future__ import annotations as the first import 
# for older version of Python and mypy

stocks: dict[str, tuple[float, float, float]] = {}

In [87]:
stocks.setdefault('APPL', (150, 175, 125))

(150, 175, 125)

In [88]:
stocks

{'APPL': (150, 175, 125)}

In [89]:
for stock, values in stocks.items():
    print(f"{stock} last value is {values[0]}")

APPL last value is 150


In [90]:
# using objects as key
class AnObject:
    def __init__(self, avalue):
        self.avalue = avalue
        
    def __repr__(self):
        return f'AnObject: {self.avalue}'

In [91]:
random_keys = {}

In [92]:
random_keys["astring"] = "somestring"
random_keys[5] = "aninteger"
random_keys[25.2] = "floats work too"
random_keys[("abc", 123)] = "so do tuples"

In [93]:
random_keys

{'astring': 'somestring',
 5: 'aninteger',
 25.2: 'floats work too',
 ('abc', 123): 'so do tuples'}

In [94]:
my_object = AnObject(14)

In [95]:
random_keys[my_object] = "We can even store objects"

In [96]:
random_keys

{'astring': 'somestring',
 5: 'aninteger',
 25.2: 'floats work too',
 ('abc', 123): 'so do tuples',
 AnObject: 14: 'We can even store objects'}

In [100]:
# change of my_objects doesn't affect the dictionary; Key changes if __repr__() changed!
my_object.avalue = 100

In [101]:
# random_keys has type hints: dict[Union[str, int, float, Tuple[str, int], AnObject], str]
for key in random_keys:
    print(f"{key!r} has value {random_keys[key]!r}")

'astring' has value 'somestring'
5 has value 'aninteger'
25.2 has value 'floats work too'
('abc', 123) has value 'so do tuples'
AnObject: 100 has value 'We can even store objects'


## Dictionary use cases

- dictionaries are extremely versatile and have numerous uses
- a couple of important ones:
    1. dict[str, tuple[float, float, float]] or dict[str, Stock]
        - similar to the stock example where the symbol maps to a tuple of prices
    2. dict[str, Union[str, float, Tuple[float, float]]
        - e.g., {'name': 'GOOG', 'current': 1245.21, 'range': (1252.64, 1245.18)}
        - this case overlaps with named tuples, dataclass, and objects in general
        
- technically, most classes are implemented using a dictionary

In [102]:
# let's look into my_object
help(my_object)

Help on AnObject in module __main__ object:

class AnObject(builtins.object)
 |  AnObject(avalue)
 |
 |  Methods defined here:
 |
 |  __init__(self, avalue)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |
 |  __repr__(self)
 |      Return repr(self).
 |
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |
 |  __dict__
 |      dictionary for instance variables
 |
 |  __weakref__
 |      list of weak references to the object



In [103]:
my_object.__dict__

{'avalue': 100}

In [105]:
my_object.__dict__['bvalue']

KeyError: 'bvalue'

In [106]:
my_object.avalue

100

## Using defaultdict

- we've used `setdefault` method to set the default value if the key doesn't exist in `dict` instance
- this can get monotonous/expensive to check or `setdefault` value every time a new key is inserted or an existing key is updated
- e.g. let's find letter frequencies using a regular dictionary

In [110]:
def letter_frequency(sentence: str) -> dict[str, int]:
    frequencies: dict[str, int] = {} # regular dictionary
    for letter in sentence:
        frequencies.setdefault(letter, 0)  # step 1
        frequencies[letter] += 1 # step 2
    return frequencies

In [111]:
hist = letter_frequency("Mississippi river is the longest river in Mississippi")

In [112]:
hist

{'M': 2,
 'i': 12,
 's': 10,
 'p': 4,
 ' ': 7,
 'r': 4,
 'v': 2,
 'e': 4,
 't': 2,
 'h': 1,
 'l': 1,
 'o': 1,
 'n': 2,
 'g': 1}

In [113]:
from collections import defaultdict

def letter_frequency_2(sentence: str) -> defaultdict[str, int]:
    frequencies: defaultdict[str, int] = defaultdict(int)
    for letter in sentence:
        frequencies[letter] += 1 # one step !!
    return frequencies

In [114]:
hist1 = hist = letter_frequency("Mississippi river is the longest river in Mississippi")

In [115]:
hist1

{'M': 2,
 'i': 12,
 's': 10,
 'p': 4,
 ' ': 7,
 'r': 4,
 'v': 2,
 'e': 4,
 't': 2,
 'h': 1,
 'l': 1,
 'o': 1,
 'n': 2,
 'g': 1}

In [116]:
# int() function returns 0
int()

0

In [117]:
# we can pass a whole bunch of built-in and user 
# defined functions to initilize the new key with!
str()

''

In [118]:
float()

0.0

In [119]:
list()

[]

In [120]:
dict()

{}

In [121]:
# initializing distance dict in Diajkstra's SSSP algorithm
# dist[(u, v)] = infinity
import math
dist: defaultdict[tuple[int, int], int] = defaultdict(lambda: math.inf)

In [122]:
dist[(1, 2)]

inf

In [123]:
if dist[(1, 2)] > 100:
    dist[(1, 2)] = 1+2

In [124]:
dist[(1, 2)]

3

In [125]:
# we can create our own functions and dataclass with default values
# to pass as a default function for defaultdict
from dataclasses import dataclass

@dataclass
class Price:
    current: float = 0.0
    high: float = 0.0
    low: float = 0.0

In [126]:
Price()

Price(current=0.0, high=0.0, low=0.0)

In [127]:
portfolio = defaultdict(Price)

In [128]:
portfolio['GOOG']

Price(current=0.0, high=0.0, low=0.0)

In [129]:
portfolio

defaultdict(__main__.Price, {'GOOG': Price(current=0.0, high=0.0, low=0.0)})

In [130]:
portfolio["AAPL"] = Price(current=122.25, high=137.98, low=53.15)

In [131]:
from pprint import pprint

In [132]:
pprint(portfolio)

defaultdict(<class '__main__.Price'>,
            {'AAPL': Price(current=122.25, high=137.98, low=53.15),
             'GOOG': Price(current=0.0, high=0.0, low=0.0)})


In [133]:
# jupyter notebook uses pprint to print values of variables/objects
portfolio

defaultdict(__main__.Price,
            {'GOOG': Price(current=0.0, high=0.0, low=0.0),
             'AAPL': Price(current=122.25, high=137.98, low=53.15)})

In [134]:
# what if we wanted prices for stocks grouped by month
# dictionary within dictionary by month!
# within inner dictionary we want Price

def make_defaultdict():
    return defaultdict(Price)

In [135]:
by_month = defaultdict(make_defaultdict)

In [136]:
by_month["APPL"]["Jan"] = Price(current=122.25, high=137.98, low=53.15)

In [137]:
by_month

defaultdict(<function __main__.make_defaultdict()>,
            {'APPL': defaultdict(__main__.Price,
                         {'Jan': Price(current=122.25, high=137.98, low=53.15)})})

In [138]:
by_month['APPL']

defaultdict(__main__.Price,
            {'Jan': Price(current=122.25, high=137.98, low=53.15)})

In [139]:
by_month['APPL']['Jan']

Price(current=122.25, high=137.98, low=53.15)

In [140]:
# shortcut is to use lambda function
by_month1 = defaultdict(lambda: defaultdict(Price))

In [141]:
by_month1["APPL"]["Jan"] = Price(current=122.25, high=137.98, low=53.15)

In [142]:
by_month1

defaultdict(<function __main__.<lambda>()>,
            {'APPL': defaultdict(__main__.Price,
                         {'Jan': Price(current=122.25, high=137.98, low=53.15)})})

## Counter

- counting is a very important task developers do:
    - *I want to count specific instances in an iterable* use case is so common that Python developers thought it deserved a special built-in data structure!

In [143]:
from collections import Counter

In [144]:
freq = Counter("Mississippi river is the longest river in Mississippi")

In [145]:
freq.most_common()

[('i', 12),
 ('s', 10),
 (' ', 7),
 ('p', 4),
 ('r', 4),
 ('e', 4),
 ('M', 2),
 ('v', 2),
 ('t', 2),
 ('n', 2),
 ('h', 1),
 ('l', 1),
 ('o', 1),
 ('g', 1)]

In [146]:
freq.most_common(5)

[('i', 12), ('s', 10), (' ', 7), ('p', 4), ('r', 4)]

In [147]:
help(Counter)

Help on class Counter in module collections:

class Counter(builtins.dict)
 |  Counter(iterable=None, /, **kwds)
 |
 |  Dict subclass for counting hashable items.  Sometimes called a bag
 |  or multiset.  Elements are stored as dictionary keys and their counts
 |  are stored as dictionary values.
 |
 |  >>> c = Counter('abcdeabcdabcaba')  # count elements from a string
 |
 |  >>> c.most_common(3)                # three most common elements
 |  [('a', 5), ('b', 4), ('c', 3)]
 |  >>> sorted(c)                       # list all unique elements
 |  ['a', 'b', 'c', 'd', 'e']
 |  >>> ''.join(sorted(c.elements()))   # list elements with repetitions
 |  'aaaaabbbbcccdde'
 |  >>> sum(c.values())                 # total of all counts
 |  15
 |
 |  >>> c['a']                          # count of letter 'a'
 |  5
 |  >>> for elem in 'shazam':           # update counts from an iterable
 |  ...     c[elem] += 1                # by adding 1 to each element's count
 |  >>> c['a']                        

## Lists

- generic `list` structure is integrated 
- list should be used to store several instances of the same type of objects
    - however, Python list can store any type of objects
- lists also maintain the order of the elements
- lists are mutable
- don't use the `list` for collecting different attributes of the same object
    - tuple, namedtuple, dataclass, and dictionary may be better
- some example of list: list[str], list[int], list[tuple], list[float], etc.

In [148]:
numbers = list(range(20, -1, -1))

In [149]:
numbers

[20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

In [150]:
nums = [2, 4, 6, 8]

In [151]:
nums.append(10)

In [152]:
help(nums)

Help on list object:

class list(object)
 |  list(iterable=(), /)
 |
 |  Built-in mutable sequence.
 |
 |  If no argument is given, the constructor creates a new empty list.
 |  The argument must be an iterable if specified.
 |
 |  Methods defined here:
 |
 |  __add__(self, value, /)
 |      Return self+value.
 |
 |  __contains__(self, key, /)
 |      Return bool(key in self).
 |
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |
 |  __eq__(self, value, /)
 |      Return self==value.
 |
 |  __ge__(self, value, /)
 |      Return self>=value.
 |
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |
 |  __getitem__(self, index, /)
 |      Return self[index].
 |
 |  __gt__(self, value, /)
 |      Return self>value.
 |
 |  __iadd__(self, value, /)
 |      Implement self+=value.
 |
 |  __imul__(self, value, /)
 |      Implement self*=value.
 |
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |
 |  __it

In [153]:
nums.reverse()

In [154]:
nums

[10, 8, 6, 4, 2]

In [155]:
numbers.sort()

In [156]:
numbers

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

### Sorting lists

- see this article on Python sorting algorithms: [https://realpython.com/sorting-algorithms-python/](https://realpython.com/sorting-algorithms-python/)
- an important task in working with lists is to sort them!
- sorting is a popular topic studied in algorithm class
    - many sorting algorithms with different running times!
- Python uses the `Timsort` algorithm created by Tim Peters 
- `Timsort` algorithm is considered a hybrid sorting algorithm
    - employs the best-of-both-worlds combination of insertion sort and merge sort

In [157]:
numbers.sort(reverse=True)

In [158]:
numbers

[20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

In [159]:
# sorting objects with multiple values
values = [(1, 'a'), (2, 'b'), (1, '1'), (2, 'a')]

In [160]:
values.sort()

In [161]:
values

[(1, '1'), (1, 'a'), (2, 'a'), (2, 'b')]

In [162]:
values.sort(key=lambda t: t[0])

In [163]:
values

[(1, '1'), (1, 'a'), (2, 'a'), (2, 'b')]

In [164]:
values.sort(key=lambda t: t[1])

In [165]:
values

[(1, '1'), (1, 'a'), (2, 'a'), (2, 'b')]

In [166]:
# another option
import operator

In [169]:
values.sort(key=operator.itemgetter(0))

In [170]:
values

[(1, '1'), (1, 'a'), (2, 'a'), (2, 'b')]

In [177]:
# sorting user-defined objects
# use order=True parameter to dataclass 
# if you want to order based on all the attributes in the order declared...
@dataclass
class Student:
    first_name: str
    last_name: str
    id: int
    gpa: float
        
    # you need to define __lt__ function to compare two objects of Student type
    def __lt__(self, other: 'Student') -> bool:
        return self.gpa < other.gpa

In [179]:
s1 = Student('John', 'Smith', 123, 2.5)
s2 = Student('Jake', 'Jordan', 200, 3.5)
s3 = Student('Alice', 'Wonderland', 300, 4.5)

In [180]:
students = [s1, s2, s3]

In [181]:
students

[Student(first_name='John', last_name='Smith', id=123, gpa=2.5),
 Student(first_name='Jake', last_name='Jordan', id=200, gpa=3.5),
 Student(first_name='Alice', last_name='Wonderland', id=300, gpa=4.5)]

In [182]:
# use the __lt__ function provided in each object to order
students.sort()

In [183]:
students

[Student(first_name='John', last_name='Smith', id=123, gpa=2.5),
 Student(first_name='Jake', last_name='Jordan', id=200, gpa=3.5),
 Student(first_name='Alice', last_name='Wonderland', id=300, gpa=4.5)]

In [185]:
# also sort based on each attribute
students.sort(reverse=True, key=lambda item: item.last_name)

In [186]:
students

[Student(first_name='Alice', last_name='Wonderland', id=300, gpa=4.5),
 Student(first_name='John', last_name='Smith', id=123, gpa=2.5),
 Student(first_name='Jake', last_name='Jordan', id=200, gpa=3.5)]

In [187]:
students.sort(key=lambda item: item.id)

In [188]:
students

[Student(first_name='John', last_name='Smith', id=123, gpa=2.5),
 Student(first_name='Jake', last_name='Jordan', id=200, gpa=3.5),
 Student(first_name='Alice', last_name='Wonderland', id=300, gpa=4.5)]

In [189]:
# another option is to use the operator module
import operator

In [190]:
students.sort(key=operator.attrgetter("gpa"))

In [191]:
students

[Student(first_name='John', last_name='Smith', id=123, gpa=2.5),
 Student(first_name='Jake', last_name='Jordan', id=200, gpa=3.5),
 Student(first_name='Alice', last_name='Wonderland', id=300, gpa=4.5)]

### List comprehension

- list shortcuts can make you an efficient programmer
- E.g., an arithmetic set $S = \{x^2 : x \in \{0 ... 9\}\}$
    - is equivalent to: 
    ```python
    S = [x**2 for x in range(10)]
    ```
- consists of brackets containing an expression followed by a for clause, then zero or more for or if clauses
    - the expressions can be anything
    - always results in a new list from evaluating an expression
- syntax:
```python
someList = [expression for item in list if conditional] # one-way selector
someList = [expression if conditional else expression for item in list] # two-way selector
```

In [192]:
# Beginner way to create a list of squared values from 0 to 9?
sq = []
for i in range(10):
    sq.append(i**2)

In [193]:
sq

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [194]:
# Professional way: List comprehension:
S = [x**2 for x in range(10)]

In [195]:
S

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In Math: $V = \{2^0, 2^1, 2^2, 2^3, ... 2^{12}\}$

In [196]:
V = [2**x for x in range(13)]
print(V)

[1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096]


In Math: $M = \{x | x \in S \ and \ x \ even\}$

In [197]:
# List comprehension
M1 = [x for x in S if x%2==0]

In [198]:
M1

[0, 4, 16, 36, 64]

In [199]:
evens = [True if x%2==0 else False for x in range(1, 21)]

In [200]:
evens

[False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True]

In [201]:
sentence = "The quick brown fox jumps over the lazy dog"
# words = sentence.split()
# can make a list of tuples or list of lists
wlist = [(w.upper(), w.lower(), len(w)) for w in sentence.split()]

In [202]:
wlist

[('THE', 'the', 3),
 ('QUICK', 'quick', 5),
 ('BROWN', 'brown', 5),
 ('FOX', 'fox', 3),
 ('JUMPS', 'jumps', 5),
 ('OVER', 'over', 4),
 ('THE', 'the', 3),
 ('LAZY', 'lazy', 4),
 ('DOG', 'dog', 3)]

### Nested list comprehension

- syntax to handle the nested lists with a nested loop-in-loop comprehension

```python
lst = [value for innerList in outerList for value in innerList]
lst = [value for innerList in outerList for value in innerList if condition]
lst = [value if condition else value1 for innerList in outerList for value in innerList]
```

In [203]:
# let's create a nestedList of [[1, 2, 3, 4]*4]
nestedList = [list(range(1, 5))]*5

In [204]:
nestedList

[[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]]

In [205]:
# let's keep the even values from each nested list
even = [ x for lst in nestedList for x in lst if x%2==0 ]

In [206]:
even

[2, 4, 2, 4, 2, 4, 2, 4, 2, 4]

In [207]:
# let's create a single boolean list
evenOdd = [True if x%2 == 0 else False for lst in nestedList for x in lst]

In [208]:
evenOdd

[False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True,
 False,
 True]

## Sets

- lists are great and versatile, but sometimes we need all the elements of a container to be unique
- Python sets can store any hashable objects, not just strings and numbers
    - hashable object must implement `__hash__()` method
- sets are inherently unordered due to the hash-based data structure used for efficient access to the members
- it's easy to check if an item is **in** the set
- if you need to order the set, must convert it into a `list`
    - the iterator of Set will access elements in alphabetical order, however!
- not literal syntax to create an empty list
- must use `set()` function to create an empty set

In [234]:
song_library = [
    ("Phantom Of The Opera", "Sarah Brightman"),
    ("Knocking On Heaven's Door", "Guns N' Roses"),
    ("Captain Nemo", "Sarah Brightman"),
    ("Patterns In The Ivy", "Opeth"),
    ("November Rain", "Guns N' Roses"),
    ("Beautiful", "Sarah Brightman"),
    ("Mal's Song", "Vixy and Tony"), ]

In [235]:
artists = set()

In [236]:
for song, artist in song_library:
    artists.add(artist)

In [237]:
artists

{"Guns N' Roses", 'Opeth', 'Sarah Brightman', 'Vixy and Tony'}

In [238]:
'Opeth' in artists

True

In [239]:
'Michael' in artists

False

In [240]:
alphabetical = sorted(list(artists))

In [241]:
alphabetical

["Guns N' Roses", 'Opeth', 'Sarah Brightman', 'Vixy and Tony']

In [242]:
aset = set([1, 2, 3, 4, 5, 6])
bset = set([3, 4, 7, 8, 9, 10])

In [243]:
# set operations
help(set)

Help on class set in module builtins:

class set(object)
 |  set() -> new empty set object
 |  set(iterable) -> new set object
 |
 |  Build an unordered collection of unique elements.
 |
 |  Methods defined here:
 |
 |  __and__(self, value, /)
 |      Return self&value.
 |
 |  __contains__(...)
 |      x.__contains__(y) <==> y in x.
 |
 |  __eq__(self, value, /)
 |      Return self==value.
 |
 |  __ge__(self, value, /)
 |      Return self>=value.
 |
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |
 |  __gt__(self, value, /)
 |      Return self>value.
 |
 |  __iand__(self, value, /)
 |      Return self&=value.
 |
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |
 |  __ior__(self, value, /)
 |      Return self|=value.
 |
 |  __isub__(self, value, /)
 |      Return self-=value.
 |
 |  __iter__(self, /)
 |      Implement iter(self).
 |
 |  __ixor__(self, value, /)
 |      Return self^=value.
 |
 |  __l

In [244]:
aset.difference(bset)

{1, 2, 5, 6}

In [245]:
# operator for difference
aset - bset

{1, 2, 5, 6}

In [246]:
aset.intersection(bset)

{3, 4}

In [247]:
# operator for intersection
aset & bset

{3, 4}

In [248]:
aset.symmetric_difference(bset)

{1, 2, 5, 6, 7, 8, 9, 10}

In [249]:
# operator for symmetric difference
aset ^ bset

{1, 2, 5, 6, 7, 8, 9, 10}

In [250]:
aset.union(bset)

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

In [251]:
# union operator
aset | bset

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

## Stack

- **Last In First Out** data structure
- Python doesn't provide stack data structure; but **list** can be easily adapted as stack
    - use append() to push
    - use pop() to pop the last element pushed
- extend List class to create Stack class
    - we'll learn this in "When Objects are Alike (Inheritance)" chapter

In [None]:
stack = list()

In [None]:
stack.append(1)
stack.append(2)
stack.append((3, 4))

In [None]:
stack.pop()

In [None]:
stack.pop()

In [None]:
stack.pop()

## Queues

- **First In First Out (FIFIO)** data structure
- Python doesn't provide a queue data structure, but the list can be easily adapted to use it as a queue
- `queue` module provides a queue often used for multithreading
- there are three important types of queues:
    1. simple queue using `append()` and `pop()` on `list`
    2. double-ended queue (`deque`) from `collections.deque`
    3. `heapq` (priority queue) from `heapq` module
        - creates a min-priority queue; smaller values have higher priorities
- extend List class to create Queue class
    - we'll learn this in "When Objects are Alike (Inheritance)" chapter

In [209]:
q = list()

In [210]:
q.append(1)

In [211]:
q.append(2)

In [212]:
q.append((3, 4))

In [213]:
q.pop(0)

1

In [214]:
q.pop(0)

2

In [215]:
q.pop(0)

(3, 4)

In [216]:
from collections import deque

In [217]:
help(deque)

Help on class deque in module collections:

class deque(builtins.object)
 |  deque([iterable[, maxlen]]) --> deque object
 |
 |  A list-like sequence optimized for data accesses near its endpoints.
 |
 |  Methods defined here:
 |
 |  __add__(self, value, /)
 |      Return self+value.
 |
 |  __contains__(self, key, /)
 |      Return bool(key in self).
 |
 |  __copy__(...)
 |      Return a shallow copy of a deque.
 |
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |
 |  __eq__(self, value, /)
 |      Return self==value.
 |
 |  __ge__(self, value, /)
 |      Return self>=value.
 |
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |
 |  __getitem__(self, key, /)
 |      Return self[key].
 |
 |  __gt__(self, value, /)
 |      Return self>value.
 |
 |  __iadd__(self, value, /)
 |      Implement self+=value.
 |
 |  __imul__(self, value, /)
 |      Implement self*=value.
 |
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for

In [218]:
deq = deque()

In [219]:
deq.append(10)

In [220]:
deq.appendleft(9)

In [221]:
deq

deque([9, 10])

In [222]:
deq.insert(0, 100)

In [223]:
deq

deque([100, 9, 10])

In [224]:
deq.pop()

10

In [225]:
deq.popleft()

100

In [226]:
import heapq

In [227]:
help(heapq)

Help on module heapq:

NAME
    heapq - Heap queue algorithm (a.k.a. priority queue).

MODULE REFERENCE
    https://docs.python.org/3.12/library/heapq.html

    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for
    all k, counting elements from 0.  For the sake of comparison,
    non-existing elements are considered to be infinite.  The interesting
    property of a heap is that a[0] is always its smallest element.

    Usage:

    heap = []            # creates an empty heap
    heappush(heap, item) # pushes a new item on the heap
    item = heappop(heap) # pops the smallest item from the heap
    item = heap[0]       # smallest item on th

In [228]:
heap = []

In [229]:
heapq.heappush(heap, (10, 'Go to Work'))

In [230]:
heapq.heappush(heap, (5, 'Eat Breakfast'))

In [231]:
heapq.heappush(heap, (1, 'Wake up'))

In [232]:
heap

[(1, 'Wake up'), (10, 'Go to Work'), (5, 'Eat Breakfast')]

In [233]:
heapq.heappop(heap)

(1, 'Wake up')