# Data Structures
© Advanced Analytics, Amir Ben Haim, 2024

## Complex data types

- Lists & Matrix
- Tupples
- Sets
- Frozenset
- Dictionary
- Range

## Lists
Lists are composed of several values separated by comma. Lists in Python are almost similar to vectors in R, but can include different data types

### Mixed Data Types

In [None]:
a = [1, 2, 3.5, 'ABC', False]
# or
a = list((1, 2, 3.5, 'ABC', False))
a

### Index Operator
We can retrieve each of the values in the list using their index.
<br>As we can appreciate, the index 1 reffers to the second value.
<br>This is because in Python, indices begin from zero and not from one.

In [None]:
print(a[1])
print(a[0])

In [None]:
# range
a[1:4]

In [None]:
# range
a[:4]

In [None]:
# range
a[2:]

In [None]:
# Opposite direction
a[-1]

In [None]:
a[-3:-1]

In [None]:
a[-3:]

In [None]:
# step
a[::2]

### Methods

In [None]:
a = [1, 2, 3, 4, 5, "A", "B", "C"]
a

In [None]:
# .append -> adds a new element to the list
a.append("D")
a

In [None]:
# .append -> not unique elements
a.append("D")
a

In [None]:
# .extend(iterable)  -> add new group of elements to the list
a.extend(['x1','x2','x3'])
a

In [None]:
# "extend" is different then "append"
a.append(['x1','x2','x3'])
a

In [None]:
# .remove(x) -> remove any appearance of x in the list
a.remove('C')
a

In [None]:
# .insert(i,x) -> inserts a new element at a specific position
a.insert(5,6)
a

In [None]:
# .pop(i) -> remove the element at the specified position. If no position is specified, it removes the last
# element in the list
a.pop()
a

In [None]:
# .pop(i) -> remove the element at the specified position. If no position is specified, it removes the last
# element in the list
a.pop(1)
a

In [None]:
# .index(x[, start[, end]]) -> returns the position the element x appears for the first time.
# We can limit the range of positions to search for x.
a.index('D')

In [None]:
# .count(x) -> return the number of times the element x appears on the list
a.count('D')

In [None]:
a

In [None]:
# Error
# because of mixed data types
a.sort(reverse=True)

In [None]:
b = a[0:4]
b

In [None]:
# .sort(key=None, reverse=False) -> orders a list. It will return error if the list has mixed data types.
b.sort(reverse=True)
b

In [None]:
# .sort(key=None, reverse=False) -> orders a list. It will return error if the list has mixed data types.
b.sort(reverse=False)
b

In [None]:
# .reverse -> invert the order of a list
c = ["a","g","e","s"]
c.reverse()
c

In [None]:
# Another way to reverse
c[::-1]

In [None]:
a

In [None]:
# .clear() -> remove all elements from a list
a.clear()
a

#### .copy() method - Breakdown

In [None]:
l = [1,2,3]
l

In [None]:
u = l
u

In [None]:
u[1] = 'xxx'
u

In [None]:
l

In [None]:
# The var 'l' & 'u' both assaigned to the same object
print(id(l))
print(id(u))

In [None]:
l = [1,2,3]
l

In [None]:
# .copy() -> returns a copy of the list
u = l.copy()
u

In [None]:
u[1] = 'xxx'
u

In [None]:
l

In [None]:
# In this case the var 'l' & 'u' assaigned to different object
print(id(l))
print(id(u))

## Matrix

Matrices are represented in Python as list of lists.
<br>For example, a 2x3 matrix will be represented as follows:

In [None]:
m = [[ 1, 2, 3 ],[ 4, 5, 6 ]]
m

### Index Operator

In [None]:
m[1]

In [None]:
m[1][1]

In [None]:
type(m)

In [None]:
type(m[1])

In [None]:
type(m[1][1])

### Append new dimentions to a Matrix

In [None]:
m.append([7 ,8, 9])
m

## Tuples

Tuples are simmilar to lists but differ from them because they are immutable (can not be modified after they are created).

We create tuples as follows:

In [None]:
t1 = (1, 2, 3, 'A', 'B', 'C')
t1

### Immutable

In [None]:
t1[0]

In [None]:
# Error
# 'tuple' object does not support item assignment
# immutable
t1[0] = 3

### Nest another tupple
The elements in the tuple can not be modified, but we can nest another tupple to the original tuple.

In [None]:
t2 = (0,1,0,0)
t2

In [None]:
# with comma ',' --> we can create a tuple of tupples
t1,t2

### Methods

In [None]:
t1

In [None]:
# .count(val) --> counts the apearance of val
t1.count('A')

In [None]:
# .index(val) --> returns the index of val
t1.index('A')

###  Mutable Elements in Tupple
However, tuples can include mutable elements inside, like lists:

In [None]:
t3 = ([1,2,3]),([4,5,6])
t3

In [None]:
type(t3)

In [None]:
t3[0]

In [None]:
type(t3[0])

In [None]:
t3[0][2]

In [None]:
type(t3[0][2])

In [None]:
# Can we assign the value ???
t3[0][2] = 'xxx'
t3
# Yes!
# Lists are mutable

In [None]:
# Error
# Can we assign the value ???
t3[0] = [1,1,1]
t3
# No!
# 'tuple' object does not support item assignment

## Sets

Sets are extracts of data that is unordered and has no duplicates.
<br>If, for example, we define a set based on a phrase, the set will contain only once each of the characters in the phrase.  

### Sets based on str

In [None]:
s1 = set("this is my sample phrase")
s1

In [None]:
s2 = set("here is my second phrase")
s2

### Set based on a list or a tupple
If, we define a set based on a list or a tupple, the set will contain only once each of the elements in the list.

In [None]:
s3 = set(["this is my sample phrase"])
s3

In [None]:
s3 = set(["this is my sample phrase","here is my second phrase"])
s3

In [None]:
# Notice we got only 2 elements
# Cause sets has no duplicates
s3 = set(["this is my sample phrase","here is my second phrase","this is my sample phrase"])
s3

### Another way to create a Set

In [None]:
# Another way to create a set
s4 = {1,2,2,3}
s4

In [None]:
print(type(s1))
print(type(s2))
print(type(s3))
print(type(s4))

### Sets as unordered collection

In [None]:
# Error
# Being an unordered collection, sets do not record element position or order of insertion.
# Accordingly, sets do not support indexing, slicing, or other sequence-like behavior.
s1[0]

### Sets operations & Methods


In [None]:
# letters in s1 but not in s2
s1 - s2

In [None]:
# letters in s1 but not in s2
s1.difference(s2)

In [None]:
# letters in both s1 and s2
s1 & s2

In [None]:
# letters in both s1 and s2
s1.intersection(s2)

In [None]:
# letters in s1 or s2 or both
s1 | s2

In [None]:
# letters in s1 or s2 or both
s1.union(s2)

In [None]:
s1

In [None]:
# letters in s1 or s2 but not both
s1 ^ s2

In [None]:
# letters in s1 or s2 but not both
s1.symmetric_difference(s2)

In [None]:
# Adds an element to the set
s1.add('xxx')
s1

In [None]:
# Removes an element from the set
s1.pop()
s1

In [None]:
# Error
# pop() takes no arguments
s1.pop('h')
s1

In [None]:
# Removes the specified element
s1.remove('h')
s1

## Frozenset

The frozenset() function returns an unchangeable frozenset object (which is like a set object, only unchangeable).

In [1]:
fs = frozenset(['tel_aviv', 'haifa', 'eilat','eilat'])
fs

frozenset({'eilat', 'haifa', 'tel_aviv'})

In [None]:
type(fs)

In [None]:
list(fs)

In [None]:
# Error
fs[1]

In [None]:
# Error
# don't have 'add' method
fs.add('xxx')
fs

In [None]:
# Error
# don't have 'pop' method
fs.pop()
fs

## Dictionary

Dictionaries are a set of keys and values. Keys are immutable elements that could be of any type (characters, integers, etc).
<br>Values are muttable elements that are associated to their respective key.


### Create dict

In [3]:
d1 = {1:'Chevrolet',2:'Fiat',3:'Mazda',4:'Toyota'}
d1

{1: 'Chevrolet', 2: 'Fiat', 3: 'Mazda', 4: 'Toyota'}

In [4]:
d1 = dict([(1,'Chevrolet'),(2,'Fiat'),(3,'Mazda'),(4,'Toyota')])
d1

{1: 'Chevrolet', 2: 'Fiat', 3: 'Mazda', 4: 'Toyota'}

In [5]:
type(d1)

dict

### Call an element
To call an element, we can not use the index of the element. We have to call it by its key.

In [3]:
d1[1]

'Chevrolet'

In [None]:
# Error
# If we call d1[0] we will get an error, as zero is not a key in the dictionary
d1[0]

### Strings can be keys too

In [None]:
d2 = {'Moshe':26, 'David':43, 'Debby':33, 'Sarah':29}
d2

In [None]:
d2['David']

### Incrementally Load

In [None]:
d3 = {}
d3

In [None]:
type(d3)

In [None]:
d3['k1'] = 'avi'
d3

In [None]:
d3['k2'] = 'shir'
d3

### Keys must be Unique and Immutable

In [None]:
# shir turned to haim
d3['k2'] = 'haim'
d3

In [None]:
# Error
# Trying to set list as a key
d3[[1]] = 'haim'
d3

### Important Methods

Dictionaries have 3 important methods:
- keys() that retrieve the elements that define the keys
- values() that retrieve their respective values
- items()  that retrieve keys and values as lists of tupple

provide a way to iterate over a dictionary
(we’ll see it later)

In [None]:
d1.keys()

In [None]:
d2.keys()

In [None]:
d3.keys()

In [None]:
list(d1.keys())

In [4]:
d1.values()

dict_values(['Chevrolet', 'Fiat', 'Mazda', 'Toyota'])

In [None]:
d2.values()

In [None]:
d3.values()

In [5]:
list(d1.values())

['Chevrolet', 'Fiat', 'Mazda', 'Toyota']

In [6]:
d1.items()

dict_items([(1, 'Chevrolet'), (2, 'Fiat'), (3, 'Mazda'), (4, 'Toyota')])

In [None]:
d2.items()

In [None]:
d3.items()

In [7]:
list(d1.items())

[(1, 'Chevrolet'), (2, 'Fiat'), (3, 'Mazda'), (4, 'Toyota')]

### More Methods

In [None]:
d1

In [None]:
# Returns True if k is a key in d, False otherwise
2 in d1

In [None]:
# Returns True if k is a key in d, False otherwise
'a' in d1

In [None]:
d1[2]

In [None]:
# Error
# there isn't key 'a' in d1 dictionary
d1['a']

In [None]:
# Returns d[k] for k in d, otherwise “foo” (default: None)
# Doesn't break the code
d1.get('a','foo')

In [None]:
# Removes the specified key k and returns its value. If k is not found, returns “foo” (if d is not specified, raises KeyError)
d1.pop(2,"foo")

In [None]:
# Removes the specified key k and returns its value. If k is not found, returns “foo” (if d is not specified, raises KeyError)
d1.pop('a',"foo")

In [None]:
# Error
# Removes the specified key k and returns its value. If k is not found, returns “foo” (if d is not specified, raises KeyError)
d1.pop('a')

## Range

In [8]:
range(10)

range(0, 10)

In [None]:
type(range(10))

In [9]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [10]:
range(5,10)

range(5, 10)

In [11]:
list(range(5,10))

[5, 6, 7, 8, 9]

In [12]:
range(5,10,2)

range(5, 10, 2)

In [13]:
list(range(5,10,2))

[5, 7, 9]

In [14]:
r = range(10)
r

range(0, 10)

In [15]:
# .count() --> counts the number of times the element apears
r.count(8)

1

In [16]:
# .index() --> returns the element position
r.index(8)

8

In [17]:
list(r)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# Python Built in Functions

- abs()	Returns the absolute value of a number
- all()	Returns True if all items in an iterable object are true
- any()	Returns True if any item in an iterable object is true
- ascii()	Returns a readable version of an object. Replaces none-ascii characters with escape character
- bin()	Returns the binary version of a number
- bool()	Returns the boolean value of the specified object
- bytearray()	Returns an array of bytes
- bytes()	Returns a bytes object
- callable()	Returns True if the specified object is callable, otherwise False
- chr()	Returns a character from the specified Unicode code.
- classmethod()	Converts a method into a class method
- compile()	Returns the specified source as an object, ready to be executed
- complex()	Returns a complex number
- delattr()	Deletes the specified attribute (property or method) from the specified object
- dict()	Returns a dictionary (Array)
- dir()	Returns a list of the specified object's properties and methods
- divmod()	Returns the quotient and the remainder when argument1 is divided by argument2
- enumerate()	Takes a collection (e.g. a tuple) and returns it as an enumerate object
- eval()	Evaluates and executes an expression
- exec()	Executes the specified code (or object)
- filter()	Use a filter function to exclude items in an iterable object
- float()	Returns a floating point number
- format()	Formats a specified value
- frozenset()	Returns a frozenset object
- getattr()	Returns the value of the specified attribute (property or method)
- globals()	Returns the current global symbol table as a dictionary
- hasattr()	Returns True if the specified object has the specified attribute (property/method)
- hash()	Returns the hash value of a specified object
- help()	Executes the built-in help system
- hex()	Converts a number into a hexadecimal value
- id()	Returns the id of an object
- input()	Allowing user input
- int()	Returns an integer number
- isinstance()	Returns True if a specified object is an instance of a specified object
- issubclass()	Returns True if a specified class is a subclass of a specified object
- iter()	Returns an iterator object
- len()	Returns the length of an object
- list()	Returns a list
- locals()	Returns an updated dictionary of the current local symbol table
- map()	Returns the specified iterator with the specified function applied to each item
- max()	Returns the largest item in an iterable
- memoryview()	Returns a memory view object
- min()	Returns the smallest item in an iterable
- next()	Returns the next item in an iterable
- object()	Returns a new object
- oct()	Converts a number into an octal
- open()	Opens a file and returns a file object
- ord()	Convert an integer representing the Unicode of the specified character
- pow()	Returns the value of x to the power of y
- print()	Prints to the standard output device
- property()	Gets, sets, deletes a property
- range()	Returns a sequence of numbers, starting from 0 and increments by 1 (by default)
- repr()	Returns a readable version of an object
- reversed()	Returns a reversed iterator
- round()	Rounds a numbers
- set()	Returns a new set object
- setattr()	Sets an attribute (property/method) of an object
- slice()	Returns a slice object
- sorted()	Returns a sorted list
- staticmethod()	Converts a method into a static method
- str()	Returns a string object
- sum()	Sums the items of an iterator
- super()	Returns an object that represents the parent class
- tuple()	Returns a tuple
- type()	Returns the type of an object
- vars()	Returns the __dict__ property of an object
- zip()	Returns an iterator, from two or more iterators
<br>

## Let's try a few

In [None]:
# abs()	Returns the absolute value of a number
abs(-1)

In [None]:
# format()	Formats a specified value
x = format(0.5, '%')
print(x)
print(type(x))

In [None]:
x = x[:5] + x[-1]
x

In [None]:
# round()	Rounds a numbers
print(round(8.7))
print(round(8.7,1))
print(round(8.769,2))
print(round(8.2))
print(round(8.5))

In [None]:
# len()	Returns the length of an object
l=[1,2,3,'a','b','c']
len(l)

In [None]:
# reversed()	Returns a reversed iterator
l=[1,2,3,'a','b','c']
list(reversed(l))

In [None]:
# sorted()	Returns a sorted list
l=[1,3,2,4]
list(sorted(l))

In [None]:
# iter()	Returns an iterator object
# next()	Returns the next item in an iterable
l=[1,2,3,'a','b','c']
it = iter(l)
print(it)
print(type(it))

print('')
print('')

print(next(it))
print(next(it))
print(next(it))
print(next(it))
print(next(it))
print(next(it))

In [18]:
# zip()	Returns an iterator, from two or more iterators
l1 = [1,2,3]
l2 = ['a','b','c']

z = zip(l1, l2)
list(z)

[(1, 'a'), (2, 'b'), (3, 'c')]

## Aggregate Functions

In [None]:
# max()	Returns the largest item in an iterable
n = [1,2,3]
max(n)

In [None]:
# min()	Returns the smallest item in an iterable
n = [1,2,3]
min(n)

In [None]:
# sum()	Sums the items of an iterator
n = [1,2,3]
sum(n)

In [None]:
# Error
# There's no "count()" or "avg()" or "mean()" fun
count(n)
avg(n)
mean(n)

## Cast

In [None]:
# float()	Returns a floating point number
float(5)

In [None]:
# int()	Returns an integer number
int(5.67)

In [None]:
# str()	Returns a string object
str(98)

In [None]:
# bool()	Returns the boolean value of the specified object
print(bool(8))
print(bool(-8))
print(bool(0))
print(bool(-18))
print(bool(-1))
print(bool(1))
print(bool('e'))

In [None]:
# list()	Returns a list
l = list((1, 2, 3.5, 'ABC', False))
l

In [None]:
# tuple()	Returns a tuple
t = tuple([1, 2, 3.5, 'ABC', False])
t

In [None]:
# dict()	Returns a dictionary (Array)
d = dict([(1,'avi'),(2,'dani'),(3,'gili')])
d

In [None]:
# set()	Returns a new set object
s = set('hello')
s

## Couple More Fun

In [19]:
# dir()	Returns a list of the specified object's properties and methods
l=[1,2,3,'a','b','c']
dir(l)

['__add__',
 '__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

In [None]:
# help()	Executes the built-in help system
help()

Welcome to Python 3.12's help utility! If this is your first time using
Python, you should definitely check out the tutorial at
https://docs.python.org/3.12/tutorial/.

Enter the name of any module, keyword, or topic to get help on writing
Python programs and using Python modules.  To get a list of available
modules, keywords, symbols, or topics, enter "modules", "keywords",
"symbols", or "topics".

Each module also comes with a one-line summary of what it does; to list
the modules whose name or summary contain a given string such as "spam",
enter "modules spam".

To quit this help utility and return to the interpreter,
enter "q" or "quit".



help>  sort


No Python documentation found for 'sort'.
Use help() to get the interactive help utility.
Use help(str) for help on the str class.

