![alt text](python.png "Title")

# Data Structures 

Python has several important types of structures to hold data. It's essential to know them.

https://docs.python.org/3/tutorial/datastructures.html

## Lists

In [0]:
# Lists are ordered, mutable collections of heteregonous objects. They are very handy!
# List items are enclosed in square braquets and comma delimited.

a = []     # Declares an empty list
a = list() # Same thing

# let's create a list of 3 objects
a = ['Hello', 1, True, ] # It's OK to leave a comma at the end, but not mandatory

print ( 'a: ', a )  # see how it's printed out
print ( len(a)   )  # number of items in the list
print ( type (a) )  # it's a list

# Lists can be nested 
b = [ [1, 2, 3], [4, 5, 6]]
print("b: ", b)

# Lists can be joined:
c = [1, 2, 3] + [4, 5, 6]
print("c: ", c)

In [0]:
# Slices: access the content by position. In Python, an iteration always starts at zero!
a = [ [1, 2, 3], [4, 5, 6]]

print ('a       =', a          )
print ('a[0]    =', a[0]       ) # First item in the list, which is a  list
print ('a[0][1] =', a[0][1]    ) # First item in the list & second item in the nested list, which is an integer
print ('a[-1]   =', a[-1], '\n') # Last item in the list, which is a list

# A slice can also be a range:
b = [ 1, 2, 3, 4, 5, 6]

print ('b        =',  b       )
print ('b[0:2]   =',  b[0:2]  ) # first and second item (the second range number is NOT inclusive )
print ('b[:2]    =',  b[:2]   ) # same 
print ('b[2:]    =',  b[2:]   ) # from third item till the last item
print ('b[2:-1]  =',  b[2:-1] ) # from third item till the second to last item
print ('b[-3:-1] =',  b[-3:-1])  
print ('b[-1:-3] =',  b[-1:-3]) # Makes no sense, so it's empty
print ('b[::2]   =',  b[::2])   # using increment, in that case skipping every other value, from left to right
print ('b[::-2]   =',  b[::-2]) # same but from right to left

![alt text](slices.jpg "Title")

In [0]:
# Lists are iterable objects. We'll cover the 'for' loops later, for now note that you can iterate on the items in a list
for item in ["Hello", "world", '!']:
    print(item)

In [0]:
# We can test the existence of a value in a list. Such tests return boolean objects:
a = ['Hello', 'world']
print ('Hello' in a)
print ('hello' in a)

In [0]:
# More functions & methods:

# del(): remove an item by position
a = [1, 2, 3]
del (a[1]) # remove the second object in the list

print('a: ', a)

# pop(): remove the last item
b = [1, 2, 3]
b.pop()
print('b: ', b)

# append(): add an item at the end of the list
c = [1, 2, 3]
c.append(4)
print('c: ', c)

# sorted(): function to lexicographically sort list items
d = ['b1', 'a2', 'a1', 'b5', ]
d = sorted(d)
print ('d:', d )

# sort(): **in-place** method to lexicographically sort list items
e = ['b1', 'a2', 'a1', 'b5', ]
e.sort() # This overwrites the original list
print ('e:', e )

# join() converts a list to a string, with a delimiter. This does the opposite of split()
joined = '@'.join(['Hello', 'world']) 
print(joined)
print(type(joined))

In [0]:
a = [1, 2, 3]

# List comprehensions are a compact & pythonic way to create lists. It's a very cool feature that you should know
squared = [number**2 for number in a]
print(squared)

# You can achieve the same using a 'for' loop, but this is not the best way
squared = []
for number in a:
    squared.append(number**2)    

In [0]:
# Using comprehensions, you can apply a function on every item in a list:
a = ['Hello', 'World']
[item.upper() for item in a]

In [0]:
# Comprehensions also support if statements. For example, we could create a list of even numbers:
[n for n in [0,1,2,3,4,5,6,7,8,9,10] if n % 2 == 0]

In [0]:
# To create a list of integers, use the 'range' function:
a = range(10) # It can takes options for starting point, increment etc.
print(a)      # The outcome is not an actual list but is an iterable object

# Turns the range object into an actual list
print(list(a))

In [0]:
# all() and any() functions take iterables as input

# Are all the values equal to True?
print( all([True, True, False]) )
print( all([1>0, True, True ])  )

# Do we have at least one True?
print( any([True, True, False])   )
print( any([False, False, False]) )

## Tuples

Tuples are very similar to lists, with one key difference: lists are mutable (i.e values can be updated) and tuples are not. Because of that, tuples are more efficient to process. Use them if you know you won't need to update them. The rest is the same (iterability, slicing etc.)

In [0]:
# Create a list and add an item. That will work.
myList = ['hello']
myList.append('world')

# Create a tuple (using parentheses, not square brackets) and add an item. That will fail.
myTuple = ('hello')
myTuple.append('world')

In [0]:
# Tuple comprehension. Note: we need to specify 'tuple', because it would be ambigious with just parenthesis
tuple(letter for letter in 'hello')

In [0]:
# Convert a tuple to a list and back to a tuple, in case you do need to change it :-)
a = ('hello', 'world')

a = list(a)
a[0] = 'Hello'
a = tuple(a)

print(a)

## Set

Somehow similar to lists, with some differences
* sets are unordered, you can't rely on the item order (therefore no slicing here)
* sets do not allow duplicate values
* set items are not directly mutable, but you can add items using add() or update()

In [0]:
# Converting a list to a set removes the duplicates. This is really useful!
a = set( [1, 2, 3, 3, 4] )
print(a) # Note the curly brackets
print(type(a))

# You can convert back to a list:
a = list(a)
print(type(a))

In [0]:
# Create set with curly brackets:
a = {1, 2, 3, 3, 4}
a

In [0]:
# Sets are not ordered, so you can NOT access values using indexes
a = {1, 2, 3, 3, 4}
# a[1] # that will crash

# However, you can iterate on a set and retrieve the values:
for item in a: 
    print(item)
    
# Like in a list, check for existence of an item:
print( 2 in a)
print( 20 in a)

In [0]:
# Sets have useful methods for content comparisons:
setA= {'a', 'b', 'd'}
setB= {'a', 'c', 'e'}

# A few examples:
print('Union: ',        setA.union(setB))
print('Intersection: ', setA.intersection(setB))
print('Not in both: ',  setA.symmetric_difference(setB))

In [0]:
# Set comprehension: a proof that sets are not ordered:
s = {letter for letter in 'hello world'}
s

## Strings

In [0]:
# A string can be seen as a list of characters
a = 'Hello!'

# We can slice a string. As with lists, start point from 0 and end point is excluded.
print('a[0]    :' , a[0])
print('a[-1]   :' , a[-1])
print('a[0:2]  :' , a[0:2])
print('a[-3:]  :' , a[-3:])
print('a[-3:-1]:' , a[-3:-1])

In [0]:
# You can use single, double or triple quotes:

# a and b are the same
a = 'Hello'
b = "Hello"

# You can protect single or double quotes with their opposites
c = "I'm here."
d = 'I am "here".'

# Triple quotes protect single and double quotes:
e = ''' I'm "here" '''

# Use the \ sign to indent the way you like. Useful when the line gets too long
a = "hello this is " \
    "a string written on 2 lines"
print(a)

In [0]:
# Concatenate strings with the + operator:
'Hello' + " world!"

In [0]:
# As with all iterable objects, you can use len() to retrieve the number of elements:
long_word = "Supercalifragilisticexpialidocious"
len(long_word)

In [0]:
# String multiplication
comments = "/*" + "*" * (len(long_word)+2) + "*/"

print (comments)
print ("/*", long_word, '*/')
print (comments )

In [0]:
# You can easily create a list out of a string with split(). Default delimiter is space.
# This does the opposite of join()

Mylist = 'Hello, world !'.split()
print(Mylist)

# using a different delimiter:
Mylist = 'Hello, world !'.split(', ')
print(Mylist)

# Chain splits:
comment = "patient (id=123) was discharged"
usubjid = int(comment.split('id=')[1].split(')')[0]) # we convert the result to Integer, otherwise we'd get a string
print (usubjid)
print (type(usubjid))

# we'll do this in a more elegant way with Regex

In [0]:
# Replacing substrings

a = 'Hello, world'
a = a.replace('Hello', 'Hi') # replace is not in-place
print(a)

In [0]:
# Escape characters are using backslashes, i.e \n 

# Python will raise an error because \u is not a valid escape char
#folder = "c:\user\nicolas"  

# You can prefix the string with 'r' to avoid this:
folder = r"c:\user\nicolas"
print(folder)

# or alternatively protect every backslash with an additional backslash:
folder = 'c:\\user\\nicolas'

# Sometimes we do want to use escape characters (\n means return carriage):
print( 'Hello \nworld')
print( r'Hello \nworld')

In [0]:
# Formating strings

usubjid = 123
date = "October 15th"

# let's put these 2 vars in a sentence

# Solution 1: concatenate. Not the easiest or readable way...
comment = "Patient (id=" + str(usubjid) + ") was discharged on " + date

# Solution 2: use format() and curly brackets as placeholders. More option with this (e.g. number of digits)
comment = "Patient (id={}) was discharged on {}".format(usubjid, date)

# Solution 3 and my personal favorite: use the format prefix.
comment = f"Patient (id={usubjid}) was discharged on {date}"

# You can use functions inside the curly brackets:
print (f"There are {len(date)} letters in '{date}'.")

In [0]:
# Reverse a string :-)
"Hello World"[::-1]

## Dictionnaries

Python dicts are unordered, mutable and indexed collections. Think of it as pairs of keys and values.

In [0]:
# You can create dicts with curly brackets, colons as delimiters between keys and their values, and commas between pairs.
contacts = {'Clark': '555-153-0486', 'Lois': '555-594-1647'}
print(contacts)

# Accessing values using the key as index, like a slice in a way:
print ("Clark's phone number:", contacts['Clark'] )

# Dicts are iterable:
print (f'We have {len(contacts)} contacts:')

for n in contacts:
    print('-', n)
    print(contacts[n])

In [0]:
# Dicts are unordered, so this will crash if you thought accessing the first key. It only works if you have a key actually named 0:
contacts[0]

In [0]:
# Get a list of keys/values
contacts.items() # this retrieves a collection of tuples, which can be converted to a list of tuples

In [0]:
# Dictionaries can contain all kinds of objects (dict, list etc)
a = {'Clark': {'phone':  '555-153-0486', 
               'email':  'ICanFly@gmail.com',
               'powers': ['flying', 'Invulnerable', 'X-rays']},
     'Lois' : None}

# Accessing values with chain indexes:
print ( a['Clark']['powers'][0] )

# Dicts are mutable:
a['Clark']['powers'].append('Freezing breath') # In fact, we are modifying a list, which is mutable
a['Clark']['powers'][0] = 'Flying' 
print (a['Clark']['powers'])

In [0]:
# add a new key
contacts['test'] = 'Test value'
contacts

In [0]:
# If the key doesn't exist:
del (contacts['Loiss'])

In [0]:
# Remove a key by name:
del (a['Lois'])

# Python will crash if the key doesn't exist when deleting.
# You can test it beforehand using 'in' (because Dicts are iterable):
if 'Lois' in a:
    del (a['Lois'])
else:
    print('Lois was already deleted from the dictionnary.')

a

# We'll see the try/except block later, which is a better way

In [0]:
# Dict comprehension:
tuples = [(1, 'a'), (2, 'b')]
d = {key:value for key, value in tuples}
print(d)

# it worked because Python unpacks the tuple values:
key, value = ('A','B')
print('key=', key, ', value=', value)

## Copies




When copying (e.g. a=b) mutable objects, you don't create a independent object but a reference!

In [0]:
# Strings are safe: a copy is a copy. Strings are NOT mutable.
old = "Hello"
new = old

# If we change 'old', 'new' remains unchanged:
old = "Hi"
new

In [0]:
# A proof strings are not mutable:
test = "Hello"
test[0] = 'h'

In [0]:
# BE CAREFUL with lists, as they are mutable. A copied list is in fact a reference!!
old = ['Hello', 'world']
new = old

# If we change 'old', 'new' is changed too :-o
old[0] = 'Hi'
new

In [0]:
# and of course, that "works" both ways:
old = ['Hello', 'world']
new = old

new[0] = 'Hi'
old

In [0]:
# So, use the copy() built-in method if you want a independant copy of that list:
old = ['Hello', 'world']
new = old.copy()

old[0] = 'Hi'
new

In [0]:
# Dictionaries are mutable, so same story as lists:
old = {'Clark': '555-153-0486', 'Lois': '555-594-1647'}

# Take a real copy:
new = old.copy()

# or alternatively
new = dict(old)

In [0]:
# Sets are mutable too. You can use copy()
a = {1, 2, 3}
b = a.copy()

In [0]:
# Tuples? Well, tuples are unmutable so you can't use copy() because you don't need to:
(1, 2).copy()

## Memory address

id() returns the memory address of an object. Mutable object references share the same id! This is a great feature to save memory space. Can be dangerous if you don't understand this.

In [0]:
a = ['Hello']
b = a 

# b is a reference to a. They share the same id
print('Before:', id(a), id(b) )

# Let's modify a
a.append('world')

# They still share the same id
print('After :', id(a), id(b) )

In [0]:
# Using copy(), these are now different and independant objects:
c = a.copy()
print('Before:', id(a), id(c) )

In [0]:
# Let's try the same with unmutable objects (like strings or integer)
a = 'Hello'
b = a

# Interestingly a and b share the same id, for now they point to the same address (that's efficient)
print('Before:', 'A:', id(a), ', B:', id(b), '>', {True: 'Same ID', False: 'Different ID'}[id(a)==id(b)])

# And what if we modify a?
a = 'Hi'

# IDs are now different. Not a surprize since we in fact created a object from scratch.
print('Before:', 'A:', id(a), ', B:', id(b), '>', {True: 'Same ID', False: 'Different ID'}[id(a)==id(b)])

__________________________________________________
Nicolas Dupuis, Methodology and Innovation (IDAR C&SP), 2020+