# Python Data Structures

* Understand the different data structures available in Python
* Strings, Lists, Tuples, Dictionaries
* Methods defined on each type 
* What to use for different tasks
* Packages and data structures for numerical data analysis:
    * Numpy vectors, arrays, matrices

## Strings

* Sequence of characters representing text
* In Python, strings are unicode - so can store any characters
* Can use single quote ', double quote " or triple quotes """ 
* Strings are objects, call methods on them
* Check out [the documentation](https://docs.python.org/3/library/string.html) for more

In [9]:
s1 = 'single quoted string might have "quotes in it"'
s2 = "double quoted string could contain it's or that's"
s3 = """triple quoted string
can contain 
newlines"""
s4 = "String containing 中文"
s3

'triple quoted string\ncan contain \nnewlines'

In [10]:
print(s3)

triple quoted string
can contain 
newlines


# Operations on Strings

In [12]:
# convert s4 to uppercase and store as s5
s5 = s4.upper()
s5

'STRING CONTAINING 中文'

In [18]:
# find the first occurence of 'g' in s4
firstg = s4.find('g')
# get all characters after that
s4[firstg:]

'g containing 中文'

# Lists and Tuples

* Lists are sequences of values
* Written inside square brackets, separated by commas
    * ['this', 'is', 'a', 'list', 'of', 'strings'], [3, 4, 6]
    * ['strings', 'and', 3, 9, 2]
    * ['lists', 'containing', ['another', 'list']]
* Lists can be modified, elements added, removed, replaced
* Tuples are just like lists but can't be modified
    * ('this', 'is', 'a', 'tuple')
    * sometimes it's more efficient to use a tuple
* Check out the [documentation](https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range)

In [22]:
list1 = ['this', 'is', 'a', 'list', 'of', 'strings']
list2 = ['embedded', 'list', list1]
list2

['embedded', 'list', ['this', 'is', 'a', 'list', 'of', 'strings']]

In [24]:
# the 'in' operator checks whether something is in a list
'a' in list1

True

In [27]:
if 'a' in list1:
    print("found it")
else:
    print("not found")

found it


# Lists and Loops

In [30]:
text = "To be or not to be that is the question"
# split the string at every space character, generate a list
words = text.split()
print(words)

# create a new empty list
wordlengths = []
for word in words:
    # append the length of this word to our list
    wordlengths.append(len(word))

print(wordlengths)

['To', 'be', 'or', 'not', 'to', 'be', 'that', 'is', 'the', 'question']
[2, 2, 2, 3, 2, 2, 4, 2, 3, 8]


# Tuples vs Lists

In [37]:
# could use a tuple to represent a data record
record = ('steve', 'cassidy', 39)
# can use some of the same operations as lists
print("Length: ", len(record))
print("Second element: ", record[1])

Length:  3
Second element:  cassidy


In [38]:
# but this can't be modified
record.append(21)

AttributeError: 'tuple' object has no attribute 'append'

# Tuples, Lists and Strings

These are all _sequence_ types and share some common operations.

In [55]:
s = ['a', 'list', 'of', 'stuff']
t = [1, 2, 3]

'a' in s         # boolean test
'x' not in s     # boolean test
s + t            # concatenate s and t
t * 3            # t repeated 3 times
s[1]             # second element
s[:2]            # all elements up to the second
s[2:4]           # all elements from 3rd to 4th
min(s)           # smallest element
max(s)           # largest element
s.count('a')     # how many times does 'a' occur

1

# Dictionaries

* Dictionaries are associative arrays
* Associate a key with a value
* Key can be any immutable type (string, number, tuple)
* Value can be anything
* O(1) access to elements (hash table)
* Compare with lists that are O(n)


# Dictionaries

In [57]:
info = dict()
info['name'] = 'Steve Cassidy'
info['age'] = 53
info['weight'] = 80
info

{'name': 'Steve Cassidy', 'age': 53, 'weight': 80}

In [59]:
info = {
    'name': 'Steve Cassidy', 
    'age': 53, 
    'weight': 80
}
info['age']

53

In [60]:
if 'age' in info:
    print("Age: ", info['age'])

Age:  53


# An Example

Let's write some code to use strings, lists and dictionaries to solve a problem. 

Given a text, count how many times each word occurs in the text.


In [70]:
text = """Python is an easy to learn, powerful programming language. 
It has efficient high-level data structures and a simple but effective 
approach to object-oriented programming. Python’s elegant syntax 
and dynamic typing, together with its interpreted nature, make 
it an ideal language for scripting and rapid application development 
in many areas on most platforms."""

# remove punctuation and convert to lowercase
text = text.replace(',',' ').replace('.', ' ').replace('-', ' ')
text = text.lower()
print(text)


python is an easy to learn  powerful programming language  
it has efficient high level data structures and a simple but effective 
approach to object oriented programming  python’s elegant syntax 
and dynamic typing  together with its interpreted nature  make 
it an ideal language for scripting and rapid application development 
in many areas on most platforms 


In [74]:
# split text into words on whitespace
words = text.split()
# intialise a dictionary for word counts
# we will use the word as a key and store the count for each word
count = dict()
# count the words
for word in words:
    if word in count:
        count[word] += 1        # word is already in the dictionary
    else:
        count[word] = 1         # word is not in the dictionary

for word in sorted(count):
    print(word, count[word])

a 1
an 2
and 3
application 1
approach 1
areas 1
but 1
data 1
development 1
dynamic 1
easy 1
effective 1
efficient 1
elegant 1
for 1
has 1
high 1
ideal 1
in 1
interpreted 1
is 1
it 2
its 1
language 2
learn 1
level 1
make 1
many 1
most 1
nature 1
object 1
on 1
oriented 1
platforms 1
powerful 1
programming 2
python 1
python’s 1
rapid 1
scripting 1
simple 1
structures 1
syntax 1
to 2
together 1
typing 1
with 1


# NumPy Data Structures