# Python Tutorials: Data Structures

### Author: Dr. Owen Chen

### Date: 2023 Fall

### Data structures covered in this tutorial:

<a id="toc"></a>
1. [Lists,Tuples,Strings and Sequences](#list)
2. [Stack](#stack)
3. [Queue](#queue)
4. [Dictionaries](#dict)
5. [Sets](#set)

<a id='list'></a>
## Lists, Tuples, Strings and Sequences 

### Sequences
Lists, tuples, and strings are all Python sequences, and share many of the same methods.

## List

### Creating an empty list

In [None]:
empty = []
empty

[]

In [88]:
empty=list()
empty

[]

In [89]:
type(empty)

list

### Using square brackets with initial values

In [90]:
numbers = [1, 2, 3]
numbers


[1, 2, 3]

### Casting an iterable
Any iterable can be cast to a list

In [91]:
numbers = list(range(10))
numbers

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

### Creating using multiplication

In [93]:
num_players = 10
scores = [1] * num_players
scores

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

### Mixing data types
Lists can contain multple data types

In [None]:
mixed = ['a', 1, 2.0, [13], {}]
mixed

['a', 1, 2.0, [13], {}]

### Indexing
Items in lists can be accessed using indices in a similar fashion to strings.

#### Access first item

In [None]:
numbers[0]


0

#### Access last item

In [None]:
numbers[-2]

8

#### Access any item

In [None]:
numbers[4]

4

### Adding to a list

#### Append to the end of a list

In [None]:
letters = ['a']
letters.append('c')
letters

['a', 'c']

#### Insert at beginning of list

In [None]:
letters.insert(0, 'b')
letters

['b', 'a', 'c']

#### Insert at arbitrary position

In [None]:
letters.insert(2, 'c')
letters

['b', 'a', 'c', 'c']

#### Extending with another list

In [None]:
more_letters = ['e', 'f', 'g']
letters.extend(more_letters)
letters

['b', 'a', 'c', 'c', 'e', 'f', 'g']

### Change item at some position

In [None]:
letters[3] = 'd'
letters

['b', 'a', 'c', 'd', 'e', 'f', 'g']

### Swap two items

In [None]:
letters[0], letters[1] = letters[1], letters[0]
letters

['a', 'b', 'c', 'd', 'e', 'f', 'g']

### Removing items from a list

#### Pop from the end

In [None]:
letters = ['a', 'b', 'c', 'd', 'e', 'f']
letters.pop()
letters

['a', 'b', 'c', 'd', 'e']

#### Pop by index

In [None]:
letters.pop(2)
letters

['a', 'b', 'd', 'e']

#### Remove specific item

In [None]:
letters.remove('d')
letters

['a', 'b', 'e']

## 1.2 String
A string is a list with each element of a character data type.

In [2]:
# A string is contained within 2 quotes:
s = "Hello, World!"
s

'Hello, World!'

In [94]:
# Can also use single quotes:
s = 'Hello, World!'
s

'Hello, World!'

In [95]:
s[0]

'H'

In [12]:
# A string can be spaces and digits:
s = '1 2 3 4 5 6 7'
s

'1 2 3 4 5 6 7'

In [13]:
type(s)

str

In [14]:
# compare to a list of numbers
s = [1,2,3,4,5,6,7]
s

[1, 2, 3, 4, 5, 6, 7]

In [15]:
type(s)

list

### Indexing on a string is the same as indexing on a list

In [5]:
s[0]

'1'

In [6]:
s[-1]

'7'

In [7]:
# a slice - 3rd character (inclusive) to the 6th character (noninclusive)
s[2:5]

'2 3'

In [8]:
# 4th character and onward:
s[3:]

' 3 4 5 6 7'

### Add and Multiple strings

In [96]:
s1 = "Welcome"
s2 = "to the Python class!"
s = s1 + " " + s2
s

'Welcome to the Python class!'

In [97]:
# a dash line with 60 of "-"
s3 = "-"
s = s3*60
s

'------------------------------------------------------------'

## 1.3 Tuples

### Create tuple using brackets

In [None]:
tup = (1, 2, 3)
tup

(1, 2, 3)

### Create tuple with commas

In [None]:
tup = 1, 2, 3
tup

(1, 2, 3)

### Create empty tuple

In [None]:
tup = ()
tup

()

### Create tuple with single item

In [None]:
tup = 1,
tup

(1,)

### Behaviours shared by lists and tuples
The following sequence behaviors are shared by lists and tuples

### Check item in sequence

In [None]:
3 in (1, 2, 3, 4, 5)

True

### Check item not in sequence

In [None]:
'a' not in [1, 2, 3, 4, 5]

True

### Slicing

#### Setting start, slice to the end

In [None]:
letters = 'a', 'b', 'c', 'd', 'e', 'f'
letters[3:4]


('d',)

#### Set end, slice from beginning

In [None]:
letters[:4]

('a', 'b', 'c', 'd')

#### Index from end of sequence

In [None]:
letters[-4:]

('c', 'd', 'e', 'f')

#### Setting step

In [None]:
letters[1::-2]

('b',)

### Unpacking

In [85]:
#### Error if the number of unpacked values do not match the assigned variables

In [None]:
first, middle = [1, 2, 3]

f"first = {first},  middle = {middle},  last = {last}"

ValueError: ignored

### Extended unpacking

In [None]:
first, *middle, last = (1, 2, 3, 4, 5)

f"first = {first},  middle = {middle},  last = {last}"

'first = 1,  middle = [2, 3, 4],  last = 5'

<a id = "stack"></a>
# 2. Stack 

[Go back to Table of Content](#toc)

A stack is a linear data structure that stores items in a Last-In/First-Out (LIFO) or First-In/Last-Out (FILO) manner. 

In stack, a new element is added at one end and an element is removed from that end only.

The insert and delete operations are often called push and pop.

The functions associated with stack are:

- empty() – Returns whether the stack is empty – Time Complexity: O(1)
- size() – Returns the size of the stack – Time Complexity: O(1)
- top() – Returns a reference to the topmost element of the stack – Time Complexity: O(1)
- push(a) – Inserts the element ‘a’ at the top of the stack – Time Complexity: O(1)
- pop() – Deletes the topmost element of the stack – Time Complexity: O(1)

### Python Stack Implementation
Stack in Python can be implemented using the following ways:

- list
- Collections.deque
- queue.LifoQueue

## 2.1 Use List as Stack
A stack is a LIFO (last in, first out) data structure which can be simulated using a list

#### Push onto the stack using append

In [98]:
stack = []
 
# append() function to push
# element in the stack
stack.append('first')
stack.append('second')
stack.append('third')
 
print('Initial stack:')
stack

Initial stack:


['first', 'second', 'third']

### Retrieve items, last one first using **pop**

In [99]:
# pop() function to pop
# element from stack in
# LIFO order
print('\nElements popped from stack:')
print(stack.pop())
print(stack.pop())
print(stack.pop())
 
print('\nStack after elements are popped:')
print(stack)
 
# uncommenting print(stack.pop())
# will cause an IndexError
# as the stack is now empty


Elements popped from stack:
third
second
first

Stack after elements are popped:
[]


## 2.2 Use collections.deque as Stack
    

In [23]:
from collections import deque

stack = deque()

# append() function to push
# element in the stack
stack.append('first')
stack.append('second')
stack.append('third')

print('Initial stack:')
print(stack)

# pop() function to pop
# element from stack in
# LIFO order
print('\nElements popped from stack:')
print(stack.pop())
print(stack.pop())
print(stack.pop())

print('\nStack after elements are popped:')
print(stack)

# uncommenting print(stack.pop())
# will cause an IndexError
# as the stack is now empty


Initial stack:
deque(['first', 'second', 'third'])

Elements popped from stack:
third
second
first

Stack after elements are popped:
deque([])


## 2.3 Use queue module

In [102]:
from queue import LifoQueue

# Initializing a stack
stack = LifoQueue(maxsize = 3)

# qsize() show the number of elements
# in the stack
print(stack.qsize())

# put() function to push
# element in the stack
stack.put('first')
stack.put('second')
stack.put('third')

print("Full: ", stack.full())
print("Size: ", stack.qsize())

# get() function to pop
# element from stack in
# LIFO order
print('\nElements popped from the stack')
print(stack.get())
print(stack.get())
print(stack.get())

print("\nEmpty: ", stack.empty())


0
Full:  True
Size:  3

Elements popped from the stack
third
second
first

Empty:  True


<a id = "queue"></a>
# 3. Queue
[Go back to Table of Content](#toc)

As a stack, the queue is a linear data structure, but the queue stores items in a First In First Out (FIFO) manner. 
With a queue, the least recently added item is removed first. 

Operations associated with **queue** are:

- **Enqueue**: Adds an item to the queue. If the queue is full, then it is said to be an Overflow condition – Time Complexity: O(1)
- **Dequeue**: Removes an item from the queue. The items are popped in the same order in which they are pushed. If the queue is empty, then it is said to be an Underflow condition – Time Complexity: O(1)
- **Front**: Get the front item from queue – Time Complexity: O(1)
- **Rear**: Get the last item from queue – Time Complexity: O(1)


Queue in Python can be implemented in the following ways:

- list
- collections.deque
- queue.Queue

## 3.1 Use List as Queue


### Use append() to add an item

In [100]:
# Initializing a queue
queue = []

# Adding elements to the queue
queue.append('first')
queue.append('second')
queue.append('third')

print("Initial queue:")
queue

Initial queue:


['first', 'second', 'third']

### Use pop(0) to pop an item

In [101]:
# Removing elements from the queue
print("\nElements dequeued from queue")
print(queue.pop(0))
print(queue.pop(0))
print(queue.pop(0))

print("\nQueue after removing elements")
print(queue)

# Uncommenting print(queue.pop(0))
# will raise and IndexError
# as the queue is now empty


Elements dequeued from queue
first
second
third

Queue after removing elements
[]


## 3.2 Use collections.deque as Queue

In [30]:
from collections import deque

# Initializing a queue
q = deque()

# Adding elements to a queue
q.append('first')
q.append('second')
q.append('third')

print("Initial queue:")
q

Initial queue:


deque(['first', 'second', 'third'])

In [31]:
# Removing elements from a queue
print("\nElements dequeued from the queue")
print(q.popleft())
print(q.popleft())
print(q.popleft())

print("\nQueue after removing elements")
print(q)

# Uncommenting q.popleft()
# will raise an IndexError
# as queue is now empty


Elements dequeued from the queue
first
second
third

Queue after removing elements
deque([])


## 3.3 Use the queue module

In [34]:
from queue import Queue

# Initializing a queue
q = Queue(maxsize = 3)

# qsize() give the maxsize
# of the Queue
print("q initial size:", q.qsize())

# Adding of element to queue
q.put('first')
q.put('second')
q.put('third')

# Return Boolean for Full
# Queue
print("q size:", q.qsize())
print("\nFull: ", q.full())

q initial size: 0
q size: 3

Full:  True


In [None]:
# Removing element from queue
print("\nElements dequeued from the queue")
print(q.get())
print(q.get())
print(q.get())

# Return Boolean for Empty
# Queue
print("\nEmpty: ", q.empty())

q.put(1)
print("\nEmpty: ", q.empty())
print("Full: ", q.full())

# This would result into Infinite
# Loop as the Queue is empty.
# print(q.get())

In [35]:
# Removing element from queue
print("\nElements dequeued from the queue")
print(q.get())
print(q.get())
print(q.get())

# Return Boolean for Empty
# Queue
print("\nEmpty: ", q.empty())

q.put(1)
print("\nEmpty: ", q.empty())
print("Full: ", q.full())

# This would result into Infinite
# Loop as the Queue is empty.
# print(q.get())


Elements dequeued from the queue
first
second
third

Empty:  True

Empty:  False
Full:  False


<a id = "dict"></a>
# 4. Dictionaries 
[Go back to Table of Content](#toc)

Dictionaries are mappings of key value pairs.

### Create an empty dict using constructor

In [None]:
dictionary = {}
dictionary

{}

### Create a dictionary based on key/value pairs

In [None]:
key_values = [['key-1','value-1'], ['key-2', 'value-2']]
dictionary = dict(key_values)
dictionary

{'key-1': 'value-1', 'key-2': 'value-2'}

### Create an empty dict using curley braces

In [None]:
dictionary = {}
dictionary

{}

### Use curley braces to create a dictionary with initial key/values

In [None]:
dictionary = {'key-1': 'value-1',
              'key-2': 'value-2'}

dictionary

{'key-1': 'value-1', 'key-2': 'value-2'}

### Access value using key

In [None]:
dictionary['key-1']

'value-1'

### Add a key/value pair to an existing dictionary

In [None]:
dictionary['key-3'] = 'value-3'

dictionary

{'key-1': 'value-1', 'key-2': 'value-2', 'key-3': 'value-3'}

### Update value for existing key

In [None]:
dictionary['key-2'] = 'new-value-2'
dictionary['key-2']

'new-value-2'

### Get keys

In [None]:
list(dictionary.keys())

['key-1', 'key-2', 'key-3']

### Get values

In [None]:
dictionary.values()

dict_values(['value-1', 'new-value-2', 'value-3'])

### Get iterable keys and items

In [None]:
dictionary.items()

dict_items([('key-1', 'value-1'), ('key-2', 'new-value-2'), ('key-3', 'value-3')])

### Use items in for loop

In [None]:
for key, value in dictionary.items():
  print(f"{key}: {value}")

key-1: value-1
key-2: new-value-2
key-3: value-3


### Check if dictionary has key
The 'in' syntax we used with sequences checks the dicts keys for membership.

In [None]:
'key-5' in dictionary

False

### Get method

In [None]:
dictionary.get("bad key", "default value")

'default value'

### Remove item

In [None]:
del(dictionary['key-1'])
dictionary

{'key-2': 'new-value-2', 'key-3': 'value-3'}

### Keys must be immutable

#### List as key
Lists are mutable and not hashable

In [None]:
items = ['item-1', 'item-2', 'item-3']

map = {}

map[items] = "some-value"

TypeError: ignored

#### Tuple as a key
Tuples are immutable and hence hashable

In [None]:
items = 'item-1', 'item-2', 'item-3'
map = {}
map[items] = "some-value"

map

{('item-1', 'item-2', 'item-3'): 'some-value'}

<a id = "set"></a>
# 5. Sets
[Go back to Table of Content](#toc)

### Create set from tuple or list

In [None]:
letters = 'a', 'a', 'a', 'b', 'c'
unique_letters = set(letters)
unique_letters

{'a', 'b', 'c'}

### Create set from a string

In [None]:
unique_chars = set('mississippi')
unique_chars

{'i', 'm', 'p', 's'}

### Create set using curley braces

In [None]:
unique_num = {1, 1, 2, 3, 4, 5, 5}
unique_num

{1, 2, 3, 4, 5}

### Adding to a set

In [None]:
unique_num.add(6)
unique_num

{1, 2, 3, 4, 5, 6}

### Popping from a set
Pop method removes and returns a random element of the set

In [None]:
unique_num.pop()

2

### Indexing
Sets have no order, and hence cannot be accessed via indexing

In [None]:
unique_num[4]

TypeError: ignored

### Checking membership

In [None]:
3 in unique_num

True

### Set operations

In [None]:
s1 = { 1 ,2 ,3 ,4, 5, 6, 7}
s2 = { 0, 2, 4, 6, 8 }

#### Items in first set, but not in the second

In [None]:
s1 - s2

{1, 3, 5, 7}

#### Items in either or both sets

In [None]:
s1 | s2

{0, 1, 2, 3, 4, 5, 6, 7, 8}

#### Items in both sets

In [None]:
s1 & s2

{2, 4, 6}

#### Items in either set, but not both

In [None]:
s1 ^ s2

{0, 1, 3, 5, 7, 8}

<a id = "pandas"></a>
# 8. Pandas DataFrame
[Go back to Table of Content](#toc)

*   One of the most highly leveraged data structures for data science
*   A table-like two dimensional data structure. 


### Create a DataFrame

In [None]:
import pandas as pd
first_names = ['henry', 'rolly', 'molly', 'frank', 'david', 'steven', 'gwen', 'arthur']
last_names = ['smith', 'brocker', 'stein', 'bach', 'spencer', 'de wilde', 'mason', 'davis']
ages = [43, 23, 78, 56, 26, 14, 46, 92]

df = pd.DataFrame({ 'first': first_names, 'last': last_names, 'age': ages})
df

Unnamed: 0,age,first,last
0,43,henry,smith
1,23,rolly,brocker
2,78,molly,stein
3,56,frank,bach
4,26,david,spencer
5,14,steven,de wilde
6,46,gwen,mason
7,92,arthur,davis


### Head - looking at the top

In [None]:
df.head(10)

Unnamed: 0,age,first,last
0,43,henry,smith
1,23,rolly,brocker
2,78,molly,stein
3,56,frank,bach
4,26,david,spencer
5,14,steven,de wilde
6,46,gwen,mason
7,92,arthur,davis


### Setting number of rows returned with head

In [None]:
df.head(3)

### Tail - looking at the bottom

In [None]:
df.tail(2)

Unnamed: 0,age,first,last
6,46,gwen,mason
7,92,arthur,davis


### Describe - descriptive statistics

In [None]:
df.describe()

Unnamed: 0,age
count,8.0
mean,47.25
std,27.227874
min,14.0
25%,25.25
50%,44.5
75%,61.5
max,92.0


### Access one column

In [None]:
df['first']

0     henry
1     rolly
2     molly
3     frank
4     david
5    steven
6      gwen
7    arthur
Name: first, dtype: object

### Slice a column

In [None]:
df['first'][4:]

4     david
5    steven
6      gwen
7    arthur
Name: first, dtype: object

### Use conditions to filter

In [None]:
df[df['age'] > 50]

Unnamed: 0,age,first,last
2,78,molly,stein
3,56,frank,bach
7,92,arthur,davis


<a id = "series"></a>
# 9. Pandas Series

[Go back to Table of Content](#toc)

*   A one dimensional labeled array
*   Contains data of only one type
*   Similar to a column in a spreedsheet




### Create a series

In [None]:
pd_series = pd.Series( [1, 2, 3 ] )
pd_series

0    1
1    2
2    3
dtype: int64

### Series introspection methods

In [None]:
f"This series is made up of {pd_series.size} items whose data type is {pd_series.dtype}"

'This series is made up of 3 items whose data type is int64'

### A Pandas DataFrame is composed of Pandas Series. 

In [None]:
age = df.age
type( age )

pandas.core.series.Series

### Some useful helper methods of a Series

#### mean

In [None]:
pd_series = pd.Series([ 1, 2, 3, 5, 6, 6, 6, 7, 8])
pd_series.mean()

4.888888888888889

#### Unique

In [None]:
pd_series.unique()

array([1, 2, 3, 5, 6, 7, 8])

#### Max

In [None]:
pd_series.min()

1

# References:
[Lists](https://docs.python.org/3/tutorial/datastructures.html)

[Tuples and sequences](https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences)

[Dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries)


