## ***Data Structures***

Groups and order data in different ways, to help you solve problems.

Specifically, you'll learn about:

Types of Data Structures: Lists, Tuples, Sets, Dictionaries, Compound Data Structures
Operators: Membership, Identity
Built-In Functions and Methods

Data structures are containers or collections of data that organize and group data types together in different ways. You can think of data structures as file folders that have organized files of data inside them.

Practice:

https://www.hackerrank.com/domains/python

https://www.codewars.com/dashboard

('roberto', 'carlos')


<img src="data_structures.png" width="800">

* You can use curly braces to define a set like this: {1, 2, 3}. However, if you leave the curly braces empty like this: {} Python will instead create an empty dictionary. So to create an empty set, use set().
** A dictionary itself is mutable, but each of its individual keys must be immutable.

## **Lists**

Slice and Dice with Lists

You saw that we can pull more than one value from a list at a time by using slicing. When using slicing, it is important to remember that the lower index is inclusive and the upper index is exclusive.

Example

In [1]:
list_of_random_things = [1, 3.4, 'a string', True]
list_of_random_things[1:2]


[3.4]

If you know that you want to start at the beginning, of the list you can also leave out this value.

In [2]:
list_of_random_things[:2]

[1, 3.4]

or to return all of the elements to the end of the list, we can leave off a final element.

In [3]:
list_of_random_things[1:]


[3.4, 'a string', True]

Are You in or not in?

We can also use in and not in to return a bool of whether an element exists within our list, or if one string is a substring of another.

<img src="membership_operators.png" width="800">

Example

In [8]:
print('this' in 'this is a string')

print('in' in 'this is a string')

print('isa' in 'this is a string')

print(5 not in [1, 2, 3, 4, 6])

print(5 in [1, 2, 3, 4, 6])

True
True
False
True
False


## **Mutability and Order**

Mutability refers to whether or not we can change an object once it has been created. If an object can be changed, it is called mutable. However, if an object cannot be changed after it has been created, then the object is considered immutable.

Lists are Mutable

In [9]:
my_lst = [1, 2, 3, 4, 5]
my_lst[0] = 'one'
print(my_lst)

['one', 2, 3, 4, 5]


Strings are Immutable

In [None]:
greeting = "Hello there"
greeting[0] = 'M'

This means you can't change a string once it's been created - you will need to instead create a completely new string.

What we mean by this is, using the same example as above, it is perfectly fine to do the following to change the value of the entire string greeting:

In [None]:
greeting = "Hello there"
greeting = "Mello there"

That second line in Python actually creates a new place in memory where the string greeting is stored, effectively creating a new string, a new object, even though it has the same name.

There are two things to keep in mind for each of the data types you are using:

Are they mutable?
Are they ordered?


Order is about whether the position of an element in the object can be used to access the element. Both strings and lists are ordered. We can use the order to access parts of a list and string

## **Useful Functions for Lists**

len() returns how many elements are in a list.
max() returns the greatest element of the list. How the greatest element is determined depends on what type of objects are in the list. The maximum element in a list of numbers is the largest number. The maximum element in a list of strings is the element that would occur last if the list were sorted alphabetically. This works because the max() function is defined in terms of the greater than comparison operator. The max() function is undefined for lists that contain elements from different, incomparable types.
min() returns the smallest element in a list. min is the opposite of max, which returns the largest element in a list.
sorted() returns a copy of a list in order from smallest to largest, leaving the list unchanged. Note again that for string objects, sorted smallest to largest means sorting in alphabetical order.

## **Useful Methods for Lists**



join method

Join is a string method that takes a list of strings as an argument, and returns a string consisting of the list elements joined by a separator string.

It is important to remember to separate each of the items in the list you are joining with a comma (,). Forgetting to do so will not trigger an error, but will also give you unexpected results.

Example

In this example we use the string "\n" as the separator so that there is a newline between each element. We can also use other strings as separators with .join. Here we use a hyphen.

In [13]:
new_str = "\n".join(["fore", "aft", "starboard", "port"])
print(new_str)

fore
aft
starboard
port


append method

A helpful method called append adds an element to the end of a list.

In [14]:
letters = ['a', 'b', 'c', 'd']
letters.append('z')
print(letters)

['a', 'b', 'c', 'd', 'z']


## **Tuples**

It's a data type for immutable ordered sequences of elements. They are often used to store related pieces of information. 

Example 

Involving latitude and longitude:

In [15]:
location = (13.4125, 103.866667)
print("Latitude:", location[0])
print("Longitude:", location[1])

Latitude: 13.4125
Longitude: 103.866667


Tuples are similar to lists in that they store an ordered collection of objects which can be accessed by their indices. Unlike lists, however, tuples are immutable - you can't add and remove items from tuples, or sort them in place.

Tuples can also be used to assign multiple variables in a compact way.

In [16]:
dimensions = 52, 40, 100
length, width, height = dimensions
print("The dimensions are {} x {} x {}".format(length, width, height))

The dimensions are 52 x 40 x 100


The parentheses are optional when defining tuples, and programmers frequently omit them if parentheses don't clarify the code.

In the second line, three variables are assigned from the content of the tuple dimensions. This is called tuple unpacking. You can use tuple unpacking to assign the information from a tuple into multiple variables without having to access them one by one and make multiple assignment statements.

If we won't need to use dimensions directly, we could shorten those two lines of code into a single line that assigns three variables in one go!

In [17]:
length, width, height = 52, 40, 100
print("The dimensions are {} x {} x {}".format(length, width, height))

The dimensions are 52 x 40 x 100


## **Sets**

A set is a data type for mutable unordered collections of unique elements. One application of a set is to quickly remove duplicates from a list.

In [None]:
numbers = [99, 100, 1, 3, 4, 99, 100]
unique_nums = set(numbers)
print(unique_nums)

Sets support the in operator the same as lists do. You can add elements to sets using the add method, and remove elements using the pop method, similar to lists. Although, when you pop an element from a set, a random element is removed. Remember that sets, unlike lists, are unordered so there is no "last element".

In [24]:
fruit = {"apple", "banana", "orange", "grapefruit"}  # define a set

print("watermelon" in fruit)  # check for element

fruit.add("watermelon")  # add an element
print(fruit)

print(fruit.pop())  # remove a random element
print(fruit)

False
{'grapefruit', 'apple', 'watermelon', 'orange', 'banana'}
grapefruit
{'apple', 'watermelon', 'orange', 'banana'}


Other operations you can perform with sets include those of mathematical sets. Methods like union, intersection, and difference are easy to perform with sets, and are much faster than such operators with other containers.

The following code compares the time it takes to perform union, intersection, and difference operations with both set and list objects.

In [26]:
import time

# Sample data
set1 = set(range(1000))
set2 = set(range(500, 1500))
list1 = list(set1)
list2 = list(set2)

# Union
start_time = time.time()
union_set = set1.union(set2)
print("Set Union Time:", time.time() - start_time)

start_time = time.time()
union_list = list(set(list1 + list2))
print("List Union Time:", time.time() - start_time)

# Intersection
start_time = time.time()
intersection_set = set1.intersection(set2)
print("Set Intersection Time:", time.time() - start_time)

start_time = time.time()
intersection_list = [x for x in list1 if x in set2]
print("List Intersection Time:", time.time() - start_time)

# Difference
start_time = time.time()
difference_set = set1.difference(set2)
print("Set Difference Time:", time.time() - start_time)

start_time = time.time()
difference_list = [x for x in list1 if x not in set2]
print("List Difference Time:", time.time() - start_time)

Set Union Time: 4.553794860839844e-05
List Union Time: 6.031990051269531e-05
Set Intersection Time: 5.221366882324219e-05
List Intersection Time: 9.465217590332031e-05
Set Difference Time: 3.695487976074219e-05
List Difference Time: 6.890296936035156e-05


Note: set time outputs use scientific notations; 9.489...e-05, for example, is equal to 0.00009489...

This output demonstrates that operations performed on set objects are significantly faster than those on list objects. In this example, set operations are approximately 2 to 3 times faster than their list counterparts. The exact speed difference can vary based on factors such as the size of the data and the specific nature of the operations.

## **Dictionaries and Identity Operators**

A dictionary is a mutable data type that stores mappings of unique keys to values. Here's a dictionary that stores elements and their atomic numbers.

In [None]:
elements = {"hydrogen": 1, "helium": 2, "carbon": 6}

In general, dictionaries look like key-value pairs, separated by commas:

{key1:value1, key2:value2, key3:value3, key4:value4, ...}


Dictionaries are mutable, but their keys need to be any immutable type, like strings, integers, or tuples. It's not even necessary for every key in a dictionary to have the same type! For example, the following dictionary is perfectly valid:

In [None]:
random_dict = {"abc": 1, 5: "hello"}

We can look up values in the dictionary using square brackets "[]" around the key, like :

dict_name[key].

Example

In [30]:
elements = {"hydrogen": 1, "helium": 2, "carbon": 6}
elements["hydrogen"]

1

If we then executed print(elements), the output would be:

{'hydrogen': 1, 'carbon': 6, 'helium': 2, 'lithium': 3}

This illustrates how dictionaries are mutable.

What if we try to look up a key that is not in our dictionary, using the square brackets, like elements['dilithium']? This will give you a "KeyError".

We can check whether a key is in a dictionary the same way we check whether an element is in a list or set, using the in keyword. Dictionaries have a related method that's also useful, get. get looks up values in a dictionary, but unlike square brackets, get returns None (or a default value of your choice) if the key isn't found.

If you expect lookups to sometimes fail, get might be a better tool than normal square bracket lookups, because errors can crash your program.

In [31]:
print("carbon" in elements)
print(elements.get("dilithium"))

True
None


## **Identity Operators**

<img src="Identity_Operators.png" width="800">

You can check if a key returned None with the is operator. You can check for the opposite using is not.

In [32]:
n = elements.get("dilithium")
print(n is None)
print(n is not None)

True
False


## **Compound Data Structures**

We can include containers in other containers to create compound data structures. For example, this dictionary maps keys to values that are also dictionaries!

In [34]:
elements = {"hydrogen": {"number": 1,
                         "weight": 1.00794,
                         "symbol": "H"},
              "helium": {"number": 2,
                         "weight": 4.002602,
                         "symbol": "He"}}

We can access elements in this nested dictionary like this.

In [35]:
helium = elements["helium"]  # get the helium dictionary
hydrogen_weight = elements["hydrogen"]["weight"]  # get hydrogen's weight

You can also add a new key to the element dictionary.

In [36]:
oxygen = {"number":8,"weight":15.999,"symbol":"O"}  # create a new oxygen dictionary 
elements["oxygen"] = oxygen  # assign 'oxygen' as a key to the elements dictionary
print('elements = ', elements)

elements =  {'hydrogen': {'number': 1, 'weight': 1.00794, 'symbol': 'H'}, 'helium': {'number': 2, 'weight': 4.002602, 'symbol': 'He'}, 'oxygen': {'number': 8, 'weight': 15.999, 'symbol': 'O'}}


Additional example

For a better understanding of how to add data to a compound data structure, here is an additional example:

In [37]:
student_records = {
    'John': {
        'age': 20,
        'major': 'Computer Science',
        'grades': [85, 90, 92]
    },
    'Emma': {
        'age': 19,
        'major': 'Mathematics',
        'grades': [95, 88, 91]
    }
}

Adding a new student:

In [38]:
student_records['Alex'] = {
    'age': 21,
    'major': 'Physics',
    'grades': [80, 85, 88]
}

Adding a grade for an existing student:

In [39]:
student_records['John']['grades'].append(88)

Printing the updated dictionary:

In [40]:
print(student_records)

{'John': {'age': 20, 'major': 'Computer Science', 'grades': [85, 90, 92, 88]}, 'Emma': {'age': 19, 'major': 'Mathematics', 'grades': [95, 88, 91]}, 'Alex': {'age': 21, 'major': 'Physics', 'grades': [80, 85, 88]}}
