<a href="https://colab.research.google.com/github/stevenkhwun/P4DS/blob/main/Chp02.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Built-In Data Structures, Functions, and Files

This notebook is based on [Chapter 3](https://wesmckinney.com/book/python-builtin) of *Python for Data Analysis (3rd ed.)* by *Wes Mckinney*.

## Data Structures and Sequences

### Dictionary

The dictionary or __`dict`__ may be the most important built-in Python data structure. A dictionary stores a collection of __*key-value*__ pairs, where __*key*__ and __*value*__ are Python objects. Each key is associated with a value so that a value can be conveniently retrieved, inserted, modified, or deleted given a particular key. One approach for creating a dictionary is to use curly braces __`{}`__ and colons to separate keys and values.

**Creating a dictionary**

In [1]:
# Create an empty dictionary
empty_dict = {}

In [2]:
empty_dict

{}

In [3]:
# Create a dictionary
d1 = {"a": "some value", "b": [1, 2, 3, 4]}

In [4]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

*You can __access__, __insert__, or __set__ elements using the syntax as for accessing elements of a list or tuple.*

In [5]:
# Set elements
d1[7] = "an integer"

In [6]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

In [7]:
# Access an element
d1["b"]

[1, 2, 3, 4]

*You can __check__ if a dictionary contains a key using the same syntax used for checking whether a list or tuple contains a value.*

In [8]:
# Check if a dictionary contains a key
"b" in d1

True

*You can delete values using either the __`del`__ keyword or the __`pop`__ method (which simultaneously returns the value and deletes the key).*

In [9]:
# Set up
d1[5] = "some value"

In [10]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer', 5: 'some value'}

In [11]:
# Set up
d1["dummy"] = "another value"

In [12]:
d1

{'a': 'some value',
 'b': [1, 2, 3, 4],
 7: 'an integer',
 5: 'some value',
 'dummy': 'another value'}

In [13]:
# Delete value using del keyword
del d1[5]

In [14]:
d1

{'a': 'some value',
 'b': [1, 2, 3, 4],
 7: 'an integer',
 'dummy': 'another value'}

*The __`pop()`__ method removes the specified item from the dictionary and the value of the removed item is the returned value.*

In [15]:
# Delete value using pop method
ret = d1.pop("dummy")

In [16]:
ret

'another value'

In [17]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

*The syntax of __`pop()`__ method is as follows:*

```Python
# Syntax of pop method
dictionary.pop(keyname, defaultvalue)
```

*Where __`keyname`__ is the item you want to remove.  __`defaultvalue`__ is optional, which is a value to return if the specified key does not exist. If the key is not present and __`defaultvalue`__ is not specified, __`pop()`__ will raise an exception.*

In [18]:
# Keyname is not available
d1.pop("z","Key Not Available")

'Key Not Available'

*The __`keys`__ and __`values`__ method gives you iterators of the dictionary's keys and values, respectively. The order of the keys depends on the order of their insertion, and these functions output the keys and values in the same respective order.*

In [19]:
# Convert the keys into a list
list(d1.keys())

['a', 'b', 7]

In [20]:
# Convert the values into a list
list(d1.values())

['some value', [1, 2, 3, 4], 'an integer']

*If you need to iterate over both keys and values, you can use the __`items`__ method to iterate over the keys and values as 2-tuples.*

In [21]:
# Convert the entries in a dictionary into a list
list(d1.items())

[('a', 'some value'), ('b', [1, 2, 3, 4]), (7, 'an integer')]

*You can merage one dictionary into another using the __`update`__ method.*

In [22]:
# Update a dictionary
d1.update({"b": "foo", "c": 12})

In [23]:
d1

{'a': 'some value', 'b': 'foo', 7: 'an integer', 'c': 12}

__The Python `zip()` function__

*The __`zip()`__ function takes in __iterables__ as arguments and returns an __iterator__. This iterator generates a series of tuples containing elements from each iterable. __`zip()`__ can accept any type of iterable, such as __files__, __lists__, __tuples__, __dictionaries__, __sets__, and so on. You can refer to this [online article](https://realpython.com/python-zip-function/) for more information.*

In [24]:
# Create a zip object
letters = ["a", "b", "c"]
numbers = [1, 2, 3]
zipped = zip(letters, numbers)

In [25]:
zipped       # Holds an iterator object

<zip at 0x2070a3ead40>

In [26]:
type(zipped)

zip

In [27]:
list(zipped)

[('a', 1), ('b', 2), ('c', 3)]

__Creating dictionaries from sequences__

*It's common to occasionally end up with two sequences that you want to pair up element-wise in a dictionay. At first, you can write code to do this.*

In [28]:
# Creating dictionaries using for loop
mapping = {}
for key, value in zip(letters, numbers):
    mapping[key] = value

In [29]:
mapping

{'a': 1, 'b': 2, 'c': 3}

*Since a dictionary is essentially a collection of 2-tuples, the __`dict`__ function accepts a list of 2-tuples.*

In [30]:
# Create a tuple
tuples = zip(letters, numbers)

In [31]:
tuples

<zip at 0x2070a3d1e40>

In [32]:
# Create a dictionary using dict function
mapping = dict(tuples)

In [33]:
mapping

{'a': 1, 'b': 2, 'c': 3}

*Another example:*

In [34]:
# Create a dictionary in one step
dict(zip(range(5), reversed(range(5))))

{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

__Default values__

*It's common to have logic like:*

```Python
# Get default value using loop
if key in some_dict:
    value = some_dict[key]
else:
    value = default_value
```


*The dictionary methods __`get`__ and __`pop`__ can take a default value to be returned, so that the above __`if-else`__ block can be written simply as:*

```Python
# Get default value using get method
value = some_dict.get(key, default_value)
```

__The `get()` method__

*The __`get()`__ method returns the value of the item with the specified key. The syntax is as follows:*

```Python
# Syntax of get method
dictionary.get(keyname, value)
```

*Where __`keyname`__ is the item you want to return the value from and __`value`__, which is optional, is a value to be returned if the specified key does not exist. Default value is none.*

In [35]:
# Use get method
car = {
    "brand": "Ford",
    "model": "Mustang",
    "year": 1964
}
car.get("brand")

'Ford'

In [36]:
# Return none if key not available
car.get("origin")

In [37]:
# Return specified value
car.get("origin", "Key Not Available")

'Key Not Available'

__`setdefault()` dictionary method__

*The __`setdefault()`__ method returns the value of the item with the specified key. If the key does not exist, insert the key, with the specified value. The syntax of the method is as follow:*

```Python
# Syntax of setdefault() method
dictionary.setdefault(keyname, value)
```

*Where `keyname` is the item you want to return the value from. `value` is optional. If the key exist, this parameter has no effect. If the key does not exist, the value becomes the key's value. Default value is none.*

In [38]:
# Recall the car dictionary
car

{'brand': 'Ford', 'model': 'Mustang', 'year': 1964}

In [39]:
# Get the value of 'model'
car.setdefault("model", "Bronco")   # Since the key exist, the parameter 'Bronco' has no effect

'Mustang'

In [40]:
# Recall the car dictionary
car

{'brand': 'Ford', 'model': 'Mustang', 'year': 1964}

In [41]:
# Set a value for colour, which this key does not exist
car.setdefault("colour", "yellow")

'yellow'

In [42]:
# Recall the car dictionary
car

{'brand': 'Ford', 'model': 'Mustang', 'year': 1964, 'colour': 'yellow'}

*Imagine you want to categorize a list of words by their first letters as a dictionary of lists.*

In [43]:
# Without using .setdefault() method
words = ["apple", "bat", "bar", "atom", "book"]
by_letter = {}
for word in words:
    letter = word[0]
    if letter not in by_letter:
        by_letter[letter] = [word]
    else:
        by_letter[letter].append(word)

In [44]:
by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

*The __`.setdefault()`__ dictionary method can be used to simplify this workflow.*

In [45]:
# Use .setdefaul() method
by_letter = {}
for word in words:
    letter = word[0]
    by_letter.setdefault(letter, []).append(word)

In [46]:
by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

*The built-in __`collections`__ module has a useful class, __`defaultdict`__, which makes this even easier. To create one, you pass a type of function for generating the default value for each slot in the dictionary.*

In [47]:
# Create dictionary using module collections
from collections import defaultdict
by_letter = defaultdict(list)
for word in words:
    by_letter[word[0]].append(word)

In [48]:
by_letter

defaultdict(list, {'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']})

__Valid dictionary key types__

*While the values of a dictionary can be nay Python object, the keys generally have to be immutable objects like scalar types (int, float, string) or tuples (all the objects in the tuple need to be immutable, too). The technical term here is __hashability__. You can check whether an object is hashable (can be used as a key in a dictionary) with the __`hash()`__ function.*

In [49]:
# Checkh hashability use hash() function
hash("String")

3211407861388202172

In [50]:
hash((1, 2, (2, 3)))

-9209053662355515447

In [51]:
hash((1, 2, [2, 3]))     # fails because lists are mutable

TypeError: unhashable type: 'list'

*To use a list as a key, one option is to convert it to a tuple, which can be hashed as long as its elements also can be.*

In [52]:
# Use list as a key
d = {}
d[tuple([1, 2, 3])] = 5

In [53]:
d

{(1, 2, 3): 5}