# Data Structures

A **data structure** is a specialized format for organizing, processing, retrieving and storing data. There are several basic and advanced types of data structures, all designed to arrange data to suit a specific purpose. Data structures make it easy for users to access and work with the data they need in appropriate ways.

## Lists

**Lists** are used to store multiple items in a single variable and are created using square brackets, `[...]`.

In [1]:
fruit = ['apple', 'banana', 'cherry']
fruit

['apple', 'banana', 'cherry']

In [2]:
type(fruit)

list

A list can contain different data types.

In [3]:
mix = ['hello', 23, [2.3, True]]
mix

['hello', 23, [2.3, True]]

Like all the other built-in sequence types, such as strings, lists can be indexed and sliced.

In [4]:
fruit[0]

'apple'

In [5]:
type(fruit[0])

str

In [6]:
fruit[0].upper()

'APPLE'

In [10]:
mix[2]

[2.3, True]

In [11]:
type(mix[2])

list

In [12]:
mix[2][0]

2.3

Unlike strings, which are *immutable*, lists are a *mutable* type, thus it is possible to change their content.

In [20]:
first_str = 'hello'
second_str = first_str.replace('e', 'a')

In [21]:
id(first_str)

4357616624

In [22]:
id(second_str)

4418803888

In [25]:
id(fruit)

4361876672

In [27]:
fruit[0] = 'apricot'
fruit

['apricot', 'banana', 'cherry']

In [28]:
id(fruit)

4361876672

In [30]:
first_str[1] = 'a'

TypeError: 'str' object does not support item assignment

It is also possible to use the `list()` constructor to make a list.

In [31]:
fruit[1:]

['banana', 'cherry']

In [38]:
'hello'[10]

IndexError: string index out of range

In [37]:
fruit[3]

IndexError: list index out of range

In [36]:
fruit

['apricot', 'banana', 'cherry']

In [35]:
fruit[-1]

'cherry'

In [33]:
fruit[-2]

'banana'

In [34]:
fruit[:2]

['apricot', 'banana']

### `len(object)`

The `len()` function returns the length (the number of items) of an object. The argument may be a sequence (such as a string, bytes, tuple, list, or range) or a collection (such as a dictionary, set, or frozen set).

In [39]:
len(fruit)

3

In [40]:
len(first_str)

5

In [41]:
names = ['Alice', 'Bob', 'Charlie']
for name in names:
    print(f'Hello, {name}')

Hello, Alice
Hello, Bob
Hello, Charlie


### `list` Methods

Lists are ordered, meaning that the items have a defined order, and that order will not change. If you add new items to a list, the new items will be placed at the end of the it.

`list.append(x)` adds an item to the end of the list.

In [42]:
fruit.append('apple')
fruit

['apricot', 'banana', 'cherry', 'apple']

In [43]:
len(fruit)

4

In [44]:
new_fruit = fruit.append('pineapple')
new_fruit

In [47]:
fruit

['apricot', 'banana', 'cherry', 'apple', 'pineapple']

In [46]:
new_fruit is None

True

`list.pop([i])` removes the item at the given position in the list, and return it. If no index is specified, it removes and returns the last item in the list.

In [48]:
fruit.pop()

'pineapple'

In [49]:
fruit

['apricot', 'banana', 'cherry', 'apple']

In [50]:
discarded_item = fruit.pop(0)
print(discarded_item)
print(fruit)

apricot
['banana', 'cherry', 'apple']


In [52]:
empty_list = []
empty_list

[]

In [53]:
empty_list[0]

IndexError: list index out of range

In [54]:
empty_list = list()
empty_list

[]

In [55]:
list(['apple', 'fruit'])

['apple', 'fruit']

In [56]:
list('apple')

['a', 'p', 'p', 'l', 'e']

## Tuples

**Tuples** are used to store multiple items in a single variable. They are written with round brackets, `(...)`, and their items are ordered, unchangeable, and allow duplicate values.

In [51]:
brands = ('Apple', 'Google', 'Microsoft')
brands

('Apple', 'Google', 'Microsoft')

In [61]:
brands[0]

'Apple'

In [62]:
brands[0] = 'Dell'

TypeError: 'tuple' object does not support item assignment

It is also possible to use the `tuple()` constructor to make a tuple.

In [63]:
fruit = ['apple', 'banana', 'cherry']
tuple(fruit)

('apple', 'banana', 'cherry')

In [66]:
new_fruit = tuple(fruit)
new_fruit

('apple', 'banana', 'cherry')

In [64]:
type(tuple(fruit))

tuple

Though tuples may seem similar to lists, they are often used in different situations and for different purposes. Tuples are immutable, and usually contain a heterogeneous sequence of elements that are accessed via unpacking or indexing. Lists are mutable, and their elements are usually homogeneous and are accessed by iterating over the list.

In [67]:
brands = 'Apple', 'Microsoft', 'Google'
brands

('Apple', 'Microsoft', 'Google')

In [68]:
apple, microsoft, google = brands

In [69]:
apple

'Apple'

In [70]:
microsoft

'Microsoft'

### `zip(*iterable)`

The built-in `zip()` function iterates over several iterables in parallel, producing tuples with an item from each one.

In [71]:
brands

('Apple', 'Microsoft', 'Google')

In [73]:
prices = (200, 125, 300)
prices

(200, 125, 300)

In [75]:
list(zip(brands, prices))

[('Apple', 200), ('Microsoft', 125), ('Google', 300)]

## Sets

A **set** is an unordered, unindexed and unchangeable collection with no duplicate elements. Basic uses include membership testing and eliminating duplicate entries. Set objects also support mathematical operations like union, intersection, difference, and symmetric difference. Sets are written with curly brackets, `{...}`.

In [1]:
fruit = ['apple', 'banana', 'cherry', 'apple']
fruit

['apple', 'banana', 'cherry', 'apple']

In [2]:
fruit_set = {'apple', 'banana', 'cherry', 'apple'}
fruit_set

{'apple', 'banana', 'cherry'}

In [3]:
fruit_set[0]

TypeError: 'set' object is not subscriptable

It is also possible to use the `set()` constructor to make a set.

In [4]:
set(fruit)

{'apple', 'banana', 'cherry'}

In [5]:
primes = {1, 2, 3, 5, 7}
odds = {1, 3, 5, 7, 9}

### Operations on Sets

The *union* is performed using the `|` operator or the `set.union()` method.

In [6]:
# Union
primes.union(odds)

{1, 2, 3, 5, 7, 9}

In [7]:
primes | odds

{1, 2, 3, 5, 7, 9}

The *intersection* is performed using the `&` operator or the `set.intersection()` method.

In [8]:
# Intersection
primes.intersection(odds)

{1, 3, 5, 7}

In [9]:
primes & odds

{1, 3, 5, 7}

The *difference* is performed using the `-` operator or the `set.difference()` method.

In [10]:
# Difference
primes.difference(odds)

{2}

In [11]:
odds.difference(primes)

{9}

In [12]:
primes - odds

{2}

The *symmetric difference* is performed using the `^` operator or the `set.symmetric_difference()` method.

In [13]:
# Symmetric Difference
primes.symmetric_difference(odds)

{2, 9}

In [14]:
primes ^ odds

{2, 9}

### Membership Tests

A **membership test** checks whether a specific element is contained in a sequence, such as strings, lists, tuples, or sets. One of the main advantages of using sets in Python is that they are highly optimized for membership tests.

The `in` operator works with iterable types, such as lists or strings, in Python. It is used to check if an element is found in the iterable. The in operator returns True if an element is found. It returns False if not.

In [15]:
1 in primes

True

In [16]:
12 in primes

False

In [17]:
'apple' in fruit

True

In [18]:
'a' in 'bike'

False

## Dictionaries

A **dictionary** is used to store values in *key: value* format and it is defined using curly brackets, `{key:value}`. Unlike sequences, which are indexed by a range of numbers, dictionaries are indexed by keys, which can be any immutable type, with the requirement that the keys are unique. 

In [22]:
capitals = {'Italy': 'Rome', 'France': 'Paris'} 

In [24]:
capitals_wrong = {'Italy': 'Rome', 'Italy': 'Paris'} 

In [21]:
capitals_wrong['Italy']

'Paris'

You can access the items of a dictionary by referring to its key name, inside square brackets.

In [23]:
capitals['France']

'Paris'

It is also possible to use the `dict()` constructor to make a dictionary.

In [32]:
dict([('Italy', 'Rome'), ('France', 'Paris')])

{'Italy': 'Rome', 'France': 'Paris'}

In [30]:
list(capitals_wrong)

['Italy']

In [26]:
list(capitals)

['Italy', 'France']

### `dict` methods

`dict.keys()` returns a list containing the dictionary's keys.

In [27]:
capitals.keys()

dict_keys(['Italy', 'France'])

`dict.values()` returns a list of all the values in the dictionary.

In [28]:
capitals.values()

dict_values(['Rome', 'Paris'])

In [31]:
capitals_wrong.values()

dict_values(['Paris'])

`dict.items()` returns a list containing a tuple for each key value pair.

In [29]:
capitals.items()

dict_items([('Italy', 'Rome'), ('France', 'Paris')])

In [33]:
type(capitals.keys())

dict_keys

In [34]:
capitals.keys()[0]

TypeError: 'dict_keys' object is not subscriptable

In [36]:
list(capitals.keys())[0]

'Italy'

In [37]:
capitals[0]

KeyError: 0

# Reading and Writing Files

A **module** is a file containing a set of functions you want to include in your application. In order to gain access to the code inside a module we use the `import` statement.

##  `os` module

The `os` module provides a portable way of using operating system dependent functionality.

In [38]:
import os

`os.getcwd()` returns a string representing the current working directory.

In [39]:
os.getcwd()

'/Users/sergiopicascia/Documents/GitHub/crash-course/2023-24/lecture-04'

`os.listdir(path='.')` returns a list containing the names of the entries in the directory given by path.

In [41]:
os.listdir('.')

['.DS_Store', '03. Basic Data Structures.ipynb', '.ipynb_checkpoints', 'data']

In [42]:
os.listdir('./data')

['levitating.txt', 'top10.json']

## `open(file, mode)`

The `open()` built-in function opens a file and returns a corresponding file object.

In [43]:
file = open('./data/levitating.txt')

In [44]:
file

<_io.TextIOWrapper name='./data/levitating.txt' mode='r' encoding='UTF-8'>

The `read()` method reads up bytes from the object and return them. The `readline()` method reads and returns one line from the stream.

In [45]:
file.readline()

'If you wanna run away with me, I know a galaxy\n'

In [46]:
file.readline()

'And I can take you for a ride\n'

The `close()` method flushes and closes the stream.

In [47]:
file.close()

In [50]:
file.readline()

ValueError: I/O operation on closed file.

The `with` statement is used to wrap the execution of a block with methods defined by a context manager. It ensures you don’t accidentally leave any resources open. 

In [54]:
with open('./data/levitating.txt', 'r') as f:
    text = f.read()

In [55]:
f.read()

ValueError: I/O operation on closed file.

In [57]:
text[:100]

'If you wanna run away with me, I know a galaxy\nAnd I can take you for a ride\nI had a premonition tha'

The `write()` method writes the given bytes-like object to the underlying raw stream. The `writelines()` method writes a list of lines to the stream; line separators are not added, so it is usual for each of the lines provided to have a line separator at the end.

## `json` module

**JSON** (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. JSON is built on two structures: a collection of name/value pairs and an ordered list of values.

In Python, the `json` module makes it easy to parse JSON strings and files containing JSON object. To load a .json file, we use the `json.load()` method.

In [58]:
with open('./data/top10.json', 'r') as f:
    text = f.read()

In [63]:
text[:100]

'[\n\t{\n\t\t"artist": "Olivia Rodrigo",\n\t\t"title": "drivers license",\n\t\t"duration": 4.02\n\t},\n\t{\n\t\t"artist'

In [62]:
print(text[:100])

[
	{
		"artist": "Olivia Rodrigo",
		"title": "drivers license",
		"duration": 4.02
	},
	{
		"artist


In [64]:
import json

In [65]:
with open('./data/top10.json', 'r') as f:
    songs = json.load(f)

In [67]:
type(songs)

list

In [69]:
songs[0]['artist']

'Olivia Rodrigo'

# Exercises

1. Generate a list containing only the names of the capitals from the following dictionary: `{'Italy': 'Rome', 'France': 'Paris', 'Spain': 'Madrid'}`.

2. Create a .json file containing data about three stocks of your choice: for each of them indicate name, symbol, industry and last closing price.