# Some python inherent `data structures`

- Lists (recap)
- Tuples
- Dicts
- Sets

# Lists

- Lists are identified by `bracket` and `comma` separation

- Lists are mutable sequences of elements

In [2]:
list([10, 1, 3])
print([10, 1, 3])

[10, 1, 3]


In [4]:
list_ex = [10, 20, 30]
print(list_ex)

[10, 20, 30]


In [6]:
print(list_ex[0])
print(list_ex[1])
print(list_ex[2])

10
20
30


In [10]:
list_ex[3]

IndexError: list index out of range

In [12]:
list_ex[0] = 'Zero'
print(list_ex)
print(list_ex[0])
print(list_ex[1])
print(list_ex[2])

['Zero', 20, 30]
Zero
20
30


In [13]:
list_ex.append(40)
print(list_ex)

['Zero', 20, 30, 40]


In [14]:
list_ex.extend([50, 60])
print(list_ex)
list_ex.append(50)
list_ex.append(60)

for elemento in [50, 60]:
    list_ex.append(elemento)

['Zero', 20, 30, 40, 50, 60]


In [15]:
minha_extensao = [70, [80, 90]]
list_ex.extend(minha_extensao)
print(list_ex)

['Zero', 20, 30, 40, 50, 60, 70, [80, 90]]


In [16]:
ultimo_elemento = list_ex.pop()
print(ultimo_elemento)
print(list_ex)

[80, 90]
['Zero', 20, 30, 40, 50, 60, 70]


In [20]:
list_ex.pop()

50

In [27]:
print(list_ex[:])

['Zero']


In [31]:
# Um jeito mais bonito de achatar listas
list_ex = [1, [2, [3, [4]]]]
print(list_ex)

chata = []
# [2, [3, [4]]]
# [1, 2, [3, [4]]]
# [3, [4]]
# [1, 2, 3, [4]]
# [4]
# [1, 2, 3, 4]

while list_ex:
    elemento = list_ex.pop()
    if type(elemento) == list:
        print(elemento)
        list_ex.extend(elemento)
    else:
        chata.append(elemento)

print(chata)

[1, [2, [3, [4]]]]
[2, [3, [4]]]
[3, [4]]
[4]
[4, 3, 2, 1]


In [30]:
print(list_ex)

[]


# Tuples

- Tuples are identified by `parenthesis` and `comma` separation

- Tuples are immutable sequences of elements

## Creating a tuple

In [32]:
tuple((10, ))

(10,)

In [33]:
tuple_ex = (10, 20, 30)
print(tuple_ex)

(10, 20, 30)


In [37]:
a, b, c = tuple_ex
print(a)
print(b)
print(c)

1
2
3


In [38]:
type(('oi', 'tchau', 10))

tuple

## Converting a `list` into a `tuple`

In [47]:
minha_lista = [10, 20, 30]
minha_upla = tuple(minha_lista)
print(minha_lista)
print(minha_upla)
print(type(minha_upla))

30
[10, 20, 30]
(20, 30)
<class 'tuple'>


In [40]:
tuple_ex = tuple([10, 20, 30])
print(type(tuple_ex))

<class 'tuple'>


In [49]:
meu_range = range(0, 10)
for i in tuple(meu_range):
    print(i)
print(meu_range)
print(list(meu_range))
print(tuple(meu_range))

0
1
2
3
4
5
6
7
8
9
range(0, 10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)


In [None]:
tuple(range(0, 10))

## <u>Accessing</u> an element in a tuple

Imagine I create a service that, given the address, it returns me latitude and longitude as a tuple, i.e., `(lat, long)`

In [50]:
coords = (-23.561762, -46.660213)

If I want to access the `latitude` (i.e., the first element):

In [51]:
coords[0]

-23.561762

In [52]:
# multiple assignment
lat, long = coords

In [53]:
lat

-23.561762

In [54]:
long

-46.660213

In [55]:
lat = coords[0]
long = coords[1]

In [56]:
lat

-23.561762

In [57]:
long

-46.660213

These are called `indices` (or `index`) 

In [58]:
coords

(-23.561762, -46.660213)

In [61]:
# think of tuples (and lists) as a circular element. 
# Accessing 0 returns the first element, 1 accesses the second element and so on
# Accessing -1 returns the last element, -2 the second to last element and so on
coords[-3]

IndexError: tuple index out of range

In [62]:
len(coords)

2

## Running through a tuple

Tuples and lists are what is called in Python **iterable**. It means you can run through it. 

The syntax is simple: 

```python
my_tuple = (1, 5, 8)

for element in my_tuple:
    # now you have access to each element
    print(element)

# Output
1
5
8
```


What a loop like below
```python
coords = (-23.561762, -46.660213)

for i in coords:
    print(i)
```

is effectively doing is:
    
```python
coords = (-23.561762, -46.660213)

# first step of the loop
i = coords[0]
print(i)
# second step of the loop
i = coords[1]
print(i)
```

which expands to the following:

```python
coords = (-23.561762, -46.660213)

# first step of the loop
i = -23.561762
print(i)
# second step of the loop
i = -46.660213
print(i)
```

In [64]:
# will through an error
coords[0] = -25.8

TypeError: 'tuple' object does not support item assignment

In [63]:
# but lists are mutable
my_list = [10, 20, 30]
print(my_list)
my_list[0] = 0
print(my_list)

[10, 20, 30]
[0, 20, 30]


In [67]:
nome = 'Pedro'
nome[-1] = 'a'

TypeError: 'str' object does not support item assignment

In [68]:
my_list.append(10)

In [69]:
list(coords)

[-23.561762, -46.660213]

In [70]:
tuple(my_list)

(0, 20, 30, 10)

In [71]:
# you can use any name to perform a loop through an iterable.
# usually you want to give names that means something
for banana in coords:
    print(banana)

-23.561762
-46.660213


## Tuple methods

- `count`: returns the number of occurences of the value you specify 
- `index`: returns the first index of the value you specify

In [72]:
# Your code here!
y = (1, 3, 7, 4, 6, 3, 8, 8)

In [73]:
y.count(8)

2

In [74]:
y.index(8)

6

In [75]:
len(y)

8

In [76]:
y_list = list(y)

In [77]:
y_list

[1, 3, 7, 4, 6, 3, 8, 8]

In [78]:
y.index(8)

6

In [84]:
coords
lat, long = coords
long_novo = -30
coords = (lat, long_novo)
print(coords)

(-23.561762, -30)


## Built in functions - `sorted()`

- Sort a tuple (or any **iterable** actually)

In [85]:
y

(1, 3, 7, 4, 6, 3, 8, 8)

In [93]:
print(sorted(y, reverse=True))
print(sorted(y, reverse=False) == sorted(y))

[8, 8, 7, 6, 4, 3, 3, 1]
True


## Slicing

`Slicing` means: take a part specific `part`/`sequence` of elements

Slices have a syntax of `[starting_index:ending_index]`

* `a[start:stop]` -> items start through stop-1
* `a[start:]` -> items start through the rest of the array
* `a[:stop]` -> items from the beginning through stop-1
* `a[:]` -> a copy of the whole array

In [94]:
grades = (9,8,5,6,10,8,10)
grades

(9, 8, 5, 6, 10, 8, 10)

In [95]:
grades[0]

9

In [96]:
len(grades)

7

In [97]:
grades[100]

IndexError: tuple index out of range

In [98]:
grades

(9, 8, 5, 6, 10, 8, 10)

In [105]:

print(grades[2])
print(grades[6])
print(grades[2:6])

5
10
(5, 6, 10, 8)


In [111]:
# do 5 em diante (ou do terceiro índice em diante)
print(grades[2])
print(grades[2:])

5
(5, 6, 10, 8, 10)


In [110]:
# do 5 pra tras (do terceiro indice pra trás)
print(grades[3])
print(grades[:3])

6
(9, 8, 5)


In [112]:
grades[:-2]

(9, 8, 5, 6, 10)

In [113]:
grades[-2:]

(8, 10)

In [114]:
grades[:]

(9, 8, 5, 6, 10, 8, 10)

In [120]:
lista_orig = [1,2,3]
lista_copia = [lista_orig]

print(lista_orig)
print(lista_copia)

[1, 2, 3]
[[1, 2, 3]]


In [121]:
lista_orig[0] = 2

print(lista_orig)
print(lista_copia)

[2, 2, 3]
[[2, 2, 3]]


In [117]:
lista_orig = [1,2,3]
lista_copia = lista_orig[:]
lista_copia[0] = 2
print(lista_orig)
print(lista_copia)

[1, 2, 3]
[2, 2, 3]


In [122]:
grades[1:3]+grades[4:6]

(8, 5, 10, 8)

-----

# Voltamos 11h20

# DICT's

## What is a dictionary?

In real life, we use it to find the `description of something`.

## What are keys and values?

`keys`: it is the `something`

`values`: it is the `description` of something

## Creating a dictionary

- Syntax of a dictionary `{key: value}`

In [123]:
my_dict={}

In [124]:
my_dict=dict()

In [125]:
type(my_dict)

dict

In [126]:
my_dict = {
            'Grão de Bico': 10,
            'Feijão': 8,
            'Lentilha': 1
          }
print(my_dict)


{'Grão de Bico': 10, 'Feijão': 8, 'Lentilha': 1}


In [128]:
my_dict['Soja'] = 9
print(my_dict)

{'Grão de Bico': 10, 'Feijão': 8, 'Lentilha': 1, 'Soja': 9}


In [130]:
my_dict = {
            'Grão de Bico': 10,
            'Grão de Bico': 8,
            'Grão de Bico': 1
          }
my_dict['Grão de Bico'] = 9
print(my_dict)

{'Grão de Bico': 9}


In [131]:
my_dict = {
            'Grão de Bico': 10,
            'Feijão': 8,
            'Lentilha': 1
          }

In [132]:
my_dict['Grão de Bico'] = 15
print(my_dict)

{'Grão de Bico': 15, 'Feijão': 8, 'Lentilha': 1}


In [133]:
my_dict.values()

dict_values([15, 8, 1])

In [134]:
my_dict.keys()

dict_keys(['Grão de Bico', 'Feijão', 'Lentilha'])

In [135]:
my_dict.items()

dict_items([('Grão de Bico', 15), ('Feijão', 8), ('Lentilha', 1)])

In [136]:
print(my_dict)

{'Grão de Bico': 15, 'Feijão': 8, 'Lentilha': 1}


In [137]:
my_dict['Lentilha'] = 15
print(my_dict)

{'Grão de Bico': 15, 'Feijão': 8, 'Lentilha': 15}


## <u>Accessing</u> a dictionary value:

- o índice de um dicionário é genérico, é você quem decide.

In [138]:
preco_10kg_gb = my_dict['Grão de Bico']*10
print(preco_10kg_gb)

150


## Creating new items for your dictionary

In [143]:
# by accessing a non-existent key and then assigning a value
my_dict['Ervilha Partida'] = 9

In [None]:
my_dict.keys()

In [146]:
my_dict

{'Grão de Bico': 15, 'Feijão': 8, 'Lentilha': 15, 'Ervilha Partida': 9}

In [149]:
# using `.update()` function containing a new dict inside
new_dict = dict()
new_dict['Lentilha'] = 5
new_dict['Arroz Integral'] = 8.5
print(new_dict)

{'Lentilha': 5, 'Arroz Integral': 8.5}


In [150]:
my_dict.update(new_dict)
print(my_dict)
my_dict_copy = my_dict

{'Grão de Bico': 15, 'Feijão': 8, 'Lentilha': 5, 'Ervilha Partida': 9, 'Arroz': 5, 'Arroz Integral': 8.5}


In [151]:
print(my_dict)

{'Grão de Bico': 15, 'Feijão': 8, 'Lentilha': 5, 'Ervilha Partida': 9, 'Arroz': 5, 'Arroz Integral': 8.5}


In [154]:
my_dict_copy = my_dict.copy()

In [155]:
my_dict['Grão de Bico'] = 15
print(my_dict)
print(my_dict_copy)

{'Grão de Bico': 15, 'Feijão': 8, 'Lentilha': 5, 'Ervilha Partida': 9, 'Arroz': 5, 'Arroz Integral': 8.5}
{'Grão de Bico': 'Não tem', 'Feijão': 8, 'Lentilha': 5, 'Ervilha Partida': 9, 'Arroz': 5, 'Arroz Integral': 8.5}


In [158]:
graos = ['Feijão Branco', 'Lentilha Síria', 'Feijão Branco']
valores = [9.50, 13, 8.50]
print(range(len(graos)))

3


In [160]:
print(list(range(len(graos))))

[0, 1, 2]


In [161]:
for i in range(len(graos)):
    print(graos[i], valores[i])
    my_dict[graos[i]]=valores[i]

Feijão Branco 9.5
Lentilha Síria 13
Feijão Branco 8.5


In [162]:
print(my_dict)

{'Grão de Bico': 15, 'Feijão': 8, 'Lentilha': 5, 'Ervilha Partida': 9, 'Arroz': 5, 'Arroz Integral': 8.5, 'Feijão Branco': 8.5, 'Lentilha Síria': 13}


In [None]:
# 1o loop
print(graos[0])
print(valores[0])
my_dict['Feijão Branco']=9.5

# 2o loop
print(graos[1])
print(valores[1])
my_dict['Lentilha Síria']=13

In [164]:
novos_precos = [('Lentilha Verde', 9), ('Abobrinha', 3), ('Beringela', 8)]
print(novos_precos[0] == ('Lentilha Verde', 9))
print(novos_precos[1] == ('Abobrinha', 3))
print(novos_precos[i][0])

True
True
Lentilha Verde


In [167]:
print(novos_precos[0])

('Lentilha Verde', 9)


In [168]:
lent = novos_precos[0]

In [169]:
print(lent)

('Lentilha Verde', 9)


In [170]:
lent[0]
lent[1]

9

In [171]:
print(novos_precos[0][0])
print(novos_precos[0][1])

Lentilha Verde
9


In [166]:
for i in range(len(novos_precos)):
    my_dict[novos_precos[i][0]] = novos_precos[i][1]
print(my_dict)

{'Grão de Bico': 15, 'Feijão': 8, 'Lentilha': 5, 'Ervilha Partida': 9, 'Arroz': 5, 'Arroz Integral': 8.5, 'Feijão Branco': 8.5, 'Lentilha Síria': 13, 'Lentilha Verde': 9, 'Abobrinha': 3, 'Beringela': 8}


## A value can be anything

In [172]:
# Your code here!
my_dict = {
            'Grão de Bico': (10, 10.3, 10.5, 11),
            'Feijão': [8, 9, 10]
          }

In [173]:
my_dict

{'Grão de Bico': (10, 10.3, 10.5, 11), 'Feijão': [8, 9, 10]}

In [174]:
type(my_dict['Grão de Bico'])

tuple

In [175]:
type(my_dict['Feijão'])

list

In [177]:
my_dict.values()
my_dict[[1,2,3]] = 5

TypeError: unhashable type: 'list'

In [None]:
for value in my_dict.values():
    print(type(value))

In [178]:
# Até outros dicionarios
casa = dict()
casa['id'] = 1
casa['tamanho'] = 80
casa['dim_terreno'] = (20, 30)
casa['endereco'] = dict()
casa['endereco']['rua'] = 'Al. das Maritacas'
casa['endereco']['numero'] = 1637
casa['endereco']['bairro'] = 'Cidade Jardim'
casa['endereco']['cep'] = 39272440
print(casa)

{'id': 1, 'tamanho': 80, 'dim_terreno': (20, 30), 'endereco': {'rua': 'Al. das Maritacas', 'numero': 1637, 'bairro': 'Cidade Jardim', 'cep': 39272440}}


In [179]:
print(type(casa))
print(type(casa['dim_terreno']))
print(type(casa['endereco']))

<class 'dict'>
<class 'tuple'>
<class 'dict'>


In [182]:
print(casa['endereco']['rua'])

Al. das Maritacas


## Um exemplo real
Vamos analisar um exemplo comumente encontrado em análise de dados: extração de dados de uma API. Para este exemplo usaremos a API do Ambee (https://www.getambee.com/) para extrair dados de qualidade do ar em três cidades (São Paulo, Belo Horizonte e a pujante métropole de Pirassununga). Veremos formas de entender o que uma API retorna utilizando os métodos de dicionários

In [183]:
import requests
TOKEN = 'c0c9147ec699d5205de0cbb2f5ad611c9aae0b41edeaf6092728677d06356836'
url = "https://api.ambeedata.com/latest/by-city"
headers = {
    'x-api-key': TOKEN,
    'Content-type': "application/json"
    }

In [184]:
querystring = {"city":"Sao Paulo"}
response = requests.request("GET", url, headers=headers, params=querystring)
ql_ar_sp = response.json()

In [185]:
querystring = {"city":"Belo Horizonte"}
response = requests.request("GET", url, headers=headers, params=querystring)
ql_ar_bh = response.json()

In [186]:
querystring = {"city":"Pirassununga"}
response = requests.request("GET", url, headers=headers, params=querystring)
ql_ar_pira = response.json()

In [187]:
print(ql_ar_sp)

{'message': 'success', 'stations': [{'_id': '60363be18f2bb86af9398a6c', 'placeId': '13991eba7f6caaaf0830f7574f0fa70052de3b51d28ba8264782466ffd78851f', 'CO': 1, 'NO2': 0.022, 'OZONE': 18.5, 'PM10': 39.808, 'PM25': 23.88, 'SO2': 10.966, 'city': None, 'countryCode': 'BR', 'division': None, 'lat': -23.627, 'lng': -46.635, 'placeName': 'São Paulo', 'postalCode': '01000-000', 'state': 'Sao Paulo', 'updatedAt': '2021-11-07 23:00:00', 'AQI': 76, 'aqiInfo': {'pollutant': 'PM2.5', 'concentration': 23.88, 'category': 'Moderate'}}]}


# Dictionary <u>methods</u>

- `.keys()`
- `.values()`
- `.items()`
- `.update()`

In [207]:
print(ql_ar_sp.keys())
print(type(ql_ar_sp['stations']))
print(ql_ar_sp['stations'])
print(len(ql_ar_sp['stations']))
print(len(ql_ar_sp['stations'][0].keys()))

dict_keys(['message', 'stations'])
<class 'list'>
[{'_id': '60363be18f2bb86af9398a6c', 'placeId': '13991eba7f6caaaf0830f7574f0fa70052de3b51d28ba8264782466ffd78851f', 'CO': 1, 'NO2': 0.022, 'OZONE': 18.5, 'PM10': 39.808, 'PM25': 23.88, 'SO2': 10.966, 'city': None, 'countryCode': 'BR', 'division': None, 'lat': -23.627, 'lng': -46.635, 'placeName': 'São Paulo', 'postalCode': '01000-000', 'state': 'Sao Paulo', 'updatedAt': '2021-11-07 23:00:00', 'AQI': 76, 'aqiInfo': {'pollutant': 'PM2.5', 'concentration': 23.88, 'category': 'Moderate'}}]
1
19


In [212]:
print(type(ql_ar_sp['stations']))
print(type(ql_ar_sp['stations'][0]))

<class 'list'>
<class 'dict'>


In [208]:
dict_ql_ar_sp = ql_ar_sp['stations'][0]
dict_ql_ar_bh = ql_ar_bh['stations'][0]
dict_ql_ar_pira = ql_ar_pira['stations'][0]

print('SP: ' + str(dict_ql_ar_sp['aqiInfo']['category']))
print('BH: ' + str(dict_ql_ar_bh['aqiInfo']['category']))
print('Pirassununga: ' + str(dict_ql_ar_pira['aqiInfo']['category']))

SP: Moderate
BH: Good
Pirassununga: Moderate


## Iterating through a dict

In [None]:
for chave in casa:
    print(chave)

In [None]:
for chave in casa.keys():
    print(f'{chave}: {casa[chave]}')

In [None]:
# Your code here
for atributo in casa.items():
    print(atributo)

In [None]:
# Your code here
for valor in casa.values():
    print(valor)

In [None]:
casa

## Loops can receive more than 1 argument

In [213]:
my_dict = {
            'Grão de Bico': 10,
            'Feijão': 8,
            'Lentilha': 1
          }

In [214]:
a, b = (1, 2)
print(a)
print(b)

1
2


In [215]:
print(my_dict.items())

dict_items([('Grão de Bico', 10), ('Feijão', 8), ('Lentilha', 1)])


In [217]:
for grao, preco in my_dict.items():
    print(grao)
    print(preco)
    if grao == 'Feijão' or grao == 'Lentilha':
        print(preco)

Grão de Bico
10
Feijão
8
8
Lentilha
1
1


In [218]:
print(grao)

Lentilha


## Verifying if a key is `in` the dictionary

In [220]:
# how it works with lists?
1 in [1, 2, 3]
print(my_dict)

{'Grão de Bico': 10, 'Feijão': 8, 'Lentilha': 1}


In [222]:
8 in my_dict.values()

True

actually, it works like this with any **iterable**

In [223]:
'abcd' in 'abc'

False

In [224]:
'abc' in 'abcd'

True

In [225]:
1 in (1, 2, 3)

True

-----

# VOLTAMOS AS 14H

# SETS

Sets are just like dictionaries, but they only have `keys`.

And just like `keys` in a dictionary, there are no `duplicates`. You can imagine a set like a [venn-diagram](https://pt.wikipedia.org/wiki/Diagrama_de_Venn) containing the elements you want.

In [226]:
my_list = ['Pedro', 'Adriano', 'Pedro', 'Adriano', 'Pedro', 'Adriano']

In [227]:
my_list

['Pedro', 'Adriano', 'Pedro', 'Adriano', 'Pedro', 'Adriano']

In [228]:
set(my_list)

{'Adriano', 'Pedro'}

----

In [229]:
x = set([1,2,3,4,4,4,4,4,5,6,6,6,7,7,8])
x

{1, 2, 3, 4, 5, 6, 7, 8}

In [230]:
y = set([8,8,6,7, 10, 12])
y

{6, 7, 8, 10, 12}

In [231]:
type(set([8,8,6,7, 10, 12]))

set

In [None]:
type(y)

## Set methods

In [232]:
x

{1, 2, 3, 4, 5, 6, 7, 8}

In [233]:
y

{6, 7, 8, 10, 12}

In [234]:
# Your code here
x.intersection(y)

{6, 7, 8}

In [235]:
y.intersection(x)

{6, 7, 8}

In [236]:
x.difference(y)

{1, 2, 3, 4, 5}

In [237]:
y.difference(x)

{10, 12}

In [238]:
x-y # x.difference(y)

{1, 2, 3, 4, 5}

In [239]:
y-x # y.difference(y)

{10, 12}

In [240]:
x.union(y)

{1, 2, 3, 4, 5, 6, 7, 8, 10, 12}

In [242]:
(x-y).union(y-x)

{1, 2, 3, 4, 5, 10, 12}

In [243]:
x.symmetric_difference(y)

{1, 2, 3, 4, 5, 10, 12}

In [271]:
x = set([1,2,3])

In [268]:
set([1,2,3, 25]).issubset(x)

bool

In [265]:
a = [print, 'A', 2,3,1]
a[0]('Ola')

Ola


In [275]:
4 in x

False

In [277]:
# Practical example
col_names = set(['qtd_cartoes', 'vlr_cartao','qtd_cheques','vlr_cheques'])

incoming_col_names = set(['qtd_cartoes', 'vlr_cartao','qtd_cheques','vlr_cheques'])

# print(f'Missing columns: {set(col_names) - set(incoming_col_names)}')
missing_columns = col_names.difference(incoming_col_names)
print(f'Missing columns: {missing_columns}')

Missing columns: set()


In [289]:
my_dict.values()

dict_values([10, 8, 1])