## **Data analyisis example**

In [1]:
import db_adapter as db
import pandas as pd

#### **Setting up the connection**

Boto3 will automatically look for credentials and default region in the AWS CLI confinguration file. More information [here](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html)

In [2]:
conn = db.Connection()

#### Accessing tables

The method get_tables() returns a name to table dictionary with all the tables on a region. If you know what table you're looking for, the method get_table can be called with the corresponding name. Note that the tables here are only references to their remote counterparts.

In [3]:
tables = conn.get_tables()
for name, table in tables.items():
    print(name)

Clientes
Fornecedores
Fornecedores_Produtos
Lojas
Produtos
Vendas
Vendas_Produtos


#### Loading tables to memory

You can load the contents of a table by performing an unfiltered scan. Both scan() and query() methods return a lais of dictionaries that can be passed directly to the pandas.DataFrame() constructor. If the attribute schema is inconsistent, that may cause problems.

In [4]:
dfs = {name: pd.DataFrame(table.scan()) for name, table in tables.items()}

In [5]:
dfs['Vendas_Produtos']

Unnamed: 0,venda,preco,quantidade,produto
0,3,50,50,0
1,3,20,10,2
2,2,600,1,5
3,1,250,1,1
4,1,250,1,4
5,0,6,15,3


In [6]:
dfs['Produtos']

Unnamed: 0,codigo,nome,descricao
0,3,Pepino,Pepino!
1,2,Água em Pó,Basta colocar água!
2,4,Mineirinho Adventures,Nada a comentar!
3,1,Vaporizador RGB,Infinitas opções de customização!
4,0,Supositório Gamer,Cague como um campeão!
5,5,Dick Augmentator Tabajara,Aumente seu pepino agora mesmo!


In [7]:
dfs['Vendas']

Unnamed: 0,loja,num_nota_fiscal,cliente,data
0,3,1,3,2023-08-18
1,2,3,4,2023-01-20
2,1,2,2,2023-03-03
3,0,0,1,2023-06-14


In [8]:
dfs['Clientes']

Unnamed: 0,endereco,codigo,nome
0,Fortaleza,3,Roberto
1,Recife,2,Carlos
2,Porto Alegre,4,Maria
3,São Paulo,1,Antônio
4,Salvador,0,Raíssa


#### Querying tables

Tables in DynamoDB can be queried in two ways:
1. A **scan**, where the engine iterates through the whole table. This is the slower but more flexible method
2. A **query**, where the engine iterates through one partition and key range only. This is more restrictive but much faster.

**Let's try analyzing Maria's purchases:**

In [9]:
tables['Clientes'].key_schema()

[{'AttributeName': 'codigo', 'KeyType': 'HASH'}]

First, we get ger client ID.

> Note that if *nome* was the partition key, with *codigo* being the range key, the scan below would not be necessary. This was a database design mistake on my part.

In [10]:

maria_id = tables['Clientes'].scan(db.Attr('nome').eq('Maria'))[0]['codigo']
maria_id

Decimal('4')

Next, we scan the *Vendas* (sales) table to look for the purchases linked to Maria's ID.

In [13]:
maria_sales = tables['Vendas'].scan(db.Attr('cliente').eq(maria_id))
maria_sales

[{'loja': Decimal('2'),
  'num_nota_fiscal': Decimal('3'),
  'cliente': Decimal('4'),
  'data': ' 2023-01-20'}]

How lucky of us! There's only one matching purchase. It's now possible to run a quick query to get the product list from *Vendas_Produtos* (sales_products):

In [16]:
sale_details = tables['Vendas_Produtos'].query(db.Key('venda').eq(maria_sales[0]['num_nota_fiscal']))
sale_details = pd.DataFrame(sale_details)
sale_details

Unnamed: 0,venda,preco,quantidade,produto
0,3,50,50,0
1,3,20,10,2


The final table is generated with the code below:

In [24]:
final_df = pd.merge(
    sale_details, 
    dfs['Produtos'],
    left_on='produto',
    right_on='codigo'
)
final_df = final_df[['quantidade', 'nome', 'descricao', 'preco']]
final_df['subtotal'] = final_df['quantidade'] * final_df['preco']
final_df.rename(
    axis='columns',
    inplace=True,
    mapper={
        'quantidade': 'quantity',
        'nome': 'product_name',
        'descricao': 'description',
        'preco': 'price',
    }
) 
final_df

Unnamed: 0,quantity,product_name,description,price,subtotal
0,50,Supositório Gamer,Cague como um campeão!,50,2500
1,10,Água em Pó,Basta colocar água!,20,200
