<div style="text-align: left;">

## Módulo: Analytics Engineering
    
<br>

## Aula 1 - Exercício 1

Desenvolver um pipeline de dados para a análise de sentimento de notícias relacionadas ao Bitcoin usando dados obtidos da API Alpha Vantage (https://www.alphavantage.co/documentation/) com os seguintes passos:

1: Fazer uma chamada à API Alpha Vantage com a função NEWS_SENTIMENT para obter dados de sentimento de notícias relacionadas ao Bitcoin. Exemplo a seguir:
'https://www.alphavantage.co/query?function=NEWS_SENTIMENT&tickers=CRYPTO:BTC&time_from=20230926T0000&limit=1000&apikey=' + api_key'

'function=NEWS_SENTIMENT':  Este é o parâmetro que especifica a função da API que está sendo chamada, neste caso, a função "NEWS_SENTIMENT" que retorna informações de sentimento de notícias.

'tickers=CRYPTO:BTC': Neste parâmetro, é especificado o ticker do ativo financeiro que deseja ser analisado. No caso em questão, é "CRYPTO:BTC", indicando que análises de sentimento de notícias relacionadas ao Bitcoin estão sendo solicitadas.

'time_from=20230926T0000': Este parâmetro define a data e hora a partir das quais as informações de sentimento de notícias são solicitadas. No exemplo fornecido, a data é definida como 26 de setembro de 2023 às 00:00 (meia-noite).

'limit=1000': Este parâmetro define o número máximo de notícias a serem recuperadas. No exemplo mencionado, a busca está limitada a 1000 notícias.

'apikey=': Aqui é preciso adicionar a chave da API adquirida via login no site da API.

O resultado dessa chamada à API será um conjunto de informações de sentimento de notícias relacionadas ao Bitcoin, incluindo possíveis métricas de sentimento, pontuações, resumos ou outros dados relevantes. O formato e a estrutura exatos dos dados dependem da API Alpha Vantage e das informações disponíveis em seu serviço no momento da chamada.







### Requisição na API

In [117]:
%run ./api_key

In [None]:
import requests 

url = 'https://www.alphavantage.co/query?function=NEWS_SENTIMENT&tickers=CRYPTO:BTC&time_from=20230926T0000&limit=1000&apikey=' + api_key
r = requests.get(url)
data = r.json()

print(data)

### Os resultados da primeira notícia encontrada:

In [None]:
data['feed'][0]

> ### 2. Efetuado a chamada, será preciso encontrar uma chave única por notícia para evitar duplicidades no pipeline e o mínimo de informações por notícia que será necessário armazenar é o título, data de publicação e o resultado da análise de sentimento para o Bitcoin, exemplo a seguir com todas essas informações necessárias da última nóticia:

In [None]:
print("Resultado para a primeira notícia encontrada:")
print("Título", data['feed'][0]['title'])
print("Data de publicação", data['feed'][0]['time_published'])
print("Resultado apenas do Ticker igual ao Bitcoin ('CRYPTO:BTC')", data['feed'][0]['ticker_sentiment'][1])

> ### 3. Preparar um pipeline capaz de extrair e armazenar essas informações em um banco de dados usando os conceitos das camadas especializadas Bronze e Silver

In [None]:
import pandas as pd

In [144]:
# número de itens obtidos
n = int(data['items'])
df_bronze = pd.DataFrame()

for i in range(n):
    df_bronze.loc[i,'title'] = data['feed'][i]['title']
    df_bronze.loc[i,'date'] = data['feed'][i]['time_published']
    
    #sentiment
    dict = data['feed'][i]['ticker_sentiment'] #dict with sentiment
    df_sentiment = pd.DataFrame.from_dict(dict) #df from dict
    mask = df_sentiment['ticker'] == 'CRYPTO:BTC' #mask to be used
    
    df_bronze.loc[i,'relevance_score'] = df_sentiment[mask]['relevance_score'].values[0]
    df_bronze.loc[i,'ticker_sentiment_score'] = df_sentiment[mask]['ticker_sentiment_score'].values[0]
    df_bronze.loc[i,'ticker_sentiment_label'] = df_sentiment[mask]['ticker_sentiment_label'].values[0]

df_bronze
    

Unnamed: 0,title,date,relevance_score,ticker_sentiment_score,ticker_sentiment_label
0,Ripple CTO seeks community consensus for XRPL ...,20231008T103935,0.105141,0.0,Neutral
1,Bitcoin bulls encircle $28K as trader says 'bi...,20231008T092433,0.737428,-0.388949,Bearish
2,"Bitcoin's 20% Surge, Satoshi Nakamoto's Crypti...",20231008T002615,0.746902,0.138496,Neutral
3,"SBF trial underway, Mashinsky trial set, Binan...",20231007T214358,0.225425,-0.035241,Neutral
4,This Week in Coins: Bitcoin and Ethereum Hold ...,20231007T183409,0.255641,0.257968,Somewhat-Bullish
...,...,...,...,...,...
554,Shiba Inu Sister Token BONE Surges 6% Outperfo...,20230926T044708,0.119235,0.22121,Somewhat-Bullish
555,Ben Armstrong 'BitBoy Crypto' Gets Arrested Du...,20230926T033738,0.321673,0.325155,Somewhat-Bullish
556,DeFi Hacks Usually Come Down to Poor Security:...,20230926T025037,0.06515,-0.025081,Neutral
557,"Why Bitcoin, Ethereum, Dogecoin Are Soaring To...",20230926T023144,0.620858,0.261719,Somewhat-Bullish


### Bronze

In [145]:
import sqlalchemy as sqlal
import psycopg2
from sqlalchemy import create_engine, text as sql_text

In [146]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [147]:
# Conecta ao banco de dados

engine = create_engine('postgresql://postgres:22091102@localhost:5433/postgres')
#conn = psycopg2.connect(host='localhost', dbname='postgres', user='postgres', password='22091102', port=5433)

In [148]:
# Prepara os tipos de cada coluna
dtype = {'title':sqlal.types.String(),
         'date':sqlal.types.String(),
         'relevance_score':sqlal.types.Float(precision=5, asdecimal=True),
         'ticker_sentiment_score':sqlal.types.Float(precision=5, asdecimal=True),
         'ticker_sentiment_label':sqlal.types.String(),
}

In [149]:
# Armazena os resultados no banco na camada bronze
df_bronze.to_sql('Bronze', engine, if_exists='replace', index=False, dtype=dtype)

559

### Silver

In [114]:
from datetime import datetime

In [150]:
# Leitura da tabela bronze

query = """
SELECT * FROM public."Bronze"
"""
df_silver = pd.read_sql(sql=sql_text(query), con=engine.connect())
df_silver

Unnamed: 0,title,date,relevance_score,ticker_sentiment_score,ticker_sentiment_label
0,Ripple CTO seeks community consensus for XRPL ...,20231008T103935,0.105141,0.000000,Neutral
1,Bitcoin bulls encircle $28K as trader says 'bi...,20231008T092433,0.737428,-0.388949,Bearish
2,"Bitcoin's 20% Surge, Satoshi Nakamoto's Crypti...",20231008T002615,0.746902,0.138496,Neutral
3,"SBF trial underway, Mashinsky trial set, Binan...",20231007T214358,0.225425,-0.035241,Neutral
4,This Week in Coins: Bitcoin and Ethereum Hold ...,20231007T183409,0.255641,0.257968,Somewhat-Bullish
...,...,...,...,...,...
554,Shiba Inu Sister Token BONE Surges 6% Outperfo...,20230926T044708,0.119235,0.221210,Somewhat-Bullish
555,Ben Armstrong 'BitBoy Crypto' Gets Arrested Du...,20230926T033738,0.321673,0.325155,Somewhat-Bullish
556,DeFi Hacks Usually Come Down to Poor Security:...,20230926T025037,0.065150,-0.025081,Neutral
557,"Why Bitcoin, Ethereum, Dogecoin Are Soaring To...",20230926T023144,0.620858,0.261719,Somewhat-Bullish


In [153]:
# Structuring the data
format = "%Y%m%dT%H%M%S"
df_silver['date'] = pd.to_datetime(df_silver['date'], format=format)
df_silver['date_day'] = df_silver['date'].dt.date
df_silver['date_day']

0      2023-10-08
1      2023-10-08
2      2023-10-08
3      2023-10-07
4      2023-10-07
          ...    
554    2023-09-26
555    2023-09-26
556    2023-09-26
557    2023-09-26
558    2023-09-26
Name: date_day, Length: 559, dtype: object

In [154]:
# Prepara os tipos de cada coluna
dtype = {'title':sqlal.types.String(),
         'date':sqlal.DateTime,
         'relevance_score':sqlal.types.Float(precision=5, asdecimal=True),
         'ticker_sentiment_score':sqlal.types.Float(precision=5, asdecimal=True),
         'ticker_sentiment_label':sqlal.types.String(),
         'date_day':sqlal.DateTime,
}

In [155]:
# Armazena os resultados no banco na camada silver
df_silver.to_sql('Silver', engine, if_exists='replace', index=False, dtype=dtype)

559

> ### 4. Por fim, desenvolver um pipeline de dados transformados para contabilizar a quantidade de notícias encontradas por dia, e o "sentimento médio" por dia na camada Gold

In [156]:
# Leitura da tabela silver

query = """
SELECT * FROM public."Silver"
"""
df_gold = pd.read_sql(sql=sql_text(query), con=engine.connect())
df_gold

Unnamed: 0,title,date,relevance_score,ticker_sentiment_score,ticker_sentiment_label,date_day
0,Ripple CTO seeks community consensus for XRPL ...,2023-10-08 10:39:35,0.105141,0.000000,Neutral,2023-10-08
1,Bitcoin bulls encircle $28K as trader says 'bi...,2023-10-08 09:24:33,0.737428,-0.388949,Bearish,2023-10-08
2,"Bitcoin's 20% Surge, Satoshi Nakamoto's Crypti...",2023-10-08 00:26:15,0.746902,0.138496,Neutral,2023-10-08
3,"SBF trial underway, Mashinsky trial set, Binan...",2023-10-07 21:43:58,0.225425,-0.035241,Neutral,2023-10-07
4,This Week in Coins: Bitcoin and Ethereum Hold ...,2023-10-07 18:34:09,0.255641,0.257968,Somewhat-Bullish,2023-10-07
...,...,...,...,...,...,...
554,Shiba Inu Sister Token BONE Surges 6% Outperfo...,2023-09-26 04:47:08,0.119235,0.221210,Somewhat-Bullish,2023-09-26
555,Ben Armstrong 'BitBoy Crypto' Gets Arrested Du...,2023-09-26 03:37:38,0.321673,0.325155,Somewhat-Bullish,2023-09-26
556,DeFi Hacks Usually Come Down to Poor Security:...,2023-09-26 02:50:37,0.065150,-0.025081,Neutral,2023-09-26
557,"Why Bitcoin, Ethereum, Dogecoin Are Soaring To...",2023-09-26 02:31:44,0.620858,0.261719,Somewhat-Bullish,2023-09-26


In [163]:
#quantidade de notícias encontradas por dia
day_count = df_gold.groupby('date_day').size().reset_index(name='count')
day_count

Unnamed: 0,date_day,count
0,2023-09-26,52
1,2023-09-27,64
2,2023-09-28,61
3,2023-09-29,58
4,2023-09-30,16
5,2023-10-01,15
6,2023-10-02,62
7,2023-10-03,53
8,2023-10-04,63
9,2023-10-05,49


In [169]:
#"sentimento médio" por dia
day_sent_score_mean = df_gold.groupby('date_day')['ticker_sentiment_score'].mean().reset_index(name='sent_score_mean')
day_sent_relevance_mean = df_gold.groupby('date_day')['relevance_score'].mean().reset_index(name='sent_rel_mean')

day_sent_mean = day_sent_score_mean.merge(day_sent_relevance_mean, on='date_day')
day_sent_mean

Unnamed: 0,date_day,sent_score_mean,sent_rel_mean
0,2023-09-26,0.128952,0.402341
1,2023-09-27,0.067885,0.324555
2,2023-09-28,0.137323,0.331353
3,2023-09-29,0.119102,0.320436
4,2023-09-30,0.052039,0.287241
5,2023-10-01,0.073951,0.218645
6,2023-10-02,0.174032,0.415716
7,2023-10-03,0.081395,0.373721
8,2023-10-04,0.143748,0.350842
9,2023-10-05,0.096738,0.31016
