<a href="https://colab.research.google.com/github/mbaliu-treino/Desenvolve/blob/main/LEARN_C_SQL_Fun%C3%A7%C3%B5es.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <font color=orange><b>SQL: funções do BigQuery</b></font>

<a href=https://colab.research.google.com/drive/1dBwFU7mAL-1sPYIF8Do53rsBjA4Q6QwD><font size=2; color=gray>Arquivo Colab</a></font>

<ul><font size=2 color=gray>FICHA TÉCNICA
<li><a href=https://cursos.alura.com.br/course/bigquery-funcoes><font size=2 color=gray>BigQuery: funções do BigQuery</a>
<li>Carga Horária: 16 h
<li>Instrutora: Victorino Vila
<li>Data de Início: 08-2022
</ul>


Este caderno contém algumas anotações sobre as principais funções a serem usadas no SQL, baseado no Google BigQuery.

## <font color=orange><b>Conteúdo</b></font>

* Tipos de dados SQL (BigQuery)
    * Conversão com CAST
* Funções SQL
    * Numéricas
        * round
        * Precisão decimal
        * Divisão segura
        * Notação científica
        * range_bucket
    * Lógicas
        * Operadores lógicos
        * Expressões condicionais
        * IF clause
        * CASE WHEN clause
        * COALESCE
    * String
        * UPPER, LOWER, INITCAP
        * LTRIM, RTRIM, TRIM, LEFT, RIGHT, SUBSTR
        * CHAR_LENGTH, STARTS_WITH, ENDS_WITH, CONCAT
        * INSTR, STRPOS, REVERSE, REPLACE, SPLIT
        * Regular expressions
        * Formats
    * Data e hora
        * Datetime
        * Timestamp
        * add, sub, diff
        * extract, trunc, last_day
        * generate_date_array
        * Format, UNIX
    * Geográficas
        * BigQuery GeoViz
        * ST_GEOGPOINT, ST_MAKELINE, ST_MAKEPOLYGON

# <font color=orange><b></b></font>Dados para experimentações

```sql
WITH example AS (
  SELECT true AS is_valid, 'a' as letter, 1 as position
  UNION ALL SELECT false , 'b', 2
  UNION ALL SELECT false , 'c', 3
)
SELECT * FROM example;
```

# <font color=orange><b>Funções</b></font>

## <font color=orange>Numéricos</font>

```sql
WITH exemplo AS
(SELECT 'Sat' AS Day, 1451 AS numrides, 1018 AS oneways
UNION ALL SELECT 'Sun', 2376, 936)
```

```sql
-- Arredondamento
SELECT *, ROUND((numrides/oneways), 3) AS frac_rounded
FROM exemplo;
```


```sql
-- Divisões por 0
SELECT IEEE_Divide(numrides, oneways) AS frac_zero
FROM exemplo;
```

```
>> -Infinit, NaN, Infinit
```


Cláusula para não apontar erro em operações inexecutáveis. 

```sql
-- try
SELECT SAFE.log(10, -3) AS logaritic
FROM exemplo;

>> null
```

### Precisão decimal no cálculo

* **NUMERIC**: para quando os cálculos não devam ser arredondados, como em contas financeiras.

* **FLOAT**: cálculos científicos, em que as casas decimais são importantes.

### **FLOAT**

```sql
WITH exemplo AS
(SELECT 1.23 as PAYMENT
UNION ALL SELECT 7.89
UNION ALL SELECT 12.43)

SELECT sum(PAYMENT) as TOTAL_PAYMENT, avg(PAYMENT) as AVG_PAYMENT FROM examplo;
```

```
21.54999999, 7.18333333334
```

### **NUMERIC**

```sql
WITH exemplo AS
(SELECT numeric '1.23' as PAYMENT
UNION ALL SELECT numeric '7.89'
UNION ALL SELECT numeric '12.43')

SELECT sum(PAYMENT) as TOTAL_PAYMENT, avg(PAYMENT) as AVG_PAYMENT FROM examplo;
```

```
21.55, 7.183333
```

* `sign`: retorna `0` para valores negativos e `1` para positvos.
* `is_nan`: 
* `is_inf`


* `rand()`: [0,1[
* `sqrt`
* `pow(2,4)`: elevação
* `ln`, `log`, `log10`
* `MOD(10/3)`: = `%`

* `round`: arredonda o 0.5 para cima.
* `trunc`: arredonda para floor.
* `ceil`, `floor`: maior, menor inteiro mais próximo.

* `greatest`: retorna o maior valor dentro de um array
* `least`: retorna o menor valor de um array


* `SAFE_MULTIPLY`: evita o overflow de multiplicações muito grandes.

### **RANGE_BUCKET**

É uma ferramenta que facilita a identificação de **faixas de valores** para classificar cada valor. É uma alternativa simples às caixas condicionais (if, where).

> Ao usar a função RANGE_BUCKET ela retorna que posição um número entraria em um array ordenado. No BigQuery a função RANGE_BUCKET(ponto, array) retorna a posição do ponto no array .

```sql
-- QUANTOS ALUNOS EU TENHO ENTRE 10 E 13, ENTRE 13 E 15 E ENTRE 15 E 18?
WITH Students AS
(SELECT 'A1' AS ALUNO, 11 AS AGE
UNION ALL SELECT 'A2' , 12
UNION ALL SELECT 'A3' , 11
UNION ALL SELECT 'A4' , 14
UNION ALL SELECT 'A5' , 17
UNION ALL SELECT 'A6' , 17
UNION ALL SELECT 'A7' , 18
UNION ALL SELECT 'A8' , 16
UNION ALL SELECT 'A9' , 11
UNION ALL SELECT 'A10' , 12
UNION ALL SELECT 'A11' , 13
UNION ALL SELECT 'A12' , 13
UNION ALL SELECT 'A13' , 16)
SELECT RANGE_BUCKET( AGE, [9, 13, 15, 19]), COUNT(*) FROM Students
GROUP BY 1;
```

## <font color=orange>Lógicas</font>

* OR, AND, NOT

### Expressão condicional



#### **IF**

`if (expression, when_true, when_false)`

```sql
SELECT (
    IF (costPrice is NULL, 30.0, costPrice) *
    IF (margin is NULL, 0.10, margin))
```

#### **CASE WHEN**

```sql
-- CASE WHEN - RANGE_BUCKET
WITH Students AS
(SELECT 'A1' AS ALUNO, 11 AS AGE
UNION ALL SELECT 'A2' , 12
UNION ALL SELECT 'A3' , 11
UNION ALL SELECT 'A4' , 14
UNION ALL SELECT 'A5' , 17
UNION ALL SELECT 'A6' , 17
UNION ALL SELECT 'A7' , 18
UNION ALL SELECT 'A8' , 16
UNION ALL SELECT 'A9' , 11
UNION ALL SELECT 'A10' , 12
UNION ALL SELECT 'A11' , 13
UNION ALL SELECT 'A12' , 13
UNION ALL SELECT 'A13' , 16)
SELECT ALUNO, RANGE_BUCKET( AGE, [9, 13, 15, 18]),
CASE 
  WHEN AGE >= 9 AND AGE < 13 THEN '1'
  WHEN AGE >= 13 AND AGE < 15 THEN '2'
  WHEN AGE >= 15 AND AGE < 18 THEN '3'
  ELSE '4' END 
FROM Students;
```

#### **COALESCE**

`SELECT COALESCE (NULL, 'B', 'C') retorna B.`

`coalesce(margin)` = `IF (margin is NULL, 0.10, margin)`



```sql
-- COALESCE
WITH catalog AS (
  SELECT 30.0 AS costPrice, 0.15 as margin, 0.1 as taxRate
  UNION ALL SELECT NULL, 0.21, 0.15
  UNION ALL SELECT 30.0, NULL, 0.09
  UNION ALL SELECT 30.0, 0.30, NULL
  UNION ALL SELECT 30.0, NULL, 0.10
)
SELECT 
  IF (costPrice IS NULL, 30.0, costPrice) * 
  IF (margin IS NULL, 0.10, margin) * 
  IF (taxrate IS NULL, 0.15, taxrate) 
  as FORMULA1 ,
  COALESCE (
    costPrice * margin * taxrate, 
    30.0 * margin * taxrate, 
    costprice * 0.10 * taxrate, 
    costPrice * margin * 0.15
  ) as FORMULA2 FROM catalog;
  ```

## <font color=orange>Converção de Tipos</font>

```sql
CAST('1' AS INT64)
```

```sql
SAFE_CAST('ele' AS INT64)
>> null
```

```sql
-- Conversão segura
WITH example AS (
    SELECT 'Jonh' AS employee, 'Doente' as Hours_work
    UNION ALL SELECT 'Jean', '100'
    UNION ALL SELECT 'Peter', 'De férias'
    UNION ALL SELECT 'Mary', '80'
)
SELECT SUM (SAFE_CAST(Hours_work AS INT64)) AS TOTAL FROM example;
```

## <font color=orange>STRINGS</font>

**LOWER, UPPER, INITCAP**

A função INITCAP retorna com o primeiro caractere de cada palavra em maiúsculas e todos os outros caracteres em minúsculas

```sql
--initcap - personalizado
WITH examples AS
 (SELECT "Alo Mundo-todo mundo!" AS FRASES, " " AS DELIMITER
 UNION ALL SELECT "o cachorro TORNADO é alegre+manso", "+"
 UNION ALL SELECT "maça&laranja&pera", "&"
 UNION ALL SELECT "tata ta tavendo a tatia", "t")
 SELECT FRASES, INITCAP(FRASES), INITCAP(FRASES, DELIMITER) FROM examples;
 ```

**LTRIM, RTRIM, TRIM, LEFT, RIGHT, SUBSTR**

* **SUBSTR**: Extrai um sub-string de um string.



```sql
-- Funções de TRIM
WITH items AS
 (SELECT "     MAÇA     " AS ITEM
 UNION ALL SELECT "     BANANA     "
 UNION ALL SELECT "     LARANJA     ")
 SELECT ITEM, LTRIM(ITEM), RTRIM(ITEM), TRIM(ITEM) FROM items;

-- left, right
WITH items AS
 (SELECT "     MAÇA     " AS ITEM
 UNION ALL SELECT "     BANANA     "
 UNION ALL SELECT "     LARANJA     ")
 SELECT TRIM(ITEM), LEFT(TRIM(ITEM), 2), RIGHT(TRIM(ITEM),2) FROM items;

-- SUBSTR
 WITH example AS
(SELECT 'banana' AS source_value,
UNION ALL SELECT 'melancia'
UNION ALL SELECT 'tangerina')
SELECT source_value, SUBSTR(source_value,3,3) FROM example;

WITH example AS
(SELECT 'banana' AS source_value,
UNION ALL SELECT 'melancia'
UNION ALL SELECT 'tangerina')
SELECT source_value, SUBSTR(source_value,3) FROM example;
 ```

**CHAR_LENGTH, STARTS_WITH, ENDS_WITH, CONCAT**

* CHAR_LENGTH: números de caracteres do string
* STARTS_WITH, ENDS_WITH: avaliação lógica sobre as extremidades do string.
* CONCAT: junta múltiplos strings. `CONCAT(A, ' ', B)`

```sql
-- Funções CHAR_LENGTH, STARTS_WITH, ENDS_WITH, CONCAT
WITH examples AS
(SELECT "DR" AS Titulo, "Carlos" as NOME, "Junior" as SOBRENOME
UNION ALL SELECT "SR", "Marcos", "Almeida"
UNION ALL SELECT "DR" , "Mario", "Costa"
UNION ALL SELECT "MS" , "Maria", "Rosa")
SELECT CONCAT (Titulo, " ", Nome, " ", Sobrenome), 
CHAR_LENGTH(CONCAT (Titulo, " ", Nome, " ", Sobrenome)),
STARTS_WITH(CONCAT (Titulo, " ", Nome, " ", Sobrenome), "DR"), 
ENDS_WITH(CONCAT (Titulo, " ", Nome, " ", Sobrenome), "Dr") FROM examples;
 ```

```sql
SELECT CPF, NOME, CONCAT(ENDERECO_1, ' ', BAIRRO, ' ', CIDADE, ' ', ESTADO, ' ', CEP) AS ENDERECO_COMPLETO FROM `curso-big-query-0965.sucos_vendas.tabela_de_clientes`
ORDER BY NOME ;
```

**INSTR, SUBSTR, STRPOS, REVERSE, REPLACE, SPLIT**

* **INSTR**: Retonar a posição de um conjunto de caracteres em um string (in str).
* **STRPOS**: Mostra a posição de um sub-string. Extrai um sub-string do string a apartir  de um caracter.
* **REVERSE**: Escrever o string de trás para frente.
* **REPLACE**: Substitui um string por outro.
* **SPLIT**: Transforma o string em um array segundo um delimitador.


```sql
-- Funções INSTR
WITH example AS
(SELECT 'banana' AS source_value, 'an' AS search_value, 1 as position, 1 as occcurrence
UNION ALL SELECT 'banana' AS source_value, 'an' AS search_value, 3 as position, 1 as occcurrence
UNION ALL SELECT 'banana' AS source_value, 'xx' AS search_value, 1 as position, 2 as occcurrence)
SELECT *, INSTR(source_value, search_value, position, occcurrence) FROM example;
 ```

 ```sql
-- Funções STRPOS
WITH example AS
(SELECT 'foo@example.com' AS source_value,
UNION ALL SELECT 'victor@gmail.com'
UNION ALL SELECT 'quexample@brazil.com')
SELECT source_value, SUBSTR(source_value,1, STRPOS(source_value, "@") - 1) FROM example;
 ```
 
 ```sql
-- Funções REPLACE
WITH example AS
(SELECT 'foo@example.com' AS source_value,
UNION ALL SELECT 'victor@gmail.com'
UNION ALL SELECT 'quexample@brazil.com')
SELECT source_value, REPLACE(source_value, "@","XXXXXX") FROM example;
 ```

  ```sql
-- Funções SPLIT - separa um string em um array
WITH example AS
(SELECT 'foo@example.com' AS source_value,
UNION ALL SELECT 'victor@gmail.com'
UNION ALL SELECT 'quexample@brazil.com')
SELECT source_value, SPLIT(source_value, "@") FROM example;
 ```

**REGULAR EXPRESSIONS**

* REGEXP_CONTAINS: Retorna True ou False, se existir o sub-string em um string.
* REGEXP_EXTRACT: `REGEXP_EXTRACT(field, exp, initial, occurance)`
* REGEXP_EXTRACT_ALL: retorna um array
* REGEXP_REPLACE
* ...

```sql
-- REGULAR EXPRESSION
SELECT FIELD,
REGEXP_CONTAINS(FIELD, r'[0-9]{5}-[0-9]{3}') AS TEM_CEP,
REGEXP_EXTRACT(FIELD, r'[0-9]{5}-[0-9]{3}', 1, 1) AS CEP,
REGEXP_EXTRACT(FIELD, r'[0-9]{5}-[0-9]{3}', 1, 2) AS CEP2,
REGEXP_EXTRACT_ALL(FIELD, r'[0-9]{5}-[0-9]{3}') AS CEP3,
REGEXP_REPLACE(FIELD, r'[0-9]{5}-[0-9]{3}', 'XXXXX-XXX') AS CEP2,
FROM
(SELECT * from UNNEST
(["22222-22","     22222-222  ","Meu CEP é 222222-22", "Do CEP 22222-222 ATÉ O 22333-222"]) AS FIELD);
```

**FORMAT**

```sql
-- 
SELECT FORMAT("%015d", 10);
>> 000000000000010
```

* Decimal: `FORMAT("%015'd", 10)`
* Float: `FORMAT("%.*f", 3, 10.785675)`
* Notação científica: `FORMAT("%.*fe", 3, 10.7e1)`
* +: `FORMAT("+%.*fe", 3, 10.7e1)`

## <font color=orange>DATETIME - TIMESTAMP</font>

* **DATETIME**: inicializa como uma função ou um TIMESTAMP.
* **TIMESTAMP**: inicializa com um STRING.
* **DATE**: inicializa como uma função ou um TIMESTAMP.
* **TIME**: inicializa como uma função ou um TIMESTAMP.

**ATUAIS**

```sql
-- DATETIME, TIMESTAMP, DATE, TIME
SELECT CURRENT_DATETIME,
    CURRENT_TIMESTAMP,
    CURRENT_DATE,
    CURRENT_TIME;
 ```

**ALTERANDO O UTC**

```sql
-- ALTERANDO O UTC
SELECT CURRENT_DATETIME('America/Sao_Paulo'),
    CURRENT_DATETIME('Europe/London'),
    CURRENT_TIMESTAMP,
    CURRENT_DATE,
    CURRENT_TIME;
```

**CRIANDO DADOS DE DATAS**
```sql
SELECT TIMESTAMP('2020-07-01 10:00:00'),
    DATETIME (2020, 7, 1, 10, 0 , 0),
    DATE(2020, 7, 1),
    TIME(10,0,0)
    DATE(TIMESTAMP('2020-07-01 10:00:00')),
    DATETIME(TIMESTAMP('2020-07-01 10:00:00')),
    TIME(TIMESTAMP('2020-07-01 10:00:00'));
```

**ADIÇÃO, SUBTRAÇÃO e DIFERENÇA**

```sql
SELECT 
-- ADD
  DATE_ADD (DATE(2008, 12, 25), INTERVAL 5 DAY) AS CINCO_DIAS_DEPOIS,
  DATE_ADD (DATE(2008, 12, 25), INTERVAL 4 YEAR) AS QUATRO_ANOS_DEPOIS,
  TIMESTAMP_ADD (CURRENT_TIMESTAMP, INTERVAL 45 MINUTE) AS QUARENTA_CINCO_MINUTOS_DEPOIS
-- SUB
  DATE_SUB (DATE(2008, 12, 25), INTERVAL 5 DAY) AS CINCO_DIAS_ANTES,
  DATE_SUB (DATE(2008, 12, 25), INTERVAL 4 YEAR) AS QUATRO_ANOS_ANTES,
  TIMESTAMP_SUB (CURRENT_TIMESTAMP, INTERVAL 45 MINUTE) AS QUARENTA_CINCO_MINUTOS_ANTES
-- DIFF
  DATE_DIFF (DATE(2010,12,25), DATE(2008, 9, 15), DAY),
  DATETIME_DIFF (CURRENT_DATETIME, DATETIME(TIMESTAMP('2020-07-01 10:00:00')), MINUTE);
```

### Extract, Trunc, Last_day, generate_date_array

* EXTRACT: Extrai informação de uma data.
* generate_date_array: range de datas.
* TRUNC: limpa as informações de menor representação (granularidade).
* LAST_DAY: mostra o último dia do mês, semestre e do ano.


```sql
-- Extraindo informações
SELECT DATA,
  EXTRACT(MONTH FROM DATA) AS MES,
  EXTRACT(DAY FROM DATA) AS DIA,
  EXTRACT(YEAR FROM DATA) AS ANO,
  EXTRACT(DAYOFWEEK FROM DATA) AS SEMANA
FROM UNNEST (GENERATE_DATE_ARRAY('2015-12-23', '2016-01-09')) AS DATA
ORDER BY DATA;
```

```sql
-- Simplicando a data
SELECT 
  DATETIME_ADD(CURRENT_DATETIME, INTERVAL 90 DAY), 
  DATETIME_TRUNC(DATETIME_ADD(CURRENT_DATETIME, INTERVAL 90 DAY), DAY), 
  DATETIME_TRUNC(DATETIME_ADD(CURRENT_DATETIME, INTERVAL 90 DAY), MINUTE), 
  DATETIME_TRUNC(DATETIME_ADD(CURRENT_DATETIME, INTERVAL 90 DAY), MONTH), 
  DATETIME_TRUNC(DATETIME_ADD(CURRENT_DATETIME, INTERVAL 90 DAY), YEAR);
```

```sql
SELECT 
  DATETIME_ADD(CURRENT_DATETIME, INTERVAL 90 DAY), 
  LAST_DAY(DATETIME_ADD(CURRENT_DATETIME, INTERVAL 90 DAY), MONTH),
  LAST_DAY(DATETIME_ADD(CURRENT_DATETIME, INTERVAL 90 DAY), YEAR);
```

```sql
WITH TAB_IDADE AS (
SELECT NOME, DATE_DIFF(CURRENT_DATE, DATA_DE_NASCIMENTO, YEAR) AS IDADE_ATUAL, IDADE 
FROM `curso-big-query-0965.sucos_vendas.tabela_de_clientes`)

SELECT NOME, IDADE_ATUAL, IDADE,
CASE WHEN (IDADE_ATUAL - IDADE) <> 0 THEN 'IDADE NÃO BATE'
ELSE 'IDADE BATE COM A BASE DE DADOS' END AS RESULTADO FROM TAB_IDADE;
```

### Formatação de datas

* FORMAT_DATE
* FORMAT_TIME
* FORMAT_DATETIME
* FORMAT_TIMESTAMP

```sql
SELECT 
    CURRENT_DATETIME, 
    FORMAT_DATETIME('%A, Dia %d de %B de %Y', CURRENT_DATETIME);
```

### Data UNIX

É muito comum ser usado para registros de logs, auditorias internas, 

```sql
-- Ler datas UNIX
SELECT
  visitStartTime, TIMESTAMP_SECONDS(visitStartTime) FROM
 `bigquery-public-data.google_analytics_sample.ga_sessions_20170731`
LIMIT 10;
```

```sql
-- Transformar em UNIX
SELECT CURRENT_TIMESTAMP, UNIX_SECONDS(CURRENT_TIMESTAMP);

SELECT UNIX_DATE(DATE "2008-12-25") as dias_de_diferenca;
>> 14238
```

## <font color=orange>Spatial Functions</font>

**ATIVAR O GEOVIZ**

1. Google > BigQuery geo viz > [primeiro link](https://cloud.google.com/bigquery/docs/gis-getting-started) >  Ative API > Escolha o projeto
2. Google > BigQuery geo viz > [bigquerygeoviz.appspot.com](https://bigquerygeoviz.appspot.com/) > Authorize

```sql
-- Carregar dados das bicicletas
SELECT *
FROM `bigquery-public-data.new_york.citibike_stations`;

-- Transforma coordenadas em point (WKT)
SELECT ST_GEOGPOINT(longitude, latitude) AS Station, num_bikes_available
FROM 
`bigquery-public-data.new_york.citibike_stations`
WHERE num_bikes_available > 10;
```

```sql
-- Construção de linhas
SELECT ST_MAKELINE(ARRAY_AGG(Ponto)) as Linha FROM 
(SELECT ST_GEOGPOINT(-22.9349, -43.1730) AS Ponto
UNION ALL SELECT ST_GEOGPOINT(-22.9365, -43.1771));

SELECT ST_DISTANCE (ST_GEOGPOINT(-22.9349, -43.1730),
                    ST_GEOGPOINT(-22.9365, -43.1771))
AS Distancia;
```

```sql
-- Construção de polígonos
SELECT ST_MAKEPOLYGON(ST_MAKELINE(ARRAY_AGG(Ponto))) as Poligono FROM 
(SELECT ST_GEOGPOINT(-22.9349, -43.1730) AS Ponto
UNION ALL SELECT ST_GEOGPOINT(-22.9365, -43.1771)
UNION ALL SELECT ST_GEOGPOINT(-22.9375, -43.1781));


SELECT ST_AREA(ST_MAKEPOLYGON(ST_MAKELINE(ARRAY_AGG(Ponto)))) as Area FROM 
(SELECT ST_GEOGPOINT(-22.9349, -43.1730) AS Ponto
UNION ALL SELECT ST_GEOGPOINT(-22.9365, -43.1771)
UNION ALL SELECT ST_GEOGPOINT(-22.9375, -43.1781));
```