# Aplicação de Python na Agrometeorologia

Este Jupyter Notebook é parte integrande do curso de Python ministrado durante XXII Congresso Brasiliero de Agrometeorologia realizado nos dias 3, 4 e 5 de outubro de 2023 na cidade Natal/RN.

## Importação de biblioteca

In [1]:
import pandas as pd

## Leitura do arquivo

In [2]:
# Leitura do arquivo.
df = pd.read_csv('../input/csv/BRASILIA_A001.csv', sep=';')

# Define a coluna 'Data' do arquivo como datetime.
df['Data'] = pd.to_datetime(df['Data'], format='%d/%m/%Y')

# Define a coluna 'Data' como index.
df.set_index('Data', inplace=True)

In [3]:
# Visualização do DataFrame.
df

Unnamed: 0_level_0,Temperatura,Umidade,Pressao
Data,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-09-01,23.5,59.0,888.4
2023-09-02,21.6,67.0,888.9
2023-09-03,19.9,,889.1
2023-09-04,,,889.0
2023-09-05,,,888.5
2023-09-06,,77.0,888.0
2023-09-07,,80.0,887.8
2023-09-08,19.2,82.0,887.9
2023-09-09,21.5,,887.7
2023-09-10,21.4,,888.4


## Verificação de dados ausentes

In [4]:
# NA = Not Available (indisponível).

df.isnull()

Unnamed: 0_level_0,Temperatura,Umidade,Pressao
Data,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-09-01,False,False,False
2023-09-02,False,False,False
2023-09-03,False,True,False
2023-09-04,True,True,False
2023-09-05,True,True,False
2023-09-06,True,False,False
2023-09-07,True,False,False
2023-09-08,False,False,False
2023-09-09,False,True,False
2023-09-10,False,True,False


In [5]:
df.isnull().astype(int)

Unnamed: 0_level_0,Temperatura,Umidade,Pressao
Data,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-09-01,0,0,0
2023-09-02,0,0,0
2023-09-03,0,1,0
2023-09-04,1,1,0
2023-09-05,1,1,0
2023-09-06,1,0,0
2023-09-07,1,0,0
2023-09-08,0,0,0
2023-09-09,0,1,0
2023-09-10,0,1,0


In [6]:
# Contabiliza a quantidade de valores ausentes.

df.isnull().sum()

Temperatura    7
Umidade        8
Pressao        0
dtype: int64

In [7]:
numero_linhas, _ = df.shape

numero_linhas

20

In [8]:
# Valore percentual de NAN para a coluna temperatura.

total_NAN = df['Temperatura'].isnull().sum()

porcentagem_temp = (total_NAN / numero_linhas) * 100

print(f'Porcentagem de valores NAN: {porcentagem_temp}%')

Porcentagem de valores NAN: 35.0%


## Método dropna

O método dropna descarta **qualquer linha** contendo um valore ausente. **Por padrão, remove linhas**.

Lembrando que:
* eixo 0 (index) representam as linhas. **Elimina as linhas** que possuem valores ausentes.
* eixo 1 (columns) representam as colunas. **Elimina as colunas** que possuem valores ausentes.
* Apenas um único eixo é permitido.

* Documentação: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html

In [9]:
df

Unnamed: 0_level_0,Temperatura,Umidade,Pressao
Data,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-09-01,23.5,59.0,888.4
2023-09-02,21.6,67.0,888.9
2023-09-03,19.9,,889.1
2023-09-04,,,889.0
2023-09-05,,,888.5
2023-09-06,,77.0,888.0
2023-09-07,,80.0,887.8
2023-09-08,19.2,82.0,887.9
2023-09-09,21.5,,887.7
2023-09-10,21.4,,888.4


In [10]:
df.dropna()

Unnamed: 0_level_0,Temperatura,Umidade,Pressao
Data,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-09-01,23.5,59.0,888.4
2023-09-02,21.6,67.0,888.9
2023-09-08,19.2,82.0,887.9
2023-09-14,25.4,53.0,890.6
2023-09-15,27.0,48.0,890.3
2023-09-16,28.2,44.0,889.6
2023-09-20,29.2,36.0,886.5


## Método fillna

O método fillna **preenche os valores ausentes** com algum valor ou utilizando um **método de interpolação**.

* Método de interpolação: **'ffill'** ou **'bfill'**.
* Documentação: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html

In [11]:
df

Unnamed: 0_level_0,Temperatura,Umidade,Pressao
Data,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-09-01,23.5,59.0,888.4
2023-09-02,21.6,67.0,888.9
2023-09-03,19.9,,889.1
2023-09-04,,,889.0
2023-09-05,,,888.5
2023-09-06,,77.0,888.0
2023-09-07,,80.0,887.8
2023-09-08,19.2,82.0,887.9
2023-09-09,21.5,,887.7
2023-09-10,21.4,,888.4


### Preenche os valores NaN com um valor específico

* Preencher os valores NaN com -999.

In [12]:
df.fillna(-999)

Unnamed: 0_level_0,Temperatura,Umidade,Pressao
Data,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-09-01,23.5,59.0,888.4
2023-09-02,21.6,67.0,888.9
2023-09-03,19.9,-999.0,889.1
2023-09-04,-999.0,-999.0,889.0
2023-09-05,-999.0,-999.0,888.5
2023-09-06,-999.0,77.0,888.0
2023-09-07,-999.0,80.0,887.8
2023-09-08,19.2,82.0,887.9
2023-09-09,21.5,-999.0,887.7
2023-09-10,21.4,-999.0,888.4


### Preenchimento de valores NaN avançado

In [13]:
df.fillna(method='ffill')

Unnamed: 0_level_0,Temperatura,Umidade,Pressao
Data,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-09-01,23.5,59.0,888.4
2023-09-02,21.6,67.0,888.9
2023-09-03,19.9,67.0,889.1
2023-09-04,19.9,67.0,889.0
2023-09-05,19.9,67.0,888.5
2023-09-06,19.9,77.0,888.0
2023-09-07,19.9,80.0,887.8
2023-09-08,19.2,82.0,887.9
2023-09-09,21.5,82.0,887.7
2023-09-10,21.4,82.0,888.4


### Preenchimento de valores NaN atrasado

In [14]:
df.fillna(method='bfill')

Unnamed: 0_level_0,Temperatura,Umidade,Pressao
Data,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-09-01,23.5,59.0,888.4
2023-09-02,21.6,67.0,888.9
2023-09-03,19.9,77.0,889.1
2023-09-04,19.2,77.0,889.0
2023-09-05,19.2,77.0,888.5
2023-09-06,19.2,77.0,888.0
2023-09-07,19.2,80.0,887.8
2023-09-08,19.2,82.0,887.9
2023-09-09,21.5,64.0,887.7
2023-09-10,21.4,64.0,888.4


In [15]:
df

Unnamed: 0_level_0,Temperatura,Umidade,Pressao
Data,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-09-01,23.5,59.0,888.4
2023-09-02,21.6,67.0,888.9
2023-09-03,19.9,,889.1
2023-09-04,,,889.0
2023-09-05,,,888.5
2023-09-06,,77.0,888.0
2023-09-07,,80.0,887.8
2023-09-08,19.2,82.0,887.9
2023-09-09,21.5,,887.7
2023-09-10,21.4,,888.4


### Limitar a quantidade de valores NaN a serem preenchidos

O parâmetro **limit** é o número máximo de valores NaN consecutivos para preenchimento para frente ('ffill') ou para trás ('bfill').

In [16]:
df.fillna(method='ffill', limit=2)

Unnamed: 0_level_0,Temperatura,Umidade,Pressao
Data,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-09-01,23.5,59.0,888.4
2023-09-02,21.6,67.0,888.9
2023-09-03,19.9,67.0,889.1
2023-09-04,19.9,67.0,889.0
2023-09-05,19.9,,888.5
2023-09-06,,77.0,888.0
2023-09-07,,80.0,887.8
2023-09-08,19.2,82.0,887.9
2023-09-09,21.5,82.0,887.7
2023-09-10,21.4,82.0,888.4


In [17]:
df

Unnamed: 0_level_0,Temperatura,Umidade,Pressao
Data,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-09-01,23.5,59.0,888.4
2023-09-02,21.6,67.0,888.9
2023-09-03,19.9,,889.1
2023-09-04,,,889.0
2023-09-05,,,888.5
2023-09-06,,77.0,888.0
2023-09-07,,80.0,887.8
2023-09-08,19.2,82.0,887.9
2023-09-09,21.5,,887.7
2023-09-10,21.4,,888.4


### Preencher valores NaN com algum cálculo

* É possível preencher os valores NaN, por exemplo, com a média ou mediana de cada coluna.
* O cálculo dependerá do usuário.

**Preencher com a média**

In [18]:
df.fillna(df.mean())

Unnamed: 0_level_0,Temperatura,Umidade,Pressao
Data,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-09-01,23.5,59.0,888.4
2023-09-02,21.6,67.0,888.9
2023-09-03,19.9,57.0,889.1
2023-09-04,23.776923,57.0,889.0
2023-09-05,23.776923,57.0,888.5
2023-09-06,23.776923,77.0,888.0
2023-09-07,23.776923,80.0,887.8
2023-09-08,19.2,82.0,887.9
2023-09-09,21.5,57.0,887.7
2023-09-10,21.4,57.0,888.4


**Preencher com a mediana**

In [19]:
df.fillna(df.median())

Unnamed: 0_level_0,Temperatura,Umidade,Pressao
Data,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-09-01,23.5,59.0,888.4
2023-09-02,21.6,67.0,888.9
2023-09-03,19.9,56.0,889.1
2023-09-04,22.0,56.0,889.0
2023-09-05,22.0,56.0,888.5
2023-09-06,22.0,77.0,888.0
2023-09-07,22.0,80.0,887.8
2023-09-08,19.2,82.0,887.9
2023-09-09,21.5,56.0,887.7
2023-09-10,21.4,56.0,888.4
