## **Classificação do Risco de Fogo no município de Altamira- PA**

---

### Problema de negócio
Os incêndios florestais são um problema recorrente no Brasil, especialmente na região amazônica, que é considerada o bioma mais atingido por incêndios florestais no país, tendo as cidades de Altamira e São Félix do Xingu como as mais afetadas Por isso, o objetivo desse projeto é classificar os riscos de incêndio no município de Altamira como baixo, médio e alto a partir de dados metereológicos e de incêndio.

_Mais informações, consultar o README.md_

---
### Base de dados
As bases de dados utilizadas são de domínio público e podem ser obtidas através do site do [INMET](https://tempo.inmet.gov.br/TabelaEstacoes/A001) e do [INPE](https://terrabrasilis.dpi.inpe.br/queimadas/bdqueimadas/#exportar-dados).
&nbsp;

Os anos escolhidos foram 2019 e 2020 devido à quantidade de dados disponíveis na Estação Metereológica de Altamira-PA e pelo Instituto Nacional de Pesquisas Espaciais(INPE)

&nbsp;


> #### A **Parte 1** desse projeto corresponde à fase de Engenheria de Dados e fase inicial da Análise para a construção do modelo de Classificação


### Importação de bibliotecas

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import gdown

### Carregamento dos dados

In [2]:
# Dados do INPE
df_inpe2019 = pd.read_csv('datasets_brutos/inpe2019.csv')
df_inpe2020 = pd.read_csv('datasets_brutos/inpe2020.csv')
# Dados do INMET
df_inmet2019 = pd.read_csv('datasets_brutos/inmet2019.csv', sep=";")
df_inmet2020 = pd.read_csv('datasets_brutos/inmet2020.csv', sep=";")


In [3]:
# Visualização geral do DataFrame do INMET
df_inmet2020.head(5)

Unnamed: 0,Data,Hora (UTC),Temp. Ins. (C),Temp. Max. (C),Temp. Min. (C),Umi. Ins. (%),Umi. Max. (%),Umi. Min. (%),Pto Orvalho Ins. (C),Pto Orvalho Max. (C),Pto Orvalho Min. (C),Pressao Ins. (hPa),Pressao Max. (hPa),Pressao Min. (hPa),Vel. Vento (m/s),Dir. Vento (m/s),Raj. Vento (m/s),Radiacao (KJ/m²),Chuva (mm)
0,01/01/2020,0,233,237,233,960,960,950,227,229,227,9901,9901,9892,9,340,35,,4
1,01/01/2020,100,232,233,232,960,970,960,226,228,226,9907,9907,9901,15,280,44,,2
2,01/01/2020,200,230,232,230,970,970,960,225,226,224,9909,9910,9907,10,40,44,,14
3,01/01/2020,300,229,230,229,970,970,970,225,225,224,9907,9910,9907,12,210,48,,4
4,01/01/2020,400,227,230,227,970,970,960,221,225,221,9904,9908,9904,8,290,45,,0


In [4]:
# Visualização geral do DataFrame do INPE
df_inpe2019.head(5)

Unnamed: 0,DataHora,Satelite,Pais,Estado,Municipio,Bioma,DiaSemChuva,Precipitacao,RiscoFogo,Latitude,Longitude,FRP
0,2019/07/26 19:34:38,GOES-16,Brasil,PARÁ,ALTAMIRA,Amazônia,44,0.0,0.8,-7.74,-54.83,
1,2019/07/26 19:34:38,GOES-16,Brasil,PARÁ,ALTAMIRA,Amazônia,45,0.0,0.9,-7.75,-54.82,
2,2019/07/26 17:04:37,GOES-16,Brasil,PARÁ,ALTAMIRA,Amazônia,43,0.0,0.8,-7.77,-54.84,
3,2019/08/03 20:34:34,GOES-16,Brasil,PARÁ,ALTAMIRA,Amazônia,29,0.0,1.0,-8.27,-54.85,
4,2019/08/03 20:34:34,GOES-16,Brasil,PARÁ,ALTAMIRA,Amazônia,30,0.0,1.0,-8.27,-54.81,


In [5]:
# Verificando o shape dos DataFrames do INPE (percebe-se que não há registros no ano de 2023)
df_inpe2019.shape, df_inpe2020.shape

((13896, 12), (43831, 12))

In [6]:
# Verificando o shape dos DataFrames do INMET
df_inmet2019.shape, df_inmet2020.shape

((8760, 19), (8784, 19))

In [7]:
# Tipos dos campos do DataFrame do INPE (Observa-se que será necessário modificar o tipo de DataHora posteriormente)
df_inpe2019.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13896 entries, 0 to 13895
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   DataHora      13896 non-null  object 
 1   Satelite      13896 non-null  object 
 2   Pais          13896 non-null  object 
 3   Estado        13896 non-null  object 
 4   Municipio     13896 non-null  object 
 5   Bioma         13896 non-null  object 
 6   DiaSemChuva   13896 non-null  int64  
 7   Precipitacao  13896 non-null  float64
 8   RiscoFogo     13896 non-null  float64
 9   Latitude      13896 non-null  float64
 10  Longitude     13896 non-null  float64
 11  FRP           0 non-null      float64
dtypes: float64(5), int64(1), object(6)
memory usage: 1.3+ MB


In [8]:
# Tipos dos campos do DataFrame do INPE (Observa-se que a maioria dos campos é do tipo object, logo, precisarão passar por transformações)
df_inmet2019.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8760 entries, 0 to 8759
Data columns (total 19 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   Data                  8760 non-null   object
 1   Hora (UTC)            8760 non-null   int64 
 2   Temp. Ins. (C)        8743 non-null   object
 3   Temp. Max. (C)        8743 non-null   object
 4   Temp. Min. (C)        8743 non-null   object
 5   Umi. Ins. (%)         8743 non-null   object
 6   Umi. Max. (%)         8743 non-null   object
 7   Umi. Min. (%)         8743 non-null   object
 8   Pto Orvalho Ins. (C)  8743 non-null   object
 9   Pto Orvalho Max. (C)  8743 non-null   object
 10  Pto Orvalho Min. (C)  8743 non-null   object
 11  Pressao Ins. (hPa)    8743 non-null   object
 12  Pressao Max. (hPa)    8743 non-null   object
 13  Pressao Min. (hPa)    8743 non-null   object
 14  Vel. Vento (m/s)      8734 non-null   object
 15  Dir. Vento (m/s)      8732 non-null   

In [9]:
# Concatenando os DataFrames do INPE e INMET
df_inpe = pd.concat([df_inpe2019, df_inpe2020], ignore_index=True)

df_inmet = pd.concat([df_inmet2019, df_inmet2020], ignore_index=True)

# Verificando o novo shape de cada DataFrame
df_inpe.shape, df_inmet.shape

((57727, 12), (17544, 19))

### Tratamento e Limpeza dos Dados

Base de Dados INPE

In [10]:
# Verificando valores nulos nos dados do INPE
df_inpe.isnull().sum()

DataHora            0
Satelite            0
Pais                0
Estado              0
Municipio           0
Bioma               0
DiaSemChuva         0
Precipitacao        0
RiscoFogo           0
Latitude            0
Longitude           0
FRP             57727
dtype: int64

In [11]:
# Retirado das colunas nulas e string irrelevantes para treinamento dos modelos de ML
df_inpe = df_inpe.drop(columns = ['Satelite', 'Pais', 'Estado', 'Municipio', 'Bioma', 'FRP'])

df_inpe.head(5)

Unnamed: 0,DataHora,DiaSemChuva,Precipitacao,RiscoFogo,Latitude,Longitude
0,2019/07/26 19:34:38,44,0.0,0.8,-7.74,-54.83
1,2019/07/26 19:34:38,45,0.0,0.9,-7.75,-54.82
2,2019/07/26 17:04:37,43,0.0,0.8,-7.77,-54.84
3,2019/08/03 20:34:34,29,0.0,1.0,-8.27,-54.85
4,2019/08/03 20:34:34,30,0.0,1.0,-8.27,-54.81


In [12]:
#verificando registros duplicados
df_inpe.duplicated().sum()

0

In [13]:
# Como visto, o campo de DataHora é do tipo objeto e precisa ser tratada para ser do tipo ideal
df_inpe['DataHora'] = pd.to_datetime(df_inpe['DataHora'])
df_inpe['Hora'] = df_inpe['DataHora'].dt.hour #criação da coluna de Hora para possibilitar e mesclagem entre as bases de dados


df_inpe.head()

Unnamed: 0,DataHora,DiaSemChuva,Precipitacao,RiscoFogo,Latitude,Longitude,Hora
0,2019-07-26 19:34:38,44,0.0,0.8,-7.74,-54.83,19
1,2019-07-26 19:34:38,45,0.0,0.9,-7.75,-54.82,19
2,2019-07-26 17:04:37,43,0.0,0.8,-7.77,-54.84,17
3,2019-08-03 20:34:34,29,0.0,1.0,-8.27,-54.85,20
4,2019-08-03 20:34:34,30,0.0,1.0,-8.27,-54.81,20


In [14]:
# Formatação da data
df_inpe["Data"] = df_inpe["DataHora"].dt.strftime('%d/%m/%Y')

df_inpe.drop("DataHora", axis=1, inplace=True)

df_inpe.head()

Unnamed: 0,DiaSemChuva,Precipitacao,RiscoFogo,Latitude,Longitude,Hora,Data
0,44,0.0,0.8,-7.74,-54.83,19,26/07/2019
1,45,0.0,0.9,-7.75,-54.82,19,26/07/2019
2,43,0.0,0.8,-7.77,-54.84,17,26/07/2019
3,29,0.0,1.0,-8.27,-54.85,20,03/08/2019
4,30,0.0,1.0,-8.27,-54.81,20,03/08/2019


In [15]:
df_inpe.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 57727 entries, 0 to 57726
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   DiaSemChuva   57727 non-null  int64  
 1   Precipitacao  57727 non-null  float64
 2   RiscoFogo     57727 non-null  float64
 3   Latitude      57727 non-null  float64
 4   Longitude     57727 non-null  float64
 5   Hora          57727 non-null  int32  
 6   Data          57727 non-null  object 
dtypes: float64(4), int32(1), int64(1), object(1)
memory usage: 2.9+ MB


In [16]:
# trocando o tipo da variável DiaSemChuva(object) pelo seu tipo ideal
df_inpe['DiaSemChuva'] = df_inpe['DiaSemChuva'].astype(float)

In [17]:
# verificando possíveis incosistências estatísticas nos dados do inpe
# A coluna RiscoFogo irá ser tratada posteriormente(o valor -999 corresponde a localidades que não têm risco, de acordo com os metadados da base original)
df_inpe.describe()

Unnamed: 0,DiaSemChuva,Precipitacao,RiscoFogo,Latitude,Longitude,Hora
count,57727.0,57727.0,57727.0,57727.0,57727.0,57727.0
mean,29.66934,0.28875,0.332121,-7.037505,-54.418323,16.806157
std,26.759664,2.046963,23.536276,1.236181,0.774103,6.64616
min,0.0,0.0,-999.0,-9.57,-55.33,0.0
25%,7.0,0.0,0.9,-8.09,-55.01,16.0
50%,18.0,0.0,1.0,-6.71,-54.82,19.0
75%,57.0,0.0,1.0,-6.32,-53.6,21.0
max,116.0,85.7,1.0,-3.0,-51.93,23.0


Bases de Dados INMET

In [18]:
df_inmet.head()

Unnamed: 0,Data,Hora (UTC),Temp. Ins. (C),Temp. Max. (C),Temp. Min. (C),Umi. Ins. (%),Umi. Max. (%),Umi. Min. (%),Pto Orvalho Ins. (C),Pto Orvalho Max. (C),Pto Orvalho Min. (C),Pressao Ins. (hPa),Pressao Max. (hPa),Pressao Min. (hPa),Vel. Vento (m/s),Dir. Vento (m/s),Raj. Vento (m/s),Radiacao (KJ/m²),Chuva (mm)
0,01/01/2019,0,245,249,245,920,920,900,232,233,228,9898,9898,9889,2,100,7,,0
1,01/01/2019,100,241,246,241,930,930,890,229,233,226,9902,9902,9898,1,1040,10,,0
2,01/01/2019,200,239,243,239,930,940,910,228,231,227,9906,9906,9901,5,3560,15,,0
3,01/01/2019,300,241,241,238,930,940,920,228,230,227,9905,9907,9905,2,120,15,,0
4,01/01/2019,400,240,241,240,920,930,920,226,228,226,9902,9906,9902,8,370,34,,0


In [19]:
# Verificando campos nulos
df_inmet.isnull().sum()

Data                        0
Hora (UTC)                  0
Temp. Ins. (C)           7384
Temp. Max. (C)           7384
Temp. Min. (C)           7384
Umi. Ins. (%)            7384
Umi. Max. (%)            7384
Umi. Min. (%)            7384
Pto Orvalho Ins. (C)     7384
Pto Orvalho Max. (C)     7384
Pto Orvalho Min. (C)     7384
Pressao Ins. (hPa)       7384
Pressao Max. (hPa)       7384
Pressao Min. (hPa)       7384
Vel. Vento (m/s)         7402
Dir. Vento (m/s)         7404
Raj. Vento (m/s)         7408
Radiacao (KJ/m²)        12022
Chuva (mm)               7384
dtype: int64

In [20]:
# A coluna de radiação diz repeito á radiação emitida no momento do foco de incêndio, e devido à alta quantidade de valores nulos, ela vai ser retirada
df_inmet.drop("Radiacao (KJ/m²)", axis=1, inplace=True)
df_inmet.dropna(inplace=True)

df_inmet.isnull().sum()

Data                    0
Hora (UTC)              0
Temp. Ins. (C)          0
Temp. Max. (C)          0
Temp. Min. (C)          0
Umi. Ins. (%)           0
Umi. Max. (%)           0
Umi. Min. (%)           0
Pto Orvalho Ins. (C)    0
Pto Orvalho Max. (C)    0
Pto Orvalho Min. (C)    0
Pressao Ins. (hPa)      0
Pressao Max. (hPa)      0
Pressao Min. (hPa)      0
Vel. Vento (m/s)        0
Dir. Vento (m/s)        0
Raj. Vento (m/s)        0
Chuva (mm)              0
dtype: int64

In [21]:
df_inmet.shape

(10133, 18)

In [22]:
df_inmet.describe()

Unnamed: 0,Hora (UTC)
count,10133.0
mean,1151.248396
std,692.158772
min,0.0
25%,600.0
50%,1200.0
75%,1800.0
max,2300.0


In [23]:
# Verificando os tipos dos dados

df_inmet.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10133 entries, 0 to 10176
Data columns (total 18 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   Data                  10133 non-null  object
 1   Hora (UTC)            10133 non-null  int64 
 2   Temp. Ins. (C)        10133 non-null  object
 3   Temp. Max. (C)        10133 non-null  object
 4   Temp. Min. (C)        10133 non-null  object
 5   Umi. Ins. (%)         10133 non-null  object
 6   Umi. Max. (%)         10133 non-null  object
 7   Umi. Min. (%)         10133 non-null  object
 8   Pto Orvalho Ins. (C)  10133 non-null  object
 9   Pto Orvalho Max. (C)  10133 non-null  object
 10  Pto Orvalho Min. (C)  10133 non-null  object
 11  Pressao Ins. (hPa)    10133 non-null  object
 12  Pressao Max. (hPa)    10133 non-null  object
 13  Pressao Min. (hPa)    10133 non-null  object
 14  Vel. Vento (m/s)      10133 non-null  object
 15  Dir. Vento (m/s)      10133 non-null  obj

In [24]:
# Modificando o tipo de dados de todas as colunas, menos de Data e Hora (UTC)
for column in df_inmet.columns:
    if column not in ["Data", "Hora (UTC)"]:
        df_inmet[column] = df_inmet[column].str.replace(',', '.').astype(float)

df_inmet.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10133 entries, 0 to 10176
Data columns (total 18 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Data                  10133 non-null  object 
 1   Hora (UTC)            10133 non-null  int64  
 2   Temp. Ins. (C)        10133 non-null  float64
 3   Temp. Max. (C)        10133 non-null  float64
 4   Temp. Min. (C)        10133 non-null  float64
 5   Umi. Ins. (%)         10133 non-null  float64
 6   Umi. Max. (%)         10133 non-null  float64
 7   Umi. Min. (%)         10133 non-null  float64
 8   Pto Orvalho Ins. (C)  10133 non-null  float64
 9   Pto Orvalho Max. (C)  10133 non-null  float64
 10  Pto Orvalho Min. (C)  10133 non-null  float64
 11  Pressao Ins. (hPa)    10133 non-null  float64
 12  Pressao Max. (hPa)    10133 non-null  float64
 13  Pressao Min. (hPa)    10133 non-null  float64
 14  Vel. Vento (m/s)      10133 non-null  float64
 15  Dir. Vento (m/s)      10

In [25]:
# verificando possíveis incosistências estatísticas nos dados do inpe
df_inmet.describe()

Unnamed: 0,Hora (UTC),Temp. Ins. (C),Temp. Max. (C),Temp. Min. (C),Umi. Ins. (%),Umi. Max. (%),Umi. Min. (%),Pto Orvalho Ins. (C),Pto Orvalho Max. (C),Pto Orvalho Min. (C),Pressao Ins. (hPa),Pressao Max. (hPa),Pressao Min. (hPa),Vel. Vento (m/s),Dir. Vento (m/s),Raj. Vento (m/s),Chuva (mm)
count,10133.0,10133.0,10133.0,10133.0,10133.0,10133.0,10133.0,10133.0,10133.0,10133.0,10133.0,10133.0,10133.0,10133.0,10133.0,10133.0,10133.0
mean,1151.248396,25.936347,26.497799,25.421366,84.051416,86.804204,81.153952,22.784309,23.26839,22.365548,989.948278,990.252196,989.643827,0.892756,132.033652,2.949097,0.228244
std,692.158772,2.864649,3.081898,2.638286,12.944574,11.598501,14.017789,0.943816,1.022098,0.916783,1.992309,1.964002,1.991005,0.566394,113.739663,1.76564,1.423289
min,0.0,20.6,20.9,20.4,37.0,41.0,35.0,17.3,19.0,16.1,983.1,983.4,983.0,0.1,1.0,0.3,0.0
25%,600.0,23.6,23.9,23.4,75.0,80.0,70.0,22.2,22.6,21.9,988.6,988.9,988.2,0.4,39.0,1.5,0.0
50%,1200.0,25.1,25.7,24.6,89.0,92.0,85.0,22.8,23.3,22.5,990.0,990.3,989.7,0.8,86.0,2.7,0.0
75%,1800.0,28.1,28.9,27.3,95.0,96.0,94.0,23.4,23.9,23.0,991.3,991.7,991.0,1.3,219.0,4.2,0.0
max,2300.0,34.4,35.1,33.6,98.0,98.0,98.0,26.9,27.0,25.0,996.0,996.0,995.8,4.4,360.0,24.9,34.2


### Mesclando as bases de dados

<p>As bases de dados se correspondem a partir da data e hora, porém o campo de Hora está em formato diferente nas duas tabelas, portanto deve ser tratado</p>
<p>A Hora UTC é o Tempo Universal Coordenado e é o fuso horário que dita todos os fusos horários do mundo.</p>
<p>A base de dados do INPE está com o fuso horário do Brasil e vai ser convertida para UTC</p>

In [26]:
df_inmet["Hora (UTC)"].unique()

array([   0,  100,  200,  300,  400,  500,  600,  700,  800,  900, 1000,
       1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100,
       2200, 2300], dtype=int64)

In [27]:
# Como visto, a coluna é do tipo inteiro e todos os valores correspondentes à hora têm dois zeros a mais, exceto o 0
# Portanto, uma forma de resolver isso é dividindo os valores por 100
df_inmet["Hora (UTC)"] = df_inmet["Hora (UTC)"] / 100

df_inmet["Hora (UTC)"].unique()

array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.,
       13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23.])

In [28]:
# Checando os valores da coluna de Hora do INPE
df_inpe["Hora"].unique()

array([19, 17, 20, 22, 15, 21,  8, 14, 18, 23, 16,  1,  3,  5,  4, 12,  0,
        2, 13,  6,  7, 10,  9, 11])

In [29]:
# Para evitar erros, ajustar o fuso de Brasília para o UTC, que é de +3, manualmente
dict_hora = {0: 3, 1:4, 2: 5, 3:6, 4:7, 5:8, 6:9, 7:10, 8:11, 9: 12, 10:13,
             11:14, 12:15, 13:16, 14: 17, 15:18, 16:19, 17:20, 18: 21, 19:22, 20: 23, 21: 0, 22: 1, 23: 2}
df_inpe["Hora"] = df_inpe["Hora"].replace(dict_hora)
df_inpe = df_inpe.rename(columns={"Hora": "Hora (UTC)"})

#igualando os tipos
df_inmet["Hora (UTC)"] = df_inmet["Hora (UTC)"].astype(int)

df_inpe["Hora (UTC)"].unique(), df_inmet["Hora (UTC)"].unique()


(array([22, 20, 23,  1, 18,  0, 11, 17, 21,  2, 19,  4,  6,  8,  7, 15,  3,
         5, 16,  9, 10, 13, 12, 14]),
 array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23]))

Agora sim, finalmente mesclando as bases de dados

In [30]:
df_merged= pd.merge(df_inpe, df_inmet, on=['Data', 'Hora (UTC)'], how='inner')

df_merged.head()

Unnamed: 0,DiaSemChuva,Precipitacao,RiscoFogo,Latitude,Longitude,Hora (UTC),Data,Temp. Ins. (C),Temp. Max. (C),Temp. Min. (C),...,Pto Orvalho Ins. (C),Pto Orvalho Max. (C),Pto Orvalho Min. (C),Pressao Ins. (hPa),Pressao Max. (hPa),Pressao Min. (hPa),Vel. Vento (m/s),Dir. Vento (m/s),Raj. Vento (m/s),Chuva (mm)
0,44.0,0.0,0.8,-7.74,-54.83,22,26/07/2019,28.2,29.5,28.2,...,21.7,22.1,21.7,990.6,990.6,990.2,0.5,115.0,1.7,0.0
1,45.0,0.0,0.9,-7.75,-54.82,22,26/07/2019,28.2,29.5,28.2,...,21.7,22.1,21.7,990.6,990.6,990.2,0.5,115.0,1.7,0.0
2,43.0,0.0,0.8,-7.77,-54.84,20,26/07/2019,30.4,31.2,30.4,...,20.7,21.6,20.4,989.7,989.7,989.5,1.7,82.0,5.6,0.0
3,29.0,0.0,1.0,-8.27,-54.85,23,03/08/2019,27.3,28.5,27.3,...,22.0,22.1,21.3,990.0,990.0,989.5,0.2,167.0,1.0,0.0
4,30.0,0.0,1.0,-8.27,-54.81,23,03/08/2019,27.3,28.5,27.3,...,22.0,22.1,21.3,990.0,990.0,989.5,0.2,167.0,1.0,0.0


In [31]:
df_merged.shape, df_inpe.shape

((13331, 23), (57727, 7))

In [32]:
df_merged.columns

Index(['DiaSemChuva', 'Precipitacao', 'RiscoFogo', 'Latitude', 'Longitude',
       'Hora (UTC)', 'Data', 'Temp. Ins. (C)', 'Temp. Max. (C)',
       'Temp. Min. (C)', 'Umi. Ins. (%)', 'Umi. Max. (%)', 'Umi. Min. (%)',
       'Pto Orvalho Ins. (C)', 'Pto Orvalho Max. (C)', 'Pto Orvalho Min. (C)',
       'Pressao Ins. (hPa)', 'Pressao Max. (hPa)', 'Pressao Min. (hPa)',
       'Vel. Vento (m/s)', 'Dir. Vento (m/s)', 'Raj. Vento (m/s)',
       'Chuva (mm)'],
      dtype='object')

In [33]:
# Ordenando as colunas
df_merged = df_merged[['Data', 'Hora (UTC)', 'DiaSemChuva', 'Precipitacao', 'Latitude', 'Longitude', 'Temp. Ins. (C)', 'Temp. Max. (C)',
       'Temp. Min. (C)', 'Umi. Ins. (%)', 'Umi. Max. (%)', 'Umi. Min. (%)',
       'Pto Orvalho Ins. (C)', 'Pto Orvalho Max. (C)', 'Pto Orvalho Min. (C)',
       'Pressao Ins. (hPa)', 'Pressao Max. (hPa)', 'Pressao Min. (hPa)',
       'Vel. Vento (m/s)', 'Dir. Vento (m/s)', 'Raj. Vento (m/s)',
       'Chuva (mm)', 'RiscoFogo']]

Mas antes de passar para a última etapa, é necessário substituir os valores de Risco de Fogo pela classificação correspondente, de acordo com o que está especificado no _README.md_

In [34]:
def classificacao(x):
  if x >= 0 and x < 0.4:
    return 'baixo'
  elif x >= 0.4 and x < 0.7:
    return 'médio'
  elif x >= 0.7 and x <= 1:
    return 'alto'

df_merged = df_merged[df_merged['RiscoFogo'] != -999] #retirando os valores de risco ausente
df_merged['ClassificacaoRF'] = df_merged['RiscoFogo'].apply(classificacao)
df_merged.drop(columns= "RiscoFogo", inplace=True)

df_merged.head()

Unnamed: 0,Data,Hora (UTC),DiaSemChuva,Precipitacao,Latitude,Longitude,Temp. Ins. (C),Temp. Max. (C),Temp. Min. (C),Umi. Ins. (%),...,Pto Orvalho Max. (C),Pto Orvalho Min. (C),Pressao Ins. (hPa),Pressao Max. (hPa),Pressao Min. (hPa),Vel. Vento (m/s),Dir. Vento (m/s),Raj. Vento (m/s),Chuva (mm),ClassificacaoRF
0,26/07/2019,22,44.0,0.0,-7.74,-54.83,28.2,29.5,28.2,68.0,...,22.1,21.7,990.6,990.6,990.2,0.5,115.0,1.7,0.0,alto
1,26/07/2019,22,45.0,0.0,-7.75,-54.82,28.2,29.5,28.2,68.0,...,22.1,21.7,990.6,990.6,990.2,0.5,115.0,1.7,0.0,alto
2,26/07/2019,20,43.0,0.0,-7.77,-54.84,30.4,31.2,30.4,56.0,...,21.6,20.4,989.7,989.7,989.5,1.7,82.0,5.6,0.0,alto
3,03/08/2019,23,29.0,0.0,-8.27,-54.85,27.3,28.5,27.3,73.0,...,22.1,21.3,990.0,990.0,989.5,0.2,167.0,1.0,0.0,alto
4,03/08/2019,23,30.0,0.0,-8.27,-54.81,27.3,28.5,27.3,73.0,...,22.1,21.3,990.0,990.0,989.5,0.2,167.0,1.0,0.0,alto


In [35]:
#verificando se há registros duplicados
df_merged.duplicated().sum()

5824

In [36]:
df_merged.drop_duplicates(inplace=True)

In [37]:
df_merged["ClassificacaoRF"].value_counts()

ClassificacaoRF
alto     6193
médio     706
baixo     598
Name: count, dtype: int64

In [38]:
# Salvando o DataFrame resultante
df_merged.to_csv("df_result.csv", index=False)