# Relatório do tratamento

Este relatório apresenta o tratamento realizado no arquivo CSV com dados de acidentes na rodovia Nova Dutra. As etapas realizadas foram:

Leitura e Carregamento dos Dados

    O arquivo CSV foi lido com a biblioteca Pandas do Python
    Foram configuradas opções como separador ';' e encoding

Verificação de Duplicatas

    Foi verificado se há linhas duplicadas no dataframe
    116 linhas duplicadas foram identificadas e removidas

Análise dos Tipos de Dados

    Foi verificado o tipo de dado de cada coluna
    As colunas de data e horário estavam como strings e foram convertidas para datetime
    Foram criadas novas colunas ano e mês com base na data

Tratamento de Valores Nulos

    Havia diversas colunas com valores nulos
    Foi decidido preencher os nulos com 0
    Assumindo que nulo significa a falta da informação

Normalização de Dados

    Foram padronizadas as categorias de tipos de ocorrência e acidentes
    Isso facilita análises futuras agregando categorias similares

Conversão de Colunas Numéricas

    As colunas de quantidade de veículos estavam com tipos mistos
    Foi realizada conversão forçando que todas sejam float

Dessa forma, o dataframe foi tratado e normalizado, estando pronto para análises futuras. Foram removidos problemas como duplicatas, valores nulos e inconsistências nos tipos de dados.

# Códigos do tratamento:

## Configurando ambiente e realizando imports necessários

In [2]:
from google.colab import drive
import pandas as pd

# Definindo pasta onde está os arquivos necessários
drive.mount('/content/drive')
%cd "/content/drive/MyDrive/Colab Notebooks/projeto/NOVADUTRA"

# Configurando numero de colunas maximas que irão aparecer no DataFrame
pd.set_option('display.max_columns',100)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/MyDrive/ADA/tecnicas_programacao_I/projeto/NOVADUTRA


## Tratamento

In [3]:
#Leitura do arquivo
df = pd.read_csv('NOVADUTRA.csv', sep=';', low_memory = False,encoding='ISO-8859-1')
df

Unnamed: 0,data,horario,n_da_ocorrencia,tipo_de_ocorrencia,km,trecho,sentido,tipo_de_acidente,automovel,bicicleta,caminhao,moto,onibus,outros,tracao_animal,transporte_de_cargas_especiais,trator_maquinas,utilitarios,ilesos,levemente_feridos,moderadamente_feridos,gravemente_feridos,mortos
0,01/01/2010,12:14:00,239,sem vítima,3945,BR-116/SP,Pista Norte,Choque em barreira New Jersey,1.0,0.0,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,1,0.0,0.0,0.0,0.0
1,01/01/2010,06:13:00,94,sem vítima,5565,BR-116/SP,Pista Sul,Colisão traseira,1.0,0.0,0.0,0.0,0.0,1.0,0.0,,0.0,0.0,4,0.0,0.0,0.0,0.0
2,01/01/2010,14:42:00,314,sem vítima,75,BR-116/SP,Pista Norte,Colisão traseira,0.0,0.0,0.0,0.0,0.0,1.0,0.0,,0.0,0.0,5,0.0,0.0,0.0,0.0
3,01/01/2010,18:55:00,440,sem vítima,1023,BR-116/SP,Pista Sul,Abalroamento longitudinal,1.0,0.0,1.0,0.0,0.0,0.0,0.0,,0.0,0.0,2,0.0,0.0,0.0,0.0
4,01/01/2010,08:14:00,144,sem vítima,1078,BR-116/SP,Pista Sul,Choque em objeto fixo,1.0,0.0,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,1,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
114989,23/02/2022,08:38:00,295,Com vítima,303400,BR-116/SP,Decrescente,Colisão traseira,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0,1.0,0.0,0.0,0.0
114990,24/02/2022,22:23:10,747,Com vítima,325000,BR-116/SP,Crescente,Colisão traseira,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,1.0,0.0,0.0,0.0
114991,24/02/2022,23:52:16,764,Com vítima,145200,BR-116/SP,Crescente,Engavetamento,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,1.0,0.0,0.0,0.0
114992,25/02/2022,07:35:19,157,Sem vítima,146000,BR-116/SP,Crescente,Engavetamento,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0


### Verificando e eliminando linhas duplicadas

In [4]:
df['tipo_de_acidente'].value_counts()

Colisão traseira                          20399
Colisão Traseira                          12239
Engavetamento                             11755
Abalroamento longitudinal                 10853
Choque em objeto fixo                      9904
Queda de moto                              6724
Choque em barreira New Jersey              6289
Abalroamento Longitudinal                  5488
Choque em objeto na pista                  5271
Capotamento                                3881
Choque em Barreira New Jersey              3737
Choque em defensa                          2082
Tombamento                                 1886
Choque em veículo parado na pista          1625
Outros                                     1565
Atropelamento de pedestre atravessando     1552
Queda de ribanceira                        1552
Choque em Defensa                          1274
Choque Talude                              1212
Atropelamento de animal                    1156
Atropelamento de Pedestre Atravessando  

In [5]:
df.n_da_ocorrencia.is_unique # Verif›icando se o número da occorrência é unico. O fato de ser único ou não, não nos diz muita coisa, ja que as ocorrências foram feitas em locais diferentes, podendo repetir o nº.

False

In [6]:
#Verificando linhas duplicadas
duplicated_rows = df[df.duplicated(keep=False)]
duplicated_rows.count()

data                              116
horario                           116
n_da_ocorrencia                   116
tipo_de_ocorrencia                116
km                                116
trecho                            116
sentido                           116
tipo_de_acidente                  116
automovel                         116
bicicleta                         116
caminhao                          116
moto                              116
onibus                            116
outros                            116
tracao_animal                     116
transporte_de_cargas_especiais      0
trator_maquinas                   116
utilitarios                       116
ilesos                            116
levemente_feridos                 116
moderadamente_feridos             116
gravemente_feridos                116
mortos                            116
dtype: int64

In [7]:
#Excluindo as linhas duplicadas
df = df.drop_duplicates()

#Verificando novamente se há linhas duplicadas
duplicated_rows = df[df.duplicated(keep=False)]
duplicated_rows.count()

data                              0
horario                           0
n_da_ocorrencia                   0
tipo_de_ocorrencia                0
km                                0
trecho                            0
sentido                           0
tipo_de_acidente                  0
automovel                         0
bicicleta                         0
caminhao                          0
moto                              0
onibus                            0
outros                            0
tracao_animal                     0
transporte_de_cargas_especiais    0
trator_maquinas                   0
utilitarios                       0
ilesos                            0
levemente_feridos                 0
moderadamente_feridos             0
gravemente_feridos                0
mortos                            0
dtype: int64

### Tipos de dados

In [8]:
#Verificando os tipos de cada coluna
df.dtypes

data                               object
horario                            object
n_da_ocorrencia                     int64
tipo_de_ocorrencia                 object
km                                 object
trecho                             object
sentido                            object
tipo_de_acidente                   object
automovel                         float64
bicicleta                         float64
caminhao                          float64
moto                              float64
onibus                            float64
outros                            float64
tracao_animal                     float64
transporte_de_cargas_especiais    float64
trator_maquinas                   float64
utilitarios                       float64
ilesos                              int64
levemente_feridos                 float64
moderadamente_feridos             float64
gravemente_feridos                float64
mortos                            float64
dtype: object

#### Data

In [9]:
#Verificando a integridade da data no formato DateTime
type(df['data'][1])

#Segundo o output é uma string.


str

In [10]:
#Realizando um conversão para normalizar a coluna data
df['data'] = pd.to_datetime(df['data'], dayfirst=True).dt.date


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['data'] = pd.to_datetime(df['data'], dayfirst=True).dt.date


In [11]:
#Verificando a integridade da data no formato DateTime
type(df['data'][1])

datetime.date

#### Horario

In [12]:
#Verificando a integridade do horario no formato DateTime
type(df['horario'][1])

#Segundo o output é uma sring.


str

In [13]:
df['horario'] = pd.to_datetime(df['horario'], format='%H:%M:%S').dt.time

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['horario'] = pd.to_datetime(df['horario'], format='%H:%M:%S').dt.time


In [14]:
#Verificando a integridade do horario no formato DateTime
type(df['horario'][1])

datetime.time

In [15]:
# Criar uma nova coluna 'ano' com o ano da data
df['ano'] = df['data'].apply(lambda x: x.year)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['ano'] = df['data'].apply(lambda x: x.year)


In [16]:
# Criar uma nova coluna 'mes' com o ano da data
df['mes'] = df['data'].apply(lambda x: x.month)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['mes'] = df['data'].apply(lambda x: x.month)


In [17]:
#Olhando o DataFrame de forma Geral para conferir se as mudanças impactaram na estrutura.

df

Unnamed: 0,data,horario,n_da_ocorrencia,tipo_de_ocorrencia,km,trecho,sentido,tipo_de_acidente,automovel,bicicleta,caminhao,moto,onibus,outros,tracao_animal,transporte_de_cargas_especiais,trator_maquinas,utilitarios,ilesos,levemente_feridos,moderadamente_feridos,gravemente_feridos,mortos,ano,mes
0,2010-01-01,12:14:00,239,sem vítima,3945,BR-116/SP,Pista Norte,Choque em barreira New Jersey,1.0,0.0,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,1,0.0,0.0,0.0,0.0,2010,1
1,2010-01-01,06:13:00,94,sem vítima,5565,BR-116/SP,Pista Sul,Colisão traseira,1.0,0.0,0.0,0.0,0.0,1.0,0.0,,0.0,0.0,4,0.0,0.0,0.0,0.0,2010,1
2,2010-01-01,14:42:00,314,sem vítima,75,BR-116/SP,Pista Norte,Colisão traseira,0.0,0.0,0.0,0.0,0.0,1.0,0.0,,0.0,0.0,5,0.0,0.0,0.0,0.0,2010,1
3,2010-01-01,18:55:00,440,sem vítima,1023,BR-116/SP,Pista Sul,Abalroamento longitudinal,1.0,0.0,1.0,0.0,0.0,0.0,0.0,,0.0,0.0,2,0.0,0.0,0.0,0.0,2010,1
4,2010-01-01,08:14:00,144,sem vítima,1078,BR-116/SP,Pista Sul,Choque em objeto fixo,1.0,0.0,0.0,0.0,0.0,0.0,0.0,,0.0,0.0,1,0.0,0.0,0.0,0.0,2010,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
114989,2022-02-23,08:38:00,295,Com vítima,303400,BR-116/SP,Decrescente,Colisão traseira,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0,1.0,0.0,0.0,0.0,2022,2
114990,2022-02-24,22:23:10,747,Com vítima,325000,BR-116/SP,Crescente,Colisão traseira,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,1.0,0.0,0.0,0.0,2022,2
114991,2022-02-24,23:52:16,764,Com vítima,145200,BR-116/SP,Crescente,Engavetamento,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,1.0,0.0,0.0,0.0,2022,2
114992,2022-02-25,07:35:19,157,Sem vítima,146000,BR-116/SP,Crescente,Engavetamento,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,2022,2


### Normalização de dados

Esta Seção serve para normalizar alguns dados e torná-los mais legíveis para uma eventual análise e apresentação.

#### Categorias

In [18]:
#Verificando os tipos de ocorrência
pd.unique(df['tipo_de_ocorrencia'])

array(['sem vítima', 'com vítima', 'Acidente com vítima',
       'Acidente sem vítima', 'Atropelamento sem morte',
       'Atropelamento com morte', 'Acidente com morte',
       'AC02 - Acidente com VITIMA', 'AC03 - Acidente sem VITIMA',
       'AC04 - Atropelamento', 'AC05 - Atropelamento Fatal',
       'AC01 - Acidente com VITIMA FATAL', 'Com vítima', 'Sem vítima'],
      dtype=object)

In [19]:
#Realizando mapeamento para eventual normalização.
mapeamento = {
    'Acidente com vítima': 'Acidente com Vítima',
    'Acidente sem vítima': 'Acidente sem Vítima',
    'Atropelamento sem morte': 'Atropelamento',
    'Atropelamento com morte': 'Atropelamento Fatal',
    'Acidente com morte': 'Acidente com Vítima Fatal',
    'AC02 - Acidente com VITIMA': 'Acidente com Vítima',
    'AC03 - Acidente sem VITIMA': 'Acidente sem Vítima',
    'AC04 - Atropelamento': 'Atropelamento',
    'AC05 - Atropelamento Fatal': 'Atropelamento Fatal',
    'AC01 - Acidente com VITIMA FATAL': 'Acidente com Vítima Fatal',
    'Com vítima': 'Acidente com Vítima',
    'com vítima': 'Acidente com Vítima',
    'Sem vítima': 'Acidente sem Vítima',
    'sem vítima': 'Acidente sem Vítima',

}

In [20]:
# Normalizando a coluna de ocorrencias
coluna_para_padronizar = 'tipo_de_ocorrencia'
df[coluna_para_padronizar] = df[coluna_para_padronizar].replace(mapeamento)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[coluna_para_padronizar] = df[coluna_para_padronizar].replace(mapeamento)


In [21]:
#Verificanda a normalização
pd.unique(df['tipo_de_ocorrencia'])

array(['Acidente sem Vítima', 'Acidente com Vítima', 'Atropelamento',
       'Atropelamento Fatal', 'Acidente com Vítima Fatal'], dtype=object)

In [22]:
# Analizando a coluna "sentido"
pd.unique(df['sentido'])

array(['Pista Norte', 'Pista Sul', 'Crescente', 'Decrescente'],
      dtype=object)

In [23]:
#Verificando a coluna "tipos de acidente"
pd.unique(df['tipo_de_acidente'])

array(['Choque em barreira New Jersey', 'Colisão traseira',
       'Abalroamento longitudinal', 'Choque em objeto fixo',
       'Choque em defensa', 'Outros', 'Choque em objeto na pista',
       'Atropelamento de animal', 'Engavetamento', 'Queda de moto',
       'Não Def', 'Capotamento', 'Atropelamento de pedestre atravessando',
       'Tombamento', 'Choque Talude', 'Colisão frontal',
       'Choque em veículo parado na pista', 'Abalroamento transversal',
       'Atropelamento de pedestre caminhando', 'Queda de ribanceira',
       'Queda de Ponte/Viaduto', 'Queda de Carga', 'Colisão Traseira',
       'Choque em Barreira New Jersey', 'Atropelamento de Animal',
       'Choque em Defensa', 'Colisão Frontal',
       'Abalroamento Longitudinal',
       'Atropelamento de Pedestre Atravessando',
       'Atropelamento de Pedestre Caminhando', 'Abalroamento Transversal',
       nan, 'Queda de ponte/viaduto', 'Abalroamento - Longitudinal',
       'Atropelamento - Animal', 'Atropelamento - Pedest

In [24]:
import numpy as np
# Mapeamento das categorias
mapeamento = {
    'Atropelamento de animal': 'Atropelamento',
    'Atropelamento de pedestre atravessando': 'Atropelamento',
    'Atropelamento de pedestre caminhando': 'Atropelamento',
    'Atropelamento de Pedestre Atravessando': 'Atropelamento',
    'Atropelamento de Pedestre Caminhando': 'Atropelamento',
    'Atropelamento de pedestre': 'Atropelamento',
    'Atropelamento - Animal': 'Atropelamento',
    'Atropelamento - Pedestre caminhando': 'Atropelamento',
    'Atropelamento - Pedestre atravessando': 'Atropelamento',
    'Colisão frontal': 'Colisões Frontais',
    'Colisão Frontal': 'Colisões Frontais',
    'Colisão traseira': 'Colisões Traseiras',
    'Colisão Traseira': 'Colisões Traseiras',
    'Colisão lateral no mesmo sentido': 'Colisões Laterais',
    'Abalroamento longitudinal': 'Abalroamento',
    'Abalroamento Longitudinal': 'Abalroamento',
    'Abalroamento transversal': 'Abalroamento',
    'Abalroamento Transversal': 'Abalroamento',
    'Choque em objeto fixo': 'Choque',
    'Choque em objeto na pista': 'Choque',
    'Choque em Defensa': 'Choque',
    'Choque em defensa': 'Choque',
    'Choque em veículo parado na pista': 'Choque',
    'Choque em Barreira New Jersey': 'Choque',
    'Choque em barreira New Jersey': 'Choque',
    'Engavetamento': 'Engavetamento',
    'Queda de moto': 'Queda de Moto',
    'Queda de ribanceira': 'Queda de Moto',
    'Queda de Ponte/Viaduto': 'Queda de Moto',
    'Queda de ponte/viaduto': 'Queda de Moto',
    'Queda de Carga': 'Queda de Carga',
    'Tombamento': 'Tombamento',
    'Capotamento': 'Capotamento',
    'Outros': 'Outros',
    'Acidentes de outra natureza': 'Outros',
    'Não Def': 'Outros',
    'nan': 'Outros'
}

# Aplicar o mapeamento para criar uma nova coluna 'categoria_acidente'
df['tipo_de_acidente'] = df['tipo_de_acidente'].map(mapeamento)

# Verificar as categorias únicas resultantes
categorias_unicas = df['tipo_de_acidente'].unique()


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['tipo_de_acidente'] = df['tipo_de_acidente'].map(mapeamento)


In [25]:
#Verificando normalização da coluna "tipos de acidente"
pd.unique(df['tipo_de_acidente'])

array(['Choque', 'Colisões Traseiras', 'Abalroamento', 'Outros',
       'Atropelamento', 'Engavetamento', 'Queda de Moto', 'Capotamento',
       'Tombamento', nan, 'Colisões Frontais', 'Queda de Carga',
       'Colisões Laterais'], dtype=object)

#### Nulos

Resolvi seguir a lógica de que se um valor é nulo ele não foi colocado por erro e/ou por não existir aquele dado. Sendo isso, irei transformar todos os nulos em 0 para evitar problemas em operações matemáticas futuras ou ate mesmo plotagem de gráficos.

In [26]:
#Verificando quantidade de nulos
df.isnull().sum()

data                                   0
horario                                0
n_da_ocorrencia                        0
tipo_de_ocorrencia                     0
km                                     0
trecho                                 0
sentido                                0
tipo_de_acidente                    2662
automovel                           8858
bicicleta                          42375
caminhao                           32288
moto                               36256
onibus                             40929
outros                             38247
tracao_animal                      42649
transporte_de_cargas_especiais    114541
trator_maquinas                    42654
utilitarios                        42654
ilesos                                 0
levemente_feridos                  32158
moderadamente_feridos              39389
gravemente_feridos                 41847
mortos                             41978
ano                                    0
mes             

In [27]:
#Preenchendo nulos com 0 no local
df.fillna(0, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.fillna(0, inplace=True)


In [28]:
#Verificando os nulos novamente
df.isnull().sum()

data                              0
horario                           0
n_da_ocorrencia                   0
tipo_de_ocorrencia                0
km                                0
trecho                            0
sentido                           0
tipo_de_acidente                  0
automovel                         0
bicicleta                         0
caminhao                          0
moto                              0
onibus                            0
outros                            0
tracao_animal                     0
transporte_de_cargas_especiais    0
trator_maquinas                   0
utilitarios                       0
ilesos                            0
levemente_feridos                 0
moderadamente_feridos             0
gravemente_feridos                0
mortos                            0
ano                               0
mes                               0
dtype: int64

#### Verificando as colunas com tipos errados

In [34]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 114936 entries, 0 to 114993
Data columns (total 25 columns):
 #   Column                          Non-Null Count   Dtype  
---  ------                          --------------   -----  
 0   data                            114936 non-null  object 
 1   horario                         114936 non-null  object 
 2   n_da_ocorrencia                 114936 non-null  int64  
 3   tipo_de_ocorrencia              114936 non-null  object 
 4   km                              114936 non-null  object 
 5   trecho                          114936 non-null  object 
 6   sentido                         114936 non-null  object 
 7   tipo_de_acidente                114936 non-null  object 
 8   automovel                       114936 non-null  float64
 9   bicicleta                       114936 non-null  float64
 10  caminhao                        114936 non-null  float64
 11  moto                            114936 non-null  float64
 12  onibus          

In [35]:
# Analisando se todos os valores são floats para futuras operações matemáticas
colunas_float = ['automovel', 'bicicleta', 'caminhao', 'moto', 'onibus', 'outros', 'tracao_animal', 'transporte_de_cargas_especiais', 'trator_maquinas', 'utilitarios', 'ilesos']

for coluna in colunas_float:
  try:
    df[coluna].sum()

  except TypeError:
    print(f'{coluna} contem tipos não float')

In [36]:
# Normalizando colunas que precisam ser float, se a coluna não for float, o codigo tenta converter, se não conseguir, adiciona 0 no local
for coluna in colunas_float:
  lista = []

  for veiculo in df[coluna].tolist():
    try:
      lista.append(float(veiculo))
    except:
      lista.append(0)

  df[coluna] = lista

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[coluna] = lista


In [37]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 114936 entries, 0 to 114993
Data columns (total 25 columns):
 #   Column                          Non-Null Count   Dtype  
---  ------                          --------------   -----  
 0   data                            114936 non-null  object 
 1   horario                         114936 non-null  object 
 2   n_da_ocorrencia                 114936 non-null  int64  
 3   tipo_de_ocorrencia              114936 non-null  object 
 4   km                              114936 non-null  object 
 5   trecho                          114936 non-null  object 
 6   sentido                         114936 non-null  object 
 7   tipo_de_acidente                114936 non-null  object 
 8   automovel                       114936 non-null  float64
 9   bicicleta                       114936 non-null  float64
 10  caminhao                        114936 non-null  float64
 11  moto                            114936 non-null  float64
 12  onibus          

In [38]:
#Olhando o dataframe para verificar integridade geral
display(df)

Unnamed: 0,data,horario,n_da_ocorrencia,tipo_de_ocorrencia,km,trecho,sentido,tipo_de_acidente,automovel,bicicleta,caminhao,moto,onibus,outros,tracao_animal,transporte_de_cargas_especiais,trator_maquinas,utilitarios,ilesos,levemente_feridos,moderadamente_feridos,gravemente_feridos,mortos,ano,mes
0,2010-01-01,12:14:00,239,Acidente sem Vítima,3945,BR-116/SP,Pista Norte,Choque,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,2010,1
1,2010-01-01,06:13:00,94,Acidente sem Vítima,5565,BR-116/SP,Pista Sul,Colisões Traseiras,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,2010,1
2,2010-01-01,14:42:00,314,Acidente sem Vítima,75,BR-116/SP,Pista Norte,Colisões Traseiras,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,2010,1
3,2010-01-01,18:55:00,440,Acidente sem Vítima,1023,BR-116/SP,Pista Sul,Abalroamento,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,2010,1
4,2010-01-01,08:14:00,144,Acidente sem Vítima,1078,BR-116/SP,Pista Sul,Choque,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,2010,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
114989,2022-02-23,08:38:00,295,Acidente com Vítima,303400,BR-116/SP,Decrescente,Colisões Traseiras,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,2022,2
114990,2022-02-24,22:23:10,747,Acidente com Vítima,325000,BR-116/SP,Crescente,Colisões Traseiras,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,2022,2
114991,2022-02-24,23:52:16,764,Acidente com Vítima,145200,BR-116/SP,Crescente,Engavetamento,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,2022,2
114992,2022-02-25,07:35:19,157,Acidente sem Vítima,146000,BR-116/SP,Crescente,Engavetamento,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2022,2


# Salvando arquivo tratado (Só execute depois que executar todos acima)

In [40]:
df.to_csv('dados_novadutra_tratados.csv', index=False)

In [44]:
# Assuming you already have the DataFrame 'df'

# Perform data manipulations or analysis on the DataFrame

# Save the DataFrame to a CSV file
output_file_path = %cd "/content/drive/MyDrive/Colab Notebooks/projeto"dados_novadutra_tratados.csv"
df.to_csv(output_file_path, index=False)
