## unificacao_bases_mogi_conecta

Ações - mapeamentos e cruzamentos com Mogi Conecta

### Data Sources
- file1 : Description of where this file came from

### Changes
- 06-16-2023 : Started project

In [57]:
import json
from datetime import datetime
from pathlib import Path

import pandas as pd

### File Locations

In [58]:
today = datetime.today()
in_file = Path.cwd() / "abairramento.geojson"
result_file = Path.cwd() / "abairramento.parquet"

In [59]:
# Lendo o arquivo .geojson
with open('abairramento.geojson') as f:
    data = json.load(f)

# Acessando os dados do arquivo
df_normalized = pd.json_normalize(data['features'])

# Convertendo os dados normalizados em um dataframe
df = pd.DataFrame(df_normalized)

### Column Cleanup

- Remove all leading and trailing spaces
- Rename the columns for consistency.

In [60]:
# https://stackoverflow.com/questions/30763351/removing-space-in-dataframe-python
df.columns = [x.strip() for x in df.columns]

In [61]:
{col: '' for col in df.columns}

{'type': '',
 'properties.Name': '',
 'properties.description': '',
 'properties.timestamp': '',
 'properties.begin': '',
 'properties.end': '',
 'properties.altitudeMode': '',
 'properties.tessellate': '',
 'properties.extrude': '',
 'properties.visibility': '',
 'properties.drawOrder': '',
 'properties.icon': '',
 'properties.NOME': '',
 'properties.DISTRITO': '',
 'properties.NUMERO': '',
 'geometry.type': '',
 'geometry.coordinates': ''}

In [62]:
cols_to_rename = {
    'properties.NOME'         : 'nome',
    'properties.DISTRITO'     : 'distrito',
    'properties.NUMERO'       : 'numero',
    'geometry.coordinates'    : 'coordenadas'
 }
df.rename(columns=cols_to_rename, inplace=True)

In [63]:
df.columns

Index(['type', 'properties.Name', 'properties.description',
       'properties.timestamp', 'properties.begin', 'properties.end',
       'properties.altitudeMode', 'properties.tessellate',
       'properties.extrude', 'properties.visibility', 'properties.drawOrder',
       'properties.icon', 'nome', 'distrito', 'numero', 'geometry.type',
       'coordenadas'],
      dtype='object')

### Clean Up Data Types

In [64]:
df.dtypes

type                       object
properties.Name            object
properties.description     object
properties.timestamp       object
properties.begin           object
properties.end             object
properties.altitudeMode    object
properties.tessellate       int64
properties.extrude          int64
properties.visibility       int64
properties.drawOrder       object
properties.icon            object
nome                       object
distrito                   object
numero                     object
geometry.type              object
coordenadas                object
dtype: object

### Data Manipulation

In [65]:
# Função para calcular a média da latitude e longitude
def calcular_media(valores):
    latitudes = [coord[1] for coord in valores[0][0]]
    longitudes = [coord[0] for coord in valores[0][0]]
    
    # Calcula a extensão geográfica
    min_lat, max_lat = min(latitudes), max(latitudes)
    min_lon, max_lon = min(longitudes), max(longitudes)

    # Calcula a média das latitudes e longitudes
    latitude_media = (min_lat + max_lat) / 2
    longitude_media = (min_lon + max_lon) / 2

    # Calcula a extensão (largura e altura) da geometria selecionada
    geometry_height = float(format(max_lat - min_lat, '.7f'))
    geometry_width = float(format(max_lon - min_lon, '.7f'))




    return pd.Series({
        'latitude_media': latitude_media, 'longitude_media': longitude_media,
        'geometry_width': geometry_width, 'geometry_height': geometry_height,
    })

# Aplicar a função e adicionar novas colunas
df[['latitude_media', 'longitude_media', 'geometry_width', 'geometry_height']] = df['coordenadas'].apply(calcular_media)
df.columns

Index(['type', 'properties.Name', 'properties.description',
       'properties.timestamp', 'properties.begin', 'properties.end',
       'properties.altitudeMode', 'properties.tessellate',
       'properties.extrude', 'properties.visibility', 'properties.drawOrder',
       'properties.icon', 'nome', 'distrito', 'numero', 'geometry.type',
       'coordenadas', 'latitude_media', 'longitude_media', 'geometry_width',
       'geometry_height'],
      dtype='object')

In [66]:
df = df.sort_values('nome')

In [67]:
df[['nome', 'distrito', 'latitude_media', 'longitude_media', 'geometry_width', 'geometry_height']]

Unnamed: 0,nome,distrito,latitude_media,longitude_media,geometry_width,geometry_height
11,ALTO DO IPIRANGA,DISTRITO SEDE,-23.538357,-46.202279,0.009936,0.019483
38,ARUA,DISTRITO ALTO DO PARATEI,-23.473431,-46.261070,0.025324,0.031087
66,BARROSO,DISTRITO DE QUATINGA,-23.692141,-46.221106,0.047840,0.049611
84,BEIJA FLOR,DISTRITO DO TABOAO,-23.423249,-46.150231,0.086244,0.073808
51,BELLA CITTA,DISTRITO SEDE,-23.493731,-46.175116,0.012077,0.018817
...,...,...,...,...,...,...
67,VL. RUBENS,DISTRITO SEDE,-23.529978,-46.212143,0.017799,0.011392
46,VL. SAO FRANCISCO,DISTRITO BRAZ CUBAS,-23.524402,-46.217133,0.024571,0.021170
102,VL. SAO SEBASTIAO,DISTRITO SEDE,-23.551294,-46.198654,0.010217,0.010294
68,VL. SUISSA,DISTRITO DE CEZAR DE SOUZA,-23.497694,-46.150890,0.023873,0.015217


### Save output file into processed directory

Save a file in the processed directory that is cleaned properly. It will be read in and used later for further analysis.

Other options besides pickle include:
- feather
- msgpack
- parquet

In [68]:
df.to_parquet(result_file)