<a href="https://colab.research.google.com/github/nadjapereira/python-applications/blob/master/aula7_parte6_extra_criando_o_proprio_mapa.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Criando o próprio mapa 

Primeiro iremos criar um mapa mundi onde iremos utilizar a geolocalização dos tweets que convertemos.

Além disso iremos visualizar informações quando passarmos o mouse sobre o ponto.

Iremos passar por três passos para construir essa visualização interativa:

* Passo 1. Criar um fundo com o mapa do mundo adicionando todas as fronteiras dos paises definidas pela latitude e longitude.
* Passo 2. Carregar os tweets que queremos visuailizar com as coordenadas e autor.
* Passo 3. Plotar no map as coordenadas dos tweets.
* Passo 4. Finalmente, iremos ativar a opção para visualizar as informações.



## Passo 1

Iremos criar uma lista chamada ```dados``` que contém todas as fronteiras dos paises com as respectivas latitude e longitude. Com esses dados iremos criar um mapa com todos os paises.

O arquivo [```World_Country_Boundaries.csv.gz```](https://github.com/hargup/WIGI-website/blob/master/World_Country_Boundaries.csv.gz) contém esses dados. Segue um exemplo dos dados.

```xml
"<MultiGeometry><Polygon><outerBoundaryIs><LinearRing><coordinates>-44.951391,-1.601664 -44.858891,-1.430627 -45.158755,-1.480418 -45.3507,-1.736809 -45.324727,-1.314718 -45.462227,-1.545555 -45.446946,-1.310836 -45.696182,-1.368682 -45.735555,-1.18 -45.861527,-1.259518 -45.9757,-1.0775 -46.046673,-1.210282 -46.259446,-1.177782 -46.191946,-0.9575 
...Dados omitidos...
-43.477709,-2.382782 -43.448336,-2.537782 -43.928336,-2.548473 -44.198336,-2.868891 -44.338373,-2.7808 -44.033891,-2.413609 -44.356946,-2.526664 -44.423064,-2.934446 -44.786391,-3.2975 -44.582091,-2.566809 -44.360555,-2.341946 -44.450836,-2.146391 -44.654936,-2.323682 -44.489727,-1.986664 -44.695009,-1.817782</coordinates></LinearRing></outerBoundaryIs></Polygon><Polygon><outerBoundaryIs><LinearRing><coordinates>-50.814309,-1.329518 -50.723918,-1.510146 -50.579518,-1.798682 -50.054309,-1.708473 -49.812991,-1.814446 -49.757573,-1.638755 -49.6725,-1.776664 -48.876673,-1.487782 -48.539173,-0.900282 -48.410282,-0.262127 -50.3289,-0.100282 -50.646664,-0.2625 -50.726391,-0.497782 -50.557427,-0.678609 -50.775,-0.644164 -50.796955,-0.971946 -50.569455,-1.101946 -50.781109,-1.151946 -50.790582,-1.202591</coordinates></LinearRing></outerBoundaryIs></Polygon><Polygon><outerBoundaryIs><LinearRing><coordinates>-51.901391,-1.476664 -51.276809,-1.02 -51.198336,-0.530282 -51.5914,-0.724818 -51.609727,-0.733891 -51.6428,-0.818109</coordinates></LinearRing></outerBoundaryIs></Polygon><Polygon><outerBoundaryIs><LinearRing><coordinates>-49.643618,0.210282 -49.703891,0.335 -50.391946,0.189718 -50.349864,0.021809 -49.895018,0</coordinates></LinearRing></outerBoundaryIs></Polygon><Polygon><outerBoundaryIs><LinearRing><coordinates>-50.5625,-0.054164 -50.972782,-0.09 -51.027227,-0.224164 -50.857782,-0.283055</coordinates></LinearRing></outerBoundaryIs></Polygon><Polygon><outerBoundaryIs><LinearRing><coordinates>-49.650982,0.077364 -49.8525,-0.064446 -49.525555,-0.134446 -49.397809,0</coordinates></LinearRing></outerBoundaryIs></Polygon><Polygon><outerBoundaryIs><LinearRing><coordinates>-50.317782,0.308891 -50.377782,0.618336 -50.536391,0.222364 -50.441118,0.171109</coordinates></LinearRing></outerBoundaryIs></Polygon><Polygon><outerBoundaryIs><LinearRing><coordinates>-50.813609,-0.576109 -51.070282,-0.694446 -51.140009,-0.962218</coordinates></LinearRing></outerBoundaryIs></Polygon><Polygon><outerBoundaryIs><LinearRing><coordinates>-50.059864,0.643054 -50.308064,0.506391 -50.254173,0.341945</coordinates></LinearRing></outerBoundaryIs></Polygon><Polygon><outerBoundaryIs><LinearRing><coordinates>-45.321946,-23.726946 -45.447918,-23.9175 -45.243336,-23.967227</coordinates></LinearRing></outerBoundaryIs></Polygon></MultiGeometry>",BR,Brazil
```

Primeiro teremos que importar alguns bibliotecas para lidar com esse tipo de dado.

O primeiro módulo que iremos importar é o gzip. Esse módulo fornece uma interface simples para comprimir ou descomprimir arquivos.

Iremos utilizar o método ```open``` do módulo para abrir o arquivo binário.

https://docs.python.org/3/library/gzip.html

In [0]:
import gzip

O segundo módulo é o codecs. Esse módulo permite realizar a codificação e decodificação de strings. Lembrando que nosso arquivo está na codificação do UTF-8.

https://docs.python.org/3/library/codecs.html

Iremos utilizar o método chamado ```iterdecode``` que utilizar um decodificador incremental para decodificar iterativamente a entrada fornecida pelo arquivo.

Em conjunto com o decodificador, iremos utilizar a função ```next```, que recuperar o próximo item do arquivo chamando o método especial ```__next__()```. 

https://docs.python.org/3/library/functions.html?highlight=next#next

In [0]:
import codecs

O próximo módulo, é o CSV, esse módulo possibilita carregar dados no formato separado por virgula.

https://docs.python.org/3/library/csv.html

In [0]:
import csv

Por fim o último módulo para trabalhar com tags XML, que temos dentro do nosso arquivo para cada país (conforme visto anteriormente).

O módulo ```xml.etree.ElementTree``` implementa uma API simples e eficiente para analisar e criar dados XML.

Iremos utilizar o método ```fromstring``` para pegar o texto com as tags e passar para o analisador XML.

https://docs.python.org/3.5/library/xml.etree.elementtree.html

In [0]:
import xml.etree.cElementTree as et

In [0]:
from bokeh.models import ColumnDataSource
def gerar_mapa(arquivo):
    nan = float('NaN')
    data = {}
    with gzip.open(arquivo) as f:
        decoded = codecs.iterdecode(f, "utf-8")
        next(decoded)
        reader = csv.reader(decoded, delimiter=',', quotechar='"')
        for row in reader:
            geometry, code, name = row
            xml = et.fromstring(geometry)
            lats = []
            lons = []
            for i, poly in enumerate(xml.findall('.//outerBoundaryIs/LinearRing/coordinates')):
                if i > 0:
                    lats.append(nan)
                    lons.append(nan)
                coords = (c.split(',')[:2] for c in poly.text.split())
                lat, lon = list(zip(*[(float(lat), float(lon)) for lon, lat in coords]))
                lats.extend(lat)
                lons.extend(lon)
                print(lats)
                data[code] = {
                    'name' : name,
                    'lats' : lats,
                    'lons' : lons,
                }
    
    world_countries = data.copy()
    countries_source = ColumnDataSource(data= dict(
            countries_xs=[world_countries[code]['lons'] for code in world_countries],
            countries_ys=[world_countries[code]['lats'] for code in world_countries],
            country = [world_countries[code]['name'] for code in world_countries],
    ))
    
    return countries_source

In [0]:
origem_paises = gerar_mapa('World_Country_Boundaries.csv.gz')

In [0]:
from bokeh.plotting import figure, show, output_notebook

In [0]:
output_notebook()

In [0]:
p = figure(
    plot_width=1000,
    plot_height=600,
    tools="pan,wheel_zoom,box_zoom,reset,hover,save"
)

In [0]:
p.patches(xs="countries_xs", ys="countries_ys", source = origem_paises, fill_color="#F1EEF6", fill_alpha=0.3,
line_color="#999999", line_width=0.5)

In [0]:
show(p)

## Passo 2 - Separando os Tweets

In [0]:
import pandas as pd

In [0]:
df = pd.read_csv('tweets_10min.csv', sep=';')

In [0]:
df.head(1)

In [0]:
df_aux = df[['created_at', 'lat', 'long','location' ,'screen_name', 'text']].copy()

In [0]:
df_aux = df_aux.rename(columns={'created_at':'date', 'screen_name':'user'})

In [0]:
df_aux.head(3)

## Passo 3 - Adicionando os Tweets no Mapa

In [0]:
tweets_source = ColumnDataSource(df_aux)

In [0]:
p.title.text="{} tweets".format(len(df_aux.index))
p.title.text_font_size="20pt"

In [0]:
p.scatter(x="long", y="lat", source=tweets_source, fill_color="#FF0000", line_color="#FF0000")

In [0]:
show(p)

# Passo 4 - Adicionar as informações complementares

In [0]:
from bokeh.models import HoverTool
from collections import OrderedDict

In [0]:
hover = p.select(dict(type=HoverTool))
hover.point_policy = "follow_mouse"
hover.tooltips = OrderedDict([
("user", "@user"),
("local", "@location"),
])

The following code gives an overview of the world map with the red dots representing the locations of the tweets' origins:

In [0]:
show(p)