## Análise do Dataset: Landslides After Rainfall, 2007-2016

### Discente: Samara Lima de Souza

Localização e causa de eventos de deslizamento de terra em todo o mundo

Contexto:

Os deslizamentos de terra são um dos perigos mais generalizados do mundo, causando mais de 11.500 mortes em 70 países desde 2007. A saturação do solo em encostas vulneráveis, chuvas intensas e prolongadas é o gatilho mais frequente de deslizamentos.

O Catálogo Global de Deslizamentos (GLC) foi desenvolvido com o objetivo de identificar eventos de deslizamentos desencadeados por chuvas em todo o mundo, independentemente do tamanho, impactos ou localização. O GLC considera todos os tipos de movimentos de massa desencadeados por chuvas, que foram relatados na mídia, bancos de dados de desastres, relatórios científicos ou outras fontes.

De acordo com os dados apresentados, pode se levantar algumas considerações interessantes:
    * Q1: Qual o período de tempo que mais ocorreram deslizamentos em cada país?
    * Q2: Como foram as ocorrências dos tipos de deslizamentos por país?

Vamos importar nossas bibliotecas para responder essas perguntas.

In [1]:
import pandas as pd
import altair as alt

In [2]:
df = pd.read_csv("db5-landslides/catalog.csv")

### Q1: Qual o período de tempo que mais ocorreram deslizamentos em cada país?

Nesse banco de dados existem dois tipos de valores iguais diferenciando o tipo de deslizamento (landslide_type):

O "landslide" com o L "maísculo" e outro com l "minúsculo"

In [3]:
pd.value_counts(df['landslide_type'])

Landslide             866
Mudslide              635
Rockfall               70
Complex                48
Debris flow            38
Other                  14
Riverbank collapse      5
Creep                   5
mudslide                4
Snow avalanche          2
Lahar                   2
Rockslide               1
landslide               1
Unknown                 1
Name: landslide_type, dtype: int64

Vamos então, juntar esses valores:

In [4]:
df.loc[df.landslide_type == "landslide",'landslide_type'] = 'Landslide'

In [5]:
pd.value_counts(df['landslide_type'])

Landslide             867
Mudslide              635
Rockfall               70
Complex                48
Debris flow            38
Other                  14
Riverbank collapse      5
Creep                   5
mudslide                4
Snow avalanche          2
Lahar                   2
Rockslide               1
Unknown                 1
Name: landslide_type, dtype: int64

Será feito o mesmo processo para a feature "trigger" que possui a mesma palavra sendo diferenciada apenas pelo a inicial da palavra Maiúscula e minúscula:

*Unknown e unknown

In [6]:
pd.value_counts(df['trigger'])

Downpour                   866
Rain                       446
Tropical cyclone           131
Continuous rain             95
Unknown                     88
unknown                     12
Snowfall snowmelt           11
Flooding                    10
Earthquake                   8
Mining digging               7
Freeze thaw                  6
Construction                 5
Other                        3
Dam embankment collapse      1
downpour                     1
Volcano                      1
Name: trigger, dtype: int64

In [7]:
df.loc[df.trigger == "unknown",'trigger'] = 'Unknown'

In [8]:
pd.value_counts(df['trigger'])

Downpour                   866
Rain                       446
Tropical cyclone           131
Unknown                    100
Continuous rain             95
Snowfall snowmelt           11
Flooding                    10
Earthquake                   8
Mining digging               7
Freeze thaw                  6
Construction                 5
Other                        3
Dam embankment collapse      1
downpour                     1
Volcano                      1
Name: trigger, dtype: int64

Será feito o mesmo processo para a feature "landslide_size" que possui as mesmas palavras sendo diferenciadas apenas pelo a inicial da palavra Maiúscula e minúscula:

In [12]:
pd.value_counts(df['landslide_size'])

Medium        965
Small         636
Large          68
Very_large     15
medium          3
small           3
large           2
Name: landslide_size, dtype: int64

In [15]:
df.loc[df.landslide_size == "medium",'landslide_size'] = 'Medium'
df.loc[df.landslide_size == "small",'landslide_size'] = 'Small'
df.loc[df.landslide_size == "large",'landslide_size'] = 'Large'

In [16]:
pd.value_counts(df['landslide_size'])

Medium        968
Small         639
Large          70
Very_large     15
Name: landslide_size, dtype: int64

Levando em consideração um panorama geral, vamos visualizar quais as ocorrências dos tipos de deslizamentos que ocorreram em cada ano:

In [47]:
alt.Chart(df).mark_bar().encode(
    y = 'ano',
    x = 'landslide_type',
    color="landslide_size",
    tooltip=[alt.Tooltip("landslide_size"),]
)

Com base no gráfico acima os tipos de deslizamentos que mais ocorreram no período de 2007 a 2016 foram os delizamentos de terra e de lama.

Agora vamos visualizar essas informações para cada país:

In [52]:
slider = alt.binding_range(min=2007, max=2016, step=10)
select_year = alt.selection_single(name="ano", fields=['ano'],
                                   bind=slider, init={'year': 2016})

alt.Chart(df).mark_bar().encode(
    x=alt.X('country_name:N'),
    y=alt.Y('landslide_type'),
    color=alt.Color('country_name'),
    column='ano:O'
).properties(
    width=20
).add_selection(
    select_year
).transform_filter(
    select_year
).configure_facet(
    spacing=8
)

In [9]:
bars = alt.Chart(df).mark_bar(color="cadetblue").encode(
    alt.Y("trigger"),
    alt.X("count()")
)
text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='count()'
)

(bars + text).properties()

In [17]:
alt.Chart(df).mark_bar().encode(
    x = 'trigger',
    y = 'country_name',
    color="landslide_size",
    tooltip=[alt.Tooltip("landslide_type"),]
)

In [26]:
base = alt.Chart(df).mark_geoshape(
    fill='#666666',
    stroke='white'
).properties(
    width=300,
    height=180
)

base

In [24]:
alt.Chart(df).mark_circle().encode(
    latitude="latitude:Q",
    longitude="Longitude:Q"
)

In [28]:
df['date'] = pd.to_datetime(df.date)
df.head()

Unnamed: 0,id,date,time,continent_code,country_name,country_code,state/province,population,city/town,distance,...,geolocation,hazard_type,landslide_type,landslide_size,trigger,storm_name,injuries,fatalities,source_name,source_link
0,34,2007-03-02,Night,,United States,US,Virginia,16000,Cherry Hill,3.40765,...,"(38.600900000000003, -77.268199999999993)",Landslide,Landslide,Small,Rain,,,,NBC 4 news,http://www.nbc4.com/news/11186871/detail.html
1,42,2007-03-22,,,United States,US,Ohio,17288,New Philadelphia,3.33522,...,"(40.517499999999998, -81.430499999999995)",Landslide,Landslide,Small,Rain,,,,Canton Rep.com,http://www.cantonrep.com/index.php?ID=345054&C...
2,56,2007-04-06,,,United States,US,Pennsylvania,15930,Wilkinsburg,2.91977,...,"(40.4377, -79.915999999999997)",Landslide,Landslide,Small,Rain,,,,The Pittsburgh Channel.com,https://web.archive.org/web/20080423132842/htt...
3,59,2007-04-14,,,Canada,CA,Quebec,42786,Châteauguay,2.98682,...,"(45.322600000000001, -73.777100000000004)",Landslide,Riverbank collapse,Small,Rain,,,,Le Soleil,http://www.hebdos.net/lsc/edition162007/articl...
4,61,2007-04-15,,,United States,US,Kentucky,6903,Pikeville,5.66542,...,"(37.432499999999997, -82.493099999999998)",Landslide,Landslide,Small,Downpour,,,0.0,Matthew Crawford (KGS),


In [31]:
df['date']=pd.to_datetime(df['date'].astype(str), format='%m/%d/%Y')
df.head()

Unnamed: 0,id,date,time,continent_code,country_name,country_code,state/province,population,city/town,distance,...,geolocation,hazard_type,landslide_type,landslide_size,trigger,storm_name,injuries,fatalities,source_name,source_link
0,34,2007-03-02,Night,,United States,US,Virginia,16000,Cherry Hill,3.40765,...,"(38.600900000000003, -77.268199999999993)",Landslide,Landslide,Small,Rain,,,,NBC 4 news,http://www.nbc4.com/news/11186871/detail.html
1,42,2007-03-22,,,United States,US,Ohio,17288,New Philadelphia,3.33522,...,"(40.517499999999998, -81.430499999999995)",Landslide,Landslide,Small,Rain,,,,Canton Rep.com,http://www.cantonrep.com/index.php?ID=345054&C...
2,56,2007-04-06,,,United States,US,Pennsylvania,15930,Wilkinsburg,2.91977,...,"(40.4377, -79.915999999999997)",Landslide,Landslide,Small,Rain,,,,The Pittsburgh Channel.com,https://web.archive.org/web/20080423132842/htt...
3,59,2007-04-14,,,Canada,CA,Quebec,42786,Châteauguay,2.98682,...,"(45.322600000000001, -73.777100000000004)",Landslide,Riverbank collapse,Small,Rain,,,,Le Soleil,http://www.hebdos.net/lsc/edition162007/articl...
4,61,2007-04-15,,,United States,US,Kentucky,6903,Pikeville,5.66542,...,"(37.432499999999997, -82.493099999999998)",Landslide,Landslide,Small,Downpour,,,0.0,Matthew Crawford (KGS),


In [37]:
df['ano'] = df['date'].dt.strftime('%Y')
df['mes'] = df['date'].dt.strftime('%m')

In [38]:
df.head(10)

Unnamed: 0,id,date,time,continent_code,country_name,country_code,state/province,population,city/town,distance,...,landslide_type,landslide_size,trigger,storm_name,injuries,fatalities,source_name,source_link,ano,mes
0,34,2007-03-02,Night,,United States,US,Virginia,16000,Cherry Hill,3.40765,...,Landslide,Small,Rain,,,,NBC 4 news,http://www.nbc4.com/news/11186871/detail.html,2007,3
1,42,2007-03-22,,,United States,US,Ohio,17288,New Philadelphia,3.33522,...,Landslide,Small,Rain,,,,Canton Rep.com,http://www.cantonrep.com/index.php?ID=345054&C...,2007,3
2,56,2007-04-06,,,United States,US,Pennsylvania,15930,Wilkinsburg,2.91977,...,Landslide,Small,Rain,,,,The Pittsburgh Channel.com,https://web.archive.org/web/20080423132842/htt...,2007,4
3,59,2007-04-14,,,Canada,CA,Quebec,42786,Châteauguay,2.98682,...,Riverbank collapse,Small,Rain,,,,Le Soleil,http://www.hebdos.net/lsc/edition162007/articl...,2007,4
4,61,2007-04-15,,,United States,US,Kentucky,6903,Pikeville,5.66542,...,Landslide,Small,Downpour,,,0.0,Matthew Crawford (KGS),,2007,4
5,64,2007-04-20,,,United States,US,Kentucky,6903,Pikeville,0.23715,...,Landslide,Small,Rain,,,,Applalachain news-express,http://www.news-expressky.com/articles/2007/04...,2007,4
6,67,2007-04-24,,,United States,US,South Dakota,2540,Dakota Dunes,2.48033,...,Landslide,Small,Rain,,,,Sioux City Journnal,http://www.siouxcityjournal.com/articles/2007/...,2007,4
7,77,2007-05-21,,SA,Colombia,CO,Risaralda,440118,Pereira,0.62022,...,Mudslide,Large,Rain,,,13.0,Reuters - AlertNet.org,http://www.reuters.com/news/video/videoStory?v...,2007,5
8,105,2007-06-27,,SA,Ecuador,EC,Zamora-Chinchipe,15276,Zamora,0.47714,...,Landslide,Medium,Downpour,,,,Red Cross - Field reports,https://www-secure.ifrc.org/dmis/prepare/view_...,2007,6
9,106,2007-06-27,,SA,Ecuador,EC,Loja,117796,Loja,0.35649,...,Landslide,Medium,Downpour,,,,Red Cross - Field reports,https://www-secure.ifrc.org/dmis/prepare/view_...,2007,6


United States                       986
Colombia                             96
Guatemala                            79
Costa Rica                           75
Mexico                               73
Trinidad and Tobago                  63
Panama                               40
Honduras                             38
Jamaica                              33
Nicaragua                            31
Ecuador                              30
El Salvador                          22
Haiti                                20
Venezuela                            20
Dominican Republic                   15
Dominica                             14
Peru                                 14
Canada                               10
Saint Lucia                           8
Puerto Rico                           7
Brazil                                4
Saint Vincent and the Grenadines      4
Cuba                                  3
U.S. Virgin Islands                   2
Grenada                               2


In [45]:
alt.Chart(df).mark_bar().encode(
    y = 'ano',
    x = 'country_name',
    color="fatalities",
    tooltip=[alt.Tooltip("fatalities"),]
)