# Global landslides dataset (very) basic investigation

## Description:

This super short notebook aims at showing what can be done with very basic data handling skills in Python. Everything in it - including Python, Jupyter and Markdown - was learned following the first step of [Dataquest](www.dataquest.io) "Data Scientist in Python" path. The notebook will be improved with Data Visualization once I know how to do it :). 

We will work with a very interesting CSV file found at [Nasa's open data portal](https://data.nasa.gov/Earth-Science/Global-Landslide-Catalog-Export/dd9e-wu2v). Here is the file description :
>The Global Landslide Catalog (GLC) was developed with the goal of identifying rainfall-triggered landslide events around the world, regardless of size, impacts or location. The GLC considers all types of mass movements triggered by rainfall, which have been reported in the media, disaster databases, scientific reports, or other sources. The GLC has been compiled since 2007 at NASA Goddard Space Flight Center. This is a unique data set with the ID tag “GLC” in the landslide editor.

The file is made of more than 11,000 rows of 31 columns, each row being a landslide described with 31 characteristics. Let's perform easy investigation on it.

## Step 1: Let's turn the CSV file into a dataset and display the first row

The first thing to do to work with the CSV file is to turn it into a Python object. We choose to create our own class of object: a dataset with 3 attributes. These 3 attributes are:
1. ***h*** : is a list containing the header row
2. ***l*** : is a list of lists containing all the remaining rows (the landslides)
3. ***d*** : is a dictionnary with the key/value pair being header/list of values

We choose to name our dataset "lsc" for *landslide catalog*. We will also use a "ls" - for *landslide* - variable to loop through "lsc".

In [1]:
import csv
class dataset:
    def __init__(self,CSVfile):
        file_handler=open(CSVfile,'r')
        list_of_lists=list(csv.reader(file_handler))
        self.h=list_of_lists[0]
        self.l=list_of_lists[1:]
        self.d={}
        for index,header in enumerate(self.h):
            column_data=[]
            for row in self.l:
                column_data.append(row[index])
            self.d[header]=column_data
            
lsc=dataset('Global_Landslide_Catalog_Export.csv')

We can now display the first row of our dataset. As said before, it is one landslide with 31 characteristics.

In [2]:
for index,header in enumerate(lsc.h):
    print(index, header,':', lsc.l[0][index])

0 source_name : AGU
1 source_link : https://blogs.agu.org/landslideblog/2008/10/14/the-lifan-landslide-from-natural-disaster-to-cover-up/
2 event_id : 684
3 event_date : 08/01/2008 12:00:00 AM
4 event_time : 
5 event_title : Sigou Village, Loufan County, Shanxi Province
6 event_description : occurred early in morning, 11 villagers buried in 7 houses
7 location_description : Sigou Village, Loufan County, Shanxi Province
8 location_accuracy : unknown
9 landslide_category : landslide
10 landslide_trigger : rain
11 landslide_size : large
12 landslide_setting : mine
13 fatality_count : 11
14 injury_count : 
15 storm_name : 
16 photo_link : 
17 notes : 
18 event_import_source : glc
19 event_import_id : 684
20 country_name : China
21 country_code : CN
22 admin_division_name : Shaanxi
23 admin_division_population : 0
24 gazeteer_closest_point : Jingyang
25 gazeteer_distance : 41.02145
26 submitted_date : 04/01/2014 12:00:00 AM
27 created_date : 11/20/2017 03:17:00 PM
28 last_edited_date : 02/1

## Step 2: How many landslides per year?

We can find the year of each landslide in our dictionnary, under the key 'event_date'. It is embedded inside a string containing the full date and the time. We isolate the year using the module datetime and append it to our data in both ***l*** (the list of landslides) and ***d*** (the dictionnary).

In [3]:
(lsc.d['event_date'][0:5])

['08/01/2008 12:00:00 AM',
 '01/02/2009 02:00:00 AM',
 '01/19/2007 12:00:00 AM',
 '07/31/2009 12:00:00 AM',
 '10/16/2010 12:00:00 PM']

In [4]:
import datetime
year_list=[]
for ls in lsc.l:
        year=datetime.datetime.strptime(ls[3],'%m/%d/%Y %I:%M:%S %p').year
        ls.append(year)
        year_list.append(year)
lsc.d['year']=year_list

We can now count the number of landslides per year and display the result. According to the file description, the Global Landslide Catalog has been compiled since 2007, therefore the data before that year is not relevant.

In [5]:
count_per_year={}
for year in lsc.d['year']:
    if year in count_per_year:
        count_per_year[year]+=1
    else:
        count_per_year[year]=1
count_per_year

{2008: 553,
 2009: 423,
 2007: 412,
 2010: 1536,
 2012: 794,
 2014: 1035,
 2015: 1341,
 2011: 1324,
 2017: 1255,
 2016: 1183,
 2013: 1132,
 1997: 10,
 1998: 12,
 2005: 2,
 1996: 2,
 2006: 13,
 2004: 1,
 1995: 1,
 2003: 2,
 1993: 1,
 1988: 1}

Let's pretend we would like to study how climate change affects the number of landslides: we would only be interested by landslides triggered by weather. We can add another column to our dataset in both ***l*** and ***d*** : this column 'weather_related' describes wether the landslide was triggered by a weather phenomenon or not. Here are all the different triggers listed under the key 'landslide_trigger' :

In [6]:
set(lsc.d['landslide_trigger'])

{'',
 'construction',
 'continuous_rain',
 'dam_embankment_collapse',
 'downpour',
 'earthquake',
 'flooding',
 'freeze_thaw',
 'leaking_pipe',
 'mining',
 'monsoon',
 'no_apparent_trigger',
 'other',
 'rain',
 'snowfall_snowmelt',
 'tropical_cyclone',
 'unknown',
 'vibration',
 'volcano'}

If a landslide was triggered by either continuous rain, downpour, flooding, freeze thaw, monsoon, rain, snowfall, snowmelt or tropical cyclone, 'weather_related' is True. Otherwise, it is False.

In [7]:
weather_related_list=[]
for ls in lsc.l:
    if ls[10]=='continuous_rain' or ls[10]=='downpour' or ls[10]=='flooding' or ls[10]=='freeze_thaw' or ls[10]=='monsoon' or ls[10]=='rain' or ls[10]=='snowfall_snowmelt' or ls[10]=='tropical_cyclone':
        ls.append(True)
        weather_related_list.append(True)
    else:
        ls.append(False)
        weather_related_list.append(False)
lsc.d['weather_related']=weather_related_list

Here is the first row of data now that we have added the columns 'year' and 'weather_related' :

In [8]:
for header,data in lsc.d.items():
    print(header,':',data[0])

source_name : AGU
source_link : https://blogs.agu.org/landslideblog/2008/10/14/the-lifan-landslide-from-natural-disaster-to-cover-up/
event_id : 684
event_date : 08/01/2008 12:00:00 AM
event_time : 
event_title : Sigou Village, Loufan County, Shanxi Province
event_description : occurred early in morning, 11 villagers buried in 7 houses
location_description : Sigou Village, Loufan County, Shanxi Province
location_accuracy : unknown
landslide_category : landslide
landslide_trigger : rain
landslide_size : large
landslide_setting : mine
fatality_count : 11
injury_count : 
storm_name : 
photo_link : 
notes : 
event_import_source : glc
event_import_id : 684
country_name : China
country_code : CN
admin_division_name : Shaanxi
admin_division_population : 0
gazeteer_closest_point : Jingyang
gazeteer_distance : 41.02145
submitted_date : 04/01/2014 12:00:00 AM
created_date : 11/20/2017 03:17:00 PM
last_edited_date : 02/15/2018 03:51:00 PM
longitude : 107.45
latitude : 32.5625
year : 2008
weather_

We count again the number of landslides per year, but this time, only the weather related landslides.

In [9]:
weather_related_per_year={}
for ls in lsc.l:
    if ls[32]==True:
        if ls[31] in weather_related_per_year:
            weather_related_per_year[ls[31]]+=1
        else:
            weather_related_per_year[ls[31]]=1
weather_related_per_year

{2008: 519,
 2009: 407,
 2007: 393,
 2010: 1515,
 2012: 551,
 2011: 1189,
 2017: 1040,
 2016: 773,
 2013: 939,
 2015: 886,
 2014: 709,
 1997: 10,
 1998: 12,
 1996: 2,
 2006: 12,
 2004: 1,
 1995: 1,
 1993: 1,
 1988: 1}

## Step 3: Finding hints of climate change in 'event_description'

In NASA's Global Landslide Catalog, we find a column 'event_desciption'. As the name suggests, it is brief report about the landslide given by the source. In it, we can find hints of climate change: unusual or record-breaking weather or even reporters blaming global warming.

We can use regular expressions (regex) to find out. In the next cells, we will display and count all event desciption matching a given regular expression. As you will see, while not always relevant, this method shows that the words 'record breaking' and 'unusual' are, for example, often use to describe the weather in our decade.

In [10]:
import re
def find_regex(regex):
    count_per_year={}
    print('Counting \'',regex,'\'')
    for index,event_des in enumerate(lsc.d['event_description']):
        if re.search(regex,event_des) is not None:
            year=lsc.d['year'][index]
            print(regex,'found at row',index,'(year',year,') :\n',event_des,'\n')
            if year in count_per_year:
                count_per_year[year]+=1
            else:
                count_per_year[year]=1
                
    print('Number of \'',regex,'\' found :',count_per_year)
    return(count_per_year)
        
            
list_of_regex=['[Cc]limate change','[Gg]lobal warming','[Gg]reenhouse effect','[Rr]ecord-breaking','[Hh]ottest','[Uu]nusual']

hints_count={}

for regex in list_of_regex:
        hints_count[regex]=find_regex(regex)
        print('\n')

Counting ' [Cc]limate change '
[Cc]limate change found at row 7235 (year 2015 ) :
 Akbayan’s Rep. Barry Gutierrez, together with Rep. Angie Katoh and Rep. Rodel Batocabe, filed House Resolution 2274 directing the House committees on natural resources, climate change and energy to jointly conduct an inquiry, in aid of legislation, on the collapse of a portion of the Panian pit of Semirara Mining and Power Corp. in Semirara Island, Caluya, Antique. 

[Cc]limate change found at row 11012 (year 2017 ) :
 Landslide blocks Malai-Rohtang HWY. Article: ""Kullu along with other northern regions of India is experiencing heavy rains and strong winds from past few days. The sudden climate change has caused a lot of damage especially to the ready crops and people’s properties. Also bringing in sudden avalanches and landslides."" 

Number of ' [Cc]limate change ' found : {2015: 1, 2017: 1}


Counting ' [Gg]lobal warming '
[Gg]lobal warming found at row 2721 (year 2010 ) :
 August 23, 2010  MITTIMATA

In [11]:
hints_count

{'[Cc]limate change': {2015: 1, 2017: 1},
 '[Gg]lobal warming': {2010: 1},
 '[Gg]reenhouse effect': {},
 '[Rr]ecord-breaking': {2013: 7, 2011: 2, 2014: 2, 2010: 2, 2015: 1},
 '[Hh]ottest': {2010: 1, 2013: 1},
 '[Uu]nusual': {2010: 2,
  2014: 6,
  2011: 7,
  2012: 1,
  2016: 1,
  2015: 2,
  2013: 1,
  2009: 2,
  2017: 3}}

In [13]:
total_hints={}
for hint,year_count in hints_count.items():
    for year in year_count:
        if year in total_hints:
            total_hints[year]+=year_count[year]
        else:
            total_hints[year]=year_count[year]
total_hints
            

{2015: 4,
 2017: 4,
 2010: 6,
 2013: 9,
 2011: 9,
 2014: 8,
 2012: 1,
 2016: 1,
 2009: 2}