# **Objective:**     
The objective of this session is to refresh our basic Python skills by calculating some simple summary statistics for data in a CSV file, this time using a customized Python class object.

### The Scenario

You are working on a research project that is trying to assess the
impact of wind-related disasters in terms of human deaths and property
damage.  You have been given a comma separated value (CSV) file from the 
EM-DAT: The OFDA/CRED International Disaster Database
(Data version: v06.01, Created on: Feb-7-2006), downloaded 
from the website http://www.em-dat.net, and in the form of 
the file ``wind_disasters.csv``. 

### The Overall Objective

*For a specified range of years*, you need to answer the following simple questions.
* How many disasters were there in each year?
* What were the total number of deaths in each year from wind-related disasters?
* What was the aggregate damage in each year from wind-related disasters?
* What was the maximum number of deaths from a single disaster in each year?
* What was the maximum amount of  damage from a single disaster in each year?

Working with your research colleagues, you have decided to 
write a simple Python program that will help you answer these questions.  

### The Computational Task

As you know, computation is the transformation from input to output.  So let's get our inputs and outputs specified.

#### Inputs:
* ``input_filename`` = the name of the file to read in (example: ``wind_disasters.csv``)
* ``start_year`` = the first year to consider in the analysis (example: ``1955``)
* ``end_year`` = the last year to consider in the analysis (example: ``1960``)

#### Outputs:
We need to open the specified file, read it line-by-line, and calculate the
following summary statistics *for each year* in the range of years under consideration:
* ``num_disasters`` = the total number of disasters that year
* ``total_deaths`` = the total number of deaths across all disasters that year
* ``total_damage`` = the total damage across all disasters that year
* ``max_deaths`` = the largest number of casualties in a single disaster that year
* ``max_cost`` = the largest amount of damage in a single disaster that year

Our output should have a single line for each year in the specified range (in order!) with the following data:
```
YYYY, num_disasters, total_deaths, total_damage, max_deaths, max_cost
```

For example, running with parameters: 
```
wind_disasters.csv, 1955, 1960
```

should yield output containing the following data:
```
1955,19,3895,877000,1500,832000
1956,14,3114,0,2000,0
1957,9,1139,150000,390,150000
1958,13,2620,0,1175,0
1959,16,9695,672200,5098,600000
1960,20,9164,454041,5149,387000
```

Okay, this type of thing sounds pretty familiar.  This task is a good candidate for a function.  

Specifically, let's write a function called ``parse_disasters()`` that does the following:
* open the file, read line-by-line, and parse its text...
We can add more functionality later

In [10]:
def calc_disasters(infile,start_year,end_year,isVerbose = True):
    '''function to calculate summary disaster statistics in range [start_year,end_year]
       for data in given filename.'''

    if isVerbose:
        print("Calculating statistics in file %s for range [%d,%d]." % 
              (infile, start_year, end_year))

    # open the file and get a "handle" on it
    file = open(infile, "r")

    # index to identify the first line or first apparence of a year
    i=0

    # looping throught the whole file once
    counter = 3
    for line in file:

        # remove the special characters
        line = line.strip()  # string method that removes whitespace and special chars

        # skip any line with first character # 
        if line[0] == '#':
            continue

        # parse by comma
        values = line.split(',')

        # give all tokens names to work with
        # type          - the type of disaster  -->typ
        # event_name    - the name of the disaster (if it exists)
        # num_killed    - the number of fatalities
        # damage        - total damage (in adjusted USD)
        # start_year    - year in which the disaster began  ---> year
        # start_month   - month in which the disaster began
        # start_day     - day of month on which the disaster began
        # iso           - three-letter country code
        # country_name  - full name
        # region        - global region
        # continent     - continent
        try:
            # Col 1
            typ = values[0].strip()             # strip() removes leading/trailing blanks
            if len(typ)==0:
                typ = 'NA'
            else:
                typ = typ
            
            # Col 2 
            event_name = values[1].strip()
            if len(event_name) == 0:
                event_name = 'NA'
            else:
                event_name = values[1].strip()
            
            # Col 3
            num_killed = int(values[2])
            if num_killed == '':
                num_killed = 'NA'
            else:
                num_killed = num_killed 
                
            # Col 4
            damage = int(values[3])
            if damage == '':
                damage = 'NA'
            else:
                damage = damage
            
            #Col 5
            year = int(values[4])
            if year == '':
                year = 'NA'
            else:
                year = year
            
            #Col 6
            start_month = int(values[5])
            if start_month == '':
                start_month = 'NA'
            else:
                start_month = start_month
                
            # Col 7
            start_day = int(values[6])
            if start_day == '':
                start_day = 'NA'
            else:
                start_day = start_day
                
            # Col 8
            iso = values[7].strip()
            if len(iso) == 0:
                iso = 'NA'
            else:
                iso = iso
                
            # Col 9
            country_name = values[8].strip()
            if len(country_name) == 0:
                country_name = 'NA'
            else:
                country_name = country_name
                
            # Col 10
            region = values[9].strip()
            if len(region) == 0:
                region = 'NA'
            else:
                region = region
                
            # Col 11
            continent = values[10].strip()
            if len(continent) == 0:
                continent = 'NA'
            else:
                continent = continent
        
        except ValueError:
             # Col 1
#             typ = values[0].strip()             # strip() removes leading/trailing blanks
            if len(typ)==0:
                typ = 'NA'
            else:
                typ = typ
            
            # Col 2 
#             event_name = values[1].strip()
            if len(event_name) == 0:
                event_name = 'NA'
            else:
                event_name = values[1].strip()
            
            # Col 3
            num_killed = (values[2])
            print(num_killed)
            if num_killed == '':
                num_killed = 'NA'
            else:
                num_killed = num_killed 
                
            # Col 4
            damage = (values[3])
            if damage == '':
                damage = 'NA'
            else:
                damage = damage
            
            #Col 5
#             year = (values[4])
            if year == '':
                year = 'NA'
            else:
                year = year
            
            #Col 6
            start_month = (values[5])
            if start_month == '':
                start_month = 'NA'
            else:
                start_month = start_month
                
            # Col 7
            start_day = (values[6])
            if start_day == '':
                start_day = 'NA'
            else:
                start_day = start_day
                
            # Col 8
            iso = values[7].strip()
            if len(iso) == 0:
                iso = 'NA'
            else:
                iso = iso
                
            # Col 9
            country_name = values[8].strip()
            if len(country_name) == 0:
                country_name = 'NA'
            else:
                country_name = country_name
                
            # Col 10
            region = values[9].strip()
            if len(region) == 0:
                region = 'NA'
            else:
                region = region
                
            # Col 11
            continent = values[10].strip()
            if len(continent) == 0:
                continent = 'NA'
            else:
                continent = continent
            
        counter += 1
        if isVerbose:
            print('Row:%d, Read... Event Type: %s, Event Name: %s:'% 
                  (counter, typ,event_name), (num_killed,damage,year))


In [11]:
calc_disasters('wind_disasters.csv',1955,1960)

Calculating statistics in file wind_disasters.csv for range [1955,1960].
Row:4, Read... Event Type: Storm, Event Name: NA: (10, 0, 1993)
Row:5, Read... Event Type: Winter, Event Name: NA: (260, 0, 2005)
Row:6, Read... Event Type: Winter, Event Name: NA: (6, 0, 2002)
Row:7, Read... Event Type: Winter, Event Name: NA: (2, 0, 2005)
Row:8, Read... Event Type: Storm, Event Name: NA: (0, 0, 1988)
Row:9, Read... Event Type: Storm, Event Name: NA: (4, 0, 2000)
Row:10, Read... Event Type: Winter, Event Name: NA: (13, 0, 2003)
Row:11, Read... Event Type: Winter, Event Name: NA: (10, 0, 2005)
Row:12, Read... Event Type: Cyclone, Event Name: NA: (90, 0, 1966)
Row:13, Read... Event Type: Cyclone, Event Name: Gina: (0, 5000, 1989)
Row:14, Read... Event Type: Cyclone, Event Name: Heta: (0, 150000, 2004)
Row:15, Read... Event Type: Cyclone, Event Name: Olaf: (0, 0, 2005)
Row:16, Read... Event Type: Hurricane, Event Name: Alice: (0, 0, 1955)
Row:17, Read... Event Type: Hurricane, Event Name: Donna: (5,

Row:849, Read... Event Type: Hurricane, Event Name: Ivan: (3, 0, 2004)
Row:850, Read... Event Type: Hurricane, Event Name: Jeanne: (2754, 21000, 2004)
Row:851, Read... Event Type: Hurricane, Event Name: Dennis: (40, 0, 2005)
Row:852, Read... Event Type: Hurricane, Event Name: Emily: (6, 0, 2005)
Row:853, Read... Event Type: Hurricane, Event Name: Stan: (1, 0, 2005)
Row:854, Read... Event Type: Hurricane, Event Name: Wilma: (5, 0, 2005)
Row:855, Read... Event Type: Tropical storm, Event Name: Alpha: (17, 0, 2005)
Row:856, Read... Event Type: Hurricane, Event Name: NA: (1500, 0, 1931)
Row:857, Read... Event Type: Hurricane, Event Name: NA: (275, 0, 1961)
Row:858, Read... Event Type: Hurricane, Event Name: Francelia: (0, 19000, 1969)
Row:859, Read... Event Type: Hurricane, Event Name: NA: (0, 0, 1971)
Row:860, Read... Event Type: Hurricane, Event Name: Fifi: (8000, 540000, 1974)
Row:861, Read... Event Type: Hurricane, Event Name: Greta: (0, 1000, 1978)
Row:862, Read... Event Type: Tropica

Row:1401, Read... Event Type: Winter, Event Name: NA: (34, 0, 1967)
Row:1402, Read... Event Type: Hurricane, Event Name: Katrina; Beulah; Fern: (77, 184000, 1967)
Row:1403, Read... Event Type: Storm, Event Name: NA: (58, 0, 1969)
Row:1404, Read... Event Type: Tropical storm, Event Name: NA: (17, 0, 1971)
Row:1405, Read... Event Type: Storm, Event Name: NA: (13, 0, 1974)
Row:1406, Read... Event Type: Hurricane, Event Name: Olivia: (29, 0, 1975)
Row:1407, Read... Event Type: Storm, Event Name: NA: (120, 0, 1976)
Row:1408, Read... Event Type: Hurricane, Event Name: Liza: (600, 100000, 1976)
Row:1409, Read... Event Type: Hurricane, Event Name: Anita: (10, 0, 1977)
Row:1410, Read... Event Type: Tropical storm, Event Name: St. Lidia: (100, 40000, 1981)
Row:1411, Read... Event Type: Hurricane, Event Name: Paul: (225, 82400, 1982)
Row:1412, Read... Event Type: Hurricane, Event Name: Tico: (135, 0, 1983)
Row:1413, Read... Event Type: Winter, Event Name: NA: (140, 0, 1983)
Row:1414, Read... Even

Row:2150, Read... Event Type: Storm, Event Name: NA: (12, 290723, 2000)
Row:2151, Read... Event Type: Storm, Event Name: NA: (4, 0, 2000)
Row:2152, Read... Event Type: Storm, Event Name: NA: (7, 0, 2002)
Row:2153, Read... Event Type: Storm, Event Name: Jeanett: (7, 77901, 2002)
Row:2154, Read... Event Type: Storm, Event Name: NA: (5, 500000, 2005)
Row:2155, Read... Event Type: Hurricane, Event Name: NA: (6000, 1000000, 1900)
Row:2156, Read... Event Type: Tornado, Event Name: NA: (98, 0, 1903)
Row:2157, Read... Event Type: Hurricane, Event Name: NA: (164, 0, 1906)
Row:2158, Read... Event Type: Hurricane, Event Name: NA: (134, 0, 1906)
Row:2159, Read... Event Type: Hurricane, Event Name: NA: (350, 0, 1909)
Row:2160, Read... Event Type: Hurricane, Event Name: NA: (41, 0, 1909)
Row:2161, Read... Event Type: Hurricane, Event Name: NA: (30, 0, 1910)
Row:2162, Read... Event Type: Hurricane, Event Name: NA: (525, 60000, 1915)
Row:2163, Read... Event Type: Hurricane, Event Name: NA: (34, 0, 191

Okay, once we can read the file, we need to:
1. consider only records that are in range, i.e.: ``start_year <= year <= end_year``
2. for each year in range:
    * calculate the total for num_killed
    * calculate the total for damage
    * keep track of largest num_killed
    * keep track of largest damage

And, we need a data structure to keep track of all this.

We already learned how to do this in OA2801...

In [4]:
def calc_disasters2(infile,start_year,end_year,isVerbose = True):
    '''function to calculate summary disaster statistics in range [start_year,end_year]
       for data in given filename.'''

    if isVerbose:
        print("Calculating statistics in file %s for range [%d,%d]." % 
              (infile, start_year, end_year))

    # we will use a dictionary to keep track of our statistics
    #   key is the year
    #   value is a list of [num_events,total_killed, total_damage, most_killed, most_damage]
    # let's initialize the dictionary
    stats_dict = {}
    for year in range(start_year,end_year+1):
        stats_dict[year] = [0,0,0,0,0] # Assigning the empty list 
        
        
    # open the file and get a "handle" on it
    file = open(infile, "r")

    # index to identify the first line or first apparence of a year
    i=0

    # looping throught the whole file once
    for line in file:

        # remove the special characters
        line = line.strip()  # string method that removes whitespace and special chars

        # skip any line with first character # 
        if line[0] == '#':
            continue

        # parse by comma
        values = line.split(',')
        # print(values)  # uncomment if you really want to check this...

        # give all tokens names to work with
        # type          - the type of disaster  -->typ
        # event_name    - the name of the disaster (if it exists)
        # num_killed    - the number of fatalities
        # damage        - total damage (in adjusted USD)
        # start_year    - year in which the disaster began  ---> year
        # start_month   - month in which the disaster began
        # start_day     - day of month on which the disaster began
        # iso           - three-letter country code
        # country_name  - full name
        # region        - global region
        # continent     - continent
        
        # for text data, strip() removes leading/trailing blanks
        # for numerical data, convert to int if a digit, otherwise use 0
        
        typ = values[0].strip()             
        event_name = values[1].strip()
        num_killed = 0 if not values[2].isdigit() else int(values[2]) #.isdigit determines 
        damage = 0 if not values[3].isdigit() else int(values[3])
        year = 0 if not values[4].isdigit() else int(values[4])
        start_month = 0 if not values[5].isdigit() else int(values[5])
        start_day = 0 if not values[6].isdigit() else int(values[6])
        iso = values[7].strip()
        country_name = values[8].strip()
        region = values[9].strip()
        continent = values[10].strip()
        
        if start_year <= year <= end_year:

            if isVerbose:
                print('Read...%s/%s: %d, %d, %d' % (typ,event_name,num_killed,damage,year))
            
            stats_dict[year][0] += 1
            stats_dict[year][1] += num_killed
            stats_dict[year][2] += damage
            stats_dict[year][3] = max(stats_dict[year][3],num_killed)
            stats_dict[year][4] = max(stats_dict[year][4],damage)
    # end of for-loop
    
    return stats_dict

In [5]:
calc_disasters2('wind_disasters.csv',1955,1960,False)

{1955: [19, 3895, 877000, 1500, 832000],
 1956: [14, 3114, 0, 2000, 0],
 1957: [9, 1139, 150000, 390, 150000],
 1958: [13, 2620, 0, 1175, 0],
 1959: [16, 9695, 672200, 5098, 600000],
 1960: [20, 9164, 454041, 5149, 387000]}

### Creating and using a customized Python class

Okay, so 

Consider the following class definition...

In [6]:
# The following class is complete and SHOULD NOT BE CHANGED
# without prior consent of the instructor.
class DisasterStat:
    '''Simple class to keep disaster statistics for a single year'''

    def __init__(self,year):
        self.year = year
        self.count = 0
        self.deaths = 0
        self.damage = 0
        self.max_deaths = 0
        self.max_damage = 0

    def add_disaster(self, deaths, damage):
        self.count += 1
        self.deaths += deaths
        self.damage += damage
        if deaths > self.max_deaths: # Deaths 
            self.max_deaths = deaths
        if damage > self.max_damage: # Damage
            self.max_damage = damage

    def __str__(self):
        return '%s,%d,%d,%d,%d,%d' % (self.year,self.count,self.deaths,self.damage,self.max_deaths,self.max_damage)

# end of class DisasterStat


### For reflection 

#### 1. Do the final values for this DisasterStat make sense to you?

#### 2. How would you use the DisasterStat class to answer the lab questions for a *single year*?

#### 3. How would you use the DisasterState class to answer the lab questions for a *range of years*?

Okay, let's try it...

In [7]:
def calc_disasters3(infile,start_year,end_year,isVerbose = True):
    '''function to calculate summary disaster statistics in range [start_year,end_year]
       for data in given filename.'''

    if isVerbose:
        print("Calculating statistics in file %s for range [%d,%d]." % 
              (infile, start_year, end_year))

    # we will use a dictionary to keep track of our statistics
    #   key is the year
    #   value is a list of [num_events,total_killed, total_damage, most_killed, most_damage]
    # let's initialize the dictionary
    stats_dict = {}
    for year in range(start_year,end_year+1):
        stats_dict[year] = DisasterStat(year) # Assigning the empty list 
        
        
    # open the file and get a "handle" on it
    file = open(infile, "r")

    # index to identify the first line or first apparence of a year
    i=0

    # looping throught the whole file once
    for line in file:

        # remove the special characters
        line = line.strip()  # string method that removes whitespace and special chars

        # skip any line with first character # 
        if line[0] == '#':
            continue

        # parse by comma
        values = line.split(',')
        # print(values)  # uncomment if you really want to check this...

        # give all tokens names to work with
        # type          - the type of disaster  -->typ
        # event_name    - the name of the disaster (if it exists)
        # num_killed    - the number of fatalities
        # damage        - total damage (in adjusted USD)
        # start_year    - year in which the disaster began  ---> year
        # start_month   - month in which the disaster began
        # start_day     - day of month on which the disaster began
        # iso           - three-letter country code
        # country_name  - full name
        # region        - global region
        # continent     - continent
        
        # for text data, strip() removes leading/trailing blanks
        # for numerical data, convert to int if a digit, otherwise use 0
        
        typ = values[0].strip()             
        event_name = values[1].strip()
        num_killed = 0 if not values[2].isdigit() else int(values[2]) #.isdigit determines 
        damage = 0 if not values[3].isdigit() else int(values[3])
        year = 0 if not values[4].isdigit() else int(values[4])
        start_month = 0 if not values[5].isdigit() else int(values[5])
        start_day = 0 if not values[6].isdigit() else int(values[6])
        iso = values[7].strip()
        country_name = values[8].strip()
        region = values[9].strip()
        continent = values[10].strip()
        
        if start_year <= year <= end_year:

            if isVerbose:
                print('Read...%s/%s: %d, %d, %d' % (typ,event_name,num_killed,damage,year))
            
#             stats_dict[year][0] += 1
#             stats_dict[year][1] += num_killed
#             stats_dict[year][2] += damage
#             stats_dict[year][3] = max(stats_dict[year][3],num_killed)
#             stats_dict[year][4] = max(stats_dict[year][4],damage)
            stats_dict[year].add_disaster(num_killed,damage)


        
    # end of for-loop
    
    return stats_dict

In [8]:
mydict = calc_disasters3('wind_disasters.csv',1955,1960,False)

In [9]:
for year in mydict:
    print(mydict[year])

1955,19,3895,877000,1500,832000
1956,14,3114,0,2000,0
1957,9,1139,150000,390,150000
1958,13,2620,0,1175,0
1959,16,9695,672200,5098,600000
1960,20,9164,454041,5149,387000


## Conclusion

* One of the simplest uses of objects is as customized data containers.
* Building such objects can make calculations easier.
* What 'needs' to be included in the class depends on the computational task and the preferences of the designer.
* Becoming proficient in this is an essential topic in software design, much of which is beyond the scope of this class.
* But you should be familiar with the basics.
* We will practice more...