Project Heroic Age or Heroic Population?
The Carnegie Hero Fund was established in 1905 by Andrew Carnegie who said "we live in a Heroic Age". This project will examine if people are more willing to risk their life for a stranger if they live in a smaller population area. I chose the Carnegie Hero Fund for data because of its requirements and number of years available. The award is given 4 times a year to civilians who voluntarily risks life to an "extraordinary degree" and have "no full measure of responsibility for the safety of the victim". Part One will focuse on cleaning and analysis of the Carnegie Hero Fund data. Part Two will introduce population data from the Census Bureau. Part Three will summarize my finidings.

In [93]:
import pandas as pd
import requests

Pay attention to the difference between importing an xlsx file vs a csv file. I had to pip install openpyxl in the terminal

In [94]:
df = pd.read_excel("C:\\Users\\emorr\\OneDrive\\Desktop\\CHM\\medal.xlsx")
df.dtypes

ACTED        object
LAST         object
FIRST        object
ACTSTATE     object
ACTCODE      object
CITY         object
STRANGER     object
DEATH        object
COUNTY       object
YEAR        float64
dtype: object

In [95]:
df.head()

Unnamed: 0,ACTED,LAST,FIRST,ACTSTATE,ACTCODE,CITY,STRANGER,DEATH,COUNTY,YEAR
0,5/6/2021,BAIR,BRANDON,ID,BV,,,,,2021.0
1,1/13/2022,THOMAS,ADAM LAYMAN,KY,DROWNING,,,,,2022.0
2,5/17/2021,JOHNSON,ROSS C.,FL,DROWNING,,,,,2021.0
3,4/11/2021,PETERKIN,JADEN DESHAWN,NC,BB,,,,,2021.0
4,4/11/2021,PETERKIN,ANTHONY,NC,BB,,,,,2021.0


Let's drop columns I don't need. Use inplace=True to change the actual df

In [96]:
df.drop(columns=['LAST', 'FIRST', 'ACTED'], inplace=True)

In [97]:
df.head()

Unnamed: 0,ACTSTATE,ACTCODE,CITY,STRANGER,DEATH,COUNTY,YEAR
0,ID,BV,,,,,2021.0
1,KY,DROWNING,,,,,2022.0
2,FL,DROWNING,,,,,2021.0
3,NC,BB,,,,,2021.0
4,NC,BB,,,,,2021.0


Next I want to change the column names by setting them to an array so I can change them all at once.

In [98]:
df.columns

Index(['ACTSTATE', 'ACTCODE', 'CITY', 'STRANGER', 'DEATH', 'COUNTY', 'YEAR'], dtype='object')

In [99]:
df.columns = ['STATE', 'CODE', 'CITY', 'STRANGER', 'DEATH', 'COUNTY', 'YEAR']

In [100]:
df.head()

Unnamed: 0,STATE,CODE,CITY,STRANGER,DEATH,COUNTY,YEAR
0,ID,BV,,,,,2021.0
1,KY,DROWNING,,,,,2022.0
2,FL,DROWNING,,,,,2021.0
3,NC,BB,,,,,2021.0
4,NC,BB,,,,,2021.0


I had to manually input rows for STRANGER, DEATH and COUNTY since this wasn't determined in the data that was provided. I made a judgement call based on each recording of the medal incident if the person was a stranger.  STRANGER with a Y means yes they were strangers. I found the counties based off the towns provided. DEATH with a Y means the medal winner died during the rescue attempt.  I had attempted to write code to scan all of the incident reports, but the language was too broad since they were written between 1905-2022. I hope to revisit this attempt later when I have more experience. I chose KY, CA, and AK for manual entry so they will be the focuse of this project. 

Now I need to filter my data to keep only the rows and columns of KY, CA, AK. I will use the isin function and creating a new df to do this.

In [101]:
options = ['KY', 'AK', 'CA']
new_df = df[df['STATE'].isin(options)]
new_df.head()

Unnamed: 0,STATE,CODE,CITY,STRANGER,DEATH,COUNTY,YEAR
1,KY,DROWNING,,,,,2022.0
16,CA,DROWNING,SAN DIEGO,Y,N,SAN DIEGO,2021.0
17,CA,DROWNING,SAN DIEGO,Y,N,SAN DIEGO,2021.0
27,CA,DROWNING,SANGER,Y,Y,FRESNO,2020.0
33,CA,ANIMAL,LOS ANGELES,Y,N,LA,2021.0


Now I want to drop all rows with NaN and save the change to the new_df. Let's look at the shape to make sure the changes stay.

In [102]:
new_df.shape

(938, 7)

In [103]:
new_df.dropna(inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df.dropna(inplace=True)


In [104]:
new_df.shape

(922, 7)

In [105]:
new_df.head()

Unnamed: 0,STATE,CODE,CITY,STRANGER,DEATH,COUNTY,YEAR
16,CA,DROWNING,SAN DIEGO,Y,N,SAN DIEGO,2021.0
17,CA,DROWNING,SAN DIEGO,Y,N,SAN DIEGO,2021.0
27,CA,DROWNING,SANGER,Y,Y,FRESNO,2020.0
33,CA,ANIMAL,LOS ANGELES,Y,N,LA,2021.0
47,CA,DROWNING,SAN DIEGO,Y,N,SAN DIEGO,2020.0


In [106]:
new_df.dtypes

STATE        object
CODE         object
CITY         object
STRANGER     object
DEATH        object
COUNTY       object
YEAR        float64
dtype: object

In [111]:
new_df['CODE'].unique()

array(['DROWNING', 'ANIMAL', 'SV', 'BV', 'GENERAL', 'ASSAULT', 'BB',
       'ELECTROCUTION', 'MVP', 'BOAT', 'ELEVATION', 'ICE', 'SUFFOCATION',
       'MVI', 'EXPOSURE', 'CAVE-IN', 'IMPENDING EX.', 'MVV'], dtype=object)

1910-1920 

In [107]:
new_df['YEAR'].sort_values(ascending=True).unique()

array([1904., 1906., 1907., 1908., 1909., 1910., 1911., 1912., 1913.,
       1914., 1915., 1916., 1917., 1918., 1919., 1920., 1921., 1922.,
       1923., 1924., 1925., 1926., 1927., 1928., 1929., 1930., 1931.,
       1932., 1933., 1934., 1935., 1936., 1937., 1938., 1939., 1940.,
       1941., 1942., 1943., 1944., 1945., 1946., 1947., 1948., 1949.,
       1950., 1951., 1952., 1953., 1954., 1955., 1956., 1957., 1958.,
       1959., 1960., 1961., 1962., 1963., 1964., 1965., 1966., 1967.,
       1968., 1969., 1970., 1971., 1972., 1973., 1974., 1975., 1976.,
       1977., 1978., 1979., 1980., 1981., 1982., 1983., 1984., 1985.,
       1986., 1987., 1988., 1989., 1990., 1991., 1992., 1993., 1994.,
       1995., 1996., 1997., 1998., 1999., 2000., 2001., 2002., 2003.,
       2004., 2005., 2006., 2007., 2008., 2009., 2010., 2011., 2012.,
       2013., 2014., 2015., 2016., 2017., 2018., 2019., 2020., 2021.])

In [108]:

##range = {'1910.': '1910-1920', '1911.': '1910-1920', '1912.': '1910-1920', '1912.': '1910-1920', '1913.': '1910-1920','1914.': '1910-1920', '1915.': '1910-1920',
##'1916.': '1910-1920','1917.': '1910-1920','1918.': '1910-1920', '1919.': '1910-1920', '1920.': '1910-1920', '1921.':'1920-1930', '1922.':'1920-1930','1923.':'1920-1930',
##'1924.':'1920-1930','1925.':'1920-1930','1926.':'1920-1930','1927.':'1920-1930', '1928.':'1920-1930','1929.':'1920-1930','1930.':'1920-1930', '1931.':'1930-1940',
##'1932.':'1930-1940','1933.':'1930-1940','1934.':'1930-1940','1935.':'1930-1940','1936.':'1930-1940','1937.':'1930-1940','1938.':'1930-1940','1939.':'1930-1940',
##'1940.':'1930-1940','1941.':'1940-1950','1942.':'1940-1950','1943.':'1940-1950','1944.':'1940-1950','1945.':'1940-1950','1946.':'1940-1950','1947.':'1940-1950'}

##new_df['RANGE'] = new_df['YEAR'].map(range)

##new_df['RANGE'].value_counts()