# Introduction

Specify your data source, format, and size.
Explain why you are interested in the dataset.
Describe the broad strokes of what you intend to do with the dataset.

The data used for this project comes from
* [Global Shark Attacks](https://www.kaggle.com/teajay/global-shark-attacks) on Kaggle. The original .csv file is 3MB and the edited file is 1MB
* [Attitudes and misconceptions towards sharks and shark meat consumption along the Peruvian coast](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6114843/). The original file is a .xlsx sheet of which I downloaded two of the pages as .csv files. The .csv file sizes for attitudes and consumption are 25Kb and 135Kb, respectively.

**Notes:** 

*   The attacks file has been edited outside of this notebook because it was so large and poorly formatted that it could not be opened and was crashing the notebook. 
*   The attitudes file has been slightly edited outside of this notebook because the data is in Spanish and we did not learn how to translate within Jupyer. Everything else remains the same and will be edited within this notebook.
*   The consumption file has **not** been edited outside of this notebook at all. 


 Sharks make up a large part of marine life and, as the top of the food chain, make up an important role in balancing the ecosystem that can not be replaced by another creature.

 Sharks have a bad history of being feared and hated by humans. This is due to the threatening nature of sharks, and by shark attacks on humans. However, a lot of this fear is unprecendented, as sharks do not pose much threat to humans, and what threat they do pose is accidental- the shark mistakes humans as prey. 

 Fear towards sharks has grown significantly over the years, dispropriately larger than the number of attacks, and quite possibly due to the popularity of the movie Jaws. Unfortunately, it is hard to determine how much affect this has had on shark mortality, as sharks are hard to track, and most sightings data is protected by paywalls. However, psychology shows that perceptions affect attitudes and history exists to prove such. While public datasets on shark populations are not available, further research will show that shark populations are declining. 

The purpose of this project is to analyze data on shark attacks and project the information to show how little harm sharks truly are. 
To reach this conclusion, this project will
*   Clean up the data
*   Create summary statistics
*   Create graphs from the summarized work
*   Pose research questions and solve them using data
*   Describe the importance of the answers and analysis
*   Consider ethics regarding data



# Exploratory Data Analysis

In [89]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [90]:
#I tried to use these to help convert the countries column in attacks to codes but couldn't get it to work
#!pip install pycountry
#!pip install dataprep

In [91]:
import altair as alt
from altair import Chart, X, Y, Axis, SortField
import pandas as pd
from pandas import DataFrame
from vega_datasets import data 
import requests
import string
import requests
from bs4 import BeautifulSoup
import numpy as np
import csv
#import pycountry
#import dataprep
#from dataprep.clean import clean_country

## Attacks

In [92]:
attacks = pd.read_csv('/content/drive/MyDrive/shared_cs200/Attacks.csv').dropna(axis=1, how='all')
attacks.head()

Unnamed: 0,Year,Country,Area,Location,Type,Activity,Name,Sex,Injury,Fatal (Y/N),Time,Species,Investigator or Source
0,1800.0,SEYCHELLES,St. Anne,,Unprovoked,a corsair's boat was overturned,,F,"FATAL, all onboard were killed by sharks",Y,,,V. C. Harvey-Brain
1,1801.0,,,,Provoked,Standing on landed shark's tail,Stephen Pettigew,N,"FATAL, PROVOKED INCIDENT",Y,,12' shark,"The Evening Post, 12/18/1801"
2,1802.0,INDIA,,,Unprovoked,,,,FATAL,Y,,,Evening Post (New York) 4/13/1802
3,1803.0,USA,South Carolina,Off Charleston,Sea Disaster,,Captain Jones,M,No injury,N,,,"Evening Post, 6/13/1803"
4,1803.0,AUSTRALIA,Western Australia,"Hamelin Harbour, at Faure Island",Unprovoked,,M. Lefevre & a sailor (rescuer),M,Shark knocked him down & tore clothing of the ...,N,,,F. Peron ref in G.P. Whitley (Fishes of Austra...


In [93]:
#clean up column titles
attacks.columns = attacks.columns.str.replace(' ','_')
attacks.columns = attacks.columns.str.replace('(','_')
attacks.columns = attacks.columns.str.replace(')','_')
attacks.columns = attacks.columns.str.replace('/','_')

In [94]:
#check that it worked
attacks.head()

Unnamed: 0,Year,Country,Area,Location,Type,Activity,Name,Sex,Injury,Fatal__Y_N_,Time,Species,Investigator_or_Source
0,1800.0,SEYCHELLES,St. Anne,,Unprovoked,a corsair's boat was overturned,,F,"FATAL, all onboard were killed by sharks",Y,,,V. C. Harvey-Brain
1,1801.0,,,,Provoked,Standing on landed shark's tail,Stephen Pettigew,N,"FATAL, PROVOKED INCIDENT",Y,,12' shark,"The Evening Post, 12/18/1801"
2,1802.0,INDIA,,,Unprovoked,,,,FATAL,Y,,,Evening Post (New York) 4/13/1802
3,1803.0,USA,South Carolina,Off Charleston,Sea Disaster,,Captain Jones,M,No injury,N,,,"Evening Post, 6/13/1803"
4,1803.0,AUSTRALIA,Western Australia,"Hamelin Harbour, at Faure Island",Unprovoked,,M. Lefevre & a sailor (rescuer),M,Shark knocked him down & tore clothing of the ...,N,,,F. Peron ref in G.P. Whitley (Fishes of Austra...


In [95]:
#just making sure the type is right
print(type(attacks))

<class 'pandas.core.frame.DataFrame'>


In [96]:
#Let's see how many rows and columns there are
attacks.shape

(6133, 13)

In [97]:
attacks.columns

Index(['Year', 'Country', 'Area', 'Location', 'Type', 'Activity', 'Name',
       'Sex', 'Injury', 'Fatal__Y_N_', 'Time', 'Species',
       'Investigator_or_Source'],
      dtype='object')

In [98]:
#check some statistics
attacks.describe()

Unnamed: 0,Year
count,6131.0
mean,1968.806231
std,43.785745
min,1800.0
25%,1946.0
50%,1980.0
75%,2005.0
max,2018.0


In [99]:
#check how many empty values there are
null_col = attacks.isnull().sum()
null_col[null_col > 0]

Year                         2
Country                     40
Area                       409
Location                   490
Type                         4
Activity                   515
Name                       202
Sex                        551
Injury                      20
Fatal__Y_N_                534
Time                      3199
Species                   2719
Investigator_or_Source      16
dtype: int64

In [100]:
#check some of the unique values of activities
attacks.Activity.unique()

array(["a corsair's boat was overturned",
       "Standing on landed shark's tail", nan, ...,
       'Grabbing shark for a selfie', 'Paddle-skiing',
       'Kayak fishing for sharks'], dtype=object)

As seen above from exploring the data above, this data begins at the year 1800 and ends at the year 2018. There are 6133 rows, one of those rows being the column titles. This indicates that between the years 1800 and 2018, there have been 6132 reported shark attacks. 

## Attitudes

In [101]:
attitudes = pd.read_csv('/content/drive/MyDrive/shared_cs200/attitudes.csv').dropna(axis=1, how='all')
attitudes.head()

Unnamed: 0,Type,Category,Sub-category,English,Mentions
0,Negative,Negative feelings produced by sharks,,afraid,587
1,Negative,Negative personality traits associated to sharks,,dangerous,547
2,Neutral,Ecology and biological knowledge,Related to body traits of sharks,big,480
3,Negative,Negative outcomes of interactions with sharks,,blood,396
4,Negative,Negative outcomes of interactions with sharks,,death,297


In [102]:
#I just don't like having any special characters but underscores in the column titles
attitudes.columns = attitudes.columns.str.replace('-','_')

In [103]:
attitudes.shape

(393, 5)

In [104]:
#checking for any special characters I missed
attitudes.columns

Index(['Type', 'Category', 'Sub_category', 'English', 'Mentions'], dtype='object')

In [105]:
attitudes.describe

<bound method NDFrame.describe of          Type  ... Mentions
0    Negative  ...      587
1    Negative  ...      547
2     Neutral  ...      480
3    Negative  ...      396
4    Negative  ...      297
..        ...  ...      ...
388  Positive  ...        1
389  Positive  ...        1
390  Positive  ...        1
391  Positive  ...        1
392  Positive  ...        1

[393 rows x 5 columns]>

Attitudes is a look at 2004 survey participants from a 2016 study in Peru. Looking at this data, we have 5 columns and 393 rows of words used to describe sharks. 


*   English- what word was chosen to describe sharks
*   Mentions- How many times the word was chosen 
*   Type- Whether the word's connotation is negative, positive, or neutral
*   Category- What the word relates to
*   Sub-category- Specific subsets of categories



## Consumption

In [106]:
consumption = pd.read_csv('/content/drive/MyDrive/shared_cs200/consumption.csv').dropna(axis=1, how='all')
consumption.head()

Unnamed: 0,ID,City,Year of birth,Gender,Education Level,Education success,Tollo consumption,Frequency tollo consumption,Sharks in Peru,Species_sharks_1,Species_sharks_2,Species_sharks_3,Shark consumption,Shark_consumption_1,Shark_consumption_2,Shark_consumption_3,Unnamed: 16
0,1,TRU,1978.0,1.0,4.0,2.0,1.0,1.0,3.0,,,,3.0,,,,
1,2,TRU,1985.0,2.0,3.0,1.0,1.0,3.0,1.0,no sabe,,,3.0,,,,
2,3,TRU,1969.0,2.0,3.0,1.0,1.0,2.0,1.0,no sabe,,,2.0,,,,
3,4,TRU,1970.0,2.0,4.0,1.0,1.0,2.0,2.0,,,,3.0,,,,
4,5,TRU,1967.0,2.0,4.0,1.0,1.0,2.0,2.0,,,,2.0,,,,


In [107]:
#fix the spaces
consumption.columns = consumption.columns.str.replace(' ','_')

In [108]:
#remove unneccessary data
consumption = consumption.drop(['Species_sharks_1', 'Species_sharks_2', 'Shark_consumption_1', 'Species_sharks_3', 'Shark_consumption_2', 'Shark_consumption_3', 'Tollo_consumption',	'Frequency_tollo_consumption',	'Sharks_in_Peru'], axis=1)
consumption = consumption.drop(['ID', 'City', 'Unnamed:_16','Education_success'], axis=1)

In [109]:
#check to make sure it worked
consumption.head()

Unnamed: 0,Year_of_birth,Gender,Education_Level,Shark_consumption
0,1978.0,1.0,4.0,3.0
1,1985.0,2.0,3.0,3.0
2,1969.0,2.0,3.0,2.0
3,1970.0,2.0,4.0,3.0
4,1967.0,2.0,4.0,2.0


In [110]:
consumption.shape

(2004, 4)

In [111]:
consumption.columns

Index(['Year_of_birth', 'Gender', 'Education_Level', 'Shark_consumption'], dtype='object')

In [112]:
consumption.describe

<bound method NDFrame.describe of       Year_of_birth  Gender  Education_Level  Shark_consumption
0            1978.0     1.0              4.0                3.0
1            1985.0     2.0              3.0                3.0
2            1969.0     2.0              3.0                2.0
3            1970.0     2.0              4.0                3.0
4            1967.0     2.0              4.0                2.0
...             ...     ...              ...                ...
1999         1973.0     1.0              4.0                3.0
2000         1973.0     1.0              4.0                3.0
2001         1980.0     2.0              4.0                3.0
2002         1994.0     2.0              3.0                3.0
2003         1981.0     1.0              4.0                1.0

[2004 rows x 4 columns]>

After cleaning up this data, consumption has 4 columns and 2004 rows. The four columns are


*   Year_of_birth: Year that the respondent was born
*   Gender: Gender of the respondent, in which 1 = male and 2 = female
*   Education_level: 1 = Elementary, 2 = High school, 3 = Technical school, 4 = University
*   Shark consumption: Whether respondent has eaten shark. 1 = Yes, 2 = Unsure, 3 = No





# Questions

After a brief analysis of the data, it's time to generate some questions that can help come to a conclusion about the real versus percieved threat of sharks.

## 1. How many attacks were provoked versus how many were unprovoked?

In [114]:
provoked_attacks = len(attacks[attacks["Type"] =="Provoked"])
unprovoked_attacks = len(attacks[attacks["Type"] =="Unprovoked"])
print(f"There are {provoked_attacks} provoked attacks versus {unprovoked_attacks} unprovoked attacks")

There are 566 provoked attacks versus 4450 unprovoked attacks


Looking at this data, it would appear that sharks really are mostly attacking humans for no reason. However, it is important to consider that sharks attack humans is out of confusion or fear. Instances that can confuse sharks include extremely erratic movements in turbulent or murky waters, where all the shark sees is something thrashing like a fish. 

##2. What was the attack victim doing?

In [115]:
unprovoked = attacks["Activity"].value_counts().rename_axis("Activity").reset_index(name="Counts")
unprovoked

Unnamed: 0,Activity,Counts
0,Surfing,974
1,Swimming,889
2,Fishing,433
3,Spearfishing,336
4,Bathing,159
...,...,...
1453,Floating face down,1
1454,Fishing (rod & line),1
1455,"Surfing, but lying prone on his board",1
1456,Swimming close to wharf,1


As we can see above, surfing and swimming are the leading causes of attacks on unprovoked victims. In fact, the numbers are much farther ahead of the others. As stated in the introduction, the predicted reason for sharks attacking humans unprovoked is due to confusion. Surfing and swimming specifically require a lot of rapid movement and make the surrounding water more turbulent and thus unable to see in.

It should also be noted that some of the outlying causes actually fit into other activities. For instance, row 1456 states the victim was "Collecting fish from net." This is the same as fishing.
However, due to the excess amount of rows, it will be easier later to create a dataframe that only takes the top 10.

##3. Let's create a count of every unique value and create a new dataframe named attacks_unique.

In [116]:
#Check what type this one is to make sure the others are the same
print(type(unprovoked))

<class 'pandas.core.frame.DataFrame'>


In [117]:
#create dataframes of each column's unique values
years = attacks["Year"].value_counts().rename_axis("Year").reset_index(name="Year_Counts")
countries = attacks["Country"].value_counts().rename_axis("Country").reset_index(name="Country_Counts")
areas = attacks["Area"].value_counts().rename_axis("Area").reset_index(name="Area_Counts")
locations = attacks["Location"].value_counts().rename_axis("Location").reset_index(name="Location_Counts")
types = attacks["Type"].value_counts().rename_axis("Type").reset_index(name="Type_Counts")
activities = attacks["Activity"].value_counts().rename_axis("Activity").reset_index(name="Activity_Counts")
sexes = attacks["Sex"].value_counts().rename_axis("Sex").reset_index(name="Sex_Counts")

In [118]:
#check one at random to make sure it works
locations

Unnamed: 0,Location,Location_Counts
0,"New Smyrna Beach, Volusia County",170
1,"Daytona Beach, Volusia County",34
2,"Cocoa Beach, Brevard County",25
3,"Ponce Inlet, Volusia County",20
4,"Melbourne Beach, Brevard County",19
...,...,...
3968,Wamberal,1
3969,"27th Avenue, New Smyrna Beach, Volusia County",1
3970,"Manzanilla Bay, St. Andrew County",1
3971,100 miles offshore,1


In [119]:
#check that the type is the same
print(type(years))

<class 'pandas.core.frame.DataFrame'>


In [120]:
#now merge them all together
attacks_unique = [years, countries, areas, locations, types, activities, sexes]
attacks_unique = pd.concat(attacks_unique, axis=1)

In [121]:
#check that it worked
attacks_unique.head()

Unnamed: 0,Year,Year_Counts,Country,Country_Counts,Area,Area_Counts,Location,Location_Counts,Type,Type_Counts,Activity,Activity_Counts,Sex,Sex_Counts
0,2015.0,143.0,USA,2207.0,Florida,1029.0,"New Smyrna Beach, Volusia County",170,Unprovoked,4450.0,Surfing,974.0,M,4955.0
1,2017.0,136.0,AUSTRALIA,1319.0,New South Wales,481.0,"Daytona Beach, Volusia County",34,Provoked,566.0,Swimming,889.0,F,623.0
2,2016.0,130.0,SOUTH AFRICA,571.0,Queensland,309.0,"Cocoa Beach, Brevard County",25,Invalid,542.0,Fishing,433.0,N,2.0
3,2011.0,128.0,PAPUA NEW GUINEA,140.0,Hawaii,294.0,"Ponce Inlet, Volusia County",20,Sea Disaster,232.0,Spearfishing,336.0,.,1.0
4,2014.0,127.0,NEW ZEALAND,127.0,California,287.0,"Melbourne Beach, Brevard County",19,Boating,203.0,Bathing,159.0,lli,1.0


Now we should be good to go when making graphs later! Next, let's start looking at the other files for some information on public perceptions.

##4. What are the most common associations with sharks?

A common method for examining unconcious biases and attitudes in psychology is to have the participant quickly choose words when looking at an image of the subject being biased.
In the attitudes data, participants were shown images of sharks and quickly asked to choose a word to describe them. Let's look at what the different attitudes towards sharks are

In [122]:
print(attitudes['Category'].value_counts())

Ecology and biological knowledge                    91
Commercial and food benefits                        76
Misc.                                               65
Negative feelings produced by sharks                59
Negative personality traits associated to sharks    42
Positive feelings produced by sharks                30
Negative outcomes of interactions with sharks       23
Positive personality traits associated to sharks     7
Name: Category, dtype: int64


In [123]:
print(attitudes['Sub_category'].value_counts())

Misc.                                         60
Related to the ecology & biology of sharks    50
Related to body traits of sharks              33
Potential commerce benefits                   32
Potential food benefits                       26
Potential health benefits                     18
Words mainly associated with Jaws              5
Name: Sub_category, dtype: int64


In [124]:
#How many total counts there are
print(attitudes['Category'].value_counts().sum())
print(attitudes['Sub_category'].value_counts().sum())

393
224


## 5. Has there been an increase in provoking sharks since the 1975 release of the movie Jaws?

As seen in the first question, 566 attacks were provoked. Let's look at how many were provoked before the release of Jaws versus after.

In [125]:
#First, let's see if there's been an increase in attacks caused by provoking since 1975.
provoked_prejaws = len(attacks[attacks["Year"] < 1975])
provoked_postjaws = len(attacks[attacks["Year"] > 1975])
print(f"There have been {provoked_postjaws} provoked attacks since the release of Jaws compared to the {provoked_prejaws} attacks before Jaws came out")


There have been 3196 provoked attacks since the release of Jaws compared to the 2886 attacks before Jaws came out


3196 compared to the previous 2886 initially doesn't seem like much, but it's once again important to look further into the information.

In [126]:
#how many years it's been pre-Jaws and post-Jaws
print(2018-1975)
print(1975-1800)

43
175


In [127]:
#how many provoked attacks there are per year
print(3196 / 43)
print(2886/175)

74.32558139534883
16.49142857142857


Looking at the data now, we see that before Jaws, there was only about 16 attacks each year.
After Jaws, there have been an average of 74 provoked attacks each year, more than four times as much as previously. 

4. Based on the sample size in Consumption, what is the average chance someone has eaten shark?

# Graphs

## 1. Consumption

In [128]:
consumption.head(3)

Unnamed: 0,Year_of_birth,Gender,Education_Level,Shark_consumption
0,1978.0,1.0,4.0,3.0
1,1985.0,2.0,3.0,3.0
2,1969.0,2.0,3.0,2.0


In [129]:
#consumption data
source = consumption
alt.Chart(source).mark_bar().encode(
    alt.X("Shark_consumption:Q", bin=True),
    y='count()'
)

More than 1,200 survey respondents responded that they have not eaten shark. Less than 400 are unsure, and even less say that they have consumed shark.
Overall, it seems most people either do not consume shark or do not admit to consuming shark.

##2. Unprovoked

In [130]:
unprovoked

Unnamed: 0,Activity,Counts
0,Surfing,974
1,Swimming,889
2,Fishing,433
3,Spearfishing,336
4,Bathing,159
...,...,...
1453,Floating face down,1
1454,Fishing (rod & line),1
1455,"Surfing, but lying prone on his board",1
1456,Swimming close to wharf,1


In [131]:
#there are a lot of insubstantial values, so let's filter out some of the data
source = unprovoked.head(5)
source

Unnamed: 0,Activity,Counts
0,Surfing,974
1,Swimming,889
2,Fishing,433
3,Spearfishing,336
4,Bathing,159


In [132]:
#now use altair to show percentages of each
alt.Chart(source).transform_joinaggregate(
    TotalCount = 'sum(Counts)', 
).transform_calculate(
    PercentOfTotal="datum.Counts / datum.TotalCounts"
).mark_bar().encode(
    alt.X('PercentOfTotal:Q', axis=alt.Axis(format='.0%')),
    y='Activity:N'
)

##3. Attacks

In [133]:
attacks_unique.head()

Unnamed: 0,Year,Year_Counts,Country,Country_Counts,Area,Area_Counts,Location,Location_Counts,Type,Type_Counts,Activity,Activity_Counts,Sex,Sex_Counts
0,2015.0,143.0,USA,2207.0,Florida,1029.0,"New Smyrna Beach, Volusia County",170,Unprovoked,4450.0,Surfing,974.0,M,4955.0
1,2017.0,136.0,AUSTRALIA,1319.0,New South Wales,481.0,"Daytona Beach, Volusia County",34,Provoked,566.0,Swimming,889.0,F,623.0
2,2016.0,130.0,SOUTH AFRICA,571.0,Queensland,309.0,"Cocoa Beach, Brevard County",25,Invalid,542.0,Fishing,433.0,N,2.0
3,2011.0,128.0,PAPUA NEW GUINEA,140.0,Hawaii,294.0,"Ponce Inlet, Volusia County",20,Sea Disaster,232.0,Spearfishing,336.0,.,1.0
4,2014.0,127.0,NEW ZEALAND,127.0,California,287.0,"Melbourne Beach, Brevard County",19,Boating,203.0,Bathing,159.0,lli,1.0


Let's create a map visualizing where attacks occur. First, we need to add country codes.

In [134]:
#First let's request the country codes
res = requests.get('https://restcountries.eu/rest/v2/alpha/aus')
res.text

'{"name":"Australia","topLevelDomain":[".au"],"alpha2Code":"AU","alpha3Code":"AUS","callingCodes":["61"],"capital":"Canberra","altSpellings":["AU"],"region":"Oceania","subregion":"Australia and New Zealand","population":24117360,"latlng":[-27.0,133.0],"demonym":"Australian","area":7692024.0,"gini":30.5,"timezones":["UTC+05:00","UTC+06:30","UTC+07:00","UTC+08:00","UTC+09:30","UTC+10:00","UTC+10:30","UTC+11:30"],"borders":[],"nativeName":"Australia","numericCode":"036","currencies":[{"code":"AUD","name":"Australian dollar","symbol":"$"}],"languages":[{"iso639_1":"en","iso639_2":"eng","name":"English","nativeName":"English"}],"translations":{"de":"Australien","es":"Australia","fr":"Australie","ja":"オーストラリア","it":"Australia","br":"Austrália","pt":"Austrália","nl":"Australië","hr":"Australija","fa":"استرالیا"},"flag":"https://restcountries.eu/data/aus.svg","regionalBlocs":[],"cioc":"AUS"}'

In [135]:
temp = res.json()
temp['numericCode']

'036'

In [136]:
iso_map = {country: do_fuzzy_search(country) for country in attacks_unique["Country"].unique()}
attacks_unique["country_code"] = attacks_unique["Country"].map(iso_map)

In [137]:
attacks_unique = clean_country(attacks_unique, 'Country', output_format='alpha-3')

                                     

Country Cleaning Report:
	139 values cleaned (3.5%)
	42 values unable to be parsed (1.06%), set to NaN
Result contains 140 (3.52%) values in the correct format and 3833 null values (96.48%)




In [138]:
countries = {}
for country in pycountry.countries:
    countries[country.name] = country.alpha_2

codes = [countries.get(country, 'Unknown code') for country in attacks_unique['Country']]

print(codes)

['Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknown code', 'Unknow

In [139]:
#Now we can add them to attacks_unique
def look_up_code(country_code):
    address = 'https://restcountries.eu/rest/v2/alpha/'+country_code
    res = requests.get(address)
    country_info = res.json()
    country_num = country_info['numericCode']
    return country_num

In [140]:
attacks_unique['country code'] = attacks_unique.Country.map(look_up_code)
attacks_unique.head()

KeyError: ignored

In [None]:
#get the countries objects first
countries = alt.topo_feature(data.world_110m.url, 'countries')

In [None]:
#create a blank map
alt.Chart(countries).mark_geoshape(
    fill='#666666',
    stroke='white'
).properties(
    width=750,
    height=450
).project('equirectangular')

# Ethical Considerations

The primary ethical consideration for each dataset is the accuracy of the data. 

The original attack file includes references to each report.

With the attitudes data examined, a potential issue to consider is how much the world population has increased. Unfortunately, no data could be found on how many people frequent beaches yearly to compare to the increase in provoked attacks. 

In the consumption report, a potential issue to consider is whether respondents answered honestly, especially about eating shark


# Conclusion

Numerous organizations show diminishing shark populations and threatened or endangered statuses. Further research into causes of death, human population increases, and ocean trends show the real threat of sharks versus the percieved threat. Lastly, more inforamtion can be found about how common negative perceptions of sharks relate to diminishing shark populations and the efforts to stop this phenomenon. 

Shark attacks are a real danger, but that danger is incredibly low and not nearly enough to warrant the poor reputation that has lead to increased killing and endangerment of numerous shark species. Sharks are a necessary component of the food chain and general wellbeing of marine ecosystems. 
