# Project 4: Global wildlife trade

Maaike de Jong
Ironhack Data Analytics Part-time
02 May 2020

## 1. Introduction

In this project I investigate the international wildlife trade, with a focus on the trade in mammals. The wildlife trade - such as trade in ivory, or pets - is the number one cause of animal extinction, together with habitat destruction caused by land development. A recent study estimates that at least 1/5th of all vertebrate species is being traded (see this [article](https://www.bbc.com/news/science-environment-49904668)). 

Wildlife trade has many negative effects, with the most important ones being:
* Decline and extinction of populations
* Introduction of invasive species
* Spread of new diseases to humans 

With this project I focus on the trade in endangered mammals as listed by CITES, the Convention on International Trade in Endangered Species of Wild Fauna and Flora. In particular, I analyse trade in live mammals taken from the wild. 

### Project Questions
The main research questions I will try to answer in this project are:
* Which wild mammal groups and species are traded the most (in terms of live animals taken from the wild)?
* What are the main purposes for trade of these animals?
* How has the trade changed over the past two decades (2000-2018)? 

<a name="requirements"></a>

### Data  
I'm using the [CITES trade database](https://trade.cites.org/) as source for my data. This database contains more than 20 million records of trade and is openly accessible. On the website of the database a selection of data can be made for download. Documentation from CITES on how to use the data can be found [here](https://trade.cites.org/cites_trade_guidelines/en-CITES_Trade_Database_Guide.pdf). 
I selected my data with the following parameters: 
* Year range: 2000-2019
* Source: W - Wild
* Exporting countries: All countries
* Importing countries: All countries
* Purpose: All purposes
* Trade Terms: Liv - Live
* Taxon: Mammalia

The resulting dataset used in this jupyter notebook can be found [here](https://drive.google.com/drive/folders/1wujpJSR6rC7AMeIm_jfcjtQV3lwDohu9). 

### Links
[Github repository](https://github.com/paoloironhack/dataptams2020/tree/maaike/projects/Project4_Module2_Final_Project)  
[Presentation slides](https://drive.google.com/drive/folders/1wujpJSR6rC7AMeIm_jfcjtQV3lwDohu9)  
[Trello board](https://trello.com/b/qdD9iGnD/project-4)  

## 2. Importing packages

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

## 3. Inspecting and cleaning the data

In [None]:
# importing the data

df = pd.read_csv('../data/mammals_wild_live_00_19.csv')
df.head()

In [None]:
# Inspect dataframe attributes

df.info()

In [None]:
# Inspect missing data

df.isnull().sum()

In [None]:
# inspect column 'Year'

df['Year'].value_counts()

In [None]:
# exclude 2019 because there are few records, data for this year is not complete

df1 = df[df['Year'] != 2019]

In [None]:
# inspect column 'App.', this is info on which Appendix the species are listed

df1['App.'].value_counts()

In [None]:
# inspect column 'Order'

df1['Order'].value_counts()

In [None]:
# inspect column 'Term', this should be all live

df1['Term'].value_counts()

In [None]:
# inspect column 'Purpose'

df1['Purpose'].value_counts()

# These letters are codes for the purpose of the traded animals

In [None]:
# inspect column 'Purpose'

df1['Source'].value_counts()

# 'W' means wild, 'U' means source unknown. So I'm removing the records with an unknown source

In [None]:
df2 = df1[df1['Source'] == 'W']

In [None]:
df2.head()

In [None]:
# add a column with English names of the Order

df2.loc[df['Order'] == 'Primates','Animal order'] = 'Primates'
df2.loc[df['Order'] == 'Carnivora','Animal order'] = 'Carnivores'
df2.loc[df['Order'] == 'Cetacea','Animal order'] = 'Whales and Dolphins'
df2.loc[df['Order'] == 'Proboscidea','Animal order'] = 'Elephants'
df2.loc[df['Order'] == 'Artiodactyla','Animal order'] = 'Even-toed Ungulates'
df2.loc[df['Order'] == 'Perissodactyla','Animal order'] = 'Odd-toed Ungulates'
df2.loc[df['Order'] == 'Chiroptera','Animal order'] = 'Bats'
df2.loc[df['Order'] == 'Pilosa','Animal order'] = 'Sloths and Anteaters'
df2.loc[df['Order'] == 'Pholidota','Animal order'] = 'Pangolins'
df2.loc[df['Order'] == 'Sirenia','Animal order'] = 'Sea-cows'
df2.loc[df['Order'] == 'Scandentia','Animal order'] = 'Treeshrews'
df2.loc[df['Order'] == 'Diprotodontia','Animal order'] = 'Marsupials'
df2.loc[df['Order'] == 'Cingulata','Animal order'] = 'Armadillos'

df2['Animal order'].value_counts()

In [None]:
# Create new column with Purpose descriptions based on 1 letter codes in 'Purpose' column (see CITES documentation)
# first rename original purpose column

df2.rename(columns={'Purpose': 'Purpose_code'}, inplace = True)
df2.head()

In [None]:
# then, create new column 'Purpose'
df2['Purpose'] = df2['Purpose_code']
df2 = df2.replace({'Purpose': {'B': 'Captive breeding', 'E': 'Educational', 'G': 'Botanical Garden', 'H': 'Hunting trophy', 'L': 'Forensic', 'M': 'Medical', 'N': 'Reintroduction', 'P': 'Personal', 'Q': 'Circus', 'S': 'Scientific', 'T': 'Commercial', 'Z': 'Zoo'}})


In [None]:
df2['Purpose'].value_counts()

In [None]:
df2.head()

In [None]:
# create single column with quantities of traded animals

df2['Quantity'] = ''

In [None]:
df2.head()

In [None]:
# create single column with quantities of traded animals.
# in case both imported and exported numbers are reported, take imported numbers

df2.loc[df2['Importer reported quantity'] != 'NaN', 'Quantity'] = df2['Importer reported quantity']
df2['Quantity'].fillna(0, inplace=True)
df2.loc[df2['Quantity'] == 0, 'Quantity'] = df2['Exporter reported quantity']

df2.head(10)

## 4. Data analysis and visualization

### 4.1. Mammals in general: trade over time and trading purposes

I start with an overview of all mammals grouped together. First plotting the total number of mammals traded over time.

In [None]:
# First, create df for total number of traded mammals per year
trades_year = df2.groupby('Year').agg({'Quantity': 'sum'}).reset_index()
trades_year.head()

In [None]:
# Plot this in a line area chart

sns.set()
sns.set_style('white')

f, ax = plt.subplots(figsize=(20, 12))
sns.set_color_codes('pastel')

plt.fill_between(trades_year['Year'], trades_year['Quantity'], color="g", alpha=0.4)
plt.plot(trades_year['Year'], trades_year['Quantity'], color="green", alpha=0.6)

ax.set_xticks(range(2000, 2020, 2))

ax.tick_params(axis='both', which='major', labelsize=26) 
ax.tick_params(axis='both', which='minor', labelsize=26)

plt.xlabel('Year', fontsize=30)
plt.ylabel('Number of traded animals', fontsize=30)
plt.suptitle('Traded live wild mammals 2000-2020', fontsize=36)

sns.despine()

plt.show()

This figure shows that overall trade in wild live mammals is lower in the current decade compared to the previous one. Next, I'll look at the distribution of trading purpose for all the mammals.

In [None]:
# create dataframe with overall purpose counts and percentages

df_purpose = df2.groupby('Purpose', as_index = False).agg({'Quantity': 'sum'}).sort_values(by = 'Quantity', ascending = False).reset_index(drop = True)
df_purpose

In [None]:
# visualise overall purpose with bar chart

sns.set()
sns.set_style('white')

f, ax = plt.subplots(figsize=(20, 12))

sns.set_color_codes('pastel')
sns.barplot(x='Purpose', y= 'Quantity', data= df_purpose,
            label= 'Purpose', color="g")

ax.set_xticklabels(df_purpose['Purpose'], rotation=40, ha='right')

ax.tick_params(axis='both', which='major', labelsize=26) 
ax.tick_params(axis='both', which='minor', labelsize=26)

plt.xlabel('Purpose', fontsize=30)
plt.ylabel('Number of traded animals', fontsize=30)
plt.suptitle('Total live wild animals per trading purpose', fontsize=36)

sns.despine()

This figure shows that Commercial trades are by far the most common for this group of animals. After commercial, scientific, medical, captive breeding, and zoo are also important purposes.

Next, let's look at the different main mammal groups (at the Order level) that are being traded.

In [None]:
# create dataframe with total number of traded animals per animal group (order)

order_trades2 = df2.groupby('Animal order', as_index = False).agg({'Quantity': 'sum'}).sort_values(by = 'Quantity', ascending = False).reset_index(drop = True)
order_trades2.head()

In [None]:
sns.set()
sns.set_style('white')

f, ax = plt.subplots(figsize=(20, 12))

sns.set_color_codes('pastel')
sns.barplot(x='Animal order', y= 'Quantity', data= order_trades2,
            label= 'Mammal group', color="g")

ax.set_xticklabels(order_trades2['Animal order'], rotation=40, ha='right')#, fontsize = 14)

ax.tick_params(axis='both', which='major', labelsize=26) 
ax.tick_params(axis='both', which='minor', labelsize=26)

plt.xlabel('Mammal order', fontsize=30)
plt.ylabel('Number of traded animals', fontsize=30)
plt.suptitle('Total traded live wild animals per mammal order', fontsize=36)

sns.despine()


By far, the most traded group are the Primates. To further investigate trade in the main mammal groups, I'll plot them over time. 

In [None]:
# create dataframe with total number of shipped animals per year per animal group (order)

year_order_trades = df2.groupby(['Year', 'Animal order'], as_index = False).agg({'Quantity': 'sum'})
year_order_trades.head()

In [None]:
#plot total number of shipped animals per year per animal group (order)

f, ax = plt.subplots(figsize=(20, 12))

sns.lineplot(x = 'Year', y = 'Quantity', hue = 'Animal order', data = year_order_trades, linewidth=2.5)

ax.tick_params(axis='both', which='major', labelsize=26) 
ax.tick_params(axis='both', which='minor', labelsize=26)
ax.set_xticks(range(2000, 2020, 2))

ax.legend(loc="upper right", frameon=True, fontsize = 20)

plt.xlabel('Year', fontsize=30)
plt.ylabel('Number of traded animals', fontsize=30)
plt.suptitle('Total traded live wild animals per mammal order 2000-2018', fontsize=36)

sns.despine()


This shows again that the Primates are the most important traded mammal order and that their trade has declined considerably compared to the previous decade. In the next section I'll look into Primates in more detail. 

### 4.2. Primates

In [None]:
# create dataframe with numbers just for primates and years
df_primates = df2[df2['Animal order'] == 'Primates']
year_primates = df_primates.groupby('Year', as_index = False).agg({'Quantity': 'sum'})
year_primates.head()

In [None]:
# Create area line chart for primates over time
f, ax = plt.subplots(figsize=(20, 12))
sns.set_color_codes('pastel')

plt.fill_between(year_primates['Year'], year_primates['Quantity'], color="skyblue", alpha=0.4)
plt.plot(year_primates['Year'], year_primates['Quantity'], color="darkcyan", alpha=0.6)

ax.set_xticks(range(2000, 2020, 2))

ax.tick_params(axis='both', which='major', labelsize=26) 
ax.tick_params(axis='both', which='minor', labelsize=26)

plt.xlabel('Year', fontsize=30)
plt.ylabel('Number of traded primates', fontsize=30)
plt.suptitle('Traded live wild primates per year', fontsize=36)

sns.despine()

plt.show()


The primate trade over time has more or less the same shape as the total mammal trade over time as the primates represent such a large proportion of the data.
Next, I look into the different Primate families in more detail. 

In [None]:
# look into the different kinds of primates and their trade over time
# make a dataframe and line chart with the different kinds

# First, inspect how many traded primate families there are
primate_family_counts = df_primates.groupby(['Family'], as_index = False).agg({'Quantity': 'sum'}).sort_values(by = 'Quantity', ascending = False)

# Add column with percentages of total
primate_family_counts['Percentage'] = (primate_family_counts['Quantity'] / primate_family_counts['Quantity'].sum()) *100
primate_family_counts.head()


In [None]:
# visualise the overall numbers of traded primates per family in a bar chart

sns.set()
sns.set_style('white')

f, ax = plt.subplots(figsize=(20, 12))

sns.set_color_codes('pastel')
sns.barplot(x='Family', y= 'Quantity', data= primate_family_counts,
            label= 'Family', color="skyblue", alpha=0.6)

ax.set_xticklabels(primate_family_counts['Family'], rotation=40, ha='right')

ax.tick_params(axis='both', which='major', labelsize=26) 
ax.tick_params(axis='both', which='minor', labelsize=26)

plt.xlabel('Primate Family', fontsize=32)
plt.ylabel('Number of traded animals', fontsize=32)
plt.suptitle('Total traded live wild primates per family', fontsize=36)

sns.despine()

From this figure it becomes clear that the Old-World Monkeys (Cercopithecidae) and the New-World Monkeys (Cebidae) are the biggest group, they make up nearly 98% of the traded livec wild primates. What are the most traded species in these families?

In [None]:
# df for cercopithecidae
cerco = df2[df2['Family'] == 'Cercopithecidae']

cerco_taxon_counts = cerco.groupby(['Taxon'], as_index = False).agg({'Quantity': 'sum'}).sort_values(by = 'Quantity', ascending = False).reset_index(drop = True)

# Add column with percentages of total
cerco_taxon_counts['Percentage'] = (cerco_taxon_counts['Quantity'] / cerco_taxon_counts['Quantity'].sum()) *100
cerco_taxon_counts.head()

The crab-eating maqaque (Macaca fascicularis) is the most traded primate in this family (60%). According to google many of these monkeys are traded for commercial research and end up in labs as testing animals. Second is the Grivet monkey (23%). 

Next, I'm looking at how the trading purpose changes over time for the primates.

In [None]:
# create dataframe with total number of shipped animals per year per purpose

primates_year_purpose = df_primates.groupby(['Year', 'Purpose'], as_index = False).agg({'Quantity': 'sum'})
primates_year_purpose.head()

In [None]:
#plot total number of shipped primates per year per purpose

f, ax = plt.subplots(figsize=(20, 12))

sns.lineplot(x = 'Year', y = 'Quantity', hue = 'Purpose', data = primates_year_purpose, linewidth=2.5)

ax.tick_params(axis='both', which='major', labelsize=26) 
ax.tick_params(axis='both', which='minor', labelsize=26)
ax.set_xticks(range(2000, 2020, 2))

ax.legend(loc="upper right", frameon=True, fontsize = 20)

plt.xlabel('Year', fontsize=32)
plt.ylabel('Number of traded animals', fontsize=32)
plt.suptitle('Total traded live wild primates per purpose', fontsize=36)

sns.despine()


This figure shows that commercial trade of primates is the main purpose, and its decline compared to the previous decade seems to be the main cause of the general decline in Primate (and mammal) trade. Also the trade for the purpose of scientific research was considerably higher in the past decade, as was the medical purpose. Captive breeding had a short peak in 2004, probably due to a specific large breeding programme.

### 4.3. Other mammal groups

Apart from the primates, which other mammal groups are traded often? I'm creating a facet grid plot to visually compare the trade in different groups over time.

In [None]:
# Create overview for different mammal groups over time

# create dataframe with total number of shipped animals per year per animal group (order)
order_year_trades = df2.groupby(['Animal order', 'Year'], as_index = False).agg({'Quantity': 'sum'})

# and one without the Primates
order_year_trades_noPrimates = order_year_trades[order_year_trades['Animal order'] != 'Primates']
order_year_trades_noPrimates['Animal order'].value_counts()


In [None]:
# visualizing the Order data over time without the primates with a facetgrid in seaborn

g = sns.FacetGrid(order_year_trades_noPrimates, col='Animal order', hue='Animal order', col_wrap=4, )

g = g.map(plt.plot, 'Year', 'Quantity')

g = g.map(plt.fill_between, 'Year', 'Quantity', alpha=0.2).set_titles("{col_name} Animal order")

g = g.set_titles("{col_name}")

plt.subplots_adjust(top=0.92)
g = g.fig.suptitle('Traded number of wild animals per mammal group 2000-2018')
 
plt.show()


After the Primates, the most traded mammal groups are the Carnivores and the Even-toed Ungulates. It looks like the trade in Carnivores in going up in recent years. What is happening there in terms of species and purpose? Let's zoom in. 

### 4.4. Carnivores

First, I'll plot the trade over time for the carnivores in general.

In [None]:
# create dataframe with numbers just for carnivores and years
df_carnivores = df2[df2['Animal order'] == 'Carnivores']
year_carnivores = df_carnivores.groupby('Year', as_index = False).agg({'Quantity': 'sum'})
year_carnivores.head()

In [None]:
# Create area line chart for carnivores over time
f, ax = plt.subplots(figsize=(20, 12))
sns.set_color_codes('pastel')

plt.fill_between(year_carnivores['Year'], year_carnivores['Quantity'], color="lightsalmon", alpha=0.2)
plt.plot(year_carnivores['Year'], year_carnivores['Quantity'], color="lightsalmon", alpha=0.8)

ax.set_xticks(range(2000, 2020, 2))

ax.tick_params(axis='both', which='major', labelsize=26) 
ax.tick_params(axis='both', which='minor', labelsize=26)

plt.xlabel('Year', fontsize=32)
plt.ylabel('Number of traded carnivores', fontsize=32)
plt.suptitle('Traded live wild carnivores per year', fontsize=36)

sns.despine()

plt.show()


Next, I'll look into the different families of carnivores and how they compare in trade numbers. 

In [None]:
# make a dataframe and line chart with the different kinds

# First, inspect how many traded carnivore families there are
carnivore_family_counts = df_carnivores.groupby(['Family'], as_index = False).agg({'Quantity': 'sum'}).sort_values(by = 'Quantity', ascending = False).reset_index(drop = True)

# Add column with percentages of total
carnivore_family_counts['Percentage'] = (carnivore_family_counts['Quantity'] / carnivore_family_counts['Quantity'].sum()) *100
carnivore_family_counts

# There are several Families that are traded a lot. How do their trade patterns look over time?


In [None]:
# visualise the overall numbers of traded carnivores per family in a bar chart

sns.set()
sns.set_style('white')

f, ax = plt.subplots(figsize=(20, 12))

sns.set_color_codes('pastel')
sns.barplot(x='Family', y= 'Quantity', data= carnivore_family_counts,
            label= 'Family', color="lightsalmon", alpha=0.6)

ax.set_xticklabels(carnivore_family_counts['Family'], rotation=40, ha='right')

ax.tick_params(axis='both', which='major', labelsize=26) 
ax.tick_params(axis='both', which='minor', labelsize=26)

plt.xlabel('Carnivore Family', fontsize=32)
plt.ylabel('Number of traded animals', fontsize=32)
plt.suptitle('Total live wild carnivores per family', fontsize=36)

sns.despine()

This figure shows that the Canids (wild dogs, wolves etc) are traded most, but the Otariidae (seals), the Felidae (cats) and the Procyonidae (Raccoons, Coatis etc) are also traded a lot. 

Next, I visualise the trade of the different carnivore families over time, again with a facetgrid for easy visual comparison

In [None]:
# Create df for different families of Carnivores over time

year_carnivores_family = df_carnivores.groupby(['Year', 'Family'], as_index = False).agg({'Quantity': 'sum'})
year_carnivores_family.head()

In [None]:
# #plot total number of shipped carnivores per year per family in a seaborn facetgrid

g = sns.FacetGrid(year_carnivores_family, col='Family', hue='Family', col_wrap=4, )

g = g.map(plt.plot, 'Year', 'Quantity')

g = g.map(plt.fill_between, 'Year', 'Quantity', alpha=0.2).set_titles("{col_name} Family")

g = g.set_titles("{col_name}")

plt.subplots_adjust(top=0.92)
g = g.fig.suptitle('Traded number of live wild carnivores per family 2000-2018')
 
plt.show()

This shows that for the Canids and for the Seals the numbers are going up in recent years. Which species of these groups are traded most? 

In [None]:
# create df for the Canidae

canidae = df2[df2['Family'] == 'Canidae']

canidae_taxon_counts = canidae.groupby(['Taxon'], as_index = False).agg({'Quantity': 'sum'}).sort_values(by = 'Quantity', ascending = False).reset_index(drop = True)

# Add column with percentages of total
canidae_taxon_counts['Percentage'] = (canidae_taxon_counts['Quantity'] / canidae_taxon_counts['Quantity'].sum()) *100
canidae_taxon_counts.head()

In [None]:
# Same for the Otariidae

otariidae = df2[df2['Family'] == 'Otariidae']

otariidae_taxon_counts = otariidae.groupby(['Taxon'], as_index = False).agg({'Quantity': 'sum'}).sort_values(by = 'Quantity', ascending = False).reset_index(drop = True)

# Add column with percentages of total
otariidae_taxon_counts['Percentage'] = (otariidae_taxon_counts['Quantity'] / otariidae_taxon_counts['Quantity'].sum()) *100
otariidae_taxon_counts.head()


In [None]:
# Also let's look at Felidae

felidae = df2[df2['Family'] == 'Felidae']

felidae_taxon_counts = felidae.groupby(['Taxon'], as_index = False).agg({'Quantity': 'sum'}).sort_values(by = 'Quantity', ascending = False).reset_index(drop = True)

# Add column with percentages of total
felidae_taxon_counts['Percentage'] = (felidae_taxon_counts['Quantity'] / felidae_taxon_counts['Quantity'].sum()) *100
felidae_taxon_counts.head()

Canidae: The fennec fox is by far the most traded canid (87%), followed by the pampas fox and the wolf. Fennec foxes are popular as pets and are also used for their fur in North Africa.

For the Otariidea, the top two species account for more than 99% of the trade, these are both species of fur seal. 

For the cats, servals are the most traded with 24%, followed by Canada Lynx (17%) and Lion (16%)

Next, I'll look at how the trading purpose of the carnivores looks over time.

In [None]:
# create dataframe with total number of shipped animals per year per purpose

carnivores_year_purpose = df_carnivores.groupby(['Year', 'Purpose'], as_index = False).agg({'Quantity': 'sum'})
carnivores_year_purpose.head()

In [None]:
#plot total number of shipped carnivores per year per purpose

f, ax = plt.subplots(figsize=(20, 12))

sns.lineplot(x = 'Year', y = 'Quantity', hue = 'Purpose', data = carnivores_year_purpose, linewidth=2.5)

ax.tick_params(axis='both', which='major', labelsize=26) 
ax.tick_params(axis='both', which='minor', labelsize=26)
ax.set_xticks(range(2000, 2020, 2))

ax.legend(loc="upper left", frameon=True, fontsize = 20)

plt.xlabel('Year', fontsize=32)
plt.ylabel('Number of traded animals', fontsize=32)
plt.suptitle('Total traded live wild carnivores per purpose', fontsize=36)

sns.despine()


The increase in trade in the Canidea in recents year is clearly due to a spike in commercial trade. 

### 5. Conclusions and next steps

The main conclusions of this project, in answer to the research questions, are:
* Primates are the most traded group, followed by the carnivores
* The most common trade purpose is commercial, including for animal testing, pet trade, and fur
* Overall trade is down from last decade but some groups are going up in recent years

As a next step I'd like to visualize trade between countries with maps, and identify the biggest importers and exporters

## Feedback Paolo

Maaike, great notebook with clear intro, comments and conclusions. The notebook runs, the data are easily retrievable. I like the visualizations, they are minimalistic but very powerful. I can follow the story by just looking at the plots.
The only suggestion I have is that sometimes I see a problem of different scales, so that when you plot multiple lines on the same plot the difference of values makes some trends different to see as they are flattened. You could try to experiment with two different scales for y, one on the left side and the other one on the right side of the plot.