# Red Panda Zoos

## 1.Intro

According to several sientific reports, there are about 10.000 red pandas in the wild and approx. 2.000 in the zoos around the world. redpandasfinder dot com is a great source of structured information about red pandas living in the zoos. redpandafinder dot com gives an access to a json file that contains information on 496 red pandas and 109 zoos where they live in. 

These cute furry animals spend days on the tops of rainforest trees eating bamboo and chilling in the shadows. They are absolutely harmless and adorable. They also have fur on the soles of the paws which make them really special. 
However during the last century the population on red pandas is decreasing significantlly and these days they have been classified as endangered in the IUCN Red List. There is a special surviving programm for this animals with the participation of many countries and many zoos.

## 2. Task

Red pandas are common habitants in zoos in all part of the world. Thus we should expect that the zoos from the json file would be spread evenly at least in the North hemisphere. We will visualize the zoos on the world map and find out whether it is true or not.

## 3. Methodology

Before we get the data and start exploring it, let's download all the dependencies that we need.

In [57]:
import pandas as pd
import json
import time
import folium
import requests
import re

### 3.1. Analyzing redpanda.json

In [58]:
url = 'http://redpandafinder.com/export/redpanda.json'
result = requests.get('http://redpandafinder.com/export/redpanda.json')
result.raise_for_status()

In [59]:
panda_json = result.json()
# Check if the json is OK
print([key for key in panda_json]) # should be ['_photo', '_totals', 'edges', 'vertices']

['_photo', '_totals', 'edges', 'vertices']


Information we intrested in is stored within the 'vertices' key. Each vertex is either a red panda or a zoo. Each red panda vertex has a positive id number and each zoo has a negative id. For example:

In [60]:
# A zoo vertex
panda_json['vertices'][0]

{'_id': '-98',
 'en.address': 'República de la India, C1425 CABA, Argentina',
 'en.location': 'Buenos Aires, Argentina',
 'en.name': 'Buenos Aires Eco-Park',
 'es.address': 'República de la India, C1425 CABA, Argentina',
 'es.location': 'Buenos Aires, Argentina',
 'es.name': 'Ecoparque Interactivo de Buenos Aires',
 'flag': 'Argentina',
 'language.order': 'es, en, jp',
 'map': 'https://goo.gl/maps/LtEhMY2XMbJ2',
 'website': 'http://www.buenosaires.gob.ar/ecoparque'}

In [61]:
# A red panda vertex
panda_json['vertices'][550]

{'_id': '486',
 'birthday': '2017/7/23',
 'en.name': 'Paprika',
 'gender': 'Female',
 'jp.name': 'パプリカ',
 'language.order': 'en, jp',
 'species': '2'}

Let's count red pandas and zoos to ensure numbers we obtain from redpandafinder dot com: 109 zoos and 496 red pandas.

In [62]:
panda_count, zoo_count = 0,0
for vertex in panda_json['vertices']:
    if float(vertex['_id']) < 0:
        zoo_count += 1
    else:
        panda_count += 1
print('There are {0} zoos and {1} red pandas in there.'.format(zoo_count, panda_count))

There are 109 zoos and 496 red pandas in there.


OK, information is correct.

### 3.2. Zoo Dataframe

Now we are ready to create a DataFrame of zoos for analysing and mapping.

In [63]:
zoo_df = pd.DataFrame(panda_json['vertices'][:zoo_count])
zoo_df.head(2)

Unnamed: 0,_id,cn.address,cn.location,cn.name,en.address,en.location,en.name,en.othernames,es.address,es.location,...,nl.address,nl.location,nl.name,photo,photo.author,photo.link,th.address,th.location,th.name,website
0,-98,,,,"República de la India, C1425 CABA, Argentina","Buenos Aires, Argentina",Buenos Aires Eco-Park,,"República de la India, C1425 CABA, Argentina","Buenos Aires, Argentina",...,,,,,,,,,,http://www.buenosaires.gob.ar/ecoparque
1,-51,,,,"210 St. George's Drive NE, Calgary, AB T2E 7V6...","Calgary, Alberta, Canada",Calgary Zoo,,,,...,,,,https://www.instagram.com/p/7q3sBBGtnG/media/?...,thecalgaryzoo,https://www.instagram.com/thecalgaryzoo/,,,,https://www.calgaryzoo.com/


In [64]:
# Make sure the size
zoo_df.shape

(109, 34)

In [65]:
# Check the columns... Er, do we need all of these?
zoo_df.columns

Index(['_id', 'cn.address', 'cn.location', 'cn.name', 'en.address',
       'en.location', 'en.name', 'en.othernames', 'es.address', 'es.location',
       'es.name', 'flag', 'fr.address', 'fr.location', 'fr.name', 'jp.address',
       'jp.location', 'jp.name', 'jp.othernames', 'kr.address', 'kr.location',
       'kr.name', 'language.order', 'map', 'nl.address', 'nl.location',
       'nl.name', 'photo', 'photo.author', 'photo.link', 'th.address',
       'th.location', 'th.name', 'website'],
      dtype='object')

For the task in hand we will need only a few columns. Let's drop usless ones.

In [66]:
# Nope, we only need a few
zoo_short_df = zoo_df[['_id','en.name','en.address','en.location','map','website']].copy()
zoo_short_df.head(2)

Unnamed: 0,_id,en.name,en.address,en.location,map,website
0,-98,Buenos Aires Eco-Park,"República de la India, C1425 CABA, Argentina","Buenos Aires, Argentina",https://goo.gl/maps/LtEhMY2XMbJ2,http://www.buenosaires.gob.ar/ecoparque
1,-51,Calgary Zoo,"210 St. George's Drive NE, Calgary, AB T2E 7V6...","Calgary, Alberta, Canada",https://goo.gl/maps/gXaumYRHucJ2,https://www.calgaryzoo.com/


In [67]:
zoo_short_df.shape

(109, 6)

For the future purposes we append columns for geographic coordinates and fill them with zeros.

In [68]:
zoo_short_df['latitude'] = 0.
zoo_short_df['longitude'] = 0.
zoo_short_df.head(2)

Unnamed: 0,_id,en.name,en.address,en.location,map,website,latitude,longitude
0,-98,Buenos Aires Eco-Park,"República de la India, C1425 CABA, Argentina","Buenos Aires, Argentina",https://goo.gl/maps/LtEhMY2XMbJ2,http://www.buenosaires.gob.ar/ecoparque,0.0,0.0
1,-51,Calgary Zoo,"210 St. George's Drive NE, Calgary, AB T2E 7V6...","Calgary, Alberta, Canada",https://goo.gl/maps/gXaumYRHucJ2,https://www.calgaryzoo.com/,0.0,0.0


Now everything is ready to collect geographical coordinates of each zoo in the dataframe and map results.

## 3.3. Collecting Coordinates

Free geospatial servises returning geographic coordinates for an address in question are usually unreliable. What if the task can be solved easier, though not so elegant. The 'map' field in the zoo dataframe contains a short link that redirects to GoogleMap.
Like _https://goo.gl/maps/LtEhMY2XMbJ2_. If you click it, after a while you will find youself on the page with different address: _https://www.google.com/maps/place/Proyecto+EcoParque+Interactivo/@-34.5781333,-58.4853221,12z/data=!4m5!3m4!1s0x95bcb57835e44c59:0x9e74dd4b6cbb6d4!8m2!3d-34.5781533!4d-58.415282_.  
Do you see? **.../@-34.5781333,-58.4853221,...** Bingo! That new link already contains the coordinates. So the solution could be as following:  
1. Get a response object by applying _requests.get()_ method to a short link in the 'map' field.  
2. Apply a regular expression to the _url_ attribute of the response object.  
3. Convert coordinates to float numbers and fill the corresponding fields in the zoo dataframe.
4. Repeat for each of 109 zoos.

Sounds easy. But is it? Doing that I found that even Google patience has its limits. After a while he discovered a 'suspicious' activity from my computer and stoped supplying me with response objects, giving me the 503-th error. Funny enough, the error message contained an url I needed but in a slightly different format. Something like: _https://www.google.com/.../%40-34.5781333,-58.4853221,12z..._ So the only difference was that '@' were replaced by '%40'. We only need another regular expression to handle it.  Let's do it keeping an eye on zoos that are left without coordinates.

In [69]:
coordinate_regex = re.compile(r'@(\S+)(,)') # if success
error_regex = re.compile(r'%40(\S+)(,)') # otherwise
missing_zoos = []
print('No coordinates for:')
for i in range(zoo_count):
    try:
        result = requests.get(zoo_short_df.iloc[i,4])
        result.raise_for_status()
    except:
        mo = error_regex.search(result.url)
        continue
    else:
        mo = coordinate_regex.search(result.url)
    finally:
        if mo:
            coordinates = mo.groups()[0].split(',')
            coordinates = [float(coor) for coor in coordinates]
            zoo_short_df.iloc[i,6], zoo_short_df.iloc[i,7] = coordinates
        else:
            missing_zoos.append(zoo_short_df.iloc[i,0])
            print(zoo_short_df.iloc[i,1])
        time.sleep(3)  # I am not a barbarian after all and have some modesty...

No coordinates for:
Chonquing Zoo
Ichikawa Zoological & Botanical Gardens
Kyoto City Zoo
Nagasaki Bio Park
Yumemigasaki Zoological Park
Tokuyama Zoo
Sapporo Maruyama Zoo
Chiba Zoological Park


We collected 101 coordinate pairs and only for 8 zoos requests failed. What should we do next?

In [70]:
# Check the zoo without coordinates
zoo_short_df[zoo_short_df['latitude'] == 0]

Unnamed: 0,_id,en.name,en.address,en.location,map,website,latitude,longitude
10,-5,Chonquing Zoo,"Chongqing Zoo, Jiulongpo Qu, China, 400080",Jiulongpo Qu,https://goo.gl/maps/rsck5UV8B552,http://www.cqzoo.com,0.0,0.0
12,-1,Ichikawa Zoological & Botanical Gardens,"Ichikawa City Zoo, Japan, 〒272-0801 Chiba Pref...","Ichikawa, Chiba Prefecture",https://goo.gl/maps/WSyNu7HFhH42,http://www.city.ichikawa.lg.jp/zoo/index.html,0.0,0.0
13,-2,Kyoto City Zoo,"Okazaki Hoshojicho, Sakyo Ward, Kyoto, Kyoto P...","Kyoto City, Kyoto Prefecture, Sakyo Ward",https://goo.gl/maps/BzPvCJ64eiB2,http://www5.city.kyoto.jp/zoo/,0.0,0.0
14,-3,Nagasaki Bio Park,"2291-1 Seihicho Nakayamago, Saikai, Nagasaki P...",Nagasaki Prefecture,https://goo.gl/maps/L1wXjRmYYeo,http://www.biopark.co.jp,0.0,0.0
15,-4,Yumemigasaki Zoological Park,"1 Chome-2-1 Minamikase, Saiwai-ku, Kawasaki-sh...","Kawasaki, Kanagawa Prefecture",https://goo.gl/maps/kfMhVjS52Gs,http://www.city.kawasaki.jp/saiwai/page/000004...,0.0,0.0
16,-6,Tokuyama Zoo,"5846 Tokuyama, Shunan, Yamaguchi Prefecture 74...","Shūnan, Yamaguchi Prefecture",https://goo.gl/maps/bMAGo54PQxq,http://www.city.shunan.lg.jp/site/zoo/,0.0,0.0
17,-7,Sapporo Maruyama Zoo,"Japan, 〒064-0959 Hokkaido, Sapporo, Chuo Ward,...","Sapporo, Hokkaido",https://goo.gl/maps/EY3uZRXNWNr,http://www.city.sapporo.jp/zoo/index.html,0.0,0.0
18,-8,Chiba Zoological Park,"280番地 Minamotocho, Wakaba Ward, Chiba, Chiba P...",Chiba,https://goo.gl/maps/14c7BwdSTkr,https://www.city.chiba.jp/zoo/,0.0,0.0


Easiest way would be filling the gaps by hands.

In [71]:
# Manually add coordinates for 8 missing zoos
zoo_short_df.iloc[10,6], zoo_short_df.iloc[10,7] = 29.503557,106.5037493
zoo_short_df.iloc[12,6], zoo_short_df.iloc[12,7] = 35.7642063,139.9638014
zoo_short_df.iloc[13,6], zoo_short_df.iloc[13,7] = 35.0126853,135.7839226
zoo_short_df.iloc[14,6], zoo_short_df.iloc[14,7] = 32.988408,129.7810003
zoo_short_df.iloc[15,6], zoo_short_df.iloc[15,7] = 35.5495511,139.6631391
zoo_short_df.iloc[16,6], zoo_short_df.iloc[16,7] = 34.0623441,131.8139387
zoo_short_df.iloc[17,6], zoo_short_df.iloc[17,7] = 43.0514641,141.3056561
zoo_short_df.iloc[18,6], zoo_short_df.iloc[18,7] = 35.6451063,140.1243959

In [72]:
# Check the zoo without coordinates
zoo_short_df[zoo_short_df['latitude'] == 0]

Unnamed: 0,_id,en.name,en.address,en.location,map,website,latitude,longitude


Just in case, we store the dataframe with coordinates to a csv file 'red_panda_zoos.csv' which you can find in the same repository.

In [73]:
zoo_short_df.to_csv('red_panda_zoos.csv', index=False)

Now, everything is ready for mapping results.

### 3.4. Mapping

Plot the map:

In [74]:
panda_zoos_map = folium.Map(location=[35,0], zoom_start=2.4)

for lat, lng, name in zip(zoo_short_df['latitude'], zoo_short_df['longitude'], zoo_short_df['en.name']):
    label = '{0}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        ).add_to(panda_zoos_map)
    
panda_zoos_map

**See the 'zoo_map.jpg' in the repository.**

## 4. Results

What do we see on the map? Our initial assumption fails. Zoos in the database are spread unevenly. There are 2 major clusters in the US, and Japan. We should increase the coverage. There are a lot of red pandas in Europe, Australia. It is really easy to retrieve and append all relevant information to the database.  