# Animal Tracker App

## Intro 

The mobile app [Animal tracker](https://www.icarus.mpg.de/29143/animal-tracker-app) allows the general public to monitor and track tagged wildlife animals. The app communicates with an API to receive data. The base URL for the API looks like;

````
https://animaltracker.app/api/v1/
````


### Info on unique animal id

With Insomnia we can explore the API response to gain further understanding in the data structure. Each animal has an unique ID number for example (53e64761-422f-4f47-978c-a3492f985825 for white-stork)

````
https://animaltracker.app/api/v1/animals/53e64761-422f-4f47-978c-a3492f985825
````

### Info on species

Some info is being reveiled, for example id, public_animal_count, species id etc.

More info on the species with where ```b3eeb878-4aa2-4a2e-aaa7-c4929a6d9266``` referes to species id:
````
https://animaltracker.app/api/v1/species/b3eeb878-4aa2-4a2e-aaa7-c4929a6d9266
````

### Get tracks for an id

````
https://animaltracker.app/api/v2/animals/53e64761-422f-4f47-978c-a3492f985825/tracks
````

## Scraping stuff

First, importing a bunch of python libraries. The "set_option" make sure that we always see entire content when bigger datasets are being displayed.

In [None]:
import requests
import json
import pandas as pd
from datetime import datetime
pd.set_option("display.max_colwidth", None)
pd.set_option('display.max_rows', None, 'display.max_columns', None)

The animaltracker API is only accessible for Android and iOS users. To use our script we have to "impersonate" a mobile phone. This is done through sending fake header information. The following function fetchURL takes care of this. Just execute the following code block and from now on, you can use fetchURL() as function. It needs some parameters, more concretely two parameters (the url and a filename where the output should be stored)

In [None]:
def fetchURL(url, outputfile, html=False):
    """Fetches url and writes result into outputfile

    Keyword arguments:
    :param str url: api url
    :param str outputfile: folder/file.json
    :param bool html: set to True if expected response is html 

    Return:
    :return dict JSON response
    """
    headers = {
        "User-Agent": "Mozilla/5.0 (Linux; Android 7.1.1; Android SDK built for x86_64 Build/NYC; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/55.0.2883.91 Mobile Safari/537.36",
        "x-at-application": "Android;25;11081",
        "content-type": "application/vnd.api+json; charset=utf-8",
        "host": "animaltracker.app",
        "x-requested-with": "com.mpio.movebank",
        "accept": "application/vnd.api+json; charset=utf-8",
    }
    if html != True:
        response = requests.get(url, headers=headers)
        data = response.json()
        print(json.dumps(data, indent=2))
        with open(outputfile, "w+") as f:
            json.dump(data, f)
        return data
    else:
        response = requests.get(url, headers=headers)
        with open(outputfile, "w+") as f:
            f.write(response.text)
        return response

## Get infos for specific animal id

Following block will prepare the URL that we have tested as an API endpoint:

In [None]:
animal_id = '53e64761-422f-4f47-978c-a3492f985825' # white-stork
url = "https://animaltracker.app/api/v1/animals/" + animal_id
print(url)

We want to store the scraped information in a new file that has the fileformat ANIMAL_ID.json

In [None]:
filename = "sample_data/animal-" + animal_id + ".json"
print(filename)

Finally, calling the fetchURL function with two parameters

- the url as specified before p.ex. https://animaltracker.app/api/v1/animals/53e64761-422f-4f47-978c-a3492f985825
- the file location where the file is being stored

The response from the API will be stored in the variable info. We can finally print the info:

In [None]:
info = fetchURL(url, filename)
print(info)

Sometimes json responses can be all in one line. Calling **print()** with **json_dumps** helps to pretty print the results:

In [None]:
print(json.dumps(info,indent=2))

## Get animal tracks for specific animal id

In [None]:
print("downloading animal track")
animal_id = '53e64761-422f-4f47-978c-a3492f985825' # white stork
url = "https://animaltracker.app/api/v2/animals/" + animal_id + "/tracks"
filename = "sample_data/track-" + animal_id + ".json"
track = fetchURL(url, filename)
#print(track)

Analysing the response is tedious and not very readable. By transforming the API response into a pandas Dataframe readibility should improve:

In [None]:
df2weeks = pd.DataFrame(track["data"][0]["attributes"]["locations"])
df1year = pd.DataFrame(track["data"][1]["attributes"]["locations"])
df2weeks
#print(df1year)

## Downloading all current animals from app

The resulting json file will be a bit heavy (~3MB). Don't execute code block below too often as it's heavy on bandwidth and API requests. The block will download all animals that are inside the animaltracker with infos on ID, last location update, species id, etc.

The file will be stored in ```sample_data/YYYYMMDD-api-all-animals.json``` where YYYY is the current year, MM month and DD the day.

In [None]:
print("downloading all animals")
url = "https://animaltracker.app/api/v1/animals"
timestr = datetime.today().strftime("%Y%m%d")
filename = "sample_data/" + timestr + "-api-all-animals.json"

In [None]:
# uncomment this line to download everything
#all_animals = fetchURL(url, filename)

Importing the JSON file needs a bit of data-wrangling. We first have to open the file, read the JSON data and then normalize it with a pandas function. Finally, the variable df contains all the entire dataset after executing the code below:

In [None]:
print("loading file: " + filename)
with open(filename) as json_data:
    data = json.load(json_data)
#print(data)
df = pd.json_normalize(data["data"])
df

At the time of writing there are apparently 5070 different animals in the dataset. 

It is possible to look at specific rows of the dataset by showing only a slice of the dataset. This will show only the first ten entries. 

In [None]:
df[0:10] # show first ten entries
#df[100:120] # show entriews between row 100-120
#df[-5:] # show last 5 entries of dataset



### Query the Dataframe

## Find out how many different species are in dataset

Query the dataframe by looking for unique species ids. The column "relationships.specie.data.id" holds information on the species type of each animal. With pandas it's quick to filter and output unique species id by using the [drop_duplicates()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop_duplicates.html) function. 

In [None]:
df["relationships.specie.data.id"].drop_duplicates() # removes all duplicates and leaves only unique species-id in dataset


## Find all animals for specific species

Find animal ids with the species id. Looking for other white-storks with the species-id in the dataset: ```b3eeb878-4aa2-4a2e-aaa7-c4929a6d9266``` 

In [None]:
species_id = "b3eeb878-4aa2-4a2e-aaa7-c4929a6d9266" # white stork
#species_id = "82c04a5e-162d-4054-9ea1-dac19545d5a0" # eagle
# with df.loc and the function isin() we can filter the dataframe
df.loc[df["relationships.specie.data.id"].isin([species_id])]

Remember that the dataframe object __df__ will not be modified after calling the loc() function. The loc() function is applied only temporarily. 

In [None]:
df # df remains untouched, still 5070 entries

The result of the filtering with loc() can be saved in a new variable. In the following example, we call it selection which is a dataframe focussing only on one animal species:

In [None]:
selection = df.loc[df["relationships.specie.data.id"].isin([species_id])]
selection

## Sort by latest activity

Sorting can be done with the function [sort_values()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html). In this case we want to sort by the column ```attributes.last_location.timestamp``` which indicates the last update that was sent.

Within the parenthesis the column name needs to be specified ```by=['attributes.last_location.timestamp']```

In [None]:
selection.sort_values(by=['attributes.last_location.timestamp'],ascending=False)

We pick the first entry of the animal that showed recent activity. The ID in the column "id" is ```ecc8e14b-2c3a-4c12-a035-2e983ed8043f``` With the loc() function we can specify which value we want to filter:

In [None]:
selection.loc[selection["id"] == "f0c15b45-4340-4b2c-a651-abf848aabd24"]

Finally, displaying only the column of interest that is called "attributes.content_url":

In [None]:
selection.loc[selection["id"] == "f0c15b45-4340-4b2c-a651-abf848aabd24"]["attributes.content_url"]

Or showing all URLs of these animals:

In [None]:
selection["attributes.content_url"]

## Exercise time

Try to download the track files for this bird "40564d88-3107-4281-8268-88766cf3b770"