# Exercise 1

The following data URL should yield a JSON array containing just one member object.
By substituting limit=1 with limit=all you should get an array with 12856
members.

Task: By using just one HTTP request for each program execution, create a
list of all unique expeditions as newline separated JSON (ndjson), sorted by
date of first sampling.

Each line should contain the expedition code, the first and last sampling dates
(iso-formatted), the programs connected with the expedition, and the expedition
vessel/conveyance. (You may assume that programs and vessels are constant for
each expedition).

You may choose the programming language of your liking to complete the exercise.
The resulting program should be runnable from the command line on a linux system
(x86_64) and receive its input from STDIN. Please provide a covenience shell
script or a command-line one-liner to invoke your code.

Imports

In [3]:
import requests
import pandas as pd
import io

Build url:

In [1]:
base_url = 'https://api.npolar.no/marine/biology/sample/'
fields = [ 'expedition', 'utc_date', 'programs', 'conveyance']
limit = 'all'
out_format = 'json'
variant = 'array'

def build_url():
    return f"{base_url}?q=&fields={','.join(fields)}&limit={limit}&format={out_format}&variant={variant}";

print(build_url())

https://api.npolar.no/marine/biology/sample/?q=&fields=expedition,utc_date,programs,conveyance&limit=all&format=json&variant=array


Fetch data:

In [4]:
url = build_url();
url_data = requests.get(url).content
data = pd.read_json(io.StringIO(url_data.decode('utf-8')))
print(data.shape)
data.head()

(12856, 4)


Unnamed: 0,conveyance,expedition,programs,utc_date
0,Helmer Hanssen,2012-01-EXPEDITION-MISSING,[BIO-8510 ARCTOS],2012-01-17T21:33:00Z
1,Jan Mayen,Fellestokt-1999,[Fellestokt],1999-10-05T20:20:00Z
2,Lance,ICE-BAR 1995,[ICE-BAR],1995-06-21T22:00:00Z
3,Jan Mayen,Fellestokt-1999,[Fellestokt],1999-10-06T14:40:00Z
4,Helmer Hanssen,2012-01-EXPEDITION-MISSING,[BIO-8510 ARCTOS],2012-01-12T22:36:00Z


In [6]:
len(data.expedition.unique())

56

In [8]:
data_mod = data.rename(columns={'utc_date': 'first_sampling_date'})
data_mod['last_sampling_date'] = data_mod['first_sampling_date']
data_mod.head()

Unnamed: 0,conveyance,expedition,programs,first_sampling_date,last_sampling_date
0,Helmer Hanssen,2012-01-EXPEDITION-MISSING,[BIO-8510 ARCTOS],2012-01-17T21:33:00Z,2012-01-17T21:33:00Z
1,Jan Mayen,Fellestokt-1999,[Fellestokt],1999-10-05T20:20:00Z,1999-10-05T20:20:00Z
2,Lance,ICE-BAR 1995,[ICE-BAR],1995-06-21T22:00:00Z,1995-06-21T22:00:00Z
3,Jan Mayen,Fellestokt-1999,[Fellestokt],1999-10-06T14:40:00Z,1999-10-06T14:40:00Z
4,Helmer Hanssen,2012-01-EXPEDITION-MISSING,[BIO-8510 ARCTOS],2012-01-12T22:36:00Z,2012-01-12T22:36:00Z


Group by expeditions:

In [70]:
expeditions = data_mod.groupby('expedition', as_index=False).agg({
    'conveyance': min,
    'programs': min,
    'first_sampling_date': min,
    'last_sampling_date': max
}).sort_values('first_sampling_date')
expeditions.tail(10)

Unnamed: 0,expedition,conveyance,programs,first_sampling_date,last_sampling_date
48,SANA09,Viking Explorer,,2009-07-21T12:00:00Z,2009-07-27T12:00:00Z
10,Alkekonge-2009b,Viking Explorer,"[Alkekonge, MOSJ]",2009-07-21T15:15:00Z,2009-07-28T15:05:00Z
11,Alkekonge-2010,Lance,"[Alkekonge, MOSJ]",2010-07-17T12:50:00Z,2010-07-26T04:40:00Z
29,ICE2010,Lance,[ICE ECO],2010-08-17T22:25:00Z,2010-08-30T08:00:00Z
30,ICE2011,Lance,[ICE ECO],2011-04-27T20:38:00Z,2011-05-14T04:20:00Z
7,2011-04-EXPEDITION-MISSING,Lance,[ICE ECO],2011-04-29T15:00:00Z,2011-04-29T15:00:00Z
36,MOSJ2012,Lance,[MOSJ],2011-07-13T14:00:00Z,2012-07-21T11:00:00Z
31,ICE2012,Lance,[ICE],2011-07-26T20:30:00Z,2013-07-30T07:59:59Z
8,2012-01-EXPEDITION-MISSING,Helmer Hanssen,[BIO-8510 ARCTOS],2012-01-12T02:28:00Z,2012-01-18T21:42:59Z
37,MOSJ2013,Lance,[ICE ECO],2013-07-23T00:00:00Z,2017-02-16T00:00:00Z


Remove NaNs:

In [71]:
def fillNaNsWithList(df, col):
    for row in df.loc[df[col].isnull(), col].index:
        df.at[row, col] = []
    return df
expeditions = fillNaNsWithList(expeditions, 'programs')
expeditions = expeditions.fillna({'conveyance': '', 'first_sampling_date': '', 'last_sampling_date': ''})
expeditions

Unnamed: 0,expedition,conveyance,programs,first_sampling_date,last_sampling_date
27,ICE-BAR 1995,Lance,[ICE-BAR],1995-06-11T11:00:00Z,1995-06-24T17:20:00Z
12,BIODAFF-1996,Oceania,[BIODAFF],1996-07-13T00:00:00Z,1996-07-15T00:00:00Z
28,ICE-BAR 1996,Lance,[ICE-BAR],1996-07-26T01:30:00Z,1996-08-13T05:32:00Z
13,BIODAFF-1997,Oceania,[BIODAFF],1997-07-12T00:00:00Z,1997-07-24T21:10:00Z
52,UNIS-AB310-1998,Jan Mayen,[UNIS AB310],1998-09-10T12:00:00Z,1998-09-20T12:00:00Z
33,MARINØK-1999,Lance,[MARINØK],1999-05-05T03:50:00Z,1999-05-21T20:30:00Z
14,BIODAFF-1999,Unknown,[BIODAFF],1999-07-01T01:00:00Z,1999-07-01T23:00:00Z
26,Fellestokt-1999,Jan Mayen,[Fellestokt],1999-09-23T10:33:00Z,1999-10-21T19:15:00Z
34,MARINØK-2000,Lance,[MARINØK],2000-03-16T12:25:00Z,2000-03-20T12:55:00Z
15,BIODAFF-2000,Oceania,[BIODAFF],2000-07-01T00:00:00Z,2000-07-01T07:00:00Z


Write to file:

In [72]:
output_path = 'exercise_1_expeditions.ndjson'
expeditions.to_json(output_path, orient='records', lines=True)

with open(output_path, 'r') as fh:
    output_data = fh.read();
    
print(output_data)

{"expedition":"ICE-BAR 1995","conveyance":"Lance","programs":["ICE-BAR"],"first_sampling_date":"1995-06-11T11:00:00Z","last_sampling_date":"1995-06-24T17:20:00Z"}
{"expedition":"BIODAFF-1996","conveyance":"Oceania","programs":["BIODAFF"],"first_sampling_date":"1996-07-13T00:00:00Z","last_sampling_date":"1996-07-15T00:00:00Z"}
{"expedition":"ICE-BAR 1996","conveyance":"Lance","programs":["ICE-BAR"],"first_sampling_date":"1996-07-26T01:30:00Z","last_sampling_date":"1996-08-13T05:32:00Z"}
{"expedition":"BIODAFF-1997","conveyance":"Oceania","programs":["BIODAFF"],"first_sampling_date":"1997-07-12T00:00:00Z","last_sampling_date":"1997-07-24T21:10:00Z"}
{"expedition":"UNIS-AB310-1998","conveyance":"Jan Mayen","programs":["UNIS AB310"],"first_sampling_date":"1998-09-10T12:00:00Z","last_sampling_date":"1998-09-20T12:00:00Z"}
{"expedition":"MARIN\u00d8K-1999","conveyance":"Lance","programs":["MARIN\u00d8K"],"first_sampling_date":"1999-05-05T03:50:00Z","last_sampling_date":"1999-05-21T20:30:00Z"