### general tenplate


Fetches data from the local source and esyablishes the following variables:

1. dutch_codes
2. swiss_codes
3. dutch_surveys
4. swiss_surveys
5. swiss_beaches


Establishes directory variables for fetching and putting to all subdirectories:

1. data
2. beaches
3. codes
4. geo
5. output

provides a script to update the remote data.

In [1]:
# sys things
import os
import sys
import json

# networks
import requests

# data
import pandas as pd
import numpy as np
import scipy
import math
import seaborn as sns

import resources.utilities.utility_functions as ut

In [2]:
# get folder extesions
data, beaches, codes, geo, output=ut.make_local_paths()
print("look for resources here\n")
print(data, beaches, codes, geo, output)


look for resources here

resources/surveydata resources/locationdata resources/mlwcodedefs resources/geodata output


In [3]:
# code data
dutch_codes = pd.read_csv(codes+'/dutch_codes.csv')
swiss_codes = pd.read_csv(codes+'/swiss_codes.csv')

start_date = "2017-07-04"
end_date = "2019-04-24"

# this list was recieved from david fleet: one of the authors of the monitoring guide
joint_list = pd.read_csv(F"{codes}/jointcodes/ospar_mlw_fleet.csv")

# housekeeping
dutch_codes.fillna(0, inplace=True)
dutch_codes.rename(columns={'OSPAR_ID':'ospar_id', 'Description':'description'}, inplace=True)
swiss_codes.rename(columns={'ospar_code':'ospar_id'}, inplace=True)
swiss_codes.drop('Unnamed: 0', axis=1,inplace=True)

# survey_data
dutch_surveys = pd.read_csv(data+'/dataset_macrolitter_NL.csv')

# use the aggregated hd data. This accounts for the custom codes used in Switzerland
swiss_surveys = pd.read_csv(data+'/aggregated_hd_surveys.csv')

# location data
swiss_beaches = pd.read_csv(beaches+'/hammerdirt_beaches.csv')

In [4]:
print("Columns from cleaned up dutch data\n")
print(dutch_codes.columns)

print("\nColumns from cleaned up swiss data\n")
print(swiss_codes.columns)

Columns from cleaned up dutch data

Index(['ID', 'description', 'category', 'ospar_id'], dtype='object')

Columns from cleaned up swiss data

Index(['code', 'material', 'description', 'source', 'source_two',
       'source_three', 'parent_code', 'direct', 'single_use', 'micro',
       'ospar_id'],
      dtype='object')


In [5]:
dutch_codes.iloc[:10]

Unnamed: 0,ID,description,category,ospar_id
0,plastic_6_packringen,Six pack ring,PO soft,1.0
1,plastic_tassen,Bag,PO soft,2.0
2,plastic_kleine_plastic_tasjes,Small bag,PO soft,3.0
3,plastic_drankflessen_groterdan_halveliter,Bottle (>= 0.5 L),PET,4.1
4,plastic_drankflessen_kleinerdan_halveliter,Bottle (< 0.5 L),PET,4.2
5,plastic_wikkels_van_drankflessen,Bottle label,PO soft,4.3
6,plastic_verpakking_van_schoonmaakmiddelen,Cleaning product packaging,PO hard,5.0
7,plastic_voedselverpakkingen_frietbakjes_etc,Food packaging,Polystyrene,6.0
8,plastic_cosmeticaverpakkingen,Cosmetics packaging,PO hard,7.0
9,plastic_motorolieverpakking_groterdan50cm,Motor oil packaging (>= 50 cm),PO hard,9.0


In [6]:
# process the dutch codes:
# identify codes that are common to both 'ospar_id' columns
dutch_codes['parent_code'] = dutch_codes.ospar_id.round(0)
dutch_codes['parent_code'] = dutch_codes['parent_code'].astype('int') 
dutch_codes['child_code'] = dutch_codes.ospar_id - dutch_codes.parent_code


# the number of child codes:
child_codes = dutch_codes.loc[dutch_codes.child_code > 0]
ccodes = child_codes.parent_code.unique()

# all the codes with no remainder:
parent_codes = dutch_codes.loc[dutch_codes.child_code == 0]
# this is the parent codes with no child code
pcodes = parent_codes.parent_code.unique()

# all the dutch codes that are not child codes:
# this all the parent codes
dcodesall = dutch_codes.parent_code.unique()

print("""
This is the OSPAR code list from the dutch data.\n
OSPAR codes that could not be typed to 'int' were counted as 0.\n
Any code with an ospar value of 0 was excluded\n
""")
print(dutch_codes['parent_code'].unique())
print(F"\nThese are the detail codes used to better define the object:\n\n{ccodes}")


This is the OSPAR code list from the dutch data.

OSPAR codes that could not be typed to 'int' were counted as 0.

Any code with an ospar value of 0 was excluded


[   1    2    3    4    5    6    7    9   10   13   14   15   16   20
   21   24   25  113   31   32   33   36   38   40   42   43   44  117
   46   48 1172  462   47   22   19  472  212  481   11   39    8   17
   35   49   52   53   54   55   57   59   60   61   63   64   65   66
   67   62   68   69   72   73   74   75   81   78   79   83   77   84
   88   76   86   80   82  120   89   90   91   92   93   98  982  102
   97   99   18  100  101  103  104  105]

These are the detail codes used to better define the object:

[  4 117  46   6  47  22  19   2  43  38  39  62  67  81 102   1]


In [7]:
dutch_codes.iloc[:10]

Unnamed: 0,ID,description,category,ospar_id,parent_code,child_code
0,plastic_6_packringen,Six pack ring,PO soft,1.0,1,0.0
1,plastic_tassen,Bag,PO soft,2.0,2,0.0
2,plastic_kleine_plastic_tasjes,Small bag,PO soft,3.0,3,0.0
3,plastic_drankflessen_groterdan_halveliter,Bottle (>= 0.5 L),PET,4.1,4,0.1
4,plastic_drankflessen_kleinerdan_halveliter,Bottle (< 0.5 L),PET,4.2,4,0.2
5,plastic_wikkels_van_drankflessen,Bottle label,PO soft,4.3,4,0.3
6,plastic_verpakking_van_schoonmaakmiddelen,Cleaning product packaging,PO hard,5.0,5,0.0
7,plastic_voedselverpakkingen_frietbakjes_etc,Food packaging,Polystyrene,6.0,6,0.0
8,plastic_cosmeticaverpakkingen,Cosmetics packaging,PO hard,7.0,7,0.0
9,plastic_motorolieverpakking_groterdan50cm,Motor oil packaging (>= 50 cm),PO hard,9.0,9,0.0


In [8]:
# compare dutch codes to fleets defs:
joint_list.columns

Index(['OSPAR-ID', 'Type-Code', 'Name', 'J-Code', 'G-Code', 'OSPAR Name'], dtype='object')

In [9]:
# process the joint_list:
# these columns names:
joint_list.rename(columns={'OSPAR-ID':'ospar_id','G-Code':'mlw_code'}, inplace=True)
joint_list['mlw_code'] = joint_list.mlw_code.astype('str')

joint_list['paired'] = list(zip(joint_list.mlw_code,joint_list.ospar_id))

joint_list.fillna(0, inplace=True)

# make some code pairs
jlistkeys = joint_list[['mlw_code','ospar_id']].copy()

# set up a mapper:
mlwkeyed = {x[0]:x[1] for x in list(joint_list.paired.unique())}
osparkeyed = {x[1]:x[0] for x in list(joint_list.paired.unique())}

In [10]:
dutch_codes_not_in_ospar = [x for x in dutch_codes.parent_code.unique() if x not in joint_list.ospar_id.unique()]

print(F"These are the ducth codes that are not in the OSPAR definitions:\n{dutch_codes_not_in_ospar }")

These are the ducth codes that are not in the OSPAR definitions:
[1172, 462, 472, 212, 481, 982]


In [11]:
print(dutch_codes[dutch_codes.ospar_id.isin(dutch_codes_not_in_ospar)][['ospar_id','description']])

    ospar_id             description
34    1172.0  Foam fragment (< 5 cm)
35     462.0  Foam fragment (>=5 cm)
41     472.0          Foam (> 50 cm)
42     212.0                Foam cup
44     481.0            Water filter
99     982.0      Carton cotton swab


In [12]:
# process the swiss codes
# get child and parent codes:
swiss_codes_parent = swiss_codes.loc[swiss_codes.parent_code == 'Parent code'].copy()
swiss_codes_child = swiss_codes.loc[swiss_codes.parent_code != 'Parent code'].copy()

# identify the codes that have actually been used:
swiss_pcodes_used = swiss_surveys[swiss_surveys.quantity > 0].code.unique()

# make a list of the codes in use:
scodes_used = swiss_codes_parent.loc[swiss_codes_parent.code.isin(swiss_pcodes_used)].copy()
scodes_used.set_index('code', inplace=True)
scodes_used.fillna(0, inplace=True)

In [13]:
# the ospar id given here should match the deifinition (or at least be close in the joint_list)
scodes_used.loc['G178']

material                                                Metal
description     Metal bottle caps, lids & pull tabs from cans
source                                              Packaging
source_two                                     Food and drink
source_three                                           Direct
parent_code                                       Parent code
direct                                                   True
single_use                                               True
micro                                                   False
ospar_id                                                   77
Name: G178, dtype: object

In [14]:
# match from joint_list for the same ospar id
joint_list[joint_list.ospar_id == 77]

Unnamed: 0,ospar_id,Type-Code,Name,J-Code,mlw_code,OSPAR Name,paired
21,77,me_nn_b&c_lids_,"metal bottle caps, lids & pull tabs from cans",J178,G178,Bottle caps,"(G178, 77)"


In [15]:
# find the mwl_codes in use in switzerland that are not included in the joint list
no_match_ospar_mlw = [x for x in swiss_pcodes_used if x not in joint_list.mlw_code.unique()]
print(F"Swiss mlw codes that are not in the OSPAR defs:\n{no_match_ospar_mlw}")

Swiss mlw codes that are not in the OSPAR defs:
['G112', 'G23', 'G30', 'G35', 'G38', 'G78', 'G79', 'G81', 'G89', 'G93', 'G117', 'G7', 'G74', 'G82', 'G153', 'G156', 'G126', 'G208', 'G21', 'G131', 'G142', 'G24', 'G31', 'G80', 'G203', 'G194', 'G87', 'G25', 'G61', 'G98', 'G188', 'G8', 'G167', 'G155', 'G83', 'G157', 'G91', 'G12', 'G193', 'G22', 'G90', 'G149', 'G195', 'G201', 'G118', 'G114', 'G135', 'G161', 'G170', 'G106', 'G113', 'G123', 'G181', 'G197', 'G75', 'G103', 'G109', 'G39', 'G105', 'G48', 'G55', 'G202', 'G88', 'G205', 'G64', 'G92', 'G11', 'G102', 'G146', 'G52', 'G119', 'G122', 'G36', 'G115', 'G129', 'G62', 'G2', 'G136', 'G943', 'G139', 'G6', 'G185', 'G111', 'G214', 'G116', 'G143', 'G104', 'G94', 'G84', 'G108', 'G999', 'G107', 'G132', 'G173']


In [16]:
# map mlw_code to ospar given the swiss data
swl = scodes_used.loc[(scodes_used.ospar_id != 0)]
swl = swl[swl != 'none']
swl = swl[swl != 'Ospar...']['ospar_id']
swl.loc['G178']
swl.index

Index(['G213', 'G214', 'G135', 'G136', 'G137', 'G138', 'G139', 'G140', 'G141',
       'G142',
       ...
       'G93', 'G943', 'G95', 'G96', 'G98', 'G125', 'G133', 'G211', 'G161',
       'G165'],
      dtype='object', name='code', length=105)

In [17]:
# remove any records where the mlw code is invalid:
joint_list_mlw = joint_list[joint_list.mlw_code != '0']

# set the index to the mlw code:
joint_list_mlw.set_index('mlw_code', inplace=True)

# get just the ospar id
joint_list_mlw_keys = joint_list_mlw['ospar_id']
joint_list_mlw_keys.loc['G178']
joint_list_mlw_keys.index

Index(['G213', 'nan', 'G137', 'G138', 'G141', 'G140', 'G145', 'G204', 'G207',
       'G200',
       ...
       'G128', 'G159', 'G165', 'G164', 'G163', 'G162', 'G172', 'G171', 'G160',
       'nan'],
      dtype='object', name='mlw_code', length=114)

In [18]:
# compbine the two sets of definitions start with the mlw code:
joint_list_defs = list(set(list(joint_list_mlw_keys.index) + list(swl.index)))

# make a dataframe, use mlw code as index:
use_these_defs = pd.DataFrame(index=joint_list_defs)


def make_ospar_defs(x, defs1, defs2):
    # favor the definitions from fleet
    try:
        new_def = defs1.loc[x]
    except:
        # fall back on the swiss deffs
        new_def = defs2.loc[x]
    
    return new_def


use_these_defs['ospar_id'] = use_these_defs.index.map(lambda x:make_ospar_defs(x, joint_list_mlw_keys, swl) )
use_these_defs.fillna(0, inplace=True)
use_these_defs.loc['G178']

ospar_id    77
Name: G178, dtype: object

In [19]:
print(F"There are {len(use_these_defs)} definitions.")

There are 163 definitions.


In [20]:
use_these_defs.iloc[:15]

Unnamed: 0,ospar_id
G18,13
G155,67
G15,9
G27,64
G106,0
G67,40
G17,11
G158,67
G142,0
G81,117


In [21]:
# apply the those to the swiss codes:
def apply_ospar_defs(x, defs1):
    try:
        new_def = defs1.loc[x][0]
    except:
        new_def = 0
    
    return new_def
scodes_used['ospar_id'] = scodes_used.index.map(lambda x: apply_ospar_defs(x, use_these_defs))
scodes_used

Unnamed: 0_level_0,material,description,source,source_two,source_three,parent_code,direct,single_use,micro,ospar_id
code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
G213,Chemicals,Paraffin wax,Undefined,Where does it come from,none,Parent code,True,False,False,108
G214,Chemicals,Oil/tar,Construction,Where does it come from,none,Parent code,False,False,False,111
G135,Cloth,"Clothes, footware, headware, gloves",Clothing,Where does it come from,none,Parent code,True,False,False,54
G136,Cloth,Shoes,Clothing,Where does it come from,none,Parent code,True,False,False,0
G137,Cloth,"Clothing, towels & rags",Clothing,Where does it come from,none,Parent code,True,False,False,54
...,...,...,...,...,...,...,...,...,...,...
G167,Wood,Matches or fireworks,Recreation,Where does it come from,0,Parent code,True,False,False,0
G170,Wood,Wood (processed),Undefined,Where does it come from,0,Parent code,True,False,False,0
G171,Wood,Other wood < 50cm,Undefined,Where does it come from,0,Parent code,True,False,False,74
G172,Wood,Other wood > 50cm,Undefined,Where does it come from,0,Parent code,True,False,False,75


In [22]:
swiss_surveys.date = pd.to_datetime(swiss_surveys.date)
scodes_used[scodes_used.ospar_id == 0].index

Index(['G136', 'G139', 'G142', 'G143', 'G202', 'G203', 'G205', 'G208', 'G185',
       'G193', 'G195', 'G197', 'G146', 'G149', 'G156', 'G157', 'G102', 'G103',
       'G104', 'G105', 'G106', 'G107', 'G108', 'G109', 'G111', 'G112', 'G113',
       'G114', 'G115', 'G116', 'G117', 'G118', 'G119', 'G122', 'G123', 'G2',
       'G23', 'G39', 'G48', 'G52', 'G55', 'G61', 'G62', 'G64', 'G80', 'G83',
       'G84', 'G89', 'G90', 'G92', 'G94', 'G943', 'G126', 'G129', 'G131',
       'G132', 'G999', 'G167', 'G170', 'G173'],
      dtype='object', name='code')

In [23]:
# this is quite a few items:
swiss_surveys[(swiss_surveys.date >= start_date)&(swiss_surveys.date <= end_date)&(swiss_surveys.code.isin(scodes_used[scodes_used.ospar_id == 0].index))].quantity.sum()

11960

In [24]:
# check against orginal data should be zero
scodes_used.loc['G202']

material                          Glass
description                 Light bulbs
source                    Utility items
source_two      Where does it come from
source_three                          0
parent_code                 Parent code
direct                             True
single_use                        False
micro                             False
ospar_id                              0
Name: G202, dtype: object

In [25]:
scodes_used.loc['G156']

material                          Paper
description             Paper fragments
source                        Undefined
source_two      Where does it come from
source_three                          0
parent_code                 Parent code
direct                             True
single_use                        False
micro                             False
ospar_id                              0
Name: G156, dtype: object

In [26]:
# these need to be placed into a category:
scodes_used[scodes_used.ospar_id == 0][['description', 'material', 'ospar_id']]

Unnamed: 0_level_0,description,material,ospar_id
code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
G136,Shoes,Cloth,0
G139,Backpacks,Cloth,0
G142,"Rope , string or nets",Cloth,0
G143,Sails and canvas,Cloth,0
G202,Light bulbs,Glass,0
G203,"Tableware ceramic or glass, cups, plates, pieces",Glass,0
G205,Fluorescent light tubes,Glass,0
G208,Glass or ceramic fragments > 2.5 cm,Glass,0
G185,Middle size containers,Metal,0
G193,car parts and batteries,Metal,0


### Choose the correct definition for MLW codes that have many OSPAR ids.

The EU is putting together a list of harmonized codes that makes it easier to switch between different systems. We will try and consult that list before making any hasty decisions.

### Account for equivalencies for dutch child codes

Both projects use a coding system for items of local concern (sub codes or child codes) we need to find each projects analog and use appropriate OSPAR code.



In [27]:
## !!! refresh the data from the hammerdirt api here:

# a = requests.get('https://mwshovel.pythonanywhere.com/api/surveys/daily-totals/code-totals/swiss/')
# b = requests.get('https://mwshovel.pythonanywhere.com/api/list-of-beaches/swiss/')
# c = requests.get('https://mwshovel.pythonanywhere.com/api/mlw-codes/list/')

# # the surveys need to be unpacked:
# swiss_surveys = ut.unpack_survey_results(a.json())
# swiss_surveys = pd.DataFrame(swiss_surveys)

# # adding location date column
# swiss_surveys['loc_date'] = list(zip(swiss_surveys['location'], swiss_surveys['date']))

# # hold the original
# x = a.json()

# print("survey columns")
# print(swiss_surveys.columns)

# swiss_beaches = pd.DataFrame(b.json())
# print("beach columns")
# print(swiss_beaches.columns)

# print("code columns")
# swiss_codes = pd.DataFrame(c.json())
# print(swiss_codes.columns)

# swiss_surveys.to_csv(data+'/hammerdirt_data.csv')
# swiss_beaches.to_csv(beaches+'/hammerdirt_beaches.csv')
# swiss_codes.to_csv(codes+'/swiss_codes.csv')
