Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review trips: platform_code (now platform_name) #17

Closed
2 tasks done
peterdesmet opened this issue Jan 6, 2021 · 18 comments
Closed
2 tasks done

Review trips: platform_code (now platform_name) #17

peterdesmet opened this issue Jan 6, 2021 · 18 comments
Assignees
Labels
model: ready Field reviewed and ready to be added to model

Comments

@peterdesmet
Copy link
Collaborator

peterdesmet commented Jan 6, 2021

trips: platform_code indicates the platform code (ship call sign, etc) with an integer code:

- name: platform_code
description: >
Code for the ship name or the call sign (unique identifier of the
aircraft) – see [lookup table](https://github.com/ices-tools-dev/esas/blob/main/_data/vocabularies/platform_code.tsv).
type: integer
format: default
constraints:
required: false
enum: [
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 101, 102, 103, 104, 105, 106, 107, 108,
109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122,
123, 124, 125, 126, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210,

Values are available at here.

  • @ices-tools-dev/data-and-information Does ICES already have a controlled vocabulary for this?
  • @nicolasvanermen I notice a lot of unknown values. How likely is it that these well eventually get meaning?
@peterdesmet
Copy link
Collaborator Author

The first question is answered in #12 (comment), where it is suggested to use ShipC. I will investigate how easy it is to map these.

@peterdesmet
Copy link
Collaborator Author

peterdesmet commented Jan 6, 2021

Summary of mapping:

  • There are 458 platform_codes in total
  • 93 (20%) can be dumb mapped on name, but part of those are unique values
  • 50 (11%) are Temp x or UNKNOWN
  • 315 (69%) cannot be dumb mapped on name.

Questions:

  • @nicolasvanermen I notice some duplicate values in the ESAS list, such as 11009 and 2465 for Ter Streep and likely duplicates: Eldjan vs Eldjarn I guess we should merge this if we adopt SHIPC codes? It would reduce the list by 29 values
  • @ices-tools-dev/data-and-information I notice many duplicate values in the SHIPC list, such as 14AT, 32A7, 572N, 90A7, CUAN for Antares. How should we differentiate between those? see Review trips: platform_code (now platform_name) #17 (comment)
  • @ices-tools-dev/data-and-information I notice many values for UNKNOWN (like we have too). How can we assign these correctly?
  • @ices-tools-dev/data-and-information How would we add the 315 unmapped ships? How are codes assigned?

@HjalteParner
Copy link
Member

HjalteParner commented Jan 6, 2021

@peterdesmet you should use ICES platform code as the identifier/key in your system and only use an interger virtual key internally in your database, if you find that useful. The platform name is not unique cannot as such be used as an identifier of a given platform. The ICES platform request system is an international colaboration used globally. As seen at http://vocab.ices.dk/request, you can contact accessions@ices.dk if you want access to the platform request application where you can search for and request new codes for platform not allready in the system. All you need to do to request a code for a new platform is to provide enough metadata to identify the platform uniquely. Then a data manager will validate your information and assign a code. The code will be the key for all future references across the globe for the platform in question. No need to reinvent the whell here.

@neil-ices-dk
Copy link
Member

@ices-tools-dev/data-and-information I notice many duplicate values in the SHIPC list, such as 14AT, 32A7, 572N, 90A7, CUAN for Antares. How should we differentiate between those?

just to clarify, these are not duplicates; the governance model distinguishes between instances of a vessel/hull. So although the name is not unique, the combination of key attributes will be - in most cases the commission date/decommission date are the defining instances of a vessel (platform code) with the same name/call sign

@nicolasvanermen
Copy link
Collaborator

nicolasvanermen commented Jan 7, 2021 via email

@peterdesmet
Copy link
Collaborator Author

@HjalteParner @neil-ices-dk thanks, then the core issue is: given that we a historic database that only contains platform/ship names such as Alkor, Alluchio, Alsfeld (BG 16) and no extra metadata (except maybe the country), how do we map these to ShipC codes?

@nicolasvanermen
Copy link
Collaborator

nicolasvanermen commented Jan 7, 2021 via email

@neil-ices-dk
Copy link
Member

we've actually done a similar exercise for our trawl survey datasets (@Osanna123) will remember this fondly; in that case we primarily looked at the date ranges of the data that related to the vessel name to map it to probable instance(s) of the vessel in the platform codes.

@Osanna123
Copy link

  • @ices-tools-dev/data-and-information I notice many values for UNKNOWN (like we have too). How can we assign these correctly?

'UNKNOWN' platforms can be mapped to the AAxx codes or the ZZ99 in the SHIPC list.

  • @ices-tools-dev/data-and-information How would we add the 315 unmapped ships? How are codes assigned?

This would have to be a separate exercise, any additional info would be useful, at least years with data
Please note that codes mapped by name should be also verified as many platforms can bear same name.

@peterdesmet
Copy link
Collaborator Author

peterdesmet commented Mar 4, 2021

Discussed this with @nicolasvanermen and @EricStienen. Decided to make this a purely informal field named platform_name with the name of the ship or aviation call sign (see c6c2e0f). It will not be mapped to ShipC, because:

  • This field is purely informal, not used in analysis (unlike platform_type). The reason we leave it in is to provide some context for those who want to use the data, not to group data by ship.
  • Even though a code system was used within ESAS, many values are duplicates (5142 and 2469 for Wilhelmshaven) or listed as unknown. By just using the name we retain more information without having to maintain a list.
  • Of the 456 current values, only 91 (20%) could be easily to ShipC codes. Mapping the other 80% will be very hard because we don't have any extra information other than the name. It doesn't make much sense to introduce those in the ShipC list.
  • Even if we would map to ShipC, we don't gain a lot of information. E.g. ship height, type of vessel, etc. is not available in ShipC: https://vocab.ices.dk/?CodeID=137637

@nicolasvanermen in your export:

  • move field after platform_side (since it is even less important than that one)
  • use the names rather than the integer codes. If name is UNKNOWN leave field empty (it is not required).

@peterdesmet peterdesmet added model: ready Field reviewed and ready to be added to model and removed vocab: use ICES labels Mar 4, 2021
@peterdesmet peterdesmet changed the title Review trips: platform_code Review trips: platform_code (now platform_name) Mar 4, 2021
@Osanna123
Copy link

The field is not mandatory, but if the field is to be 'informal', it's better to move it to notes.
If the platforms are to remain as the field in the format, they should be linked to the controlled vocabulary SHIPC.
Considering that platform mapping (and creating missing platforms) is time-consuming, old data could be mapped to the AA-codes in the SHIPC, with the more specific information like name/call sign moved to the notes.
The future data submissions can report the exact platform reference.

@Osanna123
Copy link

List of AA-codes:
SHIPC code Description
AA00 UNSPECIFIED PLATFORM
AA11 UNSPECIFIED Fixed benthic node
AA12 UNSPECIFIED Sea bed vehicle
AA13 UNSPECIFIED BEACH/INTERTIDAL ZONE STRUCTURE
AA14 UNSPECIFIED LAND/ONSHORE STRUCTURE
AA15 UNSPECIFIED LAND/ONSHORE VEHICLE
AA16 UNSPECIFIED OFFSHORE STRUCTURE
AA17 UNSPECIFIED COASTAL STRUCTURE
AA18 UNSPECIFIED River station
AA20 UNSPECIFIED Submersible
AA21 UNSPECIFIED Propelled manned submersible
AA22 UNSPECIFIED Propelled unmanned submersible
AA23 UNSPECIFIED Towed unmanned submersible
AA24 UNSPECIFIED Drifting manned submersible
AA25 UNSPECIFIED Drifting manned submersible
AA26 UNSPECIFIED Lowered unmanned submersible
AA30 UNSPECIFIED SHIP
AA31 UNSPECIFIED RESEARCH VESSEL
AA32 UNSPECIFIED VESSEL OF OPPORTUNITY
AA33 UNSPECIFIED SELF-PROPELLED SMALL BOAT
AA34 UNSPECIFIED Vessel at fixed position
AA35 UNSPECIFIED VESSEL OF OPPORTUNITY ON FIXED ROUTE
AA36 UNSPECIFIED FISHING VESSEL
AA39 UNSPECIFIED NAVAL VESSEL
AA3A UNSPECIFIED MAN-POWERED SMALL BOAT
AA41 UNSPECIFIED MOORED SURFACE BUOY
AA42 UNSPECIFIED DRIFTING SURFACE FLOAT
AA46 UNSPECIFIED DRIFTING SUBSURFACE PROFILING FLOAT
AA61 UNSPECIFIED RESEARCH AEROPLANE
AA67 UNSPECIFIED HELICOPTER
AA71 UNSPECIFIED HUMAN
AA72 UNSPECIFIED DIVER
AA95 UNSPECIFIED amphibious vehicle self-propelled

peterdesmet added a commit that referenced this issue Apr 26, 2021
@peterdesmet
Copy link
Collaborator Author

I have now reviewed the whole mapping list. To be discussed what is the best approach to move forward.

@peterdesmet
Copy link
Collaborator Author

Decisions after May 11 meeting with @Osanna123 and @nicolasvanermen:

  • We remove codes that are not in use (i.e. were in code list, but not in database)
  • We will add codes to SHIPC that are above a certain use threshold (i.e. recent use or lots of trips). Number to be defined.
  • Codes below the use threshold will be mapped to AA30 UNSPECIFIED SHIP
  • For provenance, the original code + description will be retained in campaign > notes for all data, e.g. original platform code: 2460 Seute Deern
  • @Osanna123 will review code mappings marked with ? and indicate suggestions in remarks Anna. Those ? are codes @peterdesmet could not unambiguously map to an existing SHIPC code.
  • @nicolasvanermen will include steps in his data processing script to split certain codes into two, based on dates (e.g. for Scotia). Those codes will be marked with split in status in the spreadsheet

@peterdesmet
Copy link
Collaborator Author

Some codes that are not yet mapped (do not have ok) do seem to be used a lot or recently, making them valuable candidates to be added to SHIPC. Here are the number of those codes based on the use threshold (numbers created withwith OR, e.g. >=2000 OR >=20 trips):

 threshold  no filter >=20 trips  >=50 trips >=100 trips
 no filter 281 95 47 28
>=2000 95 145 114 103
>=2005 89 130 93 79
>=2010 39 119 76 59
>=2015  15  108 62 43

@Osanna123 without truly knowing how much work is involved, >=2010 and >100 trips seem reasonable: 59 codes to add

@nicolasvanermen
Copy link
Collaborator

nicolasvanermen commented May 17, 2021 via email

@peterdesmet
Copy link
Collaborator Author

"No filter" is maybe a confusing term, but read >=2000 and no filter as 95 trips after 2000, while >=2000 and >=20 trip should be interpreted as 145 ship codes after 2000 or with more than 20 trips. Anyway, the important point to decide is how many SHIPC codes are reasonable to add.

@peterdesmet
Copy link
Collaborator Author

PlatformCode implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model: ready Field reviewed and ready to be added to model
Projects
None yet
Development

No branches or pull requests

5 participants