Script to analyse data related to fisheries. These data concerns:

* Target 14.4 (FMSY/F and B/BMSY)
* Target 14.6 (TAC/Catch)
* Target 14.a (SAD/TAC) 

We compare data of three years: 2012, 2016, and 2021

Data comes from several sources:

For all indicators, we need data on Catches by Country: from OfficialNominalCatches and correcting for some species 

Additionally, for each indicator, we use 

* FMSY/F and B/BMSY: Stock Assessment data (ICES file)
* TAC/Catch: TAC data (Carpenter's file or, alternatively, from EC or ICES PDFs)
* SAD/TAC: SAD (Carpenter's file, ICES file, or ICES PDFs) and TAC (Carpenter's file or, alternatively, from EC or ICES PDFs)

If we want to get all indicators with the same data, then merging procedure is: 

Catches by Country <-> Stock Assessment <-> TAC <-> SAD

If we calculate indicators separately, we can do three merges and get:
* Catches by Country <-> Stock Assessment: FMSY/F and B/BMSY
* Catches by Country <-> TAC: TAC/Catch
* Catches by Country <-> TAC <-> SAD: SAD/TAC

Notes from Rudi's meeting
1. Difference between officialNominalCatces and catches from StockAssesment dataset. For what stock do we have large differences? <10% would be alright. 
2. Stock Assesment is not done every year. Rudi: Do they provide a trend for those years that they don't assess? It's odd to use different years for different stocks, but it's possible. 
3. BMSY in ICES. Btrigger Bpa is the same as Btrigger, go for that one. 
4. Effort is used because it's hard to track the catches. Send questions to Rudi


In [1576]:
import os
import pandas as pd
import numpy as np

In [1577]:
pd.set_option('display.max_columns', 500)

In [1578]:
countries=['Belgium','Bulgaria','Cyprus', 'Greece','Germany','Croatia','Italy', 
           'Denmark','Estonia','Spain','Finland','France','Ireland','Lithuania',
           'Latvia','Malta','Netherlands','Poland','Portugal', 'Romania',
           'Sweden','United Kingdom of Great Britain and Northern Ireland', "United Kingdom of GB"]

In [1579]:
country_to_abbrev = {
    "Andorra": "AD",
    "United Arab Emirates": "AE",
    "Afghanistan": "AF",
    "Antigua and Barbuda": "AG",
    "Anguilla": "AI",
    "Albania": "AL",
    "Armenia": "AM",
    "Angola": "AO",
    "Antarctica": "AQ",
    "Argentina": "AR",
    "American Samoa": "AS",
    "Austria": "AT",
    "Australia": "AU",
    "Aruba": "AW",
    "Åland Islands": "AX",
    "Azerbaijan": "AZ",
    "Bosnia and Herzegovina": "BA",
    "Barbados": "BB",
    "Bangladesh": "BD",
    "Belgium": "BE",
    "Burkina Faso": "BF",
    "Bulgaria": "BG",
    "Bahrain": "BH",
    "Burundi": "BI",
    "Benin": "BJ",
    "Saint Barthélemy": "BL",
    "Bermuda": "BM",
    "Brunei Darussalam": "BN",
    "Bolivia (Plurinational State of)": "BO",
    "Bonaire, Sint Eustatius and Saba": "BQ",
    "Brazil": "BR",
    "Bahamas": "BS",
    "Bhutan": "BT",
    "Bouvet Island": "BV",
    "Botswana": "BW",
    "Belarus": "BY",
    "Belize": "BZ",
    "Canada": "CA",
    "Cocos (Keeling) Islands": "CC",
    "Congo, Democratic Republic of the": "CD",
    "Central African Republic": "CF",
    "Congo": "CG",
    "Switzerland": "CH",
    "Côte d'Ivoire": "CI",
    "Cook Islands": "CK",
    "Chile": "CL",
    "Cameroon": "CM",
    "China": "CN",
    "Colombia": "CO",
    "Costa Rica": "CR",
    "Cuba": "CU",
    "Cabo Verde": "CV",
    "Curaçao": "CW",
    "Christmas Island": "CX",
    "Cyprus": "CY",
    "Czechia": "CZ",
    "Germany": "DE",
    "Djibouti": "DJ",
    "Denmark": "DK",
    "Dominica": "DM",
    "Dominican Republic": "DO",
    "Algeria": "DZ",
    "Ecuador": "EC",
    "Estonia": "EE",
    "Egypt": "EG",
    "Western Sahara": "EH",
    "Eritrea": "ER",
    "Spain": "ES",
    "Ethiopia": "ET",
    "Finland": "FI",
    "Fiji": "FJ",
    "Falkland Islands (Malvinas)": "FK",
    "Micronesia (Federated States of)": "FM",
    "Faroe Islands": "FO",
    "France": "FR",
    "Gabon": "GA",
    "United Kingdom of Great Britain and Northern Ireland": "UK", #original is GB, Eurostat uses UK
    "United Kingdom of GB": "GB",
    "Grenada": "GD",
    "Georgia": "GE",
    "French Guiana": "GF",
    "Guernsey": "GG",
    "Ghana": "GH",
    "Gibraltar": "GI",
    "Greenland": "GL",
    "Gambia": "GM",
    "Guinea": "GN",
    "Guadeloupe": "GP",
    "Equatorial Guinea": "GQ",
    "Greece": "EL", #original ir GR, Eurostat uses EL
    "South Georgia and the South Sandwich Islands": "GS",
    "Guatemala": "GT",
    "Guam": "GU",
    "Guinea-Bissau": "GW",
    "Guyana": "GY",
    "Hong Kong": "HK",
    "Heard Island and McDonald Islands": "HM",
    "Honduras": "HN",
    "Croatia": "HR",
    "Haiti": "HT",
    "Hungary": "HU",
    "Indonesia": "ID",
    "Ireland": "IE",
    "Israel": "IL",
    "Isle of Man": "IM",
    "India": "IN",
    "British Indian Ocean Territory": "IO",
    "Iraq": "IQ",
    "Iran (Islamic Republic of)": "IR",
    "Iceland": "IS",
    "Italy": "IT",
    "Jersey": "JE",
    "Jamaica": "JM",
    "Jordan": "JO",
    "Japan": "JP",
    "Kenya": "KE",
    "Kyrgyzstan": "KG",
    "Cambodia": "KH",
    "Kiribati": "KI",
    "Comoros": "KM",
    "Saint Kitts and Nevis": "KN",
    "Korea (Democratic People's Republic of)": "KP",
    "Korea, Republic of": "KR",
    "Kuwait": "KW",
    "Cayman Islands": "KY",
    "Kazakhstan": "KZ",
    "Lao People's Democratic Republic": "LA",
    "Lebanon": "LB",
    "Saint Lucia": "LC",
    "Liechtenstein": "LI",
    "Sri Lanka": "LK",
    "Liberia": "LR",
    "Lesotho": "LS",
    "Lithuania": "LT",
    "Luxembourg": "LU",
    "Latvia": "LV",
    "Libya": "LY",
    "Morocco": "MA",
    "Monaco": "MC",
    "Moldova, Republic of": "MD",
    "Montenegro": "ME",
    "Saint Martin (French part)": "MF",
    "Madagascar": "MG",
    "Marshall Islands": "MH",
    "North Macedonia": "MK",
    "Mali": "ML",
    "Myanmar": "MM",
    "Mongolia": "MN",
    "Macao": "MO",
    "Northern Mariana Islands": "MP",
    "Martinique": "MQ",
    "Mauritania": "MR",
    "Montserrat": "MS",
    "Malta": "MT",
    "Mauritius": "MU",
    "Maldives": "MV",
    "Malawi": "MW",
    "Mexico": "MX",
    "Malaysia": "MY",
    "Mozambique": "MZ",
    "Namibia": "NA",
    "New Caledonia": "NC",
    "Niger": "NE",
    "Norfolk Island": "NF",
    "Nigeria": "NG",
    "Nicaragua": "NI",
    "Netherlands": "NL",
    "Norway": "NO",
    "Nepal": "NP",
    "Nauru": "NR",
    "Niue": "NU",
    "New Zealand": "NZ",
    "Oman": "OM",
    "Panama": "PA",
    "Peru": "PE",
    "French Polynesia": "PF",
    "Papua New Guinea": "PG",
    "Philippines": "PH",
    "Pakistan": "PK",
    "Poland": "PL",
    "Saint Pierre and Miquelon": "PM",
    "Pitcairn": "PN",
    "Puerto Rico": "PR",
    "Palestine, State of": "PS",
    "Portugal": "PT",
    "Palau": "PW",
    "Paraguay": "PY",
    "Qatar": "QA",
    "Réunion": "RE",
    "Romania": "RO",
    "Serbia": "RS",
    "Russian Federation": "RU",
    "Rwanda": "RW",
    "Saudi Arabia": "SA",
    "Solomon Islands": "SB",
    "Seychelles": "SC",
    "Sudan": "SD",
    "Sweden": "SE",
    "Singapore": "SG",
    "Saint Helena, Ascension and Tristan da Cunha": "SH",
    "Slovenia": "SI",
    "Svalbard and Jan Mayen": "SJ",
    "Slovakia": "SK",
    "Sierra Leone": "SL",
    "San Marino": "SM",
    "Senegal": "SN",
    "Somalia": "SO",
    "Suriname": "SR",
    "South Sudan": "SS",
    "Sao Tome and Principe": "ST",
    "El Salvador": "SV",
    "Sint Maarten (Dutch part)": "SX",
    "Syrian Arab Republic": "SY",
    "Eswatini": "SZ",
    "Turks and Caicos Islands": "TC",
    "Chad": "TD",
    "French Southern Territories": "TF",
    "Togo": "TG",
    "Thailand": "TH",
    "Tajikistan": "TJ",
    "Tokelau": "TK",
    "Timor-Leste": "TL",
    "Turkmenistan": "TM",
    "Tunisia": "TN",
    "Tonga": "TO",
    "Turkey": "TR",
    "Trinidad and Tobago": "TT",
    "Tuvalu": "TV",
    "Taiwan, Province of China": "TW",
    "Tanzania, United Republic of": "TZ",
    "Ukraine": "UA",
    "Uganda": "UG",
    "United States Minor Outlying Islands": "UM",
    "United States of America": "US",
    "Uruguay": "UY",
    "Uzbekistan": "UZ",
    "Holy See": "VA",
    "Saint Vincent and the Grenadines": "VC",
    "Venezuela (Bolivarian Republic of)": "VE",
    "Virgin Islands (British)": "VG",
    "Virgin Islands (U.S.)": "VI",
    "Viet Nam": "VN",
    "Vanuatu": "VU",
    "Wallis and Futuna": "WF",
    "Samoa": "WS",
    "Yemen": "YE",
    "Mayotte": "YT",
    "South Africa": "ZA",
    "Zambia": "ZM",
    "Zimbabwe": "ZW",
}
    
# invert the dictionary
abbrev_to_country = dict(map(reversed, country_to_abbrev.items()))

## Load and process data

In [1708]:
# Initial species list (from Rickels et al. 2019)
stockW = ['cod.27.47d20',
 'bss.27.4bc7ad-h',
 'ghl.27.561214',
 'hke.27.8c9a',
 'meg.27.7b-k8abd',
 'hke.27.3a46-8abd',
 'mac.27.nea',
 'san.sa.3r',
 'sol.27.7d',
 'whg.27.47d',
 'pok.27.1-2',
 'cod.2127.1f14',
 'sol.27.20-24',
 'cod.21.1',
 'her.27.3a47d',
 'nop.27.3a4',
 'cod.27.7e-k',
 'lin.27.5a',
 'her.27.irls',
 'her.27.25-2932',
 'ple.27.420',
 'had.27.7b-k',
 'ple.27.7d',
 'ple.27.7a',
 'spr.27.4',
 'had.27.7a',
 'spr.27.22-32',
 'had.27.5a',
 'san.sa.1r',
 'her.27.3031',
 'reb.27.1-2',
 'pok.27.3a46',
 'her.27.1-24a514a',
 'sol.27.8ab',
 'mon.27.78abd',
 'pok.27.6',
 'bss.27.8ab',
 'sol.27.7e',
 'lez.27.4a6a',
 'reg.27.1-2',
 'ldb.27.8c9a',
 'san.sa.4',
 'whg.27.6a',
 'cod.27.21',
 'cod.27.7a',
 'had.27.46a20',
 'lez.27.6b',
 'hom.27.2a4a5b6a7a-ce-k8',
 'ple.27.21-23',
 'cod.27.5a',
 'cap.27.1-2',
 'cod.27.22-24',
 'pok.27.5a',
 'cod.27.1-2',
 'whg.27.7b-ce-k',
 'pra.27.3a4a',
 'mon.27.8c9a',
 'reb.27.1-2',
 'sol.27.7fg',
 'whb.27.1-91214',
 'usk.27.5a14',
 'her.27.5a',
 'her.27.6a7bc',
 'had.27.6b',
 'sol.27.4',
 'san.sa.2r',
 'had.27.1-2',
 'her.27.28',
 'hom.27.9a',
 'sol.27.7a',
 'bli.27.5b67',
 'reg.27.561214',
 'meg.27.8c9a',
 'cod.27.6a',
 'her.27.20-24',
 'pra.27.1-2',
 'ple.27.7fg',
 'her.27.nirs'
 ]


In [1581]:
# https://neweconomics.org/campaigns/landing-the-blame
pd.ExcelFile(("../data/icesTACcomparison.xlsx")).sheet_names

['Menus',
 'Table of contents',
 'ICES advice',
 'Council agreed TAC',
 'Comparison',
 'Table for results',
 'Overall results',
 'Results by Member State',
 'Sea basin',
 'Third country',
 'Results by % difference',
 'Results by # of TACs',
 'Results by # of TACs by MS',
 'Results by species',
 'ID',
 'Matching - ICES-TAC',
 'Matching - Final TACs',
 'Matching - EU share',
 'Matching - TAC split share',
 'Matching - ICES area share']

In [1582]:
pd.read_excel(("../data/icesTACcomparison.xlsx"), 1)

Unnamed: 0,Tab,Description
0,ICES advice,ICES advice by TAC and year
1,Council agreed TAC,Council agreed TAC by Member State and year
2,Comparison,Comparing ICES advice and agreed TACs by Membe...
3,Overall results,Calculates the difference between TACs and ICE...
4,Results by Member State,Calculates the difference between TACs and ICE...
5,Results by % difference,Calculates the difference between TACs and ICE...
6,Results by # of TACs,Calculates the number of TACs that exceed ICES...
7,Results by # of TACs by MS,Calculates the number of TACs that exceed ICES...
8,Results by third country share,Calculates the difference between TACs and ICE...
9,Results by species,Calculates the difference between TACs and ICE...


### StockAssessment

In [1812]:
# https://standardgraphs.ices.dk/stockList.aspx
stockList = []
stockL = ['stockAssesment2020','stockAssesment2021', 'stockAssesment2022']
for i in stockL:
    # read csv and fix tokenizing error, engine='python' turns DtypeWarning off 
    stockTemp = pd.read_csv(("../data/{}/StockAssessment.csv".format(i)), names=range(138), engine='python')
    stockTemp.columns = stockTemp.iloc[0,:]
    stockTemp = stockTemp[1:]
    # extract EN name and acronym of species
    stockTemp['enName'] = stockTemp["StockDescription"].str.extract(r"^(.+?) ?(?:\d|\(|$)" , expand=False)
    stockTemp['speciesAcronym'] = stockTemp["FishStock"].str.extract(r"^([^.]*).*" , expand=False)
    # filter columns and years
    stockTemp = stockTemp.loc[:,['Year','enName', 'speciesAcronym','FishStock','StockKey', 'SpeciesName', 
                    "ICES Areas (splited with character '~')", 'StockSize', 'StockSizeDescription', 'StockSizeUnits',
                      'FishingPressure', 'FishingPressureDescription', 'FishingPressureUnits',
                     'Flim', 'Fpa', 'Blim', 'Bpa', 'FMSY', 'MSYBtrigger', 
                     'CatchesLadingsUnits', 'Landings', 'OfficialLandings', 'Catches',
                     'Report', 'AssessmentKey','AssessmentYear']]
    stockTemp.Year = stockTemp.Year.astype(int)
    stockTemp = stockTemp[stockTemp.Year.isin([2012,2016,2020])]
    stockList.append(stockTemp)
# concatenate all assessments and drop duplicates (keep last)
stock3years = pd.concat(stockList)
stock = stock3years.drop_duplicates(subset=['Year','FishStock'], keep='last').copy()
stock["ICES Areas (splited with character \'~\')"] = stock["ICES Areas (splited with character \'~\')"].fillna("")

In [1584]:
# check what what data is in the custom columns
# stock[~stock.CustomUnits6.isna()].dropna(axis=1)

In [1878]:
# To merge with OfficialCatches database
# Split the areas column and explode to get one row per species-area (as per in CatchesOfficial) 

stock['areasOriginal'] = stock["ICES Areas (splited with character \'~\')"]

stockExplode = (stock.set_index(stock.columns.difference(["ICES Areas (splited with character \'~\')"]).tolist())\
   .apply(lambda x: x.str.split('~').explode())
   .reset_index()) 

stockExplode = stockExplode.rename(columns={"ICES Areas (splited with character \'~\')":'area'}).copy()

stockExplode["area"] = stockExplode["area"].str.strip()

# there are duplicates because of updated reports, we drop them keeping the last version
stockExplode = stockExplode.drop_duplicates(subset=['Year', "area", 'FishStock'], keep='last')

In [1587]:
# delete parent areas to avoid double counting
# solution from https://stackoverflow.com/q/76183612/14534411
areasAll = stockExplode.area.unique().tolist()

def is_parent(p, target):
    return p.startswith(target) and len(p)>len(target) and p[len(target)] == '.'

icesAreas = []
prev = ''
for s in sorted(areasAll)+['']:
    if prev and not is_parent(s, prev):
        icesAreas.append(prev)
    prev = s

stockExplode = stockExplode[(stockExplode.area.isin(icesAreas)) | (stockExplode.area.isna())]

In [1883]:
# ICES areas from Christoph
icesAreasC=pd.DataFrame(["27.1.a","27.1.b","27.1_NK","27.2_NK","27.2.a.1","27.2.a.2","27.2.a_NK","27.2.b.1","27.2.b.2","27.2.b_NK",
            "27.3.a.20","27.3.a.21","27.3.a_NK","27.3_NK","27.3.b.23","27.3.c.22","27.3.d_NK","27.3.d.24","27.3.d.25",
            "27.3.d.26","27.3.d.27","27.3.d.28_NK","27.3.d.29","27.3.d.30","27.3.d.31","27.3.d.32","27.3.d.28.1","27.3.d.28.2",
            "27.4.a","27.4.b","27.4.c","27.4_NK","27.5_NK","27.5.a.1","27.5.a.2","27.5.a_NK","27.5.b.2","27.5.b_NK","27.5.b.1.a",
            "27.5.b.1.b","27.5.b.1_NK","27.6.a","27.6.b_NK","27.6_NK","27.6.b.1","27.6.b.2","27.7.a","27.7.b","27.7.c.1","27.7.c.2",
            "27.7.c_NK","27.7.d","27.7.e","27.7.f","27.7.g","27.7.h","27.7.j.1","27.7.j.2","27.7.j_NK","27.7.k.1","27.7.k.2","27.7.k_NK",
            "27.8.a","27.8.b","27.8.c","27.8.d.1","27.8.d.2","27.8.d_NK","27.8.e.1","27.8.e.2","27.8.e_NK","27.8_NK","27.9.a","27.9_NK",
            "27.9.b.1","27.9.b.2","27.9.b_NK","27.10.a.1","27.10.a.2","27.10.a_NK","27.10.b","27.10_NK","27.12.a.1","27.12.a.2","27.12.a.3",
            "27.12.a.4","27.12.a_NK","27.12.b","27.12.c","27.12_NK","27.14.a","27.14.b.1","27.14.b.2","27.14.b_NK","27.14_NK","27_NK"], columns= ['icesAreas'])

# compare the areas in the stock dataset 
areaStock = pd.DataFrame(stockExplode.area.unique(), columns=['areaStock'])

# merge
areaCompare = areaStock.merge(icesAreasC, left_on='areaStock', right_on='icesAreas', how='outer', indicator=True)
areaCompare[areaCompare['_merge']=='rigth_only'].reset_index(drop=True)

Unnamed: 0,areaStock,icesAreas,_merge


In [1589]:
# Wilfried's contained stock with StockSizeDescription = 
# = SSB|SSB/B45cm|B/Bmsy|Stock Size: Relative|Spawning Stock Biomass|B_index
stockSSB = pd.DataFrame(stockExplode[(stockExplode.StockSizeDescription.str.contains("SSB|SSB/B45cm|B/Bmsy|Stock Size: Relative|Spawning Stock Biomass|B_index", na=False)) | (stockExplode.StockSizeDescription.isna())]
    .drop_duplicates(subset=['FishStock'], keep='last')[['FishStock', 'StockSizeDescription']]).rename(columns={'FishStock':'FishStockSSB'})

stockW = pd.DataFrame(stockW, columns=['FishStockW'])

stockCompare = stockSSB.merge(stockW, left_on='FishStockSSB', right_on='FishStockW', how='outer', indicator=True)
stockCompare[stockCompare['_merge']=='right_only'].reset_index(drop=True)

Unnamed: 0,FishStockSSB,StockSizeDescription,FishStockW,_merge
0,,,cod.21.1,right_only
1,,,spr.27.4,right_only
2,,,reb.27.1-21,right_only
3,,,pok.27.6,right_only
4,,,cod.27.21,right_only
5,,,hom.27.2a4a5b6a7a-ce-k8,right_only


Matching problems:

cod.21.1 and cod.27.21 are in parent areas. 

spr.27.4 is not in the dataset, could be spr.27.3a4 

pok.27.6 is not in the dataset. 27.3a46 

hom.27.2a4a5b6a7a-ce-k8	 has no area


In [1590]:
# keep the species by Wilfried (alternatively, by StockSizeDescription)
stockExplode = stockExplode[stockExplode.FishStock.isin(stockW.FishStockW)]

In [1591]:
# create dict of area-stock
areaStock = stockExplode.groupby(['speciesAcronym','area', 'FishStock']).size().reset_index().rename(columns={0:'count'})
areaStock.head()

Unnamed: 0,speciesAcronym,area,FishStock,count
0,bli,27.5.b.1.a,bli.27.5b67,3
1,bli,27.5.b.1.b,bli.27.5b67,3
2,bli,27.5.b.2,bli.27.5b67,3
3,bli,27.6.a,bli.27.5b67,3
4,bli,27.6.b.1,bli.27.5b67,3


### OfficialNominalCatches

In [1592]:
# https://www.ices.dk/data/dataset-collections/Pages/Fish-catch-and-stock-assessment.aspx
catches = pd.read_csv(("../data\OfficialNominalCatches\ICESCatchDataset2006-2020.csv")) 

In [1593]:
# add country name column
catches['geo'] = catches.Country.map(abbrev_to_country).fillna(catches.Country)

# filter countries of interest. commented as we want to compare total catches from stockAssessment 
# catches = catches[catches.geo.isin(countries)]

# convert Species to lower case
catches.Species = catches.Species.str.lower()

# 2020 has word characters, we divide it into two 
catches[['2020','2020c']] = catches['2020'].str.split(" +",expand = True) 
catches['2020'] = catches['2020'].astype(np.float64)

# keep useful columns
catches = catches.loc[:,['Country','geo','Species','Area','2012','2016','2020']]

In [1594]:
catches = catches.melt(id_vars=['Country', 'geo','Species', 'Area'], 
       var_name='Year', value_name='CatchesCountry')

In [1595]:
catchesPivot = catches.pivot_table( columns='geo', index=['Year','Area', 'Species'] , values='CatchesCountry', aggfunc='sum').reset_index()
catchesPivot['Year'] = catchesPivot['Year'].astype(int)

# only keep the selected species and areas
catchesPivot = catchesPivot[(catchesPivot.Area.isin(list(areaStock.area.unique()))) &\
(catchesPivot.Species.isin(list(areaStock.speciesAcronym.unique())))]

### TAC

In [1596]:
# Can be extracted from the TAC vs Advice dataset or the TAC dataset. The latter has some more rows
tac = pd.read_excel(("../data/icesTACcomparison.xlsx"), 'Council agreed TAC')
# tac = pd.read_csv(("../data/RecordOfEuropeanTAC.csv"))

In [1597]:
# extract acronym of species and convert to lower case 
# (doesn't make sense since acronym doesn't correspond to species acronyms)
# tac["speciesAcronym"] = tac["Reference"].str.extract(r"\(([\w\-]+)" , expand=False).str.lower()

In [1598]:
# filter years of interest. tac level that we care about is TAC
tac = tac[(tac.Year.isin([2012,2016,2020])) & (tac.Level == 'TAC') ]
tac = tac[['Reference', 'TAC ID', 'Species', 'TAC Zone', 'Level',
       'TAC for comparison', 'Year', 'Amendment/Original']]

### ICES advice

In [1685]:
# From ICES official databse https://asd.ices.dk/AdviceList
sadOff = pd.read_csv(("../data/adviceICES_Data_26_04_2023.csv"))
# drop deprecated Advice 
sadOff = sadOff[sadOff['AdviceStatus'] == 'Advice'].copy()

In [1686]:
# transform dates
sadOff[['AdviceApplicableFrom', 'AdviceApplicableUntil']] =\
    sadOff[['AdviceApplicableFrom', 'AdviceApplicableUntil']].apply(pd.to_datetime, format='%d/%m/%Y')

In [1687]:
# drop duplicates based on StockCode and AdviceApplicableFrom (three had duplicates)
sadOff = sadOff.drop_duplicates(subset=['StockCode', 'AdviceApplicableFrom'], keep='last')

In [1688]:
# Create the new column with years between AdviceApplicableFrom and 
# AdviceApplicableUntil = years in which the advice is valid
date_range = lambda x: range(x['AdviceApplicableFrom'].year, x['AdviceApplicableUntil'].year+1)
sadOff = sadOff.assign(year=sadOff.apply(date_range, axis=1)).explode('year', ignore_index=True)

# keep columns of interest
sadOff = sadOff[['year', 'StockCode', 'AdviceValue', 'AdviceType', 'AdviceApplicableFrom', 'AdviceApplicableUntil',
 'AdviceValueUnit', 'AssessmentYear', 'AssessmentKey','AdviceKey', 'AdviceDOI'] ].copy()

# transform dates to year only
sadOff[['AdviceApplicableFrom', 'AdviceApplicableUntil']] =\
    sadOff[['AdviceApplicableFrom', 'AdviceApplicableUntil']].transform(lambda x: x.dt.year) 

In [1689]:
# check stock for which advice AdviceApplicableFrom is not the year before AdviceApplicableUntil
# sadOff.loc[(sadOff['AdviceApplicableFrom']  < sadOff['AdviceApplicableUntil'] - 1)] 

In [1690]:
# In some cases, advice is given for two years, and in the second year a new advice is given. 
# We drop the deprecated for the second year
sadOff = sadOff.drop_duplicates(subset=['StockCode', 'year'], keep='last') 
sadOff[sadOff.duplicated(subset=['StockCode', 'year'], keep=False)] 

Unnamed: 0,year,StockCode,AdviceValue,AdviceType,AdviceApplicableFrom,AdviceApplicableUntil,AdviceValueUnit,AssessmentYear,AssessmentKey,AdviceKey,AdviceDOI


In [1691]:
# example of stock for which AdviceApplicableFrom is not the year before AdviceApplicableUntil
# sadOff[sadOff['StockCode'] == 'whg.27.6b'] 

In [1698]:
sadOff

Unnamed: 0,year,StockCode,AdviceValue,AdviceType,AdviceApplicableFrom,AdviceApplicableUntil,AdviceValueUnit,AssessmentYear,AssessmentKey,AdviceKey,AdviceDOI
0,2020,cod.27.5a,272411,Catches,2020,2020,t,2019,11516,1011,https://doi.org/10.17895/ices.advice.4735
1,2020,had.27.7a,3156,Catches,2020,2020,t,2019,12969,1022,https://doi.org/10.17895/ices.advice.4784
2,2020,ple.27.7a,5640,Catches,2020,2020,t,2019,12868,1024,https://doi.org/10.17895/ices.advice.4798
3,2020,had.27.6b,10472,Catches,2020,2020,t,2019,13108,1025,https://doi.org/10.17895/ices.advice.5589
4,2020,had.27.7b-k,16671,Catches,2020,2020,t,2019,12976,1026,https://doi.org/10.17895/ices.advice.4785
...,...,...,...,...,...,...,...,...,...,...,...
1126,2022,bli.27.5b67,10859,Catches,2022,2022,t,2020,13564,3182,https://doi.org/10.17895/ices.advice.5819
1127,2020,bli.27.5b67,11150,Catches,2020,2020,t,2018,9467,3183,https://doi.org/10.17895/ices.pub.4400
1128,2019,anf.27.3a46,31690,Catches,2019,2019,t,2018,10174,3184,https://doi.org/10.17895/ices.pub.4588
1129,2019,her.27.nirs,6896,Catches,2019,2019,t,2018,9352,3203,https://doi.org/10.17895/ices.pub.4492


In [1670]:
# From Carpenter 
sad = pd.read_excel(("../data/icesTACcomparison.xlsx"), 'ICES advice')
sad = sad[sad.Year.isin([2012,2016,2020])][['ICES code', 'Advice', 'Year',
        'ICES advice', 'Catches corresponding to advice',
       'Landings corresponding to advice','Choices']]

In [1668]:
sad = sad[sad['ICES code'].isin(list(areaStock.FishStock.unique()))]

In [1672]:
sad[sad['Choices']=='Plaice in VIId + Vlle']

Unnamed: 0,ICES code,Advice,Year,ICES advice,Catches corresponding to advice,Landings corresponding to advice,Choices
1551,ple.27.7d + ple.27.7e,12512,2016,MSY approach,36.429,12.512,Plaice in VIId + Vlle


In [1671]:
sad.Choices.unique()

array([nan, 'No matching advice', 'Sum of multiple advice',
       'Assume landings =catches', 'MSY approach (range 10192–29767)',
       'MSY approach. Range 878-1685)', 'Special request',
       'MSY point value. Range 2209-5196.',
       'Measured in catches, not landings',
       'MSY approach. Range 11,418-23,262.',
       'MSY approach. Range 2,333-3,830.', 'Landings not catch',
       'MSY approach. Range 4,694-8,991.', 'Fleet B', 'Fleet D',
       'Fleet C', 'Fleet A', 'Lemon sole and witch', 'High end of range',
       'Add Rockall. MSY approach.', 'MSY approach. Range 13,218-28,838.',
       'MSY approach. Range 357-648 (Lepidorhombus whiffiagonis) and 1,275-2,651 (Lepidorhombus boscii).',
       'First two quarters', 'FU 14,15,16,17,18,20,21,22',
       'FU 6,7,8,9,32', 'FU11, FU12, FU13 and other Via', 'SD 21',
       'SD 21, MSY advice used', 'MSY used', 'North sea split out',
       'Plaice in VIId + Vlle', 'Plaice in VIId + Vlle. MSY approach.',
       'Catches data. Cal

### TAC-SAC Comparison

In [1659]:
sad.Choices.unique()

array([nan, 'No matching advice', 'Sum of multiple advice',
       'Assume landings =catches', 'MSY approach (range 10192–29767)',
       'MSY approach. Range 878-1685)', 'Special request',
       'MSY point value. Range 2209-5196.',
       'Measured in catches, not landings',
       'MSY approach. Range 11,418-23,262.',
       'MSY approach. Range 2,333-3,830.', 'Landings not catch',
       'MSY approach. Range 4,694-8,991.', 'Fleet B', 'Fleet D',
       'Fleet C', 'Fleet A', 'Lemon sole and witch', 'High end of range',
       'Add Rockall. MSY approach.', 'MSY approach. Range 13,218-28,838.',
       'MSY approach. Range 357-648 (Lepidorhombus whiffiagonis) and 1,275-2,651 (Lepidorhombus boscii).',
       'First two quarters', 'FU 14,15,16,17,18,20,21,22',
       'FU 6,7,8,9,32', 'FU11, FU12, FU13 and other Via', 'SD 21',
       'SD 21, MSY advice used', 'MSY used', 'North sea split out',
       'Plaice in VIId + Vlle', 'Plaice in VIId + Vlle. MSY approach.',
       'Catches data. Cal

In [None]:
sad_tac = pd.read_excel(("../data/icesTACcomparison.xlsx"), 'Comparison') 

In [1599]:
# sad_tac[(sad_tac.Species == 'Plaice') & (sad_tac.Year.isin([2012,2016,2020])) & (sad_tac.Level == 'TAC') & (sad_tac['ICES area'] == 4)  ]

## Merge Data
All merges are of Catches with the other datasets

### Merge stockAssessment

In [1610]:
# merge stock and catches data, left or right to check non-matching rows
mStockCatch = stockExplode.merge(catchesPivot, left_on=['speciesAcronym','area','Year'], 
                            right_on=['Species', 'Area', 'Year'], how='outer', indicator='_mergeCatch')

mStockCatch['sumOffCatches'] = mStockCatch[['Belgium', 'China', 'Denmark', 'Estonia', 'Faroe Islands',
       'Finland', 'France', 'Germany', 'Greenland', 'Guernsey', 'Iceland',
       'Ireland', 'Isle of Man', 'Japan', 'Jersey', 'Korea, Republic of',
       'Latvia', 'Lithuania', 'Netherlands', 'Norway', 'Poland', 'Portugal',
       'Russian Federation', 'Spain', 'Sweden', 'Taiwan, Province of China',
       'United Kingdom of GB']].sum(axis=1)
mStockCatch[mStockCatch._mergeCatch == 'right_only']

Unnamed: 0,Year,enName,speciesAcronym,FishStock,StockKey,SpeciesName,ICES Areas (splited with character '~'),StockSize,StockSizeDescription,StockSizeUnits,FishingPressure,FishingPressureDescription,FishingPressureUnits,Flim,Fpa,Blim,Bpa,FMSY,MSYBtrigger,CatchesLadingsUnits,Landings,OfficialLandings,Catches,Report,AssessmentKey,AssessmentYear,fullArea,area,Area,Species,Belgium,China,Denmark,Estonia,Faroe Islands,Finland,France,Germany,Greenland,Guernsey,Iceland,Ireland,Isle of Man,Japan,Jersey,"Korea, Republic of",Latvia,Lithuania,Netherlands,Norway,Poland,Portugal,Russian Federation,Spain,Sweden,"Taiwan, Province of China",United Kingdom of GB,_mergeCatch,sumOffCatches
1510,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,27.1.a,ghl,,,,0.0,,,0.0,0.0,0.0,,0.0,,,,,,0.0,0.0,,0.00,0.0,,,0.11,,,0.0,right_only,0.11
1511,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,27.1.a,lin,,,,,,,,,,,,,,,,,,,,1.06,,,,,,,,right_only,1.06
1512,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,27.1.a,mon,,,,,,,,,,,,,,,,,,,,0.00,,,,,,,,right_only,0.00
1513,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,27.1.a,ple,,,,,,,,,0.0,,0.0,,,,,,,,,,,,,,,,,right_only,0.00
1514,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,27.1.a,usk,,,,,,,,,,,0.0,,,,,,,,,0.00,,,,,,,,right_only,0.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3483,2020,,,,,,,,,,,,,,,,,,,,,,,,,,,,27.9.b.2,lez,,,,,,,,,,,,,,,,,,,,,,,,0.00,,,,right_only,0.00
3484,2020,,,,,,,,,,,,,,,,,,,,,,,,,,,,27.9.b.2,mac,,,,,,,,,,,,,,,,,,,,,,0.0,,,,,,right_only,0.00
3485,2020,,,,,,,,,,,,,,,,,,,,,,,,,,,,27.9.b.2,mon,,,,,,,,,,,,,,,,,,,,,,0.0,,0.00,,,,right_only,0.00
3486,2020,,,,,,,,,,,,,,,,,,,,,,,,,,,,27.9.b.2,pok,,,,,,,,,,,,,,,,,,,,,,0.0,,,,,,right_only,0.00


There's several stock-area combinations that are not in the stockAssessment dataset. That is, there is catches of species in areas that in the stockAssessment are not considered as part of those areas.

For example:

In [1611]:
print(mStockCatch[mStockCatch.Area == '27.1.a']['Species'].unique(),'\n',stockExplode[stockExplode.area=='27.1.a']['FishStock'].unique())

['reg' 'reb' 'pok' 'mac' 'had' 'cap' 'her' 'pra' 'cod' 'ghl' 'lin' 'mon'
 'ple' 'usk'] 
 ['reg.27.1-2' 'reb.27.1-2' 'pok.27.1-2' 'mac.27.nea' 'had.27.1-2'
 'whb.27.1-91214' 'cap.27.1-2' 'her.27.1-24a514a' 'pra.27.1-2'
 'cod.27.1-2']


In [1612]:
print(
len(stockExplode), 
len(catchesPivot), 
len(mStockCatch[mStockCatch._mergeCatch == 'both']),
len(mStockCatch[mStockCatch._mergeCatch == 'left_only']),
len(mStockCatch[mStockCatch._mergeCatch == 'right_only'])
)

1510 3258 1353 157 1978


In [1635]:
# Check difference between total catches from OfficialCatches and StockAssessment
mStockCatch.Catches = mStockCatch.Catches.astype(np.float64)
mStockCatch['diffCatches'] = (mStockCatch.sumOffCatches - mStockCatch.Catches)/mStockCatch.Catches
mStockCatch.to_csv('../dataTemp/stockCatch.csv')
mStockCatch['diffCatches'].describe()

count    1141.000000
mean             inf
std              NaN
min        -1.000000
25%        -1.000000
50%        -0.999222
75%        -0.923914
max              inf
Name: diffCatches, dtype: float64

### Merge TAC
To merge the TAC, we use the dictionary by Carpenter. 

In [1638]:
dictTAC = pd.read_excel(("../data/icesTACcomparison.xlsx"), 'Matching - ICES-TAC') 
# filter years and melt
dictTAC = dictTAC[['TAC ID', 2012, 2016, 2020]]
dictTAC = dictTAC.melt(id_vars=['TAC ID'], var_name='Year', value_name='FishStock').copy()
# explode and then merge with with original to know if TAC applies to more than one stock
dictTACexplode = dictTAC.set_index(['TAC ID', 'Year']).apply(lambda x: x.str.split('+').explode()).reset_index()
dictTACexplode = dictTACexplode.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
dictTAC = dictTAC.merge(dictTACexplode, on=['TAC ID', 'Year'])
#rename all columns of dictTAC
dictTAC.columns = ['TAC ID', 'Year', 'FishStockTAC', 'FishStock']

In [1639]:
# convert Year to float (Year in mStockCatchSADoff is float because of NaNs)
dictTAC.Year = dictTAC.Year.astype(np.float64)

In [1640]:
# filter stock from list
dictTAC = dictTAC[dictTAC.FishStock.isin(list(areaStock.FishStock.unique()))]

In [1641]:
dictTAC = dictTAC.groupby(['TAC ID', 'FishStock', 'FishStockTAC']).size().reset_index().rename(columns={0:'count'})

In [1642]:
dictTAC = dictTAC.merge(areaStock, on = 'FishStock', how='outer', indicator=True)
dictTAC.to_csv('../dataTemp/dictTACstockArea.csv')

In [1643]:
dictTAC = dictTAC[['TAC ID', 'FishStock', 'FishStockTAC','area', 'speciesAcronym']]
tacCatches =  tac.merge(dictTAC, on=['TAC ID'], how='outer', indicator=True)
# tacCatches[tacCatches._merge == 'right_only']


In [1644]:
# drop TAC without fishstock
tacCatches = tacCatches[tacCatches._merge.str.contains('right_only|both')]

In [1645]:
catchesPivot['Year'] = catchesPivot['Year'].astype(np.float64)
tacCatches = tacCatches.merge(catchesPivot, left_on=['Year', 'area','speciesAcronym'],\
 right_on=['Year', 'Area', 'Species',], how='outer', indicator='tacCatches')
tacCatches.to_csv('../dataTemp/tacCatches.csv')

If we want to merge stockAssessment - TAC - Catches:

In [169]:
# drop non-matching rows from previous merge
mStockCatch = mStockCatch[mStockCatch._mergeCatch == 'both'].copy()

mStockCatchTAC = mStockCatch.merge(dictTAC, on=['FishStock', 'Year'], how='outer', indicator='_mergeDictTAC')

In [170]:
mStockCatchTAC = mStockCatchTAC.merge(tac, on=['TAC ID', 'Year'], how='left', indicator='_mergeTAC')

In [172]:
print(
len(mStockCatch), 
len(tac), 
len(mStockCatchTAC[mStockCatchTAC._mergeDictTAC == 'both']),
len(mStockCatchTAC[mStockCatchTAC._mergeDictTAC == 'left_only']),
len(mStockCatchTAC[mStockCatchTAC._mergeDictTAC == 'right_only'])
)

2787 662 4489 1277 399


### Merge SAD (Official)

In [1646]:
# filter list of stock
sadOff = sadOff[sadOff.StockCode.isin(list(areaStock.FishStock.unique()))]
sadOff.year = sadOff.year.astype(np.float64).copy()
sadOff = sadOff[sadOff.year.isin([2012,2016,2020])]
# merge with TAC based on year and stock
sadOffTACcatches = sadOff.merge(tacCatches, left_on=['StockCode', 'year'], right_on =['FishStock', 'Year'], \
how='outer', indicator='_mergeSADTAC')
sadOffTACcatches.to_csv('../dataTemp/sadOffTACcatches.csv')

If we want to merge with stockAssessment as well

In [1647]:
# drop non-matching rows from previoys merge 
mStockCatchTAC = mStockCatchTAC[mStockCatchTAC._mergeCatch == 'both'].copy()

sadOff.year = sadOff.year.astype(np.float64)

# merge with SAD using StockCode and year
mStockCatchTACSADoff = mStockCatchTAC.merge(sadOff, left_on=['FishStock','Year'],
                                             right_on=['StockCode', 'year'],
                                              how='outer', indicator='_mergeSADoff')

In [1648]:
print(
len(mStockCatch), 
len(sadOff), 
len(mStockCatchTACSADoff[mStockCatchTACSADoff._mergeSADoff == 'both']),
len(mStockCatchTACSADoff[mStockCatchTACSADoff._mergeSADoff == 'left_only']),
len(mStockCatchTACSADoff[mStockCatchTACSADoff._mergeSADoff == 'right_only']),
)

3488 54 1204 4562 0


In [1649]:
exportMergeAll = mStockCatchTACSADoff[(mStockCatchTACSADoff._mergeSADoff == 'both') & (mStockCatchTACSADoff._mergeTAC == 'both')]	
exportMergeAll.to_csv('..\dataTemp\StockCatchSADoffTAC.csv', index=False)

### Merge SAD (Carpenter)

In [None]:
sad.year = sad.year.astype(np.float64)

# merge with SAD using StockCode and year
mStockCatchTACsad = mStockCatchTAC.merge(sad, left_on=['FishStock','Year'],
                                             right_on=['ICES code', 'Year'],
                                              how='outer', indicator='_mergeSAD')

In [None]:
print(
len(mStockCatch), 
len(sadOff), 
len(mStockCatchTACSADoff[mStockCatchTACSADoff._mergeSADoff == 'both']),
len(mStockCatchTACSADoff[mStockCatchTACSADoff._mergeSADoff == 'left_only']),
len(mStockCatchTACSADoff[mStockCatchTACSADoff._mergeSADoff == 'right_only']),
)

2787 834 1439 4373 730


In [None]:
exportMergeAll = mStockCatchTACSADoff[(mStockCatchTACSADoff._mergeSADoff == 'both') & (mStockCatchTACSADoff._mergeTAC == 'both')]	
exportMergeAll.to_csv('..\dataTemp\StockCatchSADoffTAC.csv', index=False)

## Indicators calculation

### TAC/SAD

## Trash

In [None]:
sepAreas = stock["ICES Areas (splited with character '~')"].str.split('~', expand=True)
sepAreas = sepAreas.apply(lambda x: x.str.strip() if x.dtype == "object" else x)

In [None]:
list1 = stock["ICES Areas (splited with character '~')"].to_list()

newList= []
for item in list1:
    if(str(item) != "nan"):
        newList.append(item)

max_len = -1
for ele in newList: 
    if(len(ele) > max_len): 
        max_len = len(ele) 
        res = ele 
        
print("Longest String is : ", res) 

Longest String is :  27.1.a ~ 27.1.b ~ 27.10.a.1 ~ 27.10.a.2 ~ 27.10.b ~ 27.12.a.1 ~ 27.12.a.2 ~ 27.12.a.3 ~ 27.12.a.4 ~ 27.12.b ~ 27.12.c ~ 27.14.a ~ 27.14.b.1 ~ 27.14.b.2 ~ 27.2.a.1 ~ 27.2.a.2 ~ 27.2.b.1 ~ 27.2.b.2 ~ 27.3.a ~ 27.3.b.23 ~ 27.3.c.22 ~ 27.3.d.24 ~ 27.3.d.25 ~ 27.3.d.26 ~ 27.3.d.27 ~ 27.3.d.28.1 ~ 27.3.d.28.2 ~ 27.3.d.29 ~ 27.3.d.30 ~ 27.3.d.31 ~ 27.3.d.32 ~ 27.4.a ~ 27.4.b ~ 27.4.c ~ 27.5.a.1 ~ 27.5.a.2 ~ 27.5.b.1.a ~ 27.5.b.1.b ~ 27.5.b.2 ~ 27.6.a ~ 27.6.b.1 ~ 27.6.b.2 ~ 27.7.a ~ 27.7.b ~ 27.7.c.1 ~ 27.7.c.2 ~ 27.7.d ~ 27.7.e ~ 27.7.f ~ 27.7.g ~ 27.7.h ~ 27.7.j.1 ~ 27.7.j.2 ~ 27.7.k.1 ~ 27.7.k.2 ~ 27.8.a ~ 27.8.b ~ 27.8.c ~ 27.8.d.1 ~ 27.8.d.2 ~ 27.8.e.1 ~ 27.8.e.2 ~ 27.9.a ~ 27.9.b.1 ~ 27.9.b.2


In [None]:
pleStock = stock[stock.FishStock == 'san.sa.1r']
pleStock = pleStock["ICES Areas (splited with character '~')"].str.split('~', expand=True).iloc[0,:].to_list()
pleStock = [s.strip() for s in pleStock]

In [None]:
denmarkPle = catches[(catches['Area'].isin(pleStock))  & (catches['Species'] == 'san')].copy()
denmarkPle['2020'] = denmarkPle['2020'].astype(np.float64)
denmarkPle

KeyError: '2020'