# 04 - Create list of pests for the crops in the selected area

### Description
The goal of this notebook is to:
- create a dataset with the list of possible pests that affect the crops of the selected area
- provide an interface for the selection of the pests (not implemented yet)


### Inputs

- List of crops (EPPO codes) in the area of interest, created by step 02

### External resources

- External API service: EPPO API service, to provide data on which pests affect which crop

### Outputs

- dataset with pests for each crop, including pest taxonomy: eppo_crop_pest_data.csv 
- list with common names of the pests: pest_common.csv

In [1]:
# import modules
import pandas as pd
import requests
%matplotlib inline

## 1. Read list of crops

In step 02 of the workflow, we created a list of the crops occurring in the area of interest, from the crop classification done by CROPLandCOS service. We added the EPPO code for each crop to that list.

In [2]:
# read the list of crops
in_file = '../process_data/crop_data.csv'
df = pd.read_csv(in_file)

# remove nan
df = df[df['EPPO_code'].notna()]
crops = df[['EPPO_code']]

# resolve multivalues
crops = crops['EPPO_code'].str.split(",")
crops = crops.explode('EPPO_code')
crops = crops.unique()
crops
#crops = ['ZEAMX', 'MEDSA']

array(['ZEAMX', '1GOSG', 'ORYSA', 'SORVU', 'HELAN', 'ZEAMS', 'ZEAME',
       '1MENG', 'HORVX', 'TRZDU', 'TRZAX', 'SECCE', 'AVESA', 'CAUTI',
       'BRSJU', 'MEDSA', 'BEAVX', 'SOLTU', 'IPOBA', 'CITLA', 'ALLCE',
       'CUMSA', 'CIEAR', 'PIBSX', 'LYPES', 'PRNSS', 'PRNPS', 'MABSS',
       'VITSS', 'CIDSS', 'CYAIL', 'PRDU', 'IUGSS', 'PYUCO', 'NNNWW',
       'PIAVE', 'TTLRI', 'DAUCA', 'ALLSA', 'CUMMC', 'PRNDO', 'OLVEU',
       'CIDSI', 'CUMI', 'BRSOK', 'PEBAM', 'CPSAN', 'PUNGR', 'PRNPN',
       'BRSOA', 'FRAAN', 'CUUPE', 'VICSA', 'LACSA', 'VACSS'], dtype=object)

## 2. Get data from EPPO API service

We will use the [EPPO Global Database](https://gd.eppo.int/) to identify which pests are known for each crop. To access to the EPPO database API, it is necessary to obtain a token. You can obtain one after registering at https://data.eppo.int/.

In [3]:
# define EPP API service parameters
base_url = 'https://data.eppo.int/api/rest/1.0/taxon/'
service = '/pests'
token = 'b2cdcf68881f7ac1bbfb4d0acf00b945'

Insert your EDDP database token here:

In [4]:
token = input('Please enter the EPPO API token:')
print(token)

Please enter the EPPO API token:b2cdcf68881f7ac1bbfb4d0acf00b945
b2cdcf68881f7ac1bbfb4d0acf00b945


The next step will gather all pests that exist in the EPPO database that are identified to have a crop as 'major host' or 'host'.

In [5]:
%%time
# Identifies pests for whihc the crop is identified as 'major host' or 'host'.

df = pd.DataFrame()

concat_dfs = []

for crop in crops:
    # print(crop)
    url = base_url+crop+service+'?authtoken='+token
    # print(url)
    response = requests.get(url)
    data = response.json()
    # print(data)
    if (data != None):
        if 'Major host' in data:
            df = pd.DataFrame.from_dict(data['Major host'])
        if 'Host' in data:
            df = pd.concat([df, pd.DataFrame.from_dict(data['Host'])], axis = 0, ignore_index = True, sort = True)
        df['crop_eppocode'] = crop
        concat_dfs.append(df)

df_hosts = pd.concat(concat_dfs)
df_hosts.head()

CPU times: user 4.19 s, sys: 43.7 ms, total: 4.23 s
Wall time: 18.3 s


Unnamed: 0,eppocode,fullname,idclass,labelclass,crop_eppocode
0,MIRMV0,Alphanucleorhabdovirus zeairanense,1,Major host,ZEAMX
1,APLOBE,Aphelenchoides besseyi,1,Major host,ZEAMX
2,COCHCA,Bipolaris zeicola,1,Major host,ZEAMX
3,BUSSFU,Busseola fusca,1,Major host,ZEAMX
4,PHYPSO,'Candidatus Phytoplasma solani',1,Major host,ZEAMX


In [6]:
# creates and saves a dataframe with crop pests by removing duplicates
pest_codes = df_hosts[['eppocode', 'crop_eppocode']].drop_duplicates()
pest_codes.to_csv('../process_data/pest_codes.csv')

In [7]:
# create a unique list pest eppocodes to get its taxonomy
codes = pest_codes['eppocode'].unique()
len(codes)

920

A total of 920 pests where identified for the crops in the area of interest.

In [8]:
%%time
# Get taxonomy for each pest. This can take 4 minutes to run for a list of 2000 pest codes
group_n = 1
service = '/taxonomy'
tax = pd.DataFrame()


concatenated_dfs = []

for code in codes:
    url = base_url+code+service+'?authtoken='+token
    # print(url)
    response = requests.get(url)
    data = response.json()
    if data != None:
        tax = pd.DataFrame.from_dict(data)
        tax['kingdom'] = tax['prefname'][0]
        tax['group'] = group_n
        tax['pest_eppocode'] = code
        concatenated_dfs.append(tax)
        group_n += 1

df_tax = pd.concat(concatenated_dfs)

# retain only pests caused by animals
df_tax = df_tax[df_tax['kingdom'] == 'Animalia']

CPU times: user 1min 8s, sys: 967 ms, total: 1min 9s
Wall time: 5min


In [9]:
# if you have previously saved the output of the previous step, you can uncomment bellow to read a file
# and skip previous steps:

# pest_codes = pd.read_csv('../process_data/clip/crop_data.csv')
# df_tax = pd.read_csv('../process_data/pest_taxonomy.csv')

In [10]:
# save pest taxonomy
df_tax.to_csv('../process_data/pest_taxonomy.csv')


## 3. Create table with main taxonomic levels for each pest

This step will prepare a table with higher taxonomic levels for each pest. This is necessary to determine if birds potentially use these insects as food. Many times, insects as preys of birds are not reported at species level, but at a higher taxonomic rank.

In [11]:
# create a table with kingdom, order (level 5 for Insecta) and 

# group the pests
pest_tax_group = df_tax.groupby(['group'])

# determine the number of levels for each pest
df4 = pest_tax_group.size()
df5 = df4.to_frame()
df5.rename(columns={0:'group_size'}, inplace=True)

# create a column with the number of the group
df5.reset_index(inplace=True)

# merge the group with the taxonomy
df_tax1 = pd.merge(df_tax, df5, left_on=['group'], right_on=['group'])

# make a pivot for taxonomy, when levels are equal to 7
df_tax2 = df_tax1[df_tax1['group_size'] == 7]
df_tax3 = df_tax2.pivot(index='group', columns='level', values=['eppocode', 'prefname'])

# rename columns
df_tax3.rename(columns={1:'kingdom', 2:'phylum', 3:'class', 4:'order', 5:'family', 6:'genus', 7:'species'}, 
               inplace=True)
df_tax3.columns = [' '.join(col).strip() for col in df_tax3.columns.values]

# add columns for missing taxon ranks
df_tax3.insert(loc = 2, column = 'eppocode subphylum', value = '')
df_tax3.insert(loc = 10, column = 'prefname subphylum', value = '')
df_tax3.insert(loc = 5, column = 'eppocode suborder', value = '')
df_tax3.insert(loc = 14, column = 'prefname suborder', value = '')

# make a pivot for taxonomy, when levels are equal to 8
df_tax2 = df_tax1[df_tax1['group_size'] == 8]
df_tax4 = df_tax2.pivot(index='group', columns='level', values=['eppocode', 'prefname'])
df_tax4.rename(columns={1:'kingdom', 2:'phylum', 3:'subphylum', 4:'class', 5:'order', 6:'family', 
                        7:'genus', 8:'species'}, 
               inplace=True)
df_tax4.columns = [' '.join(col).strip() for col in df_tax4.columns.values]
df_tax4.insert(loc = 5, column = 'eppocode suborder', value = '')
df_tax4.insert(loc = 14, column = 'prefname suborder', value = '')

# make a pivot for taxonomy, when levels are equal to 9
df_tax2 = df_tax1[df_tax1['group_size'] == 9]
df_tax5 = df_tax2.pivot(index='group', columns='level', values=['eppocode', 'prefname'])
df_tax5.rename(columns={1:'kingdom', 2:'phylum', 3:'subphylum', 4:'class', 5:'order', 6:'suborder', 
                        7:'family', 8:'genus', 9:'species'}, 
               inplace=True)
df_tax5.columns = [' '.join(col).strip() for col in df_tax5.columns.values]
df_tax5.columns

frames = [df_tax3, df_tax4, df_tax5]
tax_table = pd.concat(frames)

# save to csv
tax_table.to_csv('../process_data/tax_table.csv')

tax_table


Unnamed: 0_level_0,eppocode kingdom,eppocode phylum,eppocode subphylum,eppocode class,eppocode order,eppocode suborder,eppocode family,eppocode genus,eppocode species,prefname kingdom,prefname phylum,prefname subphylum,prefname class,prefname order,prefname suborder,prefname family,prefname genus,prefname species
group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
2,1ANIMK,1NEMAP,,1CHROC,1RHABO,,1APLOF,1APLOG,APLOBE,Animalia,Nematoda,,Chromadorea,Rhabditida,,Aphelenchoididae,Aphelenchoides,Aphelenchoides besseyi
22,1ANIMK,1NEMAP,,1CHROC,1RHABO,,1HETEF,1HETDG,HETDZE,Animalia,Nematoda,,Chromadorea,Rhabditida,,Heteroderidae,Heterodera,Heterodera zeae
40,1ANIMK,1NEMAP,,1CHROC,1RHABO,,1HETEF,1PUNCG,PUNCCH,Animalia,Nematoda,,Chromadorea,Rhabditida,,Heteroderidae,Punctodera,Punctodera chalcoensis
77,1ANIMK,1NEMAP,,1CHROC,1RHABO,,1ANGUF,1DITYG,DITYDE,Animalia,Nematoda,,Chromadorea,Rhabditida,,Anguinidae,Ditylenchus,Ditylenchus destructor
78,1ANIMK,1NEMAP,,1CHROC,1RHABO,,1ANGUF,1DITYG,DITYDI,Animalia,Nematoda,,Chromadorea,Rhabditida,,Anguinidae,Ditylenchus,Ditylenchus dipsaci
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
893,1ANIMK,1ARTHP,1HEXAQ,1INSEC,1HEMIO,1STERR,1DIASF,1AONDG,AONDOR,Animalia,Arthropoda,Hexapoda,Insecta,Hemiptera,Sternorrhyncha,Diaspididae,Aonidiella,Aonidiella orientalis
895,1ANIMK,1ARTHP,1HEXAQ,1INSEC,1HEMIO,1STERR,1PSEUF,1RIPLG,RHIOHI,Animalia,Arthropoda,Hexapoda,Insecta,Hemiptera,Sternorrhyncha,Pseudococcidae,Ripersiella,Ripersiella hibisci
896,1ANIMK,1ARTHP,1HEXAQ,1INSEC,1COLEO,1CURCF,1CURCS,1ANTHG,ANTHBI,Animalia,Arthropoda,Hexapoda,Insecta,Coleoptera,Curculionidae,Curculioninae,Anthonomus,Anthonomus bisignifer
897,1ANIMK,1ARTHP,1HEXAQ,1INSEC,1COLEO,1CURCF,1CURCS,1ANTHG,ANTHSI,Animalia,Arthropoda,Hexapoda,Insecta,Coleoptera,Curculionidae,Curculioninae,Anthonomus,Anthonomus signatus


In [12]:
# merge taxonomy with crop pests and remove duplicates
result = pd.merge(tax_table, pest_codes, left_on=['eppocode species'], right_on=['eppocode'], how='left')
result = result.drop_duplicates()

In [13]:
# save to csv
result.to_csv('../process_data/eppo_crop_pest_data.csv')

## 4. Create a list of pests to present to the farmer

Get pest common names from EPPO to be present to the farmer, so that he can confirm that the pest in present in the region.

In [14]:
pests = result['eppocode species'].unique()

In [15]:
%%time
# Get common name for each pest. This can take 4 minutes to run for a list of 2000 pest codes

service = '/names'
df = pd.DataFrame()


concatenated_dfs = []

for pest in pests:
    url = base_url+pest+service+'?authtoken='+token
    # print(url)
    response = requests.get(url)
    data = response.json()
    if data != None:
        species = data[0]['fullname']
        print(f'Species:. {species}')
        if len(data) > 1:
            common = data[1]['fullname']
        else:
            common = ""
        print(f'Common: {common}')
        values = {'pest_eppocode': [pest], 'scientific_name': [species], 
                   'common_name': [common]}
        df = pd.DataFrame.from_dict(values)
        concatenated_dfs.append(df)

df_common = pd.concat(concatenated_dfs)
df_common.head()

Species:. Aphelenchoides besseyi
Common: white-tip nematode
Species:. Heterodera zeae
Common: corn cyst nematode
Species:. Punctodera chalcoensis
Common: Mexican corn cyst nematode
Species:. Ditylenchus destructor
Common: kartoffelradnematod
Species:. Ditylenchus dipsaci
Common: stængelnematod
Species:. Heterodera elachista
Common: Japanese cyst nematode
Species:. Meloidogyne chitwoodi
Common: Columbia root-knot nematode
Species:. Meloidogyne ethiopica
Common: root-knot nematode
Species:. Meloidogyne graminicola
Common: Rice root‐knot nematode
Species:. Meloidogyne luci
Common: nemátode das galhas do arroz
Species:. Trichodorus viruliferus
Common: stubby root nematode
Species:. Tylenchorhynchus claytoni
Common: stunt nematode
Species:. Rotylenchulus reniformis
Common: reniform nematode
Species:. Heterodera oryzae
Common: Reiszystenälchen
Species:. Hirschmanniella imamuri
Common: 
Species:. Hirschmanniella oryzae
Common: Reiswurzelälchen
Species:. Hirschmanniella spinicaudata
Common: ro

Species:. Sesamia inferens
Common: violetter Stengelbohrer
Species:. Thrips setosus
Common: tobacco thrips
Species:. Trichispa sericea
Common: afrikanisches Reisigelkäferchen
Species:. Stenodiplosis sorghicola
Common: amerikanische Hirsegallmücke
Species:. Chloridea virescens
Common: amerikanische Tabakknospeneule
Species:. Liriomyza sativae
Common: cabbage leaf miner
Species:. Nemorimyza maculosa
Common: amerikanischer Blattminierer
Species:. Strauzia longipennis
Common: peacock fly
Species:. Brevipalpus yothersi
Common: Brevipalpus amicus
Species:. Choristoneura rosaceana
Common: oblique-banded leaf roller
Species:. Epitrix tuberis
Common: tuber flea beetle
Species:. Frankliniella occidentalis
Common: Blütenthrips
Species:. Liriomyza trifolii
Common: Floridaminierfliege
Species:. Delia coarctata
Common: brakflue
Species:. Contarinia tritici
Common: alm hvedemyg
Species:. Mayetiola destructor
Common: hessisk flue
Species:. Sitodiplosis mosellana
Common: hvedegalmyg
Species:. Epilachna

Species:. Chrysobothris mali
Common: Pacific flat-headed borer
Species:. Cydia pomonella
Common: æblevikler
Species:. Gymnandrosoma aurantianum
Common: citrus fruit borer
Species:. Oemona hirta
Common: lemon tree borer
Species:. Operophtera brumata
Common: lille frostmåler
Species:. Rhagoletis pomonella
Common: æbleflue
Species:. Rhagoletis suavis
Common: walnut husk maggot
Species:. Saperda candida
Common: Rundköpfiger Apfelbaumbohrer
Species:. Tetranychus fijiensis
Common: 
Species:. Trichoferus campestris
Common: mulberry longhorn beetle
Species:. Acleris senescens
Common: Teras senescens
Species:. Anoplophora glabripennis
Common: asiatischer Laubholzkäfer
Species:. Apriona germari
Common: longhorn stem borer
Species:. Apriona rugicollis
Common: mulberry borer
Species:. Grapholita inopinata
Common: Manchurian fruit moth
Species:. Brevipalpus azores
Common: 
Species:. Anastrepha obliqua
Common: westindische Fruchtfliege
Species:. Anastrepha serpentina
Common: sapodilla fruit fly
Spec

Species:. Aleurodicus dispersus
Common: spiralling whitefly
Species:. Helopeltis schoutedeni
Common: cotton helopeltis
Species:. Planococcus kenyae
Common: afrikanische Kaffeeschmierlaus
Species:. Rhyssomatus landeiroi
Common: sweet potato weevil
Species:. Rhyssomatus sculpturatus
Common: sweet potato weevil
Species:. Aleurodicus dugesii
Common: giant whitefly
Species:. Aonidiella citrina
Common: yellow scale
Species:. Baris granulipennis
Common: melon weevil
Species:. Comstockaspis perniciosa
Common: San José skjoldlus
Species:. Bactericera tremblayi
Common: Zwiebelblattsauger
Species:. Neotoxoptera formosana
Common: onion aphid
Species:. Phlyctinus callosus
Common: vine calandra
Species:. Acanalonia conica
Common: green cone-headed planthopper
Species:. Margarodes vitis
Common: ground pearls
Species:. Myzus mumecola
Common: Macrosiphum mumecola
Species:. Parabemisia myricae
Common: myrica whitefly
Species:. Pseudococcus calceolariae
Common: citrophilus mealybug
Species:. Pseudococcus

Unnamed: 0,pest_eppocode,scientific_name,common_name
0,APLOBE,Aphelenchoides besseyi,white-tip nematode
0,HETDZE,Heterodera zeae,corn cyst nematode
0,PUNCCH,Punctodera chalcoensis,Mexican corn cyst nematode
0,DITYDE,Ditylenchus destructor,kartoffelradnematod
0,DITYDI,Ditylenchus dipsaci,stængelnematod


In [16]:
# save to csv
df_common.to_csv('../process_data/pest_common.csv')