verify generated/imported SOLR data against input CSV (missing records) #9

mbohun · 2018-06-28T10:52:46Z

@ess-acppo-djd identified 5 missing records between the input tblBiota_20180620.csv and the generated SOLR index.

The text was updated successfully, but these errors were encountered:

mbohun · 2018-06-28T11:00:17Z

#!/bin/bash                                                                                                                                                                  
                                                                                                                                                                             
# extract the first column values from the CSV file, and remove the enclosing double-quotes                                                                                  
for intBiotaID in `cat tblBiota_20180620.csv | cut -d ',' -f1 | sed -e 's/"//g'`                                                                                             
do                                                                                                                                                                           
    # NOTE: you need curl -L (in order to follow HTTP 301 redirects to the linked record-s)                                                                                  
    #       (for example intBiotaID=106779 redirect to other record)                                                                                                         
    json=`curl -s -L --header 'Accept: application/json' "https://ag-bie.oztaxa.com/ws/species/${intBiotaID}"`                                                               
    if [ "`echo ${json} | jq '. | has("error")'`" == "true" ]; then                                                                                                          
        echo "TEST: ${intBiotaID} error => `echo ${json} | jq '.error'`"                                                                                                     
    fi                                                                                                                                                                       
done

ubuntu@ip-172-31-2-29:/tmp$ ./check_tblBiota.sh
TEST: intBiotaID error => "Not Found"
TEST: 102340 error => "Not Found"
TEST: 103926 error => "Not Found"
TEST: 71079 error => "Not Found"
TEST: 112099 error => "Not Found"
TEST: 30 error => "Not Found"

details of the above 5 records are as follows:

"intBiotaID","intParentID","vchrEpithet","vchrFullName","vchrYearOfPub","vchrAuthor","vchrNameQualifier","chrElemType","vchrRank","chrKingdomCode","intOrder","vchrParentage","bitChangedComb","bitShadowed","bitUnplaced","bitUnverified","bitAvailableName","bitLiteratureName","dtDateCreated","vchrWhoCreated","dtDateLastUpdated","vchrWhoLastUpdated","txtDistQual","GUID"
"102340","20","Phytobiota","","","","","KING ","","P ","0","\20\102340","False","False","False","False","True","False","2003-07-28 11:33:17.857000000","Clayton Winter","2003-07-28 11:33:24.997000000","Clayton Winter","","{9B626B79-DE67-4B58-849C-2B5429F9A83B}"
"103926","64792","Xyleutes eucalypti: Walker [misspelling!]","Xyleutes eucalypti: Walker [misspelling!]","","","","SP   ","","A ","0","\1\106786\6\100975\12\52112\101129\101130\101134\58791\74799\64792\103926","False","False","False","False","False","True","2004-09-27 12:48:37.270000000","graham brown","2004-09-27 12:48:40.630000000","graham brown","","{4F19BBB1-4097-4804-9B48-2F6E1394B4AF}"
"71079","66889","hirtus","Croton hirtus L’herit","","L’herit","","SP   ","","P ","0","\20\102341\102343\101427\21\22\102360\99968\66575\66889\71079","False","False","False","False","False","False","2003-03-25 12:54:09.450000000","Migration","2004-04-07 21:19:27.373000000","sa","","{51ABE293-3031-4310-894B-2353BF4C32E8}"
"112099","101848","Ornithogalum Mosaic Virus","Potyvirus (definitive_species) Ornithogalum Mosaic Virus Smith and Brierley, 1944a","1944a","Smith and Brierley","","SP   ","","V ","0","\101171\101661\104483\61073\61217\101848\112099","False","False","False","False","False","False","2016-09-05 10:37:37.967000000","NAQSTaxaTree","2016-09-05 13:47:30.587000000","AGDAFF\Teakle Graham","","{C0B11D33-42CD-4A55-A410-863A2A0CFD87}"
"30","106089","<No_Species_Entered>","<No_Species_Entered>","","","","     ","","A ","0","\24\106089\30","False","False","False","False","False","False","2003-03-25 12:54:09.450000000","Data Conversion","2007-06-12 12:29:30.250000000","Graham Brown","","{7EB978EA-7584-4285-9DA2-D66FAE5F1B3D}"

charvolant · 2018-06-29T02:56:04Z

Some of these are being rejected early by the talend processing. They can be found in /data/work/taxxas/Processed/rejected.csv (theres also a vernacular_rejected.csv). The sanity checking rules may be over strict.

ess-acppo-djd · 2018-06-29T03:50:01Z

I've already located these and am preparing to have the source data corrected. They're appear to be rejected for using unexpected characters in one of FullName, Epithet, Author or YearOfPub.
There is one other record being dropped somewhere (Phytobiota, a synonym for Plantae) and I've yet to hunt it down.

ess-acppo-djd · 2018-06-29T06:05:58Z

It gets stripped out into 'invalid_synonyms.csv' by the process that creates the directory /data/work/taxxas/DwC

moziauddin · 2020-09-14T23:40:54Z

Test script is already added. The test script can check what names are missing uaing ID or name.

mbohun added the enhancement label Jun 28, 2018

mbohun self-assigned this Jun 28, 2018

mbohun added a commit that referenced this issue Aug 15, 2018

FIX: github issues: #16, #9; bump version of taxxas-dwca from 0.1 to 0.3

a1ff74c

mbohun mentioned this issue Aug 16, 2018

remove any dependencies on -SNAPSHOT versions #8

Closed

mbohun mentioned this issue Sep 4, 2018

REST API add regression/verification script-s/test-s #24

Closed

2 tasks

moziauddin closed this as completed Sep 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

verify generated/imported SOLR data against input CSV (missing records) #9

verify generated/imported SOLR data against input CSV (missing records) #9

mbohun commented Jun 28, 2018 •

edited

Loading

mbohun commented Jun 28, 2018 •

edited

Loading

charvolant commented Jun 29, 2018

ess-acppo-djd commented Jun 29, 2018 •

edited

Loading

ess-acppo-djd commented Jun 29, 2018

moziauddin commented Sep 14, 2020

verify generated/imported SOLR data against input CSV (missing records) #9

verify generated/imported SOLR data against input CSV (missing records) #9

Comments

mbohun commented Jun 28, 2018 • edited Loading

mbohun commented Jun 28, 2018 • edited Loading

charvolant commented Jun 29, 2018

ess-acppo-djd commented Jun 29, 2018 • edited Loading

ess-acppo-djd commented Jun 29, 2018

moziauddin commented Sep 14, 2020

mbohun commented Jun 28, 2018 •

edited

Loading

mbohun commented Jun 28, 2018 •

edited

Loading

ess-acppo-djd commented Jun 29, 2018 •

edited

Loading