Skip to content

Export to Web Portal #7606

@grantfitzsimmons

Description

@grantfitzsimmons

Important

This assumes the Specify Web Portal remains largely unchanged from its 2.0 variant regarding how data is structured for import. If that application is updated to accept simplistic data exports, the explanation of the web portal export structure may be out of date.

Non-Functional Requirements

  • The system must use the existing UI mechanism for exporting data (as implemented in [DwC export]: Specify Data Publishing Toolkit #285)
  • The system should allow the reuse of mappings established for other data export pipelines (e.g. publishing to GBIF)
  • The system should enable users to assign custom terms to the export headings (e.g. not only Darwin Core terms), enabling robust support for any field captions for portal column headings and for searching.

Functional Requirements

  • The system must produce a ZIP archive file containing the files necessary for populating the Specify Web Portal (see Web portal Export Structure).
  • The export must be easily ingested using the import process described in the official documentation.
  • Creating an export should send the user a notification with a link to download the packaged export.
  • This should be something that can be automated using a cronjob, akin to automating the export feed.

Fundamentally, the interface must support selecting a 'Core' Schema Mapping and should automatically create a package that can be imported into the Web Portal. This can be done via a button in the schema mapping interface itself or via a menu item in User Tools.

Web Portal Export Structure

Specify 7 must be able to produce an export that follows the structure described below for it to be compatible with the Specify Web Portal 2.0 application:

Note

The example data below was sourced primarily from the public Shell Museum Specify Web Portal: https://webportal.specifycloud.org/shellmuseum/

Image

When an export is made for the Web Portal, Specify creates a ZIP archive. This file contains a compressed PortalFiles directory containing the following:

├── flds.json
├── PortalData.csv
├── PortalInstanceSetting.json
└── SolrFldSchema.xml

flds.json

[
{"colname":"spid", "solrname":"spid", "solrtype":"int"},
{"colname":"accessRights", "solrname":"termsOfUse", "solrtype":"string", "title":"Terms Of Use", "type":"text", "width":8192, "concept":"accessRights", "concepturl":"http://rs.tdwg.org/dwc/terms/", "sptable":"institution", "sptabletitle":"Institution", "spfld":"termsOfUse", "spfldtitle":"Terms Of Use", "spdescription":"Defines conditions under which the data may be analyzed, distributed or changed. "Terms of use" includes concepts like "Usage conditions" and "Specific Restrictions".", "colidx":0, "linkify":"true", "advancedsearch":"true", "displaycolidx":0},
{"colname":"basisOfRecord", "solrname":"collectionType", "solrtype":"string", "title":"Collection Type", "type":"java.lang.String", "width":32, "concept":"basisOfRecord", "concepturl":"http://rs.tdwg.org/dwc/terms/", "sptable":"collection", "sptabletitle":"Collection", "spfld":"collectionType", "spfldtitle":"Collection Type", "spdescription":"Textual description of collection. ABCD schema field.", "colidx":1, "linkify":"true", "advancedsearch":"true", "displaycolidx":1},
{"colname":"datasetName", "solrname":"description", "solrtype":"string", "title":"Description", "type":"text", "width":2048, "concept":"datasetName", "concepturl":"http://rs.tdwg.org/dwc/terms/", "sptable":"collection", "sptabletitle":"Collection", "spfld":"description", "spfldtitle":"Description", "spdescription":"Textual description of collection.", "colidx":2, "linkify":"true", "advancedsearch":"true", "displaycolidx":2},
{"colname":"collectionCode", "solrname":"code", "solrtype":"string", "title":"Code", "type":"java.lang.String", "width":50, "concept":"collectionCode", "concepturl":"http://rs.tdwg.org/dwc/terms/", "sptable":"collection", "sptabletitle":"Collection", "spfld":"code", "spfldtitle":"Code", "spdescription":"Unique code for collection.", "colidx":3, "linkify":"true", "advancedsearch":"true", "displaycolidx":3}...

Each column in the export mapping is converted into the following:

	{
		"colname": "typeStatus",
		"solrname": "typeStatusName",
		"solrtype": "string",
		"title": "Lot Status",
		"type": "java.lang.String",
		"width": 50,
		"concept": "typeStatus",
		"concepturl": "http://rs.tdwg.org/dwc/terms/",
		"sptable": "determination",
		"sptabletitle": "Determination",
		"spfld": "typeStatusName",
		"spfldtitle": "Lot Status",
		"spdescription": "A pick list of all available type designations; Holotype, Paratype, Neotype. Specify ships with predetermined values which are editable for users.",
		"colidx": 37,
		"linkify": "true",
		"advancedsearch": "true",
		"displaycolidx": 37
	},
Attribute Description Example
colname The identifier for the column in the export, matching the concept. accessRights
solrname The specific field name mapped in the Solr index. termsOfUse
solrtype The data type used within the Solr index (e.g., string, int, long). string
title The term assigned to the field in the mapping, used as the label for this column in the Web Portal. Terms Of Use
type The underlying Java or database type class for the field. text or java.lang.String
width The maximum character length or display width of the field in the database. 8192
concept The semantic concept associated with the data, identical to the colname. accessRights
concepturl The namespace URI for the associated concept (e.g., the Darwin Core Terms URL). http://rs.tdwg.org/dwc/terms/
sptable The database table name in the Specify schema where the data originates. institution
sptabletitle The schema caption for the Specify table. Institution
spfld The data model field name in the Specify schema. termsOfUse
spfldtitle The schema caption for the field for that discipline. Terms Of Use
spdescription The description of the field in the Specify schema. Defines conditions under which...
colidx The integer index representing the column's position in the PortalData.csv file. 0
linkify A boolean string ("true"/"false") indicating if the Portal should render this field as a clickable link (e.g., to a detail view). "true"
advancedsearch A boolean string ("true"/"false") indicating if this field should be included in the Portal's advanced search options. "true"
displaycolidx The integer index determining the default order in which columns are displayed in the Portal interface. 0

Full file: flds.json

These are often overridden by custom fldmodel.json files to better describe field values or change the order of items. As long as they are linked to a column via displaycolidx, these can be much simpler and do not require all attributes, for example:

[
    {
        "colname": "BMSM No", 
        "title": "BMSM No", 
        "concept": "BMSM No", 
        "advancedsearch": "true", 
        "displaycolidx": 0, 
        "hiddenbydefault": false
    }, 
    {
        "colname": "Fossil?", 
        "title": "Fossil?", 
        "concept": "Fossil?", 
        "advancedsearch": "true", 
        "displaycolidx": 1, 
        "hiddenbydefault": false
    }, 

An evaluation during implementation must be done as to the importance of each attribute for searchability.

PortalData.csv

The first row of the PortalData.csv file contains all of the column names (colname) behind-the-scenes, which are replaced with the titles defined in the flds.json file when displayed in the Web Portal itself.

This contains the export from the Schema Mapping using the underlying query builder as usual. This must be tab-delimited.

spid,contents,img,geoc,termsOfUse,collectionType,description,code,catalogNumber,Kingdom,Phylum,Class,Order,Family,Genus,Subgenus,Species,Subspecies,author,source,fullName,Continent,SeaBasin,Country,State,Region,geo_fullName,localityName,latitude1,longitude1,preparations,startDate,remarks,geoRefDetDate,inst_code,altName,copyright,timestampModified,co_remarks,altCatalogNumber,collectors,typeStatusName,determinedDate,determiner,verbatimLocality,stationFieldNumberModifier2,stationFieldNumberModifier1,verbatimDate
00005afa-a0bd-46aa-b101-fc7ea05b58e8,"PreservedSpecimen	SM	112770	Animalia	Mollusca	Gastropoda	Neogastropoda	Fasciolariidae	Cinctura	apicina	(Dall, 1890)	Cinctura apicina	North America	USA	FLORIDA	Sarasota Co.	North America,USA,FLORIDA,Sarasota Co.	APAC-Black Layer	Dry - 2	1989-10-14	BMNSM	 	14 Oct 1989	",,,,PreservedSpecimen,,SM,112770,Animalia,Mollusca,Gastropoda,Neogastropoda,Fasciolariidae,Cinctura,,apicina,,"(Dall, 1890)",,Cinctura apicina,North America,,USA,FLORIDA,Sarasota Co.,"North America,USA,FLORIDA,Sarasota Co.",APAC-Black Layer,,,Dry - 2,1989-10-14,,,BMNSM,,,,,,,,,,,,,14 Oct 1989
00007505-3d07-4d6d-abaf-7e4841e898be,"PreservedSpecimen	SM	40045	Animalia	Mollusca	Gastropoda	Neogastropoda	Fasciolariidae	Triplofusus	giganteus	(Kiener, 1840)	Triplofusus giganteus	NW Atlantic O.	Gulf of Mexico	USA	FLORIDA	Lee Co.	NW Atlantic O.,Gulf of Mexico,USA,FLORIDA,Lee Co.	Roosevelt Channel, Sanibel Island	26.4933150	-82.1826900	Dry - 1	1970-12-31	2021-12-02	BMNSM	2023-12-18 13:17:26.0	Janet Paddison	2006-02-28	B. Hansen	Jan 1971	",,26.4933150 -82.1826900 SM,,PreservedSpecimen,,SM,40045,Animalia,Mollusca,Gastropoda,Neogastropoda,Fasciolariidae,Triplofusus,,giganteus,,"(Kiener, 1840)",,Triplofusus giganteus,NW Atlantic O.,Gulf of Mexico,USA,FLORIDA,Lee Co.,"NW Atlantic O.,Gulf of Mexico,USA,FLORIDA,Lee Co.","Roosevelt Channel, Sanibel Island",26.493315,-82.18269,Dry - 1,1970-12-31,,2021-12-02,BMNSM,,,2023-12-18 13:17:26.0,,,Janet Paddison,,2006-02-28,B. Hansen,,,,Jan 1971
00033b2b-ba07-4aca-a8fd-b2f3719d443b,"PreservedSpecimen	SM	125465	Animalia	Mollusca	Bivalvia	Pteriidae	Pteria	colymbus	(Röding, 1798)	Pteria colymbus	NW Atlantic O.	Caribbean Sea	USA	PUERTO RICO	NW Atlantic O.,Caribbean Sea,USA,PUERTO RICO	Punta Jorobad	Dry - 1	1956-06-30	BMNSM	 	1956-07	",,,,PreservedSpecimen,,SM,125465,Animalia,Mollusca,Bivalvia,,Pteriidae,Pteria,,colymbus,,"(Röding, 1798)",,Pteria colymbus,NW Atlantic O.,Caribbean Sea,USA,PUERTO RICO,,"NW Atlantic O.,Caribbean Sea,USA,PUERTO RICO",Punta Jorobad,,,Dry - 1,1956-06-30,,,BMNSM,,,,,,,,,,,,,1956-07
00061909-ce0f-465e-8fe0-116002fe2955,"PreservedSpecimen	SM	20475	Animalia	Mollusca	Gastropoda	Neritidae	Nerita	tessellata	Gmelin, 1791	Nerita tessellata	NW Atlantic O.	USA	FLORIDA	Miami-Dade Co.	NW Atlantic O.,USA,FLORIDA,Miami-Dade Co.	Key Biscayne	25.6937130	-80.1628250	Dry - 1	BMNSM	2022-02-18 15:52:14.0	ESB Cleaned. Locality Cleaned.	 	Date unk'n	",,25.6937130 -80.1628250 SM,,PreservedSpecimen,,SM,20475,Animalia,Mollusca,Gastropoda,,Neritidae,Nerita,,tessellata,,"Gmelin, 1791",,Nerita tessellata,NW Atlantic O.,,USA,FLORIDA,Miami-Dade Co.,"NW Atlantic O.,USA,FLORIDA,Miami-Dade Co.",Key Biscayne,25.693713,-80.162825,Dry - 1,,,,BMNSM,,,2022-02-18 15:52:14.0,ESB Cleaned. Locality Cleaned.,,,,,,,,,Date unk'n
00064e21-a06f-451f-aa1b-e4a600b889e9,"PreservedSpecimen	SM	24931	Animalia	Mollusca	Gastropoda	Cassidae	Cassis	flammea	(Linnaeus, 1758)	Cassis flammea	NW Atlantic O.	BAHAMAS	Bimini	NW Atlantic O.,BAHAMAS,Bimini	[Site unknown]	Dry - 3	1970-11-30	BMNSM	Laverne Weddle	2011-01-31	Harold Payson, III	beach	Dec 1970	",,,,PreservedSpecimen,,SM,24931,Animalia,Mollusca,Gastropoda,,Cassidae,Cassis,,flammea,,"(Linnaeus, 1758)",,Cassis flammea,NW Atlantic O.,,BAHAMAS,Bimini,,"NW Atlantic O.,BAHAMAS,Bimini",[Site unknown],,,Dry - 3,1970-11-30,,,BMNSM,,,,,,Laverne Weddle,,2011-01-31,"Harold Payson, III",beach,,,Dec 1970
0006a1f3-7d8d-4554-8521-dadd5345a760,"PreservedSpecimen	SM	100542	Animalia	Mollusca	Gastropoda	Orthalicidae	Liguus	flammellus	cervus	Clench, 1934	Liguus flammellus cervus	North America	CUBA	Pinar Del Rio	Viñales Valley	North America,CUBA,Pinar Del Rio,Viñales Valley	[Site unknown]	Dry - 1	BMNSM	 	Trees	Date unk'n	",,,,PreservedSpecimen,,SM,100542,Animalia,Mollusca,Gastropoda,,Orthalicidae,Liguus,,flammellus,cervus,"Clench, 1934",,Liguus flammellus cervus,North America,,CUBA,Pinar Del Rio,Viñales Valley,"North America,CUBA,Pinar Del Rio,Viñales Valley",[Site unknown],,,Dry - 1,,,,BMNSM,,,,,,,,,,Trees,,,Date unk'n

PortalInstanceSetting.json

This is a set of custom settings that tell the Web Portal where to look for attachment links among other things. All supported settings are listed here.

This usually needs to be customized by the user after the initial export. A default export from Specify 6 looks like this:

{
  "portalInstance": "5883ccd0-4abc-4a7a-84c0-4e14f55ddefe",
  "collectionName": "shellmuseum",
  "imageBaseUrl": "http://assets1.specifycloud.org",
  "imageInfoFlds": " catalogNumber"
}

After customization, it usually looks something like this, so defaults more inline with these customizations would be desirable:

{
    "solrPageSize": 100, 
    "imagePreviewSize": 200, 
    "imageViewSize": 600, 
    "imageInfoFlds": "cn fn", 
    "imageBaseUrl": "http://assets1.specifycloud.org",
    "collectionName":"shellmuseum",
    "backgroundURL": "/custom-images/shellmuseum/webportal_image.png", 
    "bannerURL": "/custom-images/shellmuseum/WebPortal_left.jpg", 
    "bannerTitle": "Bailey-Matthews National Shell Museum", 
    "bannerHeight": 128, 
    "bannerWidth": 200
}

When generating this in the future, automatically:

  • bannerTitle should be the name of the current collection
  • imageBaseUrl should be the URL of the configured asset server for the instance
  • collectionName should be the name of the collection directory on the asset server used for retrieving/depositing assets

Images reference local files on the server hosting the asset server, so some customization must be done by the IT department after the portal is configured.

SolrFldSchema.xml

The final component in the PortalFiles directory is an XML file that defines the Solr field definitions:

<!-- solr field definitions for iDigBio web portal -->
<!-- Paste the contents of this file into the solr/conf/schema.xml file. -->
<field name="Class" type="string" indexed="true" stored="true" required="false"/>
<field name="Continent" type="string" indexed="true" stored="true" required="false"/>
<field name="Country" type="string" indexed="true" stored="true" required="false"/>
<field name="Family" type="string" indexed="true" stored="true" required="false"/>
<field name="Genus" type="string" indexed="true" stored="true" required="false"/>
<field name="Kingdom" type="string" indexed="true" stored="true" required="false"/>
<field name="Order" type="string" indexed="true" stored="true" required="false"/>
<field name="Phylum" type="string" indexed="true" stored="true" required="false"/>
<field name="Region" type="string" indexed="true" stored="true" required="false"/>
<field name="SeaBasin" type="string" indexed="true" stored="true" required="false"/>
<field name="Species" type="string" indexed="true" stored="true" required="false"/>
<field name="collectors" type="string" indexed="true" stored="true" required="false"/>
<field name="contents" type="text_general" indexed="true" stored="false" required="true"/>
<field name="copyright" type="string" indexed="true" stored="true" required="false"/>
<field name="description" type="string" indexed="true" stored="true" required="false"/>
<field name="determinedDate" type="string" indexed="true" stored="true" required="false"/>
<field name="determiner" type="string" indexed="true" stored="true" required="false"/>
<field name="fullName" type="string" indexed="true" stored="true" required="false"/>
<field name="geoRefDetDate" type="string" indexed="true" stored="true" required="false"/>
<field name="geo_fullName" type="string" indexed="true" stored="true" required="false"/>
<field name="geoc" type="string" indexed="true" stored="true" required="false"/>
<field name="img" type="string" indexed="true" stored="true" required="false"/>
<field name="inst_code" type="string" indexed="true" stored="true" required="false"/>
<field name="latitude1" type="pdouble" indexed="true" stored="true" required="false"/>
<field name="localityName" type="string" indexed="true" stored="true" required="false"/>
<field name="longitude1" type="pdouble" indexed="true" stored="true" required="false"/>
<field name="preparations" type="string" indexed="true" stored="true" required="false"/>
<field name="remarks" type="string" indexed="true" stored="true" required="false"/>

This is dynamically generated. The name values match the colname in the flds.json file. The type values are determined based on the type of field in the interface schema. My understanding is that stored and indexed are always true and required is always false.

The logic for assigning the type is here in the Specify 6 repo:

https://github.com/specify/specify6/blob/3edca4fdb26a630e1be19724cf9d78b5d95ef371/src/edu/ku/brc/specify/tools/webportal/BuildSearchIndex2.java#L465-L515

Source Field Type Specific Constraints Solr Field Type
CatalogNumber Table is CollectionObject AND Formatter is Numeric pint
CatalogNumber Table is CollectionObject AND Formatter is NOT Numeric string
String java.lang.String or text string
Date/Time java.util.Date, java.sql.Timestamp string
Calendar Field ID ends with NumericDay, NumericMonth, or NumericYear pint
Calendar All other Calendar fields string
Integer java.lang.Integer, java.lang.Byte, java.lang.Short pint
Long java.lang.Long plong
Float java.lang.Float pfloat
Double/Decimal java.lang.Double, java.math.BigDecimal pdouble
Boolean java.lang.Boolean string
Default Any other type, formatted fields, or aggregated fields string

Metadata

Metadata

Assignees

No one assigned

    Labels

    1 - RequestA request made by a member of the community2 - Exporting DataIssues that are related to exporting data to DwC, GBIF, IPT, Web Portal, etc.SeparationFrom6

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions