Skip to content

KB CSV Export Issues Approach Discussion

ianibo edited this page Aug 13, 2012 · 4 revisions

The problem

KB+ allows users to export their data to a CSV format for import into other systems / spreadsheet / other data processing tools. The ultimate aim is to encourage a round-trip capability that allows users to edit/export data in the most efficient format for any given scenario.

However, KBART files alone do not carry sufficient header information for KB+ to be able to ingest a file. Various systems have different approaches to this. CUFTS for example uses a directory level XML file which contains meta information about each "Deal" in a directory full of CSV exports.

In phase one of the project, KB+ used a variable number of header rows at the top of the export file to carry deal level meta-data in a simple name-value pair arrangement. Each extra meta-property required an extra line of header information.

The -current- solution

KB+ allows users to export their data from the subscriptionDetails page : Screenshot of subs page showing export options

Export options are given in the top right hand corner, 2 links: "CSV Export" and "(no header)". The no header variant provides a simple KBART compliant export (With some additional non-standard columns, but a core set of KBART data). This file conforms to the standard, but is insufficient for round-tripping in to KB+. The primary option (CSV Export) provides the same data with an additional 2 lines at the start of the file defining a header. Data-proper starts on line 3.

Here is an example header and first data lines from a KB+ export

  1. FileType,SpecVersion,JC_ID,TermStartDate,TermEndDate,SubURI
  2. Subscription Taken,2.0,551,start,end,"uri://kbplus/sub/236"
  3. included_st,publication_title,print_identifier,online_identifier,date_first_issue_subscribed,num_first_vol_subscribed,num_first_issue_subscribed,date_last_issue_subscribed,num_last_vol_subscribed,num_last_issue_subscribed,embargo_info,core_title
  4. Y,"Asian Journal of Andrology","null","1745-7262",2008-01-01 00:00:00.0,,,,,,,false
  5. Y,"British Dental Journal","null","1476-5373",2008-01-01 00:00:00.0,,,,,,,false

The header in this export follows the same convention as a basic KBART file. The fields are in no particular order, and are extensible across the horizontal axis. The header line specifies "FileType" and "SpecVersion" to allow us to track modifications to the specifications. Line 2 contains the actual values. This file is v2.0 of the Subscriptions Taken export. We also specify the JC_ID (Jisc collections id) of the subscriber, deal start and end dates and the URI of the subscription for matching. Line 3 is the start of data proper and identical (currently) to the CUFTS export.

Clone this wiki locally