Harvesting library guides from WordPress to ingest into Primo

The files in this project are used to harvest metadata from the BU Libraries' Research Guides developed and maintained on a WordPress web site to load to Ex Libris Primo to enable the guides to be discovered in that platform.

The project began with the assumption that the assumption that the guides would be harvested once per semester. The scripts to do this were developed using ipython notebook:

Harvest_Library_Research_Guide_Metadata.ipynb

A markdown version of the file was created using nbconvert:

Harvest_Library_Research_Guide_Metadata.md

The input file for the Harvest can be a file exported from WordPress in an RSS format. Two examples are provided:

bulibraries.wordpress.2014-05-30.xml
bulibraries.wordpress.2014-07-08.xml

The output file ('guides.xml') is the file that is ingested into Primo via a standard (oai) harvest pipe.

After initial testing, we determined that it would be more desireable to harvest the guides on a daily basis. The export of the file from WordPress could not be automated, so we developed scripts to harvest all of the metadata directly from the WordPress web pages. U harvests library subject guides from four different sites. Two are WordPress sites Mugar and Theology. The Medical Library maintains its guides on a php driven web site . The Law Library maintains its guides on the SpringShare LibGuides platform. The python scripts for harvesting from each was developed using iPython notebook. The files were explorted as standard python (*.py) files to be run scheduled by a cron job.

MugarGuidesHarvestedFromWordPress.ipynb
LawLibraryLibGuldes.ipynb
TheologyLibraryGuidesHarvestedFromWordPress.ipynb
Medical Research Guides.ipynb

The ipython notebook files (*.ipynb) are reasonably well documented and are the basis for the python files ( *.py ) that we currently use.

These four python scripts are run daily on a cron job. The output from each is an xml file that is harvested using a standard Primo harvest pipe. The normalization rules are the same rules we use to harvest Dublin Core records from our DSpace repository.

Screen shots of the data source definition and harvest pipe definition are included in this directory.

libguides_harvest_pipe_definition.png
libguides_data_source_definition.png

Jack Ammerman July 30, 2014

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Harvesting library guides from WordPress to ingest into Primo

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Harvest_Library_Research_Guide_Metadata_files		Harvest_Library_Research_Guide_Metadata_files
Harvest_Library_Research_Guide_Metadata.ipynb		Harvest_Library_Research_Guide_Metadata.ipynb
Harvest_Library_Research_Guide_Metadata.md		Harvest_Library_Research_Guide_Metadata.md
LawLibraryLibGuldes.ipynb		LawLibraryLibGuldes.ipynb
Medical Research Guides.ipynb		Medical Research Guides.ipynb
MugarGuidesHarvestedFromWordPress.ipynb		MugarGuidesHarvestedFromWordPress.ipynb
MugarGuidesHarvestedFromWordPress.py		MugarGuidesHarvestedFromWordPress.py
README.md		README.md
TheologyLibraryGuidesHarvestedFromWordPress.ipynb		TheologyLibraryGuidesHarvestedFromWordPress.ipynb
TheologyLibraryGuidesHarvestedFromWordPress.py		TheologyLibraryGuidesHarvestedFromWordPress.py
bulibraries.wordpress.2014-05-30.xml		bulibraries.wordpress.2014-05-30.xml
bulibraries.wordpress.2014-07-08.xml		bulibraries.wordpress.2014-07-08.xml
guides.xml		guides.xml
lawguides.py		lawguides.py
libguides_data_source_definition.png		libguides_data_source_definition.png
libguides_harvest_pipe_definition.png		libguides_harvest_pipe_definition.png
medguides.py		medguides.py
mugar_libguides.ipynb		mugar_libguides.ipynb
theology_research_guides.ipynb		theology_research_guides.ipynb

jwacooks/libguides

Folders and files

Latest commit

History

Repository files navigation

Harvesting library guides from WordPress to ingest into Primo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages