pensoft-flickr

An adaption of my workflow for BMC journals, which is posted at: https://github.com/rossmounce/Trying-beautiful-soup

Developed in bash & python with Beautiful Soup.

Get full-text HTML articles from ZooKeys that contain phylogenies

wget -w 30 -i layout.txt

mmv "*.htm" "#1.html" # There is much inconsistency between articles as to the URL extension. Sometimes 'html' sometimes 'htm'

Running it

bash html_create_subfolders.sh ; #creates a subfolder for each fulltext html article
for i in *.html ; do python pensoft-get-figures.py $i ; done ;   #extracts the figure image links, bibliographic data and figure caption text
bash download-figs.sh ;
bash remove-apos.sh ; #Removes all apostrophes from all caption plaintext files
bash fix-pensoft-captions.sh; #ensures that each figure caption takes up just one (long) line
bash exif-CCBY-Pensoft.sh ; #embeds constant strings : BioMed Central & CC BY 
bash embedxmp.sh ; # this script calls on "doexif.sh" so make sure it's executable

bash create_subfolders can be happily re-run without losing what has been done in the next two lines

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pensoft-flickr

Get full-text HTML articles from ZooKeys that contain phylogenies

Running it

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
Zookeys.ipynb		Zookeys.ipynb
doexif.sh		doexif.sh
download-figs.sh		download-figs.sh
embedxmp.sh		embedxmp.sh
exif-CCBY-Pensoft.sh		exif-CCBY-Pensoft.sh
fix-pensoft-captions.sh		fix-pensoft-captions.sh
html_create_subfolders.sh		html_create_subfolders.sh
layout.txt		layout.txt
pensoft-get-figures.py		pensoft-get-figures.py
remove-apos.sh		remove-apos.sh

License

rossmounce/pensoft-flickr

Folders and files

Latest commit

History

Repository files navigation

pensoft-flickr

Get full-text HTML articles from ZooKeys that contain phylogenies

Running it

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages