Skip to content

Latest commit

 

History

History
9 lines (5 loc) · 1.27 KB

How to attach raw data to a PDF.md

File metadata and controls

9 lines (5 loc) · 1.27 KB

Regrettably, this almost never happens -- including research/university circles, where one might expect such a thing: research data (or a subset thereof) attached/"bundled" with the paper reporting the findings.

If you are interested in attaching your data to your PDFs, you're bound to need tools. (Ditto if you wish to extract the attached data as not every PDF reader out there is capable of providing a extract/download button for you to click!

So we generally succumb to the reality of scraping content, whether it's merely to obtain the obvious metadata elements (title, authors, publishing venue, publishing date, abstract, ...) or data/charts. Many of us probably won't even realize we are scraping like that -- and what the consequences are for our extracted data quality for our collection at large -- because it's so pervasive, despite bibliographic sites and metadata-offering websites, e.g. Google Scholar.

Once you have [scraping] tools, you're not set for life either! Then it turns into a statistics game: how good are your tools, how much effort are you willing to spend (cleaning and/or cajoling your tools to do your bidding) and what is the target accuracy/veracity of your extracted data?