Skip to content

Having access to thousands of publications via http://arxiv.org is great but when downloading a paper the title of the .pdf document always comes in the form "1234.56789.pdf" or similar. That file name is the id fo the paper in the arXiv database. Here I present set of utils that automatically rename those .pdf with the title of the publication

Notifications You must be signed in to change notification settings

jafermarq/arXiv2Txt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

arXiv2Txt

Having access to thousands of publications via http://arxiv.org is great but when downloading a paper the title of the PDF document always comes in the form "1234.5675.pdf" or similar. That file name is the id fo the paper in the arXiv database. Here I present set of utils that automatically rename those PDFs with the title of the publication

Getting ready

Here I make use of the python wrapper for the arXiv API provided by: https://github.com/lukasschwab/arxiv.py. You can install it as:

$ pip install arxiv

Among other options, this python wrapper allows to send a query to arXiv base on the id of the publication. We can use this query to obtain the title of the publication and then rename the original PDF.

Renaming the .pdf

This is the basic usage of the arxiv python package that we'll be using:

 import arxiv
 
 paper_id = "1512.03385" # from a file "1512.03385.pdf"
 
 # send query to arXiv database and filter publication title
 paper_title = arxiv.query(id_list=[paper_id])[0]['title_detail']['value']
 
 print paper_title 
 # prints "Deep Residual Learning for Image Recognition"

Usage

Here I describe how to use each of the scripts int this repo. For each example, let's assume that the PDF file to rename is "1512.03385.pdf". The examples below assume that the PDF is in the same directory where the script is. However, this is not a requirement, passing a full path to the PDF will also work with either of the methods below.

Bash

# This will rename the PDF file with the title of the publication, "Deep Residual Learning for Image Recognition" for this particular id
$ ./rename.sh 1512.03385.pdf

Embedded in your Python code

  import getArXivName
  
  file_name = 1512.03385.pdf
  paper_title = getArXivName(file_name) 
  print paper_title # prints "Deep Residual Learning for Image Recognition"
  
  # your own code to rename the file
  ...
  

About

Having access to thousands of publications via http://arxiv.org is great but when downloading a paper the title of the .pdf document always comes in the form "1234.56789.pdf" or similar. That file name is the id fo the paper in the arXiv database. Here I present set of utils that automatically rename those .pdf with the title of the publication

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published