# Retrieving arXiv Metadata using REST API
This notebook demonstrates how to fetch metadata from arXiv papers using their REST API.

First, let's import the required libraries

In [1]:
import requests
import json
import xml.etree.ElementTree as ET

Define a function that retrieves metadata using the arXiv REST API

In [2]:
def get_arxiv_metadata(arxiv_id):
    url = f"http://export.arxiv.org/api/query?id_list={arxiv_id}"
    response = requests.get(url)
    root = ET.fromstring(response.content)
    
    # Extract namespace
    ns = {'atom': 'http://www.w3.org/2005/Atom'}
    
    # Get the first entry
    entry = root.find('atom:entry', ns)
    
    metadata = {
        'title': entry.find('atom:title', ns).text.strip(),
        'authors': [author.find('atom:name', ns).text for author in entry.findall('atom:author', ns)],
        'published': entry.find('atom:published', ns).text,
        'summary': entry.find('atom:summary', ns).text.strip(),
        'pdf_url': f"https://arxiv.org/pdf/{arxiv_id}.pdf"
    }
    return metadata

Test the function with an example arXiv ID and save the results

In [3]:
arxiv_id = "2204.07547"
metadata = get_arxiv_metadata(arxiv_id)

# Save metadata to file
with open(f'arxiv_metadata_{arxiv_id}.json', 'w') as f:
    json.dump(metadata, f, indent=2)