## Get pdfs from arxiv.org


In [1]:
# Set up to use local modules
%load_ext autoreload
%autoreload 2
import os


In [2]:
import pyprojroot

from src import utils

PDF_DIR = pyprojroot.here("data")

In [3]:
# https://lukasschwab.me/arxiv.py/arxiv.html

import arxiv

# Construct the default API client.
client = arxiv.Client()

# Search for the 10 most recent articles matching the keyword "quantum."
search = arxiv.Search(
    query="quantum", max_results=10, sort_by=arxiv.SortCriterion.SubmittedDate
)

results = client.results(search)

# `results` is a generator; you can iterate over its elements one by one...
for r in client.results(search):
    print(r.title)
# ...or exhaust it into a list. Careful: this is slow for large results sets.
# all_results = list(results)
# print([r.title for r in all_results])

Gravitational Background of Alice-Vortices and R7-Branes
Matter-induced plaquette terms in a $\mathbb{Z}_2$ lattice gauge theory
Nuclear gradients from auxiliary-field quantum Monte Carlo and their application in geometry optimization and transition state search
Discrete Invariants of Koszul Artin-Schelter Regular Algebras of Dimension four
Absorption imaging of quantum gases near surfaces using incoherent light
Non-chiral ephemeral edge states and cascading of exceptional points in the non-reciprocal Haldane model
Mean-Force Hamiltonians from Influence Functionals
Single snapshot non-Markovianity of Pauli channels
An updated constraint for the Gravitational Wave Background from the Gamma-ray Pulsar Timing Array
Emergent aperiodicity in Bose-Bose mixtures induced by spin-dependent periodic potentials


In [4]:
pdf_ids = ["1706.03762v6", "1605.08386v1"]

# Search for the paper with ID "1605.08386v1"
search_by_id = arxiv.Search(id_list=[pdf_ids[0]])
paper = next(client.results(search_by_id))
print(paper.title)

paper.download_pdf(dirpath=PDF_DIR, filename="example_paper.pdf")

Attention Is All You Need


'/home/jordan/documents/GitHub/arxiv-chat/data/example_paper.pdf'

In [5]:
papers = utils.local_papers

utils.get_local_papers(papers=papers, silent=False)

Already downloaded: Attention is All You Need
Already downloaded: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Already downloaded: Generative Adversarial Nets
Already downloaded: Playing Atari with Deep Reinforcement Learning
Already downloaded: ImageNet Classification with Deep Convolutional Neural Networks


## Get most recent ML papers from arxiv


In [6]:
# Use the `arxiv` package to get the 10 most recent papers on the topic of cs.LG
# Papers categorized with stat.ML as primary are automatically cross-listed as cs.LG but not vice versa.
# Computer science: Machine Learning = cs.LG
# Other,
# Statistics: Machine Learning = stat.ML
# Computer science: Artificial Intelligence = cs.AI
# Computer science: Neural and Evolutionary Computing = cs.NE
# Computer science: Systems and Control = cs.SY
# Math: Optimization and Control = math.OC

search = arxiv.Search(
    query="cs.LG", max_results=10, sort_by=arxiv.SortCriterion.SubmittedDate
)

results = client.results(search)
# Print the titles
for r in results:
    print(r.title)
    print(r.summary)

Imitating What Works: Simulation-Filtered Modular Policy Learning from Human Videos
The ability to learn manipulation skills by watching videos of humans has the potential to unlock a new source of highly scalable data for robot learning. Here, we tackle prehensile manipulation, in which tasks involve grasping an object before performing various post-grasp motions. Human videos offer strong signals for learning the post-grasp motions, but they are less useful for learning the prerequisite grasping behaviors, especially for robots without human-like hands. A promising way forward is to use a modular policy design, leveraging a dedicated grasp generator to produce stable grasps. However, arbitrary stable grasps are often not task-compatible, hindering the robot's ability to perform the desired downstream motion. To address this challenge, we present Perceive-Simulate-Imitate (PSI), a framework for training a modular manipulation policy using human video motion data processed by paired gr

In [7]:
# Get the first paper from results
paper = next(client.results(search))

In [8]:
paper

arxiv.Result(entry_id='http://arxiv.org/abs/2602.13197v1', updated=datetime.datetime(2026, 2, 13, 18, 59, 10, tzinfo=datetime.timezone.utc), published=datetime.datetime(2026, 2, 13, 18, 59, 10, tzinfo=datetime.timezone.utc), title='Imitating What Works: Simulation-Filtered Modular Policy Learning from Human Videos', authors=[arxiv.Result.Author('Albert J. Zhai'), arxiv.Result.Author('Kuo-Hao Zeng'), arxiv.Result.Author('Jiasen Lu'), arxiv.Result.Author('Ali Farhadi'), arxiv.Result.Author('Shenlong Wang'), arxiv.Result.Author('Wei-Chiu Ma')], summary="The ability to learn manipulation skills by watching videos of humans has the potential to unlock a new source of highly scalable data for robot learning. Here, we tackle prehensile manipulation, in which tasks involve grasping an object before performing various post-grasp motions. Human videos offer strong signals for learning the post-grasp motions, but they are less useful for learning the prerequisite grasping behaviors, especially fo

In [9]:
# Get all ML papers from today

# https://export.arxiv.org/api/query?search_query=cat:cs.LG+AND+submittedDate:[202001130630+TO+202001131645]

query = "cs.LG"
# query by submitteDate is not implemented in the Python API
# query = "cat:cs.LG+AND+submittedDate:[202001130630+TO+202101131645]"
search = arxiv.Search(
    query=query, max_results=10, sort_by=arxiv.SortCriterion.SubmittedDate
)

results = client.results(search)
# Print the titles
for r in results:
    print(r.title)
    print(r.published)

Imitating What Works: Simulation-Filtered Modular Policy Learning from Human Videos
2026-02-13 18:59:10+00:00
Selection of CMIP6 Models for Regional Precipitation Projection and Climate Change Assessment in the Jhelum and Chenab River Basins
2026-02-13 18:41:40+00:00
Improved Regret Guarantees for Online Mirror Descent using a Portfolio of Mirror Maps
2026-02-13 18:37:26+00:00
Learning functional components of PDEs from data using neural networks
2026-02-13 18:32:33+00:00
Realistic Face Reconstruction from Facial Embeddings via Diffusion Models
2026-02-13 18:28:24+00:00
Learning to Approximate Uniform Facility Location via Graph Neural Networks
2026-02-13 18:08:23+00:00
Quantization-Robust LLM Unlearning via Low-Rank Adaptation
2026-02-13 18:01:40+00:00
FlashSchNet: Fast and Accurate Coarse-Grained Neural Network Molecular Dynamics
2026-02-13 17:49:12+00:00
Order Matters in Retrosynthesis: Structure-aware Generation via Reaction-Center-Guided Discrete Flow Matching
2026-02-13 17:39:21+