In [1]:
import requests
import os
import re
import emoji
import pandas as pd
from collections import Counter, defaultdict
import string
import matplotlib.pyplot as plt
from bs4 import BeautifulSoup

In [2]:
# Define a function to fetch and parse arXiv data
def fetch_arxiv_data(search_query, max_results=10):
    # Define the API endpoint URL for searching articles
    api_url = "http://export.arxiv.org/api/query"

    # Define additional parameters like start and max_results
    params = {
        "search_query": search_query,
        "start": 0,  # Start index of results
        "max_results": max_results,  # Maximum number of results to retrieve
    }

    # Send an HTTP GET request to the arXiv API
    response = requests.get(api_url, params=params)

    # Check if the request was successful (HTTP status code 200)
    if response.status_code == 200:
        # Parse the response content using BeautifulSoup
        soup = BeautifulSoup(response.content, "xml")

        # Create a list to store article information
        articles = []

        # Extract article information from the parsed content
        entries = soup.find_all("entry")
        for entry in entries:
            title = entry.find("title").text
            paper_id = entry.find("id").text
            published = entry.find("published").text
            updated = entry.find("updated").text
            summary = entry.find("summary").text
            author = [author.text for author in entry.find_all("author")]
            comments = entry.find("arxiv:comment").text if entry.find("arxiv:comment") else ""
            journal_ref = entry.find("arxiv:journal_ref").text if entry.find("arxiv:journal_ref") else ""
            link = entry.find("link")["href"] if entry.find("link") else ""
            primary_category = entry.find("arxiv:primary_category")["term"] if entry.find("arxiv:primary_category") else ""
            categories = [cat["term"] for cat in entry.find_all("category")]
            doi = entry.find("arxiv:doi").text if entry.find("arxiv:doi") else ""
            license = entry.find("arxiv:license")["type"] if entry.find("arxiv:license") else ""
            affiliation = [aff.text for aff in entry.find_all("arxiv:affiliation")]

            # Append article information to the list
            articles.append({
                "Title": title,
                "ID": paper_id,
                "Published": published,
                "Updated": updated,
                "Summary": summary,
                "Author": author,
                "Comments": comments,
                "Journal_Ref": journal_ref,
                "Link": link,
                "Primary_Category": primary_category,
                "Categories": categories,
                "DOI": doi,
                "License": license,
                "Affiliation": affiliation
            })

        # Create a DataFrame from the list of articles
        df = pd.DataFrame(articles)

        # Return the DataFrame
        return df

    else:
        print("Failed to retrieve data from arXiv API")
        return None


# arXiv API Search Query Parameter

The `search_query` parameter in the arXiv API allows you to specify search criteria for retrieving articles. It enables you to define a query string that determines which articles will be returned in the API response. Here's how the `search_query` parameter works:

## Keywords

You can use keywords to search for articles that contain specific terms or phrases. For example:

```python
search_query = "quantum physics"
This retrieves articles that contain the keywords "quantum" and "physics."

Field-Specific Searches
You can specify which fields of the articles to search within. Some common fields include:

all: Searches for keywords in all fields of the articles, including the title, author names, abstract, and more.
title: Searches for keywords only in the titles of articles.
author: Searches for articles authored by specific authors or containing specific author names.
abstract: Searches for keywords in the abstracts of articles.
category: Searches for articles within specific categories or subjects.
Boolean Operators
You can use Boolean operators to refine your search. Common operators include:

AND: Retrieves articles that contain both specified terms. For example:
python
Copy code
search_query = "quantum AND physics"
This retrieves articles related to both quantum and physics.

OR: Retrieves articles that contain either of the specified terms. For example:
python
Copy code
search_query = "quantum OR physics"
This retrieves articles related to quantum or physics.

NOT: Excludes articles that contain the specified term. For example:
python
Copy code
search_query = "quantum NOT physics"
This retrieves articles related to quantum but not physics.

Wildcards
You can use wildcards to match partial terms or patterns. Common wildcards include:

*: Matches any sequence of characters. For example:
python
Copy code
search_query = "astro*"
This retrieves articles related to astrophysics, astronomy, etc.

?: Matches a single character. For example:
python
Copy code
search_query = "colou?r"
This retrieves articles related to "color" or "colour."

Exact Phrases
You can use double quotes to search for an exact phrase. For example:

python
Copy code
search_query = "\"quantum mechanics\""
This retrieves articles with the exact phrase "quantum mechanics."

Parentheses
You can use parentheses to group terms and control the order of operations. For example:

python
Copy code
search_query = "(quantum OR physics) AND NOT chemistry"
This retrieves articles related to quantum or physics but not chemistry.

Field Prefixes
You can specify the field to search within by using field prefixes. For example:

python
Copy code
search_query = "title:quantum"
This searches for the term "quantum" only in article titles.

The search_query parameter allows you to construct complex queries to narrow down the search results based on your specific requirements and interests. You can combine various operators, fields, and keywords to tailor your search query to find relevant articles on arXiv.

In [3]:
# Example usage:
search_query = "all:electron"
max_results = 10
arxiv_df = fetch_arxiv_data(search_query, max_results)
arxiv_df

Unnamed: 0,Title,ID,Published,Updated,Summary,Author,Comments,Journal_Ref,Link,Primary_Category,Categories,DOI,License,Affiliation
0,Impact of Electron-Electron Cusp on Configurat...,http://arxiv.org/abs/cond-mat/0102536v1,2001-02-28T20:12:09Z,2001-02-28T20:12:09Z,The effect of the electron-electron cusp on ...,"[\nDavid Prendergast\nDepartment of Physics\n,...","11 pages, 6 figures, 3 tables, LaTeX209, submi...","J. Chem. Phys. 115, 1626 (2001)",http://dx.doi.org/10.1063/1.1383585,cond-mat.str-el,[cond-mat.str-el],10.1063/1.1383585,,"[Department of Physics, NMRC, University Colle..."
1,Electron thermal conductivity owing to collisi...,http://arxiv.org/abs/astro-ph/0608371v1,2006-08-17T14:05:46Z,2006-08-17T14:05:46Z,We calculate the thermal conductivity of ele...,[\nP. S. Shternin\nIoffe Physico-Technical Ins...,"8 pages, 3 figures",Phys.Rev. D74 (2006) 043004,http://dx.doi.org/10.1103/PhysRevD.74.043004,astro-ph,[astro-ph],10.1103/PhysRevD.74.043004,,"[Ioffe Physico-Technical Institute, Ioffe Phys..."
2,Electron pairing: from metastable electron pai...,http://arxiv.org/abs/1802.06593v1,2018-02-19T11:51:42Z,2018-02-19T11:51:42Z,Starting from the shell structure in atoms a...,"[\nGuo-Qiang Hai\n, \nLadir Cândido\n, \nBraul...","17 pages, 6 figures, Journal of Physics Commun...",,http://dx.doi.org/10.1088/2399-6528/aaaee0,cond-mat.str-el,[cond-mat.str-el],10.1088/2399-6528/aaaee0,,[]
3,Electron Temperature Anisotropy and Electron B...,http://arxiv.org/abs/2010.01066v1,2020-10-02T15:46:56Z,2020-10-02T15:46:56Z,Electron temperature anisotropies and electr...,"[\nHeyu Sun\n, \nJinsong Zhao\n, \nWen Liu\n, ...",,,http://dx.doi.org/10.3847/1538-4357/abb3ca,physics.space-ph,[physics.space-ph],10.3847/1538-4357/abb3ca,,[]
4,Hamiltonian of a many-electron system with sin...,http://arxiv.org/abs/1501.04914v1,2015-01-20T18:48:22Z,2015-01-20T18:48:22Z,Based on the metastable electron-pair energy...,"[\nG. -Q. Hai\n, \nF. M. Peeters\n]",,Eur. Phys. J. B (2015) 88: 20,http://dx.doi.org/10.1140/epjb/e2014-50686-x,cond-mat.supr-con,"[cond-mat.supr-con, cond-mat.mes-hall]",10.1140/epjb/e2014-50686-x,,[]
5,Electron-Electron Bremsstrahlung Emission and ...,http://arxiv.org/abs/0707.4225v1,2007-07-28T09:32:22Z,2007-07-28T09:32:22Z,Although both electron-ion and electron-elec...,"[\nEduard P. Kontar\n, \nA. Gordon Emslie\n, \...","7 pages, 5 figures, submitted to Astrophysical...",,http://dx.doi.org/10.1086/521977,astro-ph,[astro-ph],10.1086/521977,,[]
6,Improved scenario of baryogenesis,http://arxiv.org/abs/astro-ph/9904306v1,1999-04-22T15:54:59Z,1999-04-22T15:54:59Z,"It is assumed that, in the primordial plasma...",[\nD. L. Khokhlov\n],3 pages LaTeX,,http://arxiv.org/abs/astro-ph/9904306v1,astro-ph,[astro-ph],,,[]
7,Exact Electron-Pairing Ground States of Tight-...,http://arxiv.org/abs/cond-mat/0310615v1,2003-10-27T08:59:02Z,2003-10-27T08:59:02Z,We present a class of exactly solvable model...,[\nAkinori Tanaka\n],"4 pages, 1 figure",,http://dx.doi.org/10.1143/JPSJ.73.1107,cond-mat.str-el,[cond-mat.str-el],10.1143/JPSJ.73.1107,,[]
8,Insights into the Electron-Electron Interactio...,http://arxiv.org/abs/2101.10508v1,2021-01-26T01:15:10Z,2021-01-26T01:15:10Z,The effective electron-electron interaction ...,"[\nCarl A. Kukkonen\n, \nKun Chen\n]","16 pages, 20 figures",,http://arxiv.org/abs/2101.10508v1,cond-mat.quant-gas,"[cond-mat.quant-gas, cond-mat.mtrl-sci, cond-m...",,,[]
9,Electron-electron interactions in a weakly scr...,http://arxiv.org/abs/cond-mat/0205001v1,2002-04-30T20:00:18Z,2002-04-30T20:00:18Z,We probe the strength of electron-electron i...,"[\nI. Karakurt\n, \nA. J. Dahm\n]","4 pages, 5 figures",,http://arxiv.org/abs/cond-mat/0205001v1,cond-mat.str-el,[cond-mat.str-el],,,[]


In [4]:
arxiv_df.to_csv('example_output.csv')