## Example 2: Executing a Search and Downloading the Results

In `example_1.ipynb`, we saw how to download an e-print using its arXiv ID. This is a useful feature, but is somewhat limited by the fact that we need to know the ID ahead of time. Instead, what if we want to search for any e-prints that contain certain keywords or authors (for example), and download the results of the search? This functionality is covered by the other main class of `pyxiv`: the `Search` class. In this example, we'll get comfortable with using `Search` to find and download e-prints. To start, we import `pyxiv`.

In [1]:
import sys
sys.path.append("../")
import pyxiv

To view the basics of the `Search` class, including its parameters and intended usage, we can run the following cell.

In [2]:
pyxiv.Search?

[1;31mInit signature:[0m
[0mpyxiv[0m[1;33m.[0m[0mSearch[0m[1;33m([0m[1;33m
[0m    [0mquery[0m[1;33m:[0m [0mstr[0m[1;33m,[0m[1;33m
[0m    [0mstart_date[0m[1;33m:[0m [0mstr[0m[1;33m,[0m[1;33m
[0m    [0mend_date[0m[1;33m:[0m [0mstr[0m [1;33m=[0m [1;34m'today'[0m[1;33m,[0m[1;33m
[0m    [0mmax_results[0m[1;33m:[0m [0mint[0m [1;33m=[0m [1;36m250[0m[1;33m,[0m[1;33m
[0m    [0msort_order[0m[1;33m:[0m [0mstr[0m [1;33m=[0m [1;34m'descending'[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m [1;33m->[0m [1;32mNone[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
Class for searching through e-prints submitted to arxiv.org using
the [arXiv API](https://info.arxiv.org/help/api/index.html). Once
e-prints are found that match the search criteria, they can be
downloaded locally as .pdf files.

Parameters
----------
query : str
    The core part of a query compatible with the arXiv API.
    Instructions on how to construct queries are 

As we can see in the docstring for `Search`, it has two mandatory arguments: `query` and `start_date`. `query` is a string that specifies what to search for, formatted according to the requirements of the [arXiv API](https://info.arxiv.org/help/api/index.html). `start_date` is an ISO-formatted date string (i.e. a string of the form `"YYYY-MM-DD"`) that sets the date at which the search results begin. For convenience, `start_date` can also take the values `"today"` and `"yesterday"`.

There are also three optional arguments: `end_date`, `max_results`, and `sort_order`. `end_date` is the date at which the search results end, and is formatted in an identical manner to `start_date`; it is set to `"today"` by default. `max_results` is an integer that sets the number of e-prints returned by the arXiv API; it is set to `250` by default.

Here, it is necessary to distinguish between the search results returned by `Search` and the results returned by the arXiv API. Unfortunately, queries of the arXiv API cannot sort by specific dates. Therefore, filtering to match the requirements of `start_date` and `end_date` is a post-query process. This is important because, in general, the number of e-prints returned by `Search` will be less than `max_results`, because any e-prints found by the arXiv API that were submitted before `start_date` or after `end_date` are ignored by `Search`. In fact, if the number of e-prints returned by `Search` is equal to `max_results`, then it is likely necessary to increase `max_results`, because the query of the arXiv API is ending before the date limits are reached. It is recommended to increase `max_results` from its default value of 250 if we are searching for keywords with a high frequency of publication  or across a long time span.

Finally, `sort_order` controls the order in which the arXiv API search through e-prints; it is `'descending'` by default. Setting `sort_order` to ascending is only recommended if we want very old e-prints, since the query will start with oldest submissions first.

To illustrate how `Search` works, suppose we want to find and download all papers submitted to the astro-ph.EP category between March 14, 2023 and May 4, 2023 that are about exoplanets and machine learning. We can accomplish this goal by creating the following instance of `Search`.

In [3]:
search = pyxiv.Search(
    query      = "all:exoplanets AND all:machine learning AND cat:astro-ph.EP",
    start_date = "2023-03-14",
    end_date   = "2023-05-04"
)

Acquiring query results from the arXiv API...
Results acquired in 2.3 sec.


We see that we have converted our desired search criteria (e-prints involving exoplanets and machine learning submitted to astro-ph.EP) to a query string. This conversion was done like so:
- We want to search for e-prints involving exoplanets. Since we are not restricting ourselves to a particular data field (like author, title, etc.), we use the `"all"` field, and get `"all:exoplanets"`.
- We want to search for e-prints involving machine learning. Since we are not restricting ourselves to a particular data field (like author, title, etc.), we use the `"all"` field, and get `"all:exoplanets"`.
- We want to search for e-prints involving exoplanets and machine learning. Therefore, we combine the previous strings using `"AND"`, giving us `"all:exoplanets AND all:machine learning"`.
- We want to search specifically in the astro-ph.EP category. Thus, we append `"cat:astro-ph.EP"` to the previous string using `"AND"`, yielding `"all:exoplanets AND all:machine learning AND cat:astro-ph.EP"`.

A full description of how queries are formatted can be found at the arXiv API [query instructions manual](https://info.arxiv.org/help/api/user-manual.html#51-details-of-query-construction). Note that in the examples given in the instructions manual, the queries are full URLs prefixed with `http://export.arxiv.org/api/query?search_query=` and occasionally suffixed with other specifiers like `&start=0&max_results=1`. These should NOT be included in the string passed to the `query` argument of `Search`, as they are handled automatically behind the scenes.

As well, we may notice that the query used here contains spaces, while example queries in the instructions manual use + to join words. The query with spaces works, though, because the `Search` class automatically replaces all spaces with pluses behind the scenes. Therefore, the queries `"all:exoplanets AND all:machine learning AND cat:astro-ph.EP"` and `"all:exoplanets+AND+all:machine+learning+AND+cat:astro-ph.EP"` are seen as identical by `Search` (but not by the arXiv API alone!), with the former being preferred for readability.

Now that `search` has been created, we can view the search results by calling the `results` method. To learn more about the `results` method, we can run the following cell.

In [4]:
pyxiv.Search.results?

[1;31mSignature:[0m [0mpyxiv[0m[1;33m.[0m[0mSearch[0m[1;33m.[0m[0mresults[0m[1;33m([0m[0mself[0m[1;33m,[0m [0mdetail[0m[1;33m:[0m [0mstr[0m [1;33m=[0m [1;34m'low'[0m[1;33m)[0m [1;33m->[0m [0mOptional[0m[1;33m[[0m[0mstr[0m[1;33m][0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Method that generates a string containing the results of
the search.

Parameters
----------
detail : str, default 'low'
    Controls the level of detail included in the summary of
    each `ePrint` object. If set to `'high'`, all relevant
    fields are included. If set to `'low'`, `abstract`,
    `comment`, `all_categories`, `doi`, `journal_ref`, and
    `date_updated` are omitted.

Returns
-------
Union[str, None]
    If the search yields results, a string that
    concatenates `ePrint.summary(detail=detail)` for each
    e-print is returned. If the search yields no results,
    `None` is returned and a message to help with
    troubleshooting is printed.
[1;31mFile:[

Next, we call the `results` method on `search`.

In [5]:
results = search.results()
print(results)

The specified query yielded 5 results:
arXiv.org e-Print 2303.09335v2
Title
-----
ExoplANNET: A deep learning algorithm to detect and identify planetary
  signals in radial velocity data

Author(s)
---------
L. A. Nieto, R. F. Díaz

Primary category: astro-ph.EP
URL: https://arxiv.org/pdf/2303.09335v2.pdf
Submitted: 2023-03-16

arXiv.org e-Print 2303.12925v1
Title
-----
A Catalogue of Exoplanet Atmospheric Retrieval Codes

Author(s)
---------
Ryan J. MacDonald, Natasha E. Batalha

Primary category: astro-ph.EP
URL: https://arxiv.org/pdf/2303.12925v1.pdf
Submitted: 2023-03-22

arXiv.org e-Print 2304.00224v1
Title
-----
The CARMENES search for exoplanets around M dwarfs -- A deep transfer
  learning method to determine Teff and [M/H] of target stars

Author(s)
---------
A. Bello-García, V. M. Passegger, J. Ordieres-Meré, A. Schweitzer, J. A. Caballero, A. González-Marcos, I. Ribas, A. Reiners, A. Quirrenbach, P. J. Amado, V. J. S. Béjar, C. Cifuentes, Th. Henning, A. Kaminski, R. Luque, 

Note that the string returned by the `results` method appears to be a sequence of strings returned by the `ePrint` class's `summary` method, as seen in `example_1.ipynb`. This is indeed the case, since `Search` internally generates a list of `ePrint` objects.

In a manner analogous to the `summary` method, the `results` method has an optional `detail` argument, set to `"low"` by default. Setting `detail = "high"` adds information about abstracts, comments, any additional categories the e-prints are included in, DOIs (if applicable), journal references (if applicable), and the dates the e-prints were last updated.

In [6]:
results = search.results(detail="high")
print(results)

The specified query yielded 5 results:
arXiv.org e-Print 2303.09335v2
Title
-----
ExoplANNET: A deep learning algorithm to detect and identify planetary
  signals in radial velocity data

Author(s)
---------
L. A. Nieto, R. F. Díaz

Abstract
--------
The detection of exoplanets with the radial velocity method consists in
detecting variations of the stellar velocity caused by an unseen sub-stellar
companion. Instrumental errors, irregular time sampling, and different noise
sources originating in the intrinsic variability of the star can hinder the
interpretation of the data, and even lead to spurious detections. In recent
times, work began to emerge in the field of extrasolar planets that use Machine
Learning algorithms, some with results that exceed those obtained with the
traditional techniques in the field. We seek to explore the scope of the neural
networks in the radial velocity method, in particular for exoplanet detection
in the presence of correlated noise of stellar origin. In 

From here, if only one or two of the e-prints found by `Search` are of interest, they can be downloaded individually using either the URL to their .pdf files included in the `results` method, or by calling the `download` method on `eprint_from_arxiv_id` (as seen in `example_1.ipynb`). However, if all of the e-prints are of interest, they can be downloaded collectively using the `download_results` method. To learn more about the `download_results` method, we can run the following cell.

In [8]:
pyxiv.Search.download_results?

[1;31mSignature:[0m [0mpyxiv[0m[1;33m.[0m[0mSearch[0m[1;33m.[0m[0mdownload_results[0m[1;33m([0m[0mself[0m[1;33m,[0m [0msave_directory[0m[1;33m:[0m [0mstr[0m[1;33m)[0m [1;33m->[0m [0mOptional[0m[1;33m[[0m[0mlist[0m[1;33m[[0m[0mstr[0m[1;33m][0m[1;33m][0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Method that downloads the search results as .pdf files to a
specified local directory.

Parameters
----------
save_directory : str
    The directory to which the e-prints are saved.

Returns
-------
Union[list[str], None]
    If the search yields results, a list of the names of the
    downloaded files is returned. If the search yields no results,
    `None` is returned and a message to help with
    troubleshooting is printed.
[1;31mFile:[0m      c:\users\rusla\dropbox\23-github\projects\chat-research\pyxiv\search.py
[1;31mType:[0m      function

For example, we can download all the e-prints found by `search`.

In [9]:
search.download_results("./papers")

Downloading e-prints...
[1/5] 'ExoplANNET: A deep learning algorithm to detect and identify planetary
  signals in radial velocity data' (2303.09335v2)
[2/5] 'A Catalogue of Exoplanet Atmospheric Retrieval Codes' (2303.12925v1)
[3/5] 'The CARMENES search for exoplanets around M dwarfs -- A deep transfer
  learning method to determine Teff and [M/H] of target stars' (2304.00224v1)
[4/5] 'Distinguishing a planetary transit from false positives: a
  Transformer-based classification for planetary transit signals' (2304.14283v1)
[5/5] 'Multiplicity Boost Of Transit Signal Classifiers: Validation of 69 New
  Exoplanets Using The Multiplicity Boost of ExoMiner' (2305.02470v2)
Download complete! 5 e-prints (20.9 MiB) were downloaded in 1 min 29.4 sec and saved to ./papers.


['./papers/2303.09335v2.pdf',
 './papers/2303.12925v1.pdf',
 './papers/2304.00224v1.pdf',
 './papers/2304.14283v1.pdf',
 './papers/2305.02470v2.pdf']

If the path provided to `download_results` does not exist, it is created automatically. As well, note that the `download_results` method returns a list of the paths to the .pdf files.