# MPIA Arxiv on Deck 2: Debugging notebook

In this notebook, I keep some first order commands for diagnostic of issues with papers.
Main definitions are taken from the main notebook.

In [1]:
# Imports
import os
from IPython.display import Markdown, display
from tqdm.notebook import tqdm
import warnings
from PIL import Image 

# requires arxiv_on_deck_2

from arxiv_on_deck_2.arxiv2 import (get_new_papers, 
                                    get_paper_from_identifier,
                                    retrieve_document_source, 
                                    get_markdown_badge)
from arxiv_on_deck_2 import (latex,
                             latex_bib,
                             mpia,
                             highlight_authors_in_list)

# Sometimes images are really big
Image.MAX_IMAGE_PIXELS = 1000000000 

# Some useful definitions.
class AffiliationWarning(UserWarning):
    pass

class AffiliationError(RuntimeError):
    pass

def validation(source: str):
    """Raises error paper during parsing of source file
    
    Allows checks before parsing TeX code.
    
    Raises AffiliationWarning
    """
    check = mpia.affiliation_verifications(source, verbose=True)
    if check is not True:
        raise AffiliationError("mpia.affiliation_verifications: " + check)

        
warnings.simplefilter('always', AffiliationWarning)

We get the author list from the MPIA website

In [2]:
# !rm -f tmp_mpia_authors.yml

In [3]:
# Getting the list of authors can take sometimes (internet connection)
# Caching the MPIA author list to avoid running this line every time we restart the kernel.
import yaml
try:
    with open('tmp_mpia_authors.yml', 'r') as fin:
        mpia_authors = yaml.load(fin, yaml.BaseLoader)
    print("`mpia.get_mpia_mitarbeiter_list()`: restored from cache")
except FileNotFoundError:
    print("`mpia.get_mpia_mitarbeiter_list()`: cannot be restored from cache.")
    # get list from MPIA website
    # it automatically filters identified non-scientists :func:`mpia.filter_non_scientists`
    mpia_authors = mpia.get_mpia_mitarbeiter_list()
    with open('tmp_mpia_authors.yml', 'w') as fout:
        fout.write(yaml.dump(mpia_authors))

`mpia.get_mpia_mitarbeiter_list()`: restored from cache


We get the paper to debug

In [4]:
which = "2406.10032"
paper = get_paper_from_identifier(which)
paper


|||
|---:|:---|
| [![arXiv](https://img.shields.io/badge/arXiv-2406.10032-b31b1b.svg)](https://arxiv.org/abs/2406.10032) | **GASTLI: An open-source coupled interior-atmosphere model to unveil gas giant composition**  |
|| Lorena Acuña, Laura Kreidberg, Meng Zhai, Paul Mollière |
|*Appeared on*| *2024-06-14*|
|*Comments*| *18 pages, 9 figures. In review in Astronomy & Astrophysics*|
|**Abstract**| The metal mass fractions of gas giants are a powerful tool to constrain their formation mechanisms and evolution. The metal content is inferred by comparing mass and radius measurements with interior structure and evolution models. In the midst of the JWST, CHEOPS, TESS, and the forthcoming PLATO era, we are at the brink of obtaining unprecedented precision in radius, age and atmospheric metallicity measurements. To prepare for this wealth of data, we present the GAS gianT modeL for Interiors (GASTLI), an easy-to-use, publicly available Python package. The code is optimized to rapidly calculate mass-radius relations, and radius and luminosity thermal evolution curves for a variety of envelope compositions and core mass fractions. Its applicability spans planets with masses $17 \ M_{\oplus} < M < 6 \ M_{Jup}$, and equilibrium temperatures $T_{eq} < 1000$ K. The interior model is stratified in a core composed of water and rock, and an envelope constituted by H/He and metals (water). The interior is coupled to a grid of self-consistent, cloud-free atmospheric models to determine the atmospheric and boundary interior temperature, as well as the contribution of the atmosphere to the total radius. We successfully validate GASTLI by comparing it to previous work and data of the Solar System's gas giants and Neptune. We also test GASTLI on the Neptune-mass exoplanet HAT-P-26 b, finding a bulk metal mass fraction between 0.60-0.78 and a core mass of 8.5-14.4 $M_{\oplus}$. Finally, we explore the impact of different equations of state and assumptions, such as C/O ratio and transit pressure, in the estimation of bulk metal mass fraction. These differences between interior models entail a change in radius of up to 2.5% for Jupiter-mass planets, but more than 10\% for Neptune-mass. These are equivalent to variations in core mass fraction of 0.07, or 0.10 in envelope metal mass fraction.|

In [5]:
import re
from typing import Sequence

def author_match(author: str, hl_list: Sequence[str], verbose=False) -> Sequence[str]:
    """ Matching author names with a family name reference list
    
    :param author: the author string to check
    :param hl_list: the list of reference authors to match
    :param verbose: prints matching results if set
    :return: the matching sequences or empty sequence if None
    """
    for hl in hl_list:
        match = re.findall(r"\b{:s}\b".format(author), hl, re.IGNORECASE)
        if match:
            if verbose:
                print(author, ' -> ',  hl, ' | ', match)
            return match
        
from typing import Sequence

def highlight_authors_in_list(author_list: Sequence[str], 
                              hl_list: Sequence[str], 
                              verbose: bool = False) -> Sequence[str]:
    """ highlight all authors of the paper that match `lst` entries

    :param author_list: the list of authors
    :param hl_list: the list of authors to highlight
    :param verbose: prints matching results if set
    :return: the list of authors with the highlighted authors
    """
    new_authors = []
    for author in author_list:
        match = author_match(author, hl_list, verbose)
        if match:
            new_authors.append(f"<mark>{author}</mark>")
        else:
            new_authors.append(f"{author}")
    return new_authors

In [6]:
# Check author list with their initials
normed_author_list = [mpia.get_initials(k) for k in paper['authors']]
normed_mpia_authors = [k[1] for k in mpia_authors]
hl_authors = highlight_authors_in_list(normed_author_list, normed_mpia_authors, verbose=True)
matches = [(hl, orig) for hl, orig in zip(hl_authors, paper['authors']) if 'mark' in hl]
if not matches:
    warnings.warn(AffiliationWarning("WARNING: This paper does not seem to have MPIA authors."))
    
paper['authors'] = hl_authors
paper

L. Kreidberg  ->  L. Kreidberg  |  ['L. Kreidberg']
P. Mollière  ->  P. Mollière  |  ['P. Mollière']



|||
|---:|:---|
| [![arXiv](https://img.shields.io/badge/arXiv-2406.10032-b31b1b.svg)](https://arxiv.org/abs/2406.10032) | **GASTLI: An open-source coupled interior-atmosphere model to unveil gas giant composition**  |
|| L. Acuña, <mark>L. Kreidberg</mark>, M. Zhai, <mark>P. Mollière</mark> |
|*Appeared on*| *2024-06-14*|
|*Comments*| *18 pages, 9 figures. In review in Astronomy & Astrophysics*|
|**Abstract**| The metal mass fractions of gas giants are a powerful tool to constrain their formation mechanisms and evolution. The metal content is inferred by comparing mass and radius measurements with interior structure and evolution models. In the midst of the JWST, CHEOPS, TESS, and the forthcoming PLATO era, we are at the brink of obtaining unprecedented precision in radius, age and atmospheric metallicity measurements. To prepare for this wealth of data, we present the GAS gianT modeL for Interiors (GASTLI), an easy-to-use, publicly available Python package. The code is optimized to rapidly calculate mass-radius relations, and radius and luminosity thermal evolution curves for a variety of envelope compositions and core mass fractions. Its applicability spans planets with masses $17 \ M_{\oplus} < M < 6 \ M_{Jup}$, and equilibrium temperatures $T_{eq} < 1000$ K. The interior model is stratified in a core composed of water and rock, and an envelope constituted by H/He and metals (water). The interior is coupled to a grid of self-consistent, cloud-free atmospheric models to determine the atmospheric and boundary interior temperature, as well as the contribution of the atmosphere to the total radius. We successfully validate GASTLI by comparing it to previous work and data of the Solar System's gas giants and Neptune. We also test GASTLI on the Neptune-mass exoplanet HAT-P-26 b, finding a bulk metal mass fraction between 0.60-0.78 and a core mass of 8.5-14.4 $M_{\oplus}$. Finally, we explore the impact of different equations of state and assumptions, such as C/O ratio and transit pressure, in the estimation of bulk metal mass fraction. These differences between interior models entail a change in radius of up to 2.5% for Jupiter-mass planets, but more than 10\% for Neptune-mass. These are equivalent to variations in core mass fraction of 0.07, or 0.10 in envelope metal mass fraction.|

We get the (TeX) source
* retrieve the tarball
* find the main tex file and parse it
* parse for affiliations (but debugging so we do not stop if not found)
* generate the the output markdown

In [7]:
def get_markdown_qrcode(paper_id: str):
    """ Generate a qrcode to the arxiv page using qrserver.com
    
    :param paper: Arxiv paper
    :returns: markdown text
    """
    url = r"https://api.qrserver.com/v1/create-qr-code/?size=100x100&data="
    txt = f"""<img src={url}"https://arxiv.org/abs/{paper_id}">"""
    txt = '<div id="qrcode">' + txt + '</div>'
    return txt

In [8]:
paper_id = f'{which:s}'
folder = f'tmp_{paper_id:s}'

if not os.path.isdir(folder):
    folder = retrieve_document_source(f"{paper_id}", f'tmp_{paper_id}')

try:
    doc = latex.LatexDocument(folder, validation=validation)    
except AffiliationError as affilerror:
    msg = f"ArXiv:{paper_id:s} is not an MPIA paper... " + str(affilerror)
    print(msg)

# Hack because sometimes author parsing does not work well
if (len(doc.authors) != len(paper['authors'])):
    doc._authors = paper['authors']
else:
    # highlight authors (FIXME: doc.highlight_authors)
    # done on arxiv paper already
    doc._authors = highlight_authors_in_list(
        [mpia.get_initials(k) for k in doc.authors], 
        normed_mpia_authors, verbose=True)
if (doc.abstract) in (None, ''):
    doc._abstract = paper['abstract']

doc.comment = get_markdown_badge(paper_id) 
if paper['comments']:
    doc.comment += " _" + paper['comments'] + "_"

full_md = doc.generate_markdown_text()

full_md += get_markdown_qrcode(paper_id)

# replace citations
try:
    bibdata = latex_bib.LatexBib.from_doc(doc)
    full_md = latex_bib.replace_citations(full_md, bibdata)
except Exception as e:
    raise e

L. Kreidberg  ->  L. Kreidberg  |  ['L. Kreidberg']
P. Mollière  ->  P. Mollière  |  ['P. Mollière']




Found 158 bibliographic references in tmp_2406.10032/aanda.bbl.


In [18]:
print(full_md)

<div class="macros" style="visibility:hidden;">
$\newcommand{\ensuremath}{}$
$\newcommand{\xspace}{}$
$\newcommand{\object}[1]{\texttt{#1}}$
$\newcommand{\farcs}{{.}''}$
$\newcommand{\farcm}{{.}'}$
$\newcommand{\arcsec}{''}$
$\newcommand{\arcmin}{'}$
$\newcommand{\ion}[2]{#1#2}$
$\newcommand{\textsc}[1]{\textrm{#1}}$
$\newcommand{\hl}[1]{\textrm{#1}}$
$\newcommand{\footnote}[1]{}$</div>



<div id="title">

# GASTLI

</div>
<div id="comments">

[![arXiv](https://img.shields.io/badge/arXiv-2406.10032-b31b1b.svg)](https://arxiv.org/abs/2406.10032) _18 pages, 9 figures. In review in Astronomy & Astrophysics_

</div>
<div id="authors">

L. Acuña, <mark>L. Kreidberg</mark>, M. Zhai, <mark>P. Mollière</mark>

</div>
<div id="abstract">

**Abstract:** The metal mass fractions of gas giants are a powerful tool to constrain their formation mechanisms and evolution. The metal content is inferred by comparing mass and radius measurements with interior structure and evolution models. In the midst 

In [None]:
print(doc.abstract)

In [None]:
def export_markdown_summary(md: str, md_fname:str, directory: str):
    """Export MD document and associated relevant images"""
    import os
    import shutil
    import re

    if (os.path.exists(directory) and not os.path.isdir(directory)):
        raise RuntimeError(f"a non-directory file exists with name {directory:s}")

    if (not os.path.exists(directory)):
        print(f"creating directory {directory:s}")
        os.mkdir(directory)

    fig_fnames = (re.compile(r'\[Fig.*\]\((.*)\)').findall(md) + 
                  re.compile(r'\<img src="([^>\s]*)"[^>]*/>').findall(md))
    for fname in fig_fnames:
        if 'http' in fname:
            # No need to copy online figures
            continue
        destdir = os.path.join(directory, os.path.dirname(fname))
        destfname = os.path.join(destdir, os.path.basename(fname))
        try:
            os.makedirs(destdir)
        except FileExistsError:
            pass
        shutil.copy(fname, destfname)
    with open(os.path.join(directory, md_fname), 'w') as fout:
        fout.write(md)
    print("exported in ", os.path.join(directory, md_fname))
    [print("    + " + os.path.join(directory,fk)) for fk in fig_fnames]

In [72]:
export_markdown_summary(full_md, f"{paper_id:s}.md", '_build/html/')

exported in  _build/html/2304.12343.md
    + _build/html/tmp_2304.12343/./selection.png
    + _build/html/tmp_2304.12343/./mzr.png
    + _build/html/tmp_2304.12343/./line_ratios.png
