# MPIA Arxiv on Deck 2: Debugging notebook

In this notebook, I keep some first order commands for diagnostic of issues with papers.
Main definitions are taken from the main notebook.

In [1]:
# Imports
import os
from IPython.display import Markdown, display
from tqdm.notebook import tqdm
import warnings
from PIL import Image 

# requires arxiv_on_deck_2

from arxiv_on_deck_2.arxiv2 import (get_new_papers, 
                                    get_paper_from_identifier,
                                    retrieve_document_source, 
                                    get_markdown_badge)
from arxiv_on_deck_2 import (latex,
                             latex_bib,
                             mpia,
                             highlight_authors_in_list)

# Sometimes images are really big
Image.MAX_IMAGE_PIXELS = 1000000000 

# Some useful definitions.
class AffiliationWarning(UserWarning):
    pass

class AffiliationError(RuntimeError):
    pass

def validation(source: str):
    """Raises error paper during parsing of source file
    
    Allows checks before parsing TeX code.
    
    Raises AffiliationWarning
    """
    check = mpia.affiliation_verifications(source, verbose=True)
    if check is not True:
        raise AffiliationError("mpia.affiliation_verifications: " + check)

        
warnings.simplefilter('always', AffiliationWarning)

We get the author list from the MPIA website

In [2]:
# !rm -f tmp_mpia_authors.yml

In [3]:
# Getting the list of authors can take sometimes (internet connection)
# Caching the MPIA author list to avoid running this line every time we restart the kernel.
import yaml
try:
    with open('tmp_mpia_authors.yml', 'r') as fin:
        mpia_authors = yaml.load(fin, yaml.BaseLoader)
    print("`mpia.get_mpia_mitarbeiter_list()`: restored from cache")
except FileNotFoundError:
    print("`mpia.get_mpia_mitarbeiter_list()`: cannot be restored from cache.")
    # get list from MPIA website
    # it automatically filters identified non-scientists :func:`mpia.filter_non_scientists`
    mpia_authors = mpia.get_mpia_mitarbeiter_list()
    with open('tmp_mpia_authors.yml', 'w') as fout:
        fout.write(yaml.dump(mpia_authors))

`mpia.get_mpia_mitarbeiter_list()`: restored from cache


We get the paper to debug

In [4]:
which = "2303.12101"
paper = get_paper_from_identifier(which)
paper


|||
|---:|:---|
| [![arXiv](https://img.shields.io/badge/arXiv-2303.12101-b31b1b.svg)](https://arxiv.org/abs/2303.12101) | **Stellar associations powering HII regions $\unicode{x2013}$ I. Defining an evolutionary sequence**  |
|| Fabian Scheuermann, et al. |
|*Appeared on*| *2023-03-21*|
|*Comments*| *15 pages, 12 figures. Accepted for publication in MNRAS*|
|**Abstract**| Connecting the gas in HII regions to the underlying source of the ionizingradiation can help us constrain the physical processes of stellar feedback andhow HII regions evolve over time. With PHANGS$\unicode{x2013}$MUSE we detectnearly 24,000 HII regions across 19 galaxies and measure the physicalproperties of the ionized gas (e.g. metallicity, ionization parameter,density). We use catalogues of multi-scale stellar associations fromPHANGS$\unicode{x2013}$HST to obtain constraints on the age of the ionizingsources. We construct a matched catalogue of 4,177 HII regions that are clearlylinked to a single ionizing association. A weak anti-correlation is observedbetween the association ages and the H$\alpha$ equivalent width EW(H$\alpha$),the H$\alpha$/FUV flux ratio and the ionization parameter, log q. As all threeare expected to decrease as the stellar population ages, this could indicatethat we observe an evolutionary sequence. This interpretation is furthersupported by correlations between all three properties. Interpreting these asevolutionary tracers, we find younger nebulae to be more attenuated by dust andcloser to giant molecular clouds, in line with recent models offeedback-regulated star formation. We also observe strong correlations with thelocal metallicity variations and all three proposed age tracers, suggestive ofstar formation preferentially occurring in locations of locally enhancedmetallicity. Overall, EW(H$\alpha$) and log q show the most consistent trendsand appear to be most reliable tracers for the age of an HII region.|

In [5]:
# Check author list with their initials
normed_author_list = [mpia.get_initials(k) for k in paper['authors']]
normed_mpia_authors = [k[1] for k in mpia_authors]
hl_authors = highlight_authors_in_list(normed_author_list, normed_mpia_authors, verbose=True)
matches = [(hl, orig) for hl, orig in zip(hl_authors, paper['authors']) if 'mark' in hl]
if not matches:
    warnings.warn(AffiliationWarning("WARNING: This paper does not seem to have MPIA authors."))
    
paper['authors'] = hl_authors
paper


|||
|---:|:---|
| [![arXiv](https://img.shields.io/badge/arXiv-2303.12101-b31b1b.svg)](https://arxiv.org/abs/2303.12101) | **Stellar associations powering HII regions $\unicode{x2013}$ I. Defining an evolutionary sequence**  |
|| F. Scheuermann, et al. -- incl., <mark>K. Kreckel</mark>, <mark>S. Hannon</mark>, <mark>E. Schinnerer</mark> |
|*Appeared on*| *2023-03-21*|
|*Comments*| *15 pages, 12 figures. Accepted for publication in MNRAS*|
|**Abstract**| Connecting the gas in HII regions to the underlying source of the ionizingradiation can help us constrain the physical processes of stellar feedback andhow HII regions evolve over time. With PHANGS$\unicode{x2013}$MUSE we detectnearly 24,000 HII regions across 19 galaxies and measure the physicalproperties of the ionized gas (e.g. metallicity, ionization parameter,density). We use catalogues of multi-scale stellar associations fromPHANGS$\unicode{x2013}$HST to obtain constraints on the age of the ionizingsources. We construct a matched catalogue of 4,177 HII regions that are clearlylinked to a single ionizing association. A weak anti-correlation is observedbetween the association ages and the H$\alpha$ equivalent width EW(H$\alpha$),the H$\alpha$/FUV flux ratio and the ionization parameter, log q. As all threeare expected to decrease as the stellar population ages, this could indicatethat we observe an evolutionary sequence. This interpretation is furthersupported by correlations between all three properties. Interpreting these asevolutionary tracers, we find younger nebulae to be more attenuated by dust andcloser to giant molecular clouds, in line with recent models offeedback-regulated star formation. We also observe strong correlations with thelocal metallicity variations and all three proposed age tracers, suggestive ofstar formation preferentially occurring in locations of locally enhancedmetallicity. Overall, EW(H$\alpha$) and log q show the most consistent trendsand appear to be most reliable tracers for the age of an HII region.|

We get the (TeX) source
* retrieve the tarball
* find the main tex file and parse it
* parse for affiliations (but debugging so we do not stop if not found)
* generate the the output markdown

In [6]:
def get_markdown_qrcode(paper_id: str):
    """ Generate a qrcode to the arxiv page using qrserver.com
    
    :param paper: Arxiv paper
    :returns: markdown text
    """
    url = r"https://api.qrserver.com/v1/create-qr-code/?size=100x100&data="
    txt = f"""<img src={url}"https://arxiv.org/abs/{paper_id}">"""
    txt = '<div id="qrcode">' + txt + '</div>'
    return txt

In [7]:
paper_id = f'{which:s}'
folder = f'tmp_{paper_id:s}'

if not os.path.isdir(folder):
    folder = retrieve_document_source(f"{paper_id}", f'tmp_{paper_id}')

try:
    doc = latex.LatexDocument(folder, validation=validation)    
except AffiliationError as affilerror:
    msg = f"ArXiv:{paper_id:s} is not an MPIA paper... " + str(affilerror)
    print(msg)

# Hack because sometimes author parsing does not work well
if (len(doc.authors) != len(paper['authors'])):
    doc._authors = paper['authors']
else:
    # highlight authors (FIXME: doc.highlight_authors)
    # done on arxiv paper already
    doc._authors = highlight_authors_in_list(
        [mpia.get_initials(k) for k in doc.authors], 
        normed_mpia_authors, verbose=True)
if (doc.abstract) in (None, ''):
    doc._abstract = paper['abstract']

doc.comment = get_markdown_badge(paper_id) + " _" + paper['comments'] + "_"

full_md = doc.generate_markdown_text()

full_md += get_markdown_qrcode(paper_id)

# replace citations
try:
    bibdata = latex_bib.LatexBib.from_doc(doc)
    full_md = latex_bib.replace_citations(full_md, bibdata)
except Exception as e:
    raise e


  exec(code_obj, self.user_global_ns, self.user_ns)

  exec(code_obj, self.user_global_ns, self.user_ns)


Found 113 bibliographic references in tmp_2303.12101/main.bbl.


In [8]:
Markdown(full_md)

<div class="macros" style="visibility:hidden;">
$\newcommand{\ensuremath}{}$
$\newcommand{\xspace}{}$
$\newcommand{\object}[1]{\texttt{#1}}$
$\newcommand{\farcs}{{.}''}$
$\newcommand{\farcm}{{.}'}$
$\newcommand{\arcsec}{''}$
$\newcommand{\arcmin}{'}$
$\newcommand{\ion}[2]{#1#2}$
$\newcommand{\textsc}[1]{\textrm{#1}}$
$\newcommand{\hl}[1]{\textrm{#1}}$
$\newcommand{\footnote}[1]{}$
$\newcommand{\uncertainty}[3]{#1^{+#2}_{-#3}}$
$\newcommand{\StoN}{\mathrm{S}/\mathrm{N}}$
$\newcommand$
$\newcommand$
$\newcommand$
$\newcommand$
$\newcommand$
$\newcommand$
$\newcommand$
$\newcommand$
$\newcommand$
$\newcommand$
$\newcommand$
$\newcommand$
$\newcommand$
$\newcommand$
$\newcommand$
$\newcommand$
$\newcommand$
$\newcommand$
$\newcommand$
$\newcommand$
$\newcommand{\change}[1]{{\color{orange}#1}}$
$\newcommand{\thebibliography}{\DeclareRobustCommand{\VAN}[3]{##3}\VANthebibliography}$</div>



<div id="title">

# Stellar associations powering $\HII$ regions -- I. Defining an evolutionary sequence

</div>
<div id="comments">

[![arXiv](https://img.shields.io/badge/arXiv-2303.12101-b31b1b.svg)](https://arxiv.org/abs/2303.12101) _15 pages, 12 figures. Accepted for publication in MNRAS_

</div>
<div id="authors">

F. Scheuermann, et al. -- incl., <mark>K. Kreckel</mark>, <mark>S. Hannon</mark>, <mark>E. Schinnerer</mark>

</div>
<div id="abstract">

**Abstract:** Connecting the gas in $\HII$ regions to the underlying source of the ionizing radiation can help us constrain the physical processes of stellar feedback and how $\HII$ regions evolve over time.With PHANGS--MUSE we detect nearly $\num{24000}$ $\HII$ regions across 19 galaxies and measure the physical properties of the ionized gas (e.g. metallicity, ionization parameter, density).We use catalogues of multi-scale stellar associations from PHANGS-- _HST_ to obtain constraints on the age of the ionizing sources.We construct a matched catalogue of $\num{4177}$ $\HII$ regions that are clearly linked to a single ionizing association.A weak anti-correlation is observed between the association ages and the $\HA$ equivalent width $\EW$ , the $\HA/\FUV$ flux ratio and the ionization parameter, $\log q$ .As all three are expected to decrease as the stellar population ages, this could indicate that we observe an evolutionary sequence.This interpretation is further supported by correlations between all three properties.Interpreting these as evolutionary tracers, we find younger nebulae to be more attenuated by dust and closer to giant molecular clouds, in line with recent models of feedback-regulated star formation.We also observe strong correlations with the local metallicity variations and all three proposed age tracers, suggestive of star formation preferentially occurring in locations of locally enhanced metallicity.Overall, $\EW$ and $\log q$ show the most consistent trends and appear to be most reliable tracers for the age of an $\HII$ region.

</div>

<div id="div_fig1">

<img src="tmp_2303.12101/fig/nebulae_age_tracers_corner.png" alt="Fig6" width="100%"/>

**Figure 6. -** Comparison between the proposed age tracers for the full \HII region catalogue. The nebulae are grouped by their host galaxy, sorted by stellar mass $M_{\star}$, with the median of the entire sample indicated by a black line. The 68 and 98 percentile ranges are shaded in grey. (*fig:age_tracers_corner*)

</div>
<div id="div_fig2">

<img src="tmp_2303.12101/fig/overlap_rgb.png" alt="Fig1" width="100%"/>

**Figure 1. -** Examples for the overlap between the \HII regions and stellar associations in \galaxyname{NGC}{1365}. The cutouts show a three colour composite images, based on the 5 available _HST_ bands, overlaid with the $\HA$ line emission of MUSE in red. The boundaries of the \HII regions are shown in red and the stellar associations in blue. Flags that characterise the overlap between the two catalogues are showcased in this figure. (*fig:overlap*)

</div>
<div id="div_fig3">

<img src="tmp_2303.12101/fig/catalogue_properties_2D_hist.png" alt="Fig2" width="100%"/>

**Figure 2. -** Distribution of masses and ages of the stellar associations in the matched catalogue (the \texttt{one-to-one} sample). The black lines mark the cuts that we apply to the sample to ensure a fully sampled IMF (more massive than $>\SI{e4}{\Msun}$) and to only include young clusters that should be associated with ionized gas (younger than $\leq\SI{8}{\mega\year}$). The mass cut leaves us with a sample of \num{1014} objects and the age cut with \num{3531}. Applying both cuts results in a sample of \num{756} objects. (*fig:catalogue_properties_2D_hist_v2*)

</div><div id="qrcode"><img src=https://api.qrserver.com/v1/create-qr-code/?size=100x100&data="https://arxiv.org/abs/2303.12101"></div>

In [9]:
def export_markdown_summary(md: str, md_fname:str, directory: str):
    """Export MD document and associated relevant images"""
    import os
    import shutil
    import re

    if (os.path.exists(directory) and not os.path.isdir(directory)):
        raise RuntimeError(f"a non-directory file exists with name {directory:s}")

    if (not os.path.exists(directory)):
        print(f"creating directory {directory:s}")
        os.mkdir(directory)

    fig_fnames = (re.compile(r'\[Fig.*\]\((.*)\)').findall(md) + 
                  re.compile(r'\<img src="([^>\s]*)"[^>]*/>').findall(md))
    for fname in fig_fnames:
        if 'http' in fname:
            # No need to copy online figures
            continue
        destdir = os.path.join(directory, os.path.dirname(fname))
        destfname = os.path.join(destdir, os.path.basename(fname))
        try:
            os.makedirs(destdir)
        except FileExistsError:
            pass
        shutil.copy(fname, destfname)
    with open(os.path.join(directory, md_fname), 'w') as fout:
        fout.write(md)
    print("exported in ", os.path.join(directory, md_fname))
    [print("    + " + os.path.join(directory,fk)) for fk in fig_fnames]

In [10]:
export_markdown_summary(full_md, f"{paper_id:s}.md", '_build/html/')

exported in  _build/html/2303.12101.md
    + _build/html/tmp_2303.12101/fig/nebulae_age_tracers_corner.png
    + _build/html/tmp_2303.12101/fig/overlap_rgb.png
    + _build/html/tmp_2303.12101/fig/catalogue_properties_2D_hist.png
