# Publications markdown generator for academicpages

Takes a set of bibtex of publications and converts them for use with [academicpages.github.io](academicpages.github.io). This is an interactive Jupyter notebook ([see more info here](http://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html)). 

The core python code is also in `pubsFromBibs.py`. 
Run either from the `markdown_generator` folder after replacing updating the publist dictionary with:
* bib file names
* specific venue keys based on your bib file preferences
* any specific pre-text for specific files
* Collection Name (future feature)

TODO: Make this work with other databases of citations, 
TODO: Merge this with the existing TSV parsing solution

In [38]:
!pip install pybtex
from pybtex.database.input import bibtex
import pybtex.database.input.bibtex 
from time import strptime
import string
import html
import os
import re
import calendar




[notice] A new release of pip is available: 24.1.2 -> 25.0.1
[notice] To update, run: C:\Users\bscuser\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [39]:
html_escape_table = {
    "&": "&amp;",
    '"': "&quot;",
    "'": "&apos;"
    }

def html_escape(text):
    """Produce entities within text."""
    return "".join(html_escape_table.get(c,c) for c in text)

import shutil
mydir = "../_publications"
if os.path.exists(mydir):
    shutil.rmtree(mydir)
os.makedirs(mydir)

In [40]:
def process_pubs(type):
    cnt_pub = 0
    for pubsource in publist:
        parser = bibtex.Parser()
        bibdata = parser.parse_file(publist[pubsource]["file"])

        #loop through the individual references in a given bibtex file
        for bib_id in bibdata.entries:
            #reset default date
            pub_year = "1900"
            pub_month = "01"
            pub_day = "01"
            
            b = bibdata.entries[bib_id].fields
            
            try:
                if "year" in b:
                    pub_year = f'{b["year"]}'
                    if "month" in b.keys(): 
                        if(len(b["month"])<3):
                            pub_month = "0"+b["month"]
                            pub_month = pub_month[-2:]
                        elif(b["month"] not in range(12)):
                            tmnth = strptime(b["month"][:3],'%b').tm_mon   
                            pub_month = "{:02d}".format(tmnth) 
                        else:
                            pub_month = str(b["month"])
                    if "day" in b.keys(): 
                        pub_day = str(b["day"])

                elif "date" in b:
                    aux = str(b["date"]).split("-")
                    pub_year = str(aux[0])
                    pub_month = str(aux[1])
                    if len(aux) == 3:
                        pub_day = str(aux[2])

                pub_date = pub_year + "-" + pub_month + "-" + pub_day
                
                #strip out {} as needed (some bibtex entries that maintain formatting)
                clean_title = b["title"].replace("{", "").replace("}","").replace("\\","").replace(" ","-")    

                url_slug = re.sub("\\[.*\\]|[^a-zA-Z0-9_-]", "", clean_title)
                url_slug = url_slug.replace("--","-")

                md_filename = (str(pub_date) + "-" + url_slug + ".md").replace("--","-")
                html_filename = (str(pub_date) + "-" + url_slug).replace("--","-")

                #Build Citation from text
                citation = ""

                #citation authors - todo - add highlighting for primary author?
                n_authors = len(bibdata.entries[bib_id].persons["author"])
                cnt = 0
                for author in bibdata.entries[bib_id].persons["author"]:
                    cnt += 1
                    if author.last_names[0] == "Iserte":
                        citation = citation +" <strong>" + author.first_names[0][0] + ". "+author.last_names[0] + "</strong>"
                    else:
                        citation = citation + " " + author.first_names[0][0] + ". " + author.last_names[0]
                    if cnt == n_authors-1:
                        citation = citation + ", and "
                    else:
                        citation = citation + ", "
                        
                #citation title
                citation = citation + "\"" + html_escape(b["title"].replace("{", "").replace("}","").replace("\\","")) + ".\""

                #add venue logic depending on citation type
                venue = "<em>" + publist[pubsource]["venue-pretext"]+b[publist[pubsource]["venuekey"]].replace("{", "").replace("}","").replace("\\","") + "</em>"

                if "volume" in b:
                    venue = venue + "(" + str(b["volume"]) + ")"

                if "pages" in b:
                    venue = venue + ", pp. " + str(b["pages"])

                citation = citation + " " + html_escape(venue)
                month_name = calendar.month_name[int(pub_month)]
                if month_name == "May":
                    month = "May"
                else:
                    month = month_name[0:3] + "."
                citation = citation + ", " + month + " " + pub_year + "."

                if "issn" in b:
                    citation = citation + " ISSN: " + str(b["issn"]) + "."

                ## YAML variables
                md = "---\ntitle: \""   + html_escape(b["title"].replace("{", "").replace("}","").replace("\\","")) + '"\n'
                md += """collection: """ +  publist[pubsource]["collection"]["name"]
                md += """\npermalink: """ + publist[pubsource]["collection"]["permalink"]  + html_filename
                md += "\ntype: \"" + type + "\""
                
                note = False
                if "note" in b.keys():
                    if len(str(b["note"])) > 5:
                        md += "\nexcerpt: '" + html_escape(b["note"]) + "'"
                        note = True

                md += "\ndate: " + str(pub_date) 

                md += "\nvenue: '" + html_escape(venue) + "'"
                
                if "doi" in b:
                        md += "\npaperurl: 'https://doi.org/" + b["doi"] + "'"
                elif "url" in b.keys():
                    if len(str(b["url"])) > 5:
                        md += "\npaperurl: '" + b["url"] + "'"

                md += "\ncitation: '" + html_escape(citation) + "'"

                md += "\n---"

                
                ## Markdown description for individual page
                """
                if note:
                    md += "\n" + html_escape(b["note"]) + "\n"

                if url:
                    md += "\n[Access paper here](" + b["url"] + "){:target=\"_blank\"}\n" 
                else:
                    md += "\nUse [Google Scholar](https://scholar.google.com/scholar?q="+html.escape(clean_title.replace("-","+"))+"){:target=\"_blank\"} for full citation"
                """

                md_filename = os.path.basename(md_filename)

                with open(mydir + "/" + md_filename, 'w', encoding="utf-8") as f:
                    f.write(md)
                print(f'SUCESSFULLY PARSED {bib_id}: \"', b["title"][:60],"..."*(len(b['title'])>60),"\"")
            # field may not exist for a reference
            except KeyError as e:
                print(f'WARNING Missing Expected Field {e} from entry {bib_id}: \"', b["title"][:30],"..."*(len(b['title'])>30),"\"")
                continue
            cnt_pub += 1
    return cnt_pub

In [41]:
type="poster"
publist = {
    "journal":{
        "file": "bib/" + type + ".bib",
        "venuekey" : "booktitle",
        "venue-pretext" : "",
        "collection" : {"name":"publications",
                        "permalink":"/publication/"}
    } 
}
print(process_pubs(type))

SUCESSFULLY PARSED iserte2024dynamic: " Dynamic Resources Utilization in Malleable Flooding Simulati ... "
SUCESSFULLY PARSED dutot2024leveraging: " Leveraging Dynamic Resource Management in HPC  "
SUCESSFULLY PARSED usman2023bluefield: " BlueField DPU Programming using OpenMP Offloading  "
SUCESSFULLY PARSED martin2021malleability: " Malleability Implementation in a MPI Iterative Method  "
SUCESSFULLY PARSED iserte2018boosting: " Boosting Productivity through Efficient Resource Management  "
SUCESSFULLY PARSED iserte2018mpi: " MPI Malleability Integration into a Bioinformatics Tool  "
SUCESSFULLY PARSED iserte2018high: " High-throughput Computation through MPI Malleability  "
SUCESSFULLY PARSED silla2016benefits: " Benefits of remote GPU virtualization: the rCUDA perspective  "
SUCESSFULLY PARSED iserte2015gpu: " GPU Virtualization in the Cloud  "
9


In [42]:
type="conference"
publist = {
    "journal":{
        "file": "bib/" + type + ".bib",
        "venuekey" : "booktitle",
        "venue-pretext" : "",
        "collection" : {"name":"publications",
                        "permalink":"/publication/"}
    } 
}
print(process_pubs(type))

SUCESSFULLY PARSED dolz_energysaving_2011: " {EnergySaving} Cluster Experience in {CETA}-{CIEMAT}  "
SUCESSFULLY PARSED iserte_slurm_2014: " Slurm Support for Remote {GPU} Virtualization: Implementatio ... "
SUCESSFULLY PARSED iserte_enabling_2016: " Enabling {GPU} Virtualization in Cloud Environments  "
SUCESSFULLY PARSED majo_distributed_2021: " A Distributed Mesh Generation Study Case through a Customiza ... "
SUCESSFULLY PARSED martin-alvarez_configurable_2023: " Configurable Synthetic Application for Studying Malleability ... "
SUCESSFULLY PARSED castello_accessible_2018: " Accessible C-programming course from scratch using a {MOOC}  ... "
SUCESSFULLY PARSED tomas_learning_2021: " Learning Databases Using Project-based Learning  "
SUCESSFULLY PARSED iserte_increasing_2016: " Increasing the Performance of Data Centers by Combining Remo ... "
SUCESSFULLY PARSED halbiniak_unleashing_2024: " Unleashing the Potential of Mixed Precision in {AI}-Accelera ... "
SUCESSFULLY PARSED iserte_t

In [43]:
type="national"
publist = {
    "journal":{
        "file": "bib/" + type + ".bib",
        "venuekey" : "booktitle",
        "venue-pretext" : "",
        "collection" : {"name":"publications",
                        "permalink":"/publication/"}
    } 
}
print(process_pubs(type))

SUCESSFULLY PARSED iserte2023maleabilidad: " Maleabilidad MPI basada en la eficiencia paralela  "
SUCESSFULLY PARSED martin-alvarez2023analisis: " Análisis de métodos de redistribución de datos para aplicaci ... "
SUCESSFULLY PARSED gonzalez-barbera2023estudio: " Estudio del rendimiento en entrenamientos distribuidos para  ... "
SUCESSFULLY PARSED martin-alvarez2022aplicacion: " Aplicación sintética para el estudio de maleabilidad en comp ... "
SUCESSFULLY PARSED climent-agustina2019desarrollo: " Desarrollo de una herramienta de simulación computacional 3D ... "
SUCESSFULLY PARSED iserte2017camino: " El camino desde la maleabilidad MPI hasta las cargas de trab ... "
SUCESSFULLY PARSED iserte2015comparativa: " Comparativa de políticas de selección de GPUs remotas en clu ... "
SUCESSFULLY PARSED iserte2014extendiendo: " Extendiendo SLURM con soporte para el uso de GPUs remotas  "
SUCESSFULLY PARSED iserte2013planificador: " Un planificador de GPUs remotas para clusters HPC  "
SUCESSFULLY

In [44]:
type="workshop"
publist = {
    "journal":{
        "file": "bib/" + type + ".bib",
        "venuekey" : "booktitle",
        "venue-pretext" : "",
        "collection" : {"name":"publications",
                        "permalink":"/publication/"}
    } 
}
print(process_pubs(type))

SUCESSFULLY PARSED carratala_teaching_2019: " Teaching on Demand: an {HPC} Experience  "
SUCESSFULLY PARSED iserte_dynamic_2016: " Dynamic Management of Resource Allocation for {OmpSs} Jobs  "
SUCESSFULLY PARSED iserte_productivity-enhancing_2018: " Productivity-enhancing malleability for {HPC} applications  "
SUCESSFULLY PARSED iserte_boosting_2018: " Boosting Productivity through Efficient Resource Management  "
SUCESSFULLY PARSED martin_efficient_2023: " Efficient Data Redistribution for Malleable Applications  "
SUCESSFULLY PARSED usman_dpu_2023: " {DPU} Offloading Programming with the {OpenMP} {API}  "
SUCESSFULLY PARSED dolz_flexible_2011: " A Flexible Simulator to Evaluate a Power Saving System for { ... "
SUCESSFULLY PARSED silla_remote_2016: " Remote {GPU} Virtualization: Is It Useful?  "
SUCESSFULLY PARSED iserte_efficient_2017: " Efficient Scalable Computing through Flexible Applications a ... "
SUCESSFULLY PARSED rosciszewski_adaptation_2022: " Adaptation of {AI}-accelerate

In [45]:
type="book"
publist = {
    "journal":{
        "file": "bib/" + type + ".bib",
        "venuekey" : "booktitle",
        "venue-pretext" : "",
        "collection" : {"name":"publications",
                        "permalink":"/publication/"}
    } 
}
print(process_pubs(type))

SUCESSFULLY PARSED iserte2022study: " A Study on the Resource Utilization and User Behavior on Tit ... "
SUCESSFULLY PARSED iserte2021construya: " Construya su propio supercomputador con Raspberri Pi  "
2


In [46]:
type="journal"
publist = {
    "journal":{
        "file": "bib/" + type + ".bib",
        "venuekey" : "journaltitle",
        "venue-pretext" : "",
        "collection" : {"name":"publications",
                        "permalink":"/publication/"}
    } 
}
print(process_pubs(type))

SUCESSFULLY PARSED tarraf_malleability_2024: " Malleability in Modern {HPC} Systems: Current Experiences, C ... "
SUCESSFULLY PARSED iserte_complete_2022: " Complete Integration of Team Project-based Learning into a D ... "
SUCESSFULLY PARSED dolz_simulator_2012: " A Simulator to Assess Energy Saving Strategies and Policies  ... "
SUCESSFULLY PARSED silla_benefits_2017: " On the Benefits of the Remote {GPU} Virtualization Mechanism ... "
SUCESSFULLY PARSED iserte_gsaas_2018: " {GSaaS}: A Service to Cloudify and Schedule {GPUs}  "
SUCESSFULLY PARSED silla_improving_2019: " Improving the Management Efficiency of {GPU} Workloads in Da ... "
SUCESSFULLY PARSED iserte_dmr_2018: " {DMR} {API}: Improving Cluster Productivity by Turning Appli ... "
SUCESSFULLY PARSED iserte_dynamic_2018: " Dynamic Reconfiguration of Non-iterative Scientific Applicat ... "
SUCESSFULLY PARSED iserte_dmrlib_2020: " {DMRlib}: Easy-coding and Efficient Resource Management for  ... "
SUCESSFULLY PARSED aliaga_survey