# Publications markdown generator for academicpages

Takes a set of bibtex of publications and converts them for use with [academicpages.github.io](academicpages.github.io). This is an interactive Jupyter notebook ([see more info here](http://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html)). 

The core python code is also in `pubsFromBibs.py`. 
Run either from the `markdown_generator` folder after replacing updating the publist dictionary with:
* bib file names
* specific venue keys based on your bib file preferences
* any specific pre-text for specific files
* Collection Name (future feature)

TODO: Make this work with other databases of citations, 
TODO: Merge this with the existing TSV parsing solution

In [2]:
# installing dependencies - dr-omer
!pip install pybtex

Collecting pybtex
  Downloading https://files.pythonhosted.org/packages/94/2a/11039970561f1bbc74fbaca89b59c26b398a0a70bba8caad553ac779b4f7/pybtex-0.22.2-py2.py3-none-any.whl (279kB)
[K    100% |################################| 286kB 542kB/s ta 0:00:01
Collecting latexcodec>=1.0.4 (from pybtex)
  Downloading https://files.pythonhosted.org/packages/0a/76/9552dfc6b74c2d6c3f199e927d41998dc1e561b7cbe4af7e7247388e17e8/latexcodec-2.0.1-py2.py3-none-any.whl
Installing collected packages: latexcodec, pybtex
Successfully installed latexcodec-2.0.1 pybtex-0.22.2
[33mYou are using pip version 9.0.3, however version 20.2.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [81]:
from pybtex.database.input import bibtex
import pybtex.database.input.bibtex 
from time import strptime
import string
import html
import os
import re

import warnings
warnings.filterwarnings('ignore')

In [34]:
#todo: incorporate different collection types rather than a catch all publications, requires other changes to template
publist = {
    "proceeding": {
        "file" : "omerbib.bib",
        "venuekey": "booktitle",
        "venue-pretext": "In the proceedings of ",
        "collection" : {"name":"publications",
                        "permalink":"/publication/"}
        
    },
    "journal":{
        "file": "omerbib.bib",
        "venuekey" : "journal",
        "venue-pretext" : "",
        "collection" : {"name":"publications",
                        "permalink":"/publication/"}
    } 
}

In [35]:
html_escape_table = {
    "&": "&amp;",
    '"': "&quot;",
    "'": "&apos;"
    }

def html_escape(text):
    """Produce entities within text."""
    return "".join(html_escape_table.get(c,c) for c in text)

In [88]:
for pubsource in publist:
    parser = bibtex.Parser()
    bibdata = parser.parse_file(publist[pubsource]["file"])

    #loop through the individual references in a given bibtex file
    for bib_id in bibdata.entries:
        #reset default date
        pub_year = "1900"
        pub_month = "01"
        pub_day = "01"
        
        b = bibdata.entries[bib_id].fields
        
        try:
            pub_year = f'{b["year"]}'

            #todo: this hack for month and day needs some cleanup
            if "month" in b.keys(): 
                if(len(b["month"])<3):
                    pub_month = "0"+b["month"]
                    pub_month = pub_month[-2:]
                elif(b["month"] not in range(12)):
                    tmnth = strptime(b["month"][:3],'%b').tm_mon   
                    pub_month = "{:02d}".format(tmnth) 
                else:
                    pub_month = str(b["month"])
            if "day" in b.keys(): 
                pub_day = str(b["day"])

            if pub_month == '0': # solve error of 0 in month
                pub_month='01'
            
            pub_date = pub_year+"-"+pub_month+"-"+pub_day
            print(pub_date)
            
            #strip out {} as needed (some bibtex entries that maintain formatting)
            clean_title = b["title"].replace("{", "").replace("}","").replace("\\","").replace(" ","-")    

            url_slug = re.sub("\\[.*\\]|[^a-zA-Z0-9_-]", "", clean_title)
            url_slug = url_slug.replace("--","-")

            md_filename = (str(pub_date) + "-" + url_slug + ".md").replace("--","-")
            html_filename = (str(pub_date) + "-" + url_slug).replace("--","-")
            print(md_filename)

            #Build Citation from text
            citation = ""

            #citation authors - todo - add highlighting for primary author?
            for author in bibdata.entries[bib_id].persons["author"]:
                try:
                    citation = citation+" "+author.first_names[0]+" "+author.last_names[0]+", "
                except:
                    raise Exception(f'{bib_id} + {author.last_names[0]}')
                    
                    

            #citation title
            citation = citation + "\"" + html_escape(b["title"].replace("{", "").replace("}","").replace("\\","")) + ".\""

            #add venue logic depending on citation type
            venue = publist[pubsource]["venue-pretext"]+b[publist[pubsource]["venuekey"]].replace("{", "").replace("}","").replace("\\","")

            citation = citation + " " + html_escape(venue)
            citation = citation + ", " + pub_year + "."

            
            ## YAML variables
            md = "---\ntitle: \""   + html_escape(b["title"].replace("{", "").replace("}","").replace("\\","")) + '"\n'
            
            md += """collection: """ +  publist[pubsource]["collection"]["name"]

            md += """\npermalink: """ + publist[pubsource]["collection"]["permalink"]  + html_filename
            
            note = False
            if "note" in b.keys():
                if len(str(b["note"])) > 5:
                    md += "\nexcerpt: '" + html_escape(b["note"]) + "'"
                    note = True

            md += "\ndate: " + str(pub_date) 

            md += "\nvenue: '" + html_escape(venue) + "'"
            
            url = False
            if "url" in b.keys():
                if len(str(b["url"])) > 5:
                    md += "\npaperurl: '" + b["url"] + "'"
                    url = True

            md += "\ncitation: '" + html_escape(citation) + "'"

            md += "\n---"

            
            ## Markdown description for individual page
            if note:
                md += "\n" + html_escape(b["note"]) + "\n"

            if url:
                md += "\n[Access paper here](" + b["url"] + "){:target=\"_blank\"}\n" 
            else:
                md += "\nUse [Google Scholar](https://scholar.google.com/scholar?q="+html.escape(clean_title.replace("-","+"))+"){:target=\"_blank\"} for full citation"

            md_filename = os.path.basename(md_filename)

            with open("../_publications/" + md_filename, 'w') as f:
                try: 
                    f.write(md)
                except: 
                    raise Exception(f'{bib_id}')
                                    
            print(f'SUCESSFULLY PARSED {bib_id}: \"', b["title"][:60],"..."*(len(b['title'])>60),"\"")
        # field may not exist for a reference
        except KeyError as e:
            print(f'WARNING Missing Expected Field {e} from entry {bib_id}: \"', b["title"][:30],"..."*(len(b['title'])>30),"\"")
            continue


2013-01-01
2019-07-01
2019-11-01
SUCESSFULLY PARSED 9001965: " Fault Signal Detection of Linear Actuators based on Intellig ... "
2019-05-01
SUCESSFULLY PARSED 8861588: " Automated Classification of Retinal Diseases in STARE Databa ... "
2019-05-01
SUCESSFULLY PARSED 8861886: " A comparison of lumped parameter models of modified hybrid e ... "
2019-05-01
SUCESSFULLY PARSED 8861835: " Comparative Analysis of Classifiers for EMG Signals  "
2019-03-01
SUCESSFULLY PARSED 8680994: " A Novel Framework to Segment out Cervical Vertebrae  "
2020-01-01
2020-01-01
2020-01-01
2020-01-01
2020-01-01
2020-01-01
2020-01-01
2019-01-01
2019-01-01
2019-01-01
2019-01-01
2019-01-01
2019-01-01
2019-01-01
SUCESSFULLY PARSED tooba19Conf: " Automated Classification of Retinal Diseases in STARE Databa ... "
2019-01-01
SUCESSFULLY PARSED Nasir19RTCSECconf: " Efficient energy utilization in wireless sensor networks: an ... "
2019-01-01
2018-01-01
2018-01-01
2018-01-01
2018-01-01
2018-01-01
2018-01-01
2018-01-01
2

2013-01-01
2019-07-01
2019-11-01
2019-05-01
2019-05-01
2019-05-01
2019-03-01
2020-01-01
SUCESSFULLY PARSED TANVEER2020117976: " Improving fuel cell performance via optimal parameters ident ... "
2020-01-01
SUCESSFULLY PARSED senergies2020: " Impact of an Energy Monitoring System on the Energy Efficien ... "
2020-01-01
SUCESSFULLY PARSED ashraf2020: " Determination of Optimum Segmentation Schemes for Pattern Re ... "
2020-01-01
SUCESSFULLY PARSED s20061642: " Performance Evaluation of Convolutional Neural Network for H ... "
2020-01-01
SUCESSFULLY PARSED KHAN2020106607: " Photo detector-based indoor positioning systems variants: A  ... "
2020-01-01
SUCESSFULLY PARSED s20061601: " Skin Lesion Segmentation from Dermoscopic Images Using Convo ... "
2020-01-01
SUCESSFULLY PARSED 8993799: " Design of Robust Higher Order Repetitive Controller Using Ph ... "
2019-01-01
SUCESSFULLY PARSED 8807102: " A Robust Scheme of Vertebrae Segmentation for Medical Diagno ... "
2019-01-01
SUCESSFULLY PARSED

SUCESSFULLY PARSED ur2016adaptive: " Adaptive Thresholding Technique for Segmentation and Juxtapl ... "
2017-01-01
SUCESSFULLY PARSED ali2017design: " Design and Comparison of PID and Proportional Resonant Contr ... "
2016-01-01
SUCESSFULLY PARSED yousaf2016prototype: " Prototype Development to Detect Electric Theft using PIC18F4 ... "
2014-01-01
SUCESSFULLY PARSED waris2014classification: " Classification of Functional Motions of Hand for Upper Limb  ... "
2015-01-01
SUCESSFULLY PARSED khan2015collaborative: " Collaborative Optimal Reciprocal Collision Avoidance for Mob ... "
2015-01-01
SUCESSFULLY PARSED faisal2015iterative: " Iterative Linear Quadratic Regulator (ILQR) controller for t ... "
2016-01-01
SUCESSFULLY PARSED hassan2016review: " Review of fiducial and non-fiducial techniques of feature ex ... "
2013-01-01
SUCESSFULLY PARSED waris2013control: " Control of Upper Limb Active Prosthesis Using Surface Electr ... "
2017-01-01
2017-01-01
2017-01-01
2017-01-01
2017-01-01
2017-01