# Talks markdown generator for academicpages

Takes a TSV of talks with metadata and converts them for use with [academicpages.github.io](academicpages.github.io). This is an interactive Jupyter notebook ([see more info here](http://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html)). The core python code is also in `talks.py`. Run either from the `markdown_generator` folder after replacing `talks.tsv` with one containing your data.

TODO: Make this work with BibTex and other databases, rather than Stuart's non-standard TSV format and citation style.

In [226]:
import pandas as pd
import os
import numpy as np

## Data format

The TSV needs to have the following columns: title, type, url_slug, venue, date, location, talk_url, description, with a header at the top. Many of these fields can be blank, but the columns must be in the TSV.

- Fields that cannot be blank: `title`, `url_slug`, `date`. All else can be blank. `type` defaults to "Talk" 
- `date` must be formatted as YYYY-MM-DD.
- `url_slug` will be the descriptive part of the .md file and the permalink URL for the page about the paper. 
    - The .md file will be `YYYY-MM-DD-[url_slug].md` and the permalink will be `https://[yourdomain]/talks/YYYY-MM-DD-[url_slug]`
    - The combination of `url_slug` and `date` must be unique, as it will be the basis for your filenames

This is how the raw file looks (it doesn't look pretty, use a spreadsheet or other program to edit and create).

## Import TSV

Pandas makes this easy with the read_csv function. We are using a TSV, so we specify the separator as a tab, or `\t`.

I found it important to put this data in a tab-separated values format, because there are a lot of commas in this kind of data and comma-separated values can get messed up. However, you can modify the import statement, as pandas also has read_excel(), read_json(), and others.

### importing talks from Google sheets

In [227]:
# Google sheets
sheet_id = "1ozMqEZPNpK4szjnODCX-5H8Ks432nAAXP8Ds19_B-2o"
sheet_name = "my_list_of_talks"
url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"

talks = pd.read_csv(url, header=0)
talks

# Local file
# talks = pd.read_csv("talks.tsv", sep="\t", header=0)
# talks

Unnamed: 0,ID,title,type,url_slug,venue,date,location,talk_url,city,description
0,1,Neutrino trident production at near detectors,invited seminar,,Fermilab Theory Seminar,05 / 18,USA,https://theory.fnal.gov/events/event/title-tba...,Fermilab,
1,2,Leptophilic Z’s in neutrino scattering,invited parallel talk,,Phenomenology Symposium 2018,05 / 18,USA,https://indico.cern.ch/event/699148/contributi...,Pittsburgh,
2,3,Current status of short-baseline oscillations,invited seminar,,Perimeter Institute,06 / 18,Canada,,Waterloo,
3,4,Near detector physics with neutrino experiments,invited talk,,"Near detector workshop 2018, CERN",06 / 18,Switzerland,https://indico.cern.ch/event/721473/overview,CERN,
4,5,Neutrino tridents at DUNE,invited parallel talk,,"NuFact 2018, Virginia",08 / 18,USA,https://indico.phys.vt.edu/event/34/contributi...,Blacksbourg,
...,...,...,...,...,...,...,...,...,...,...
193,194,,,,,,,,,
194,195,,,,,,,,,
195,196,,,,,,,,,
196,197,,,,,,,,,


### Escape special characters

YAML is very picky about how it takes a valid string, so we are replacing single and double quotes (and ampersands) with their HTML encoded equivilents. This makes them look not so readable in raw format, but they are parsed and rendered nicely.

In [228]:
html_escape_table = {
    "&": "&amp;",
    '"': "&quot;",
    "'": "&apos;"
    }

def html_escape(text):
    if type(text) is str:
        return "".join(html_escape_table.get(c,c) for c in text)
    else:
        return "False"

In [229]:
talks = talks.replace(np.nan, '', regex=True)

## Creating the markdown files

This is where the heavy lifting is done. This loops through all the rows in the TSV dataframe, then starts to concatentate a big string (```md```) that contains the markdown for each type. It does the YAML metadata first, then does the description for the individual page.

In [230]:
for f in os.listdir('../_talks'):
    os.remove(os.path.join('../_talks', f))

loc_dict = {}
for row, item in talks.iterrows():
    if item.venue != "":
        print( item.venue)
        year = f'20{item.date[5:]}'
        month = item.date[:2]
        print(month,year)
        if len(str(item.ID)) == 1:
            id_for_sorting = f'000{item.ID}'
        elif len(str(item.ID)) == 2:
            id_for_sorting = f'00{item.ID}'
        elif len(str(item.ID))==3:
            id_for_sorting = f'0{item.ID}'
        else:
            print(f"Invalid ID number {item.ID} with length {len(str(item.ID))}")
            break;
        md_filename = f"{str(id_for_sorting)}_{item.venue}.md"
        html_filename = str(id_for_sorting)
        md = "---\n"
        md += "collection: talks" + "\n"
        md += 'talk_number: "' + str(item.ID) + '"\n'
        md += 'id_for_sorting: "' + str(id_for_sorting) + '"\n'
        md += "permalink: /talks/" + html_filename + "\n"


        if len(str(item.type)) > 1:
            md += f'title: "{item.title}" \n'


        if len(str(item.type)) > 0:
            md += 'type: "' + item.type + '"\n'
        else:
            md += 'type: "talk"\n'

        if len(str(item.venue)) > 0:
            md += 'venue: "' + item.venue + '"\n'

        if len(str(item.date)) > 0:
            md += "date: " + month+"/"+year[2:] + "\n"
        if len(str(item.location)) > 0:
            md += 'location: "' + str(item.location) + '"\n'

        if len(str(item.talk_url)) > 3:
            md += f"link: True \n"         
            md += f"talk_url: {str(item.talk_url)} \n" 

        md += "---\n"


        if len(str(item.talk_url)) > 3:
            md += "\n[More information here](" + item.talk_url + ")\n" 


        if len(str(item.description)) > 3:
            md += "\n" + html_escape(item.description) + "\n"


        md_filename = os.path.basename(md_filename)
        with open("../_talks/" + md_filename, 'w') as f:
            f.write(md)

Fermilab Theory Seminar
05 2018
Phenomenology Symposium 2018
05 2018
Perimeter Institute
06 2018
Near detector workshop 2018, CERN
06 2018
NuFact 2018, Virginia
08 2018
Neutrino Oscillation Workshop 2018
09 2018
Max-Planck-Institut fur Kernphysik, Heidelberg
11 2018
Physics Opportunities at the Near Detector of DUNE (PONDD), Fermilab
12 2018
Queen Mary University of London
03 2019
Prospects of Neutrino Physics, IPMU
04 2019
IFIC, Valencia
05 2019
Neutrino Theory Network Workshop, Washington U., St Louis
05 2019
Invisibles Workshop 2019, Valencia
06 2019
MicroBooNE collaboration call
08 2019
Columbia University
08 2019
CERN Neutrino Platform Week 2019
10 2019
NuPhys 2019
12 2019
Fermilab Theory Seminar
02 2020
Brookhaven Neutrino Theory Virtual Seminars
05 2020
Phenomenology Symposium 2020, Pittsburgh
05 2020
JGU Theorie Palaver, Mainz
06 2020
Neutrino 2020, University of Chicago
06 2020
ICHEP 2020
07 2020
Snowmass Neutrino Frontier 03 kick-off meeting
09 2020
Snowmass Theory of neutrin

These files are in the talks directory, one directory below where we're working from.

In [231]:
!ls ../_talks

0001_Fermilab Theory Seminar.md
0002_Phenomenology Symposium 2018.md
0003_Perimeter Institute.md
0004_Near detector workshop 2018, CERN.md
0005_NuFact 2018, Virginia.md
0006_Neutrino Oscillation Workshop 2018.md
0007_Max-Planck-Institut fur Kernphysik, Heidelberg.md
0008_Physics Opportunities at the Near Detector of DUNE (PONDD), Fermilab.md
0009_Queen Mary University of London.md
0010_Prospects of Neutrino Physics, IPMU.md
0011_IFIC, Valencia.md
0012_Neutrino Theory Network Workshop, Washington U., St Louis.md
0013_Invisibles Workshop 2019, Valencia.md
0014_MicroBooNE collaboration call.md
0015_Columbia University.md
0016_CERN Neutrino Platform Week 2019.md
0017_NuPhys 2019.md
0018_Fermilab Theory Seminar.md
0019_Brookhaven Neutrino Theory Virtual Seminars.md
0020_Phenomenology Symposium 2020, Pittsburgh.md
0021_JGU Theorie Palaver, Mainz.md
0022_Neutrino 2020, University of Chicago.md
0023_ICHEP 2020.md
0024_Snowmass Neutrino Frontier 03 kick-off meeting.md
0025_Snowmass Theory of ne

## Creating a CV list

In [232]:
import datetime

In [233]:
talks['type'].unique().sort()

In [242]:
nice_list = ''
space = '\\vspace{2ex}'
sorted_list = ['plenary talk', 'invited talk', 'invited parallel talk', 'parallel talk', 'invited seminar']
for category in sorted_list:
    
    if len(category)>0:    
        # nice_list += '\\begin{minipage}{\\textwidth}\n'
        # nice_list += f'{space}\n'
        nice_list += f'\\textBF{{{category.capitalize()}s}}\n'
        # nice_list += f'{space}\n\n'
        # nice_list += '\\begin{enumerate}\n'
        nice_list += '\\begin{longtable}{r c p{14 cm}}\n'
        mask = (talks['type'] == category)
        for id, talk in talks[mask].iterrows():
            list_date = talk['date'].split(' / ')
            date = datetime.date(int('20'+list_date[1]),int(list_date[0]), 1).strftime('%B %Y')
            nice_list += f"{date}& --- &"+f"{ talk['venue']}, ".replace("&","\&")
            if talk['city'] != '' and talk['city'] != 'Virtual':
                nice_list += f"{talk['city']}, ".replace("&","\&")
            nice_list += f"{talk['location']} \\\\ \n".replace("&","\&")

        # nice_list += '\end{enumerate}\n'
        nice_list += '\end{longtable}\n'
        # nice_list += '\\end{minipage}\n\n'


In [243]:
print(nice_list)

\textBF{Plenary talks}
\begin{longtable}{r c p{14 cm}}
April 2019& --- &Prospects of Neutrino Physics, IPMU, Kashiwa, Japan \\ 
October 2019& --- &CERN Neutrino Platform Week 2019, CERN, Switzerland \\ 
December 2019& --- &NuPhys 2019, London, UK \\ 
November 2020& --- &Central American meeting of High Energy Physics, Cosmology and High Energy Astrophysics, Cidade da Guatemala, Central America \\ 
October 2020& --- &3rd South American Dark Matter Workshop, ICTP, São Paulo, Brazil \\ 
March 2022& --- &KITP, Interdisciplinary Developments in Neutrino Physics, Santa Barbara, USA \\ 
\end{longtable}
\textBF{Invited talks}
\begin{longtable}{r c p{14 cm}}
June 2018& --- &Near detector workshop 2018, CERN, CERN, Switzerland \\ 
December 2018& --- &Physics Opportunities at the Near Detector of DUNE (PONDD), Fermilab, Fermilab, USA \\ 
May 2019& --- &Neutrino Theory Network Workshop, Washington U., St Louis, St Louis, USA \\ 
September 2020& --- &Snowmass Neutrino Frontier 03 kick-off meeting, 