In [None]:
from jupyter2tex import setup

title = "A Jupyter to Latex Transpiler for Easy Paper Writing"
authors = ["Nilesh Gupta", "Aditya Parulekar"]
abstract = """Most papers these days are written in \LaTeX, a mathematical typesetting system. However, \LaTeX is a little limited since you can't easily define and use variables to dynamically generate tables and plots. To do so, users are forced to rely on other sources, such as python scripts that generate tables or plots, save the outputs of these scripts to image files, and then statically import them into their manuscripts. This is often a time-consuming process, and requires many different systems to be run in sequence. This work aims to streamline this process by allowing the user to work entirely within an IPython notebook to create a \LaTeX pdf output."""
setup(title=title, authors = authors, abstract = abstract, style_file = 'colm')

# Introduction
Jupyter Notebooks are an indispensable tool for machine learning (ML) practitioners, allowing for the integration of live code, equations, visualizations, and text in a single document. However, when it comes to publishing research, the manuscript is published in a specifically formatted PDF (required by conferences like NeurIPS). With this project, we aim to build a transpiler that can take a jupyter notebook with an extended syntax and automatically convert it into a formatted PDF (more specifically, a given conference format). This will enable researchers to streamline the process of writing and embedding dynamic contents of a research document (such as plots, tables, AI-generated output, code, etc) into the final conference-ready PDF in a seamless way.

## Motivation
When writing papers for ML conferences, figures and tables containing experiment data are obviously very important, both for displaying the main results of the work and for aiding the exposition of the paper. Therefore, the exact form and details of these figures are often a matter of intense scrutiny from all of the authors of the paper. This often leads to many rounds of iterating and reiterating on the figures, which leads to a lot of back and forth between multiple platforms: the python script that was used to generate the data, the plotting tool to create the figure, and then copying the figure over into the file system of the \LaTeX editor being used. This is a tedious process. Here, we propose the use of a single platform, in this case, Jupyter Notebooks (commonly used as the coding/plotting platform), which is then directly converted into a final Latex document/PDF that is styled to the appropriate conference's style file. Moreover, with the rising capabilities of AI assistant tools, one can also potentially directly incorporate their outputs in the final PDF.

# Features
Since the whole point of this project was to streamline the writing process, we wanted to minimize the number of issues that the user would run into while doing most standard things. To this end, we wanted to ensure a few things:
1. We wanted to implement most common features used by \LaTeX, including things like enumeration and itemization, code blocks, math mode (of course), section headings, subsection headings, etc. To do this, we consulted markdown guides and implemented most features that are available to the user there.
2. We also wanted to make sure that the notebook that was being used to create the pdf was still runnable. That is, the user could entirely work in that notebook, without having to run our tool in intermediate stages to check the outputs of certain figures. 
3. We wanted the API to be simple and easy to understand

Following these guidelines, here is a full list of features we have implemented:

1. Sections, subsections, subsubsections.
2. Code blocks
3. **Boldface**, *italics*, and math mode, like: $x^2 + 3 = 42$
    1. both inline math and $$dX_t = (-f(x) + \nabla s_t(X_t)) dt + dB_t\qquad \text{display style}$$
4. Enumeration and Itemization, including nested enumerations and nested itemizations
    1. For example, a second layer of enumeration is possible

# API

First, to start each file, we require the user to run all cells that they need displayed. This ensures that any plots or data generated are actually created. This also gives the user flexibility to have other code in their notebook that does not make it to the final pdf. For example, they may want to train some ML model and then save the weights to a file, which are then accessed at inference time. We don't want to run this code everytime the pdf is regenerated.

## Preamble
In the first cell, we require a call to `setup`, which takes in four optional arguments:
- title : the title of the manuscript. Defaults to "Title". 
- authors : the authors, as a python list. Defaults to the empty list.
- abstract : an abstract for the work. Defaults to an empty string.
- style_file : a style file for the \LaTeX output.

These setup our \LaTeX file, and generate the necessary preamble and packages imports that are needed for the rest of the \LaTeX generation. 



# Techniques

First, to extract the text of the cells of a IPython notebook, we used a library called `JupyterNotebookParser`. After this, largely, our work involved parsing markdown text sources by matching strings using regex, and based on matches to various markdown patterns, handling their outputs in \LaTeX.  There were a few notable exceptions to this simple strategy. 

## Enumeration and Itemization
Although still a parse-and-replace strategy at heart, these fields were notoriously challenging to implement. Enumeration in particular has a lot of freedom in markdown. In fact, all markdown needs to begin an enumerated list is a line beginning with a `1.`. All future lines which begin with a numeral are then the next items in the enumeration. Further, nesting or un-nesting can happen at any time. To handle this, we store a enumeration state to know which level we are in, and how many levels of enumeration we need to exit or create based on the next line of markdown. Itemize is similar, but simpler since there is no need to keep track of numbers.

## API calls
For our provided API calls, `display_figure`, and `display_table`, we needed to do two things: first, we wanted to display the figure or table in the IPython notebook itself. Secondly, we needed to generate the relevant \LaTeX for creating the figure or table. When the relevant IPython cell is run, we dump this tex output to a named temporary file, which is then accessed when generating the \LaTeX source and dropped in. `setup` works similarly, but requires less effort since there is no required output for the IPython notebook itself.

In [2]:
#%capture code
for i in range(1, 101):
    if i % 3 == 0 and i % 5 == 0:
        print("FizzBuzz")
    elif i % 3 == 0:
        print("Fizz")
    elif i % 5 == 0:
        print("Buzz")
    else:
        print(i)

1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz
16
17
Fizz
19
Buzz
Fizz
22
23
Fizz
Buzz
26
Fizz
28
29
FizzBuzz
31
32
Fizz
34
Buzz
Fizz
37
38
Fizz
Buzz
41
Fizz
43
44
FizzBuzz
46
47
Fizz
49
Buzz
Fizz
52
53
Fizz
Buzz
56
Fizz
58
59
FizzBuzz
61
62
Fizz
64
Buzz
Fizz
67
68
Fizz
Buzz
71
Fizz
73
74
FizzBuzz
76
77
Fizz
79
Buzz
Fizz
82
83
Fizz
Buzz
86
Fizz
88
89
FizzBuzz
91
92
Fizz
94
Buzz
Fizz
97
98
Fizz
Buzz


# Conclusion
We now have a tool that takes as input a Jupyter Notebook (an .ipynb file) and outputs a \LaTeX (.tex) file which has the same content as the jupyter notebook and is styled to the appropriate conference's format requirements. To prove its worth and usability, **we wrote this report entirely using Jupyter2Tex**. In fact, in using this tool, we noticed that since markdown is much more lightweight than \LaTeX, it is much quicker and easier to use. No need to start large enumerate and itemize environments, figure out how to include an image. 

We also got some feedback from users around our lab, and a few of them mentioned that they would love to have such a tool, especially for a first draft purpose, but would perhaps like to have final control over the \LaTeX file at the end in case they want to make edits to the spacing or more fine-grained control in general.



## Future directions:
This is a very exciting project with a lot of potential to be used in the real world. However, to be a viable product to use, we need to extend the features and nail down some details. We list a small subset of the things that need work below, based on features we wanted to implement but did not get time to, or issues we noted while creating this report.

- Bug-fixing
    - There are a few bugs in our code that occur from not handling edge cases. For example, nesting enumerates and itemizes does not work. Further, using the 'verbatim' tag within an enumerate or itemize field does not work either. These are bugs that can be fixed with more careful handling of the edge cases.
- Package support
    - We could implement support for including packages, which is not enabled now. For example, authors that need specific symbols or \LaTeX fonts can't use them natively, and would need to manually add the packages into the \LaTeX source.
- Handling \LaTeX macros
    - Similar to package support, this currently is something that the user needs to manually add to the source themselves.