Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for merging/concatenating multiple notebooks #253

Open
fperez opened this issue Feb 22, 2016 · 21 comments
Open

Add support for merging/concatenating multiple notebooks #253

fperez opened this issue Feb 22, 2016 · 21 comments
Milestone

Comments

@fperez
Copy link
Member

fperez commented Feb 22, 2016

This simple gist offers a command-line tool for concatenating/merging multiple notebooks. As requested by @jamespjh, this could be a useful nbconvert feature (it would also make it robust against evolution of the internal API for users, as they'd only have to remember the cmd line call, and we'd update the internals if the nbformat API changes).

@Carreau
Copy link
Member

Carreau commented Feb 22, 2016

I'm worried about the logic for merging metadata at notebook level, and why in many cases it is obvious what to do, I'm worried of the slippery slope we would get into when metadata differ.

@Carreau Carreau added this to the wishlist milestone Feb 22, 2016
@fperez
Copy link
Member Author

fperez commented Feb 23, 2016

I would simply make an explicit decision: the metadata is loaded so that it basically corresponds to that of the first nb in the list, plus keys from the others if they differ (the algorithm is simply to do meta.update() with all the notebooks in reverse order from the command line).

That's a simple, unambiguous choice with known semantics. If users don't like it, they can edit it back by hand later.

I don't see a problem with the feature having this constraint.

@Carreau
Copy link
Member

Carreau commented Feb 23, 2016

Ok, I like a strong limitation like that. I came almost to the same conclusion while walking back home.

It might be hard to shoehorn that into the nbconvert structure itself, as right now it's constructed around the assumption that 1 exporter convert 1 notebook, and the looping on all the notebook is implicit, but we can likely arrange that.

@Carreau
Copy link
Member

Carreau commented Feb 23, 2016

I propose to add a --merge flag that merge all the notebooks into one before feeding it to the rest of the pipeline. Metadata are as you proposed:

metadata = {}
for n in reversed(notebooks):
    metadata.update(n.metadata)

and for the name of the notebook (if needed) we use the first one.

This allow to not only merge, but merge (virtually) and generate a PDF/HTML version, at once.

@fperez
Copy link
Member Author

fperez commented Feb 23, 2016

+1

On Mon, Feb 22, 2016, 18:48 Matthias Bussonnier notifications@github.com
wrote:

I propose to add a --merge flag that merge all the notebooks into one
before feeding it to the rest of the pipeline. Metadata are as you proposed:

metadata = {}
for n in reversed(notebooks):
metadata.update(n.metadata)

and for the name of the notebook (if needed) we use the first one.

This allow to not only merge, but merge (virtually) and generate a
PDF/HTML version, at once.


Reply to this email directly or view it on GitHub
#253 (comment).

@Carreau Carreau added good first issue great for new contributors URAP and removed question labels Feb 24, 2016
@jamespjh
Copy link
Contributor

jamespjh commented Mar 2, 2016

This would be great. I'm using @fperez nbmerge.py script from https://gist.github.com/fperez/e2bbc0a208e82e450f69 at the moment, and would be delighted to replace it with simple invocation of nbconvert.

@chadlagore
Copy link

+1 here. Using nbmerge.py fairly frequently as well.

@aoboy
Copy link

aoboy commented Dec 15, 2016

I am trying to use fperez version and I am getting the following errors..
Traceback (most recent call last):
File "nbmerge.py", line 49, in
merge_notebooks(notebooks)
File "nbmerge.py", line 38, in merge_notebooks
print(nbformat.writes(merged))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 2519513: ordinal not in range(128)

@npyoung
Copy link

npyoung commented Dec 15, 2016

Happen to be using Python 3, @aoboy? I think you're seeing this issue. If so, there's an easy fix mentioned in that thread.

@aoboy
Copy link

aoboy commented Dec 15, 2016

@npyoung I solved it.. using p2.7 actually.
I changed the line from print (nbformat.writes(merged)) to
print (nbformat.writes(merged).encode('utf-8'))
basically encoding is what was missing..

@ketch
Copy link

ketch commented Feb 6, 2017

This capability would be very useful for a book I am currently working on, where each chapter is a Jupyter notebook. This feature would make it simpler to generate the print version.

@Carreau
Copy link
Member

Carreau commented Feb 6, 2017

@ketch have a look at @takluyver's BookBook

@chrisjsewell
Copy link

chrisjsewell commented Jul 7, 2017

Hey guys, I've just created a repo (ipypublish), with a simple workflow/scripts for creating/editing 'publication ready' scientific reports from one or more Jupyter Notebooks (containing matplotlib, pandas, scipy, ...), without leaving the browser. Sorry for the spam but, since I used the gist posted here (thanks!), I thought it might be nice to share.

In particular it would be great to get any feedback, especially in the case where future Jupyter versions might break (or enhance) this. Since I intend to write my doctoral thesis with it!

Ta, Chris

@mpacer
Copy link
Member

mpacer commented Jul 7, 2017

@chrisjsewell Really cool project!! You might be interested in looking at Jupyter lab, it looks like your system is a beautiful application of the kind of workflow it makes possible & you will be able to influence the sevelopmebt of that interface to ensure that it can support true kinds of features you want going forward.

@chrisjsewell
Copy link

@mpacer thanks :) Yes I've seen a bit about it, looks good, I'll definitely be keeping tabs on it. I see you mentioning about easier manipulation of metadata (jupyterlab/jupyterlab#902), that's definitely relevant for my repo (chrisjsewell/ipypublish#1).

From the perspective of my research (atomic/quantum level simulations), I'm really interested in the interactive capability that javascript bridging is now offering for 3D graphics (ipywidgets, pythreejs, ipyvolume and my other repo pandas3js) and how it can be applied to the exploratory analysis -> publication workflow that Notebooks offer. Being out to 'pop' out a view of such a GUI to a separate window would definitely be pretty neat.

@ketch
Copy link

ketch commented Jul 9, 2017

People interested in this thread may also be interested in this book project, which is a collection of notebooks viewable as PDF, HTML, or executable notebooks and runnable on binder or Microsoft Azure; it's not completely finished but is in an advanced state:

https://github.com/clawpack/riemann_book

We are using bookbook, among several other tools.

@Yensan
Copy link

Yensan commented Jan 31, 2018

Although I have finished reading, I have not got the HowTo thing. And nbmerge.py failed...
😕

@mgeier
Copy link
Contributor

mgeier commented Dec 14, 2018

Since it hasn't been mentioned yet in this issue, let me suggest using https://nbsphinx.readthedocs.io/.

It basically concatenates notebooks and creates HTML pages or a LaTeX/PDF from them.

@choldgraf
Copy link
Contributor

Just a note that this project sort-of exists now: https://github.com/jbn/nbmerge

(FWIW, I think it's better to have a separate tool than nbconvert do merging)

@Carreau Carreau removed the URAP label Jul 16, 2019
@takluyver takluyver removed the good first issue great for new contributors label Nov 2, 2019
@sfixedgear
Copy link

ipynb files are JSON format. What I do is open in a new python notebook all the files I want to merge, and convert them to dicts, then you can use the 'cells' key to concatenate all the cells or whatever you want to do, so finally you convert this dict or dicts back to JSON and export it to a new file.

Here is an example where I import 2 different ipynb files, and merge them into a new ipynb file:

import json
import numpy as np

first file

with open('file1.ipynb', 'r') as file:
json_1 = file.read()
dict_1 = json.loads(json_1)
cells_1 = dict_1['cells']

second file

with open('file2.ipynb', 'r') as file:
json_2 = file.read()
dict_2 = json.loads(json_2)
cells_2 = dict_2['cells']

New file (merging the first and second files)

new_dict = dict_1.copy()
new_dict['cells'] = list(np.concatenate([cells_1, cells_2]))
with open('new_file.ipynb', 'w') as json_file:
json.dump(new_dict, json_file)

@maximveksler
Copy link

Does loading a notebook loading as a module feature offer an answer for the discussed use case? https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Importing%20Notebooks.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests