Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import Jupyter Notebook files into Atom #1501

Merged
merged 20 commits into from
Jan 22, 2019

Conversation

kylebarron
Copy link
Contributor

@kylebarron kylebarron commented Dec 28, 2018

This PR adds functionality to import Jupyter Notebook files as text files in Atom.

It imports the new file into a new TextEditor. In order to support Markdown blocks, I need to know the correct comment symbol, so I need to find the correct Atom Grammar for the notebook file. I first try to match file extensions between the notebook file's metadata and the file extensions that each Grammar is applied for. But since the file extension is an optional metadata field for the notebook, I also attempt to match on kernelspec name. These two together should work for the vast majority of notebooks.

Given the grammar, I get the source text for code and markdown cells, prepending each line of text in the markdown cell with the commentStartString, right trimmed plus a space. I add a cell marker line before each cell, either # %% or # %% markdown. It uses \r\n as the line separator on Windows; \n otherwise (for newlines created between cells).

To do:

Limitations:

  • Currently only version 4 of the notebook format is supported.
  • Only imports source text of code cells and Markdown cells. Doesn't import raw cells or code cell outputs.

Python

image

# %% markdown
# # Example Notebook
# 
# This is an example Python Notebook!
# %%
print('hello world!')
# %%

Bash:

image

# %% markdown
# # Example Notebook
# 
# This is an example Bash Notebook.
# %%
echo "hello world!"
# %%

Javascript:

image

// %% markdown
// # Example Notebook
// 
// This is an example Javascript Notebook.
// %%
console.log('hello world!');
// %%

R:

image

# %% markdown
# # Example Notebook
# 
# This is an example R Notebook!
# %%
print('hello world!')
# %%

Closes #1457, ref #1404, ref #75.

@kylebarron
Copy link
Contributor Author

Can someone help with this last flow error? My understanding of promises and callbacks is only so-so.

Cannot call readFile with loadNotebook bound to callback because:
 • Promise [1] is incompatible with undefined [2] in the return value.
 • Promise [3] is incompatible with undefined [2] in the return value.

     lib/import-notebook.js
      23│       atom.notifications.addError("Selected file must have extension .ipynb");
      24│       return;
      25│     }
      26│     readFile(filename, loadNotebook);
      27│   });
      28│ }
      29│
 [3]  30│ async function loadNotebook(err, data) {

     /private/tmp/flow/flowlib_f7461a8/core.js
 [1] 612│ declare class Promise<+R> {

     /private/tmp/flow/flowlib_f7461a8/node.js
 [2] 967│     callback: (err: ?ErrnoError, data: Buffer) => void

@lgeiger
Copy link
Member

lgeiger commented Dec 30, 2018

Thanks for getting this started 🎉

I guess the flow error is because the callback in fs.readFile shouldn't be a async function. I guess wrapping it into an arrow function might work around this issue.

I can review the PR in detail next week.

Copy link
Member

@BenRussert BenRussert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be great @kylebarron! Thanks for leading this effort!

I added some tests and fixed a couple things on my fork. If you add me as a collaborator on your fork I can push my changes to this PR. Or, I can pr against your fork if you prefer.

kylebarron/hydrogen@import-notebook...BenRussert:import-notebook

lib/import-notebook.js Show resolved Hide resolved
const cellType = cell.cell_type;
const cellMarkerKeyword = cellType === "markdown" ? "markdown" : null;
const cellMarker = getCellMarker(commentStartString, cellMarkerKeyword);
var source = cell.source;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For best practice, use let instead of var here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

lib/main.js Outdated
@@ -55,6 +55,7 @@ import {
} from "./utils";

import exportNotebook from "./export-notebook";
import importNotebook from "./import-notebook";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like that you implemented this in its own file and imported into main. This improves organization, avoids merge conflicts from unrelated PRs, and makes testing easier to name a few benefits.

@kylebarron
Copy link
Contributor Author

I added some tests and fixed a couple things on my fork. If you add me as a collaborator on your fork I can push my changes to this PR. Or, I can pr against your fork if you prefer.

Added as contributor

@BenRussert
Copy link
Member

We can rebase last once we are ready to merge. Try this branch out and see if you can find anything that still needs work. I'll take another look during the week as well to see what's left.

lib/import-notebook.js Show resolved Hide resolved
}
const nb = parseNotebook(data);
if (nb.nbformat < 4) {
atom.notifications.addError("Only notebook version 4 currently supported");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to check out how version 3 differs, because supporting reading version 3 would be a nice plus (probably for a future PR)

const cellType = cell.cell_type;
const cellMarkerKeyword = cellType === "markdown" ? "markdown" : null;
const cellMarker = getCellMarker(commentStartString, cellMarkerKeyword);
var source = cell.source;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

fail(e);
done();
});
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you import this from test-utils.js?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely meant to 😂

@kylebarron
Copy link
Contributor Author

I think all your edits make sense, though I haven't tested them in Atom yet.

@kylebarron
Copy link
Contributor Author

I'd be in favor of moving importNotebook to an atom-workspace command, so that a text editor doesn't have to be currently active to use it.

@kylebarron
Copy link
Contributor Author

I also think it's important to have tests of [import then export] and [export then import] and to make sure that the .ipynb file and text editor are identical, respectively.

@kylebarron
Copy link
Contributor Author

I'd also be happy to add documentation to https://nteract.gitbooks.io/hydrogen/docs/Usage/NotebookFiles.html. (When #1498 was merged, the documentation was updated despite that code not being released yet).

@kylebarron
Copy link
Contributor Author

I added some more documentation and moved import-notebook to be a workspace command.

@kylebarron
Copy link
Contributor Author

@BenRussert
I think this is probably good to merge if you're happy with it.

I wasn't sure if this should be its own PR, since it might need more discussion, but it would also be useful to add the import notebook functionality as an opener for .ipynb files. It's a simple 10-line change.

@BenRussert BenRussert mentioned this pull request Jan 22, 2019
@BenRussert BenRussert merged commit e69e274 into nteract:master Jan 22, 2019
@JohnCHarrington
Copy link

I wasn't sure if this should be its own PR, since it might need more discussion, but it would also be useful to add the import notebook functionality as an opener for .ipynb files. It's a simple 10-line change.

I know nothing about how this works, but the ideal for me would be to have atom open .ipynb files like this, then export them on save. Everyone around me works directly in jupyter, this way I could work with them pretty seamlessly.

On a side note, how about having this import/export to/from the rich document format? Even if it only supported importing one language and markdown cells it would be nice to have the choice of which format to work in.

@kylebarron
Copy link
Contributor Author

I know nothing about how this works, but the ideal for me would be to have atom open .ipynb files like this, then export them on save. Everyone around me works directly in jupyter, this way I could work with them pretty seamlessly.

We currently don't support automatic exporting because of the potential for unintentionally overwriting data.

On a side note, how about having this import/export to/from the rich document format? Even if it only supported importing one language and markdown cells it would be nice to have the choice of which format to work in.

Do you mean markdown documents?

@JohnCHarrington
Copy link

Fair enough, I can add a hotkey for export anyway.

Yes I mean markdown documents, then the grammar/kernelspec would be used to set the language on the code blocks.

@kylebarron
Copy link
Contributor Author

Fair enough, I can add a hotkey for export anyway.

Well currently when you export a file to a notebook, it brings up the system file selector, so not sure how much time a hotkey would save. Hydrogen has no way to know ahead of time how to name the outputted notebook file.

Yes I mean markdown documents, then the grammar/kernelspec would be used to set the language on the code blocks.

Markdown documents are more prone to losing some metadata, in particular cell boundaries. It's probably not too difficult to write, but not my focus at the moment.

@kylebarron
Copy link
Contributor Author

@JohnCHarrington
You may also want to check out the newest release of Pandoc https://github.com/jgm/pandoc/releases/tag/2.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants