Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Writing to JSON schema #46

Open
Zsailer opened this issue Mar 31, 2016 · 14 comments
Open

Feature request: Writing to JSON schema #46

Zsailer opened this issue Mar 31, 2016 · 14 comments

Comments

@Zsailer
Copy link

Zsailer commented Mar 31, 2016

DendroPy is awesome.

I would love to see DendroPy's Tree data structures access modern tree visualization tools like D3. A simple method for writing the Tree data structure to a JSON schema would do the trick. I'm imagining JSON that follows the format described in this post.

I'd be happy to work on a PR if y'all agree it would be useful.

@jeetsukumaran
Copy link
Owner

This is a wonderful idea! Fantastically wonderful. I've been looking for a lightweight, portable, visualization solution, and this would be it.

Would be great to have you work on it! Otherwise, I can try an implementation myself (I love the idea so much!).

If you want to work on it, let me know, and we can discuss organization and API.

For the former, I think it should go into the dendropy.dataio hierarchy, maybe in its own module, dendropy.dataio.d3writer?

Perhaps the API should follow, for e.g. that for the NewickWriter, in terms of what gets rendered (node labels, edge lengths etc.). Also, support for other rendering features (colors, node shapes, edge thicknesses, tree styles, what gets collapsed etc. etc.) should be planned in the API even if we do not get around to implementing it all?

@Zsailer
Copy link
Author

Zsailer commented Mar 31, 2016

Fantastic! I'd be happy to help!

If development would go faster through you, I don't mind doing code-review and branching off your work. Whatever works best for you! Otherwise, I'd be happy take a crack at it myself (hopefully sometime today or tomorrow).

Yeah, I agree it makes sense to have this API live in its own module in dendropy.dataio. There is likely a format specified/standardized by Vega for this kind of data structure and all the rendering features you mentioned. This might be a good place to start planning the organization. Vega was started with D3 in mind. I'll look through their docs for ideas.

@jeetsukumaran
Copy link
Owner

Great!

I think given my familiarity with the codebase, one approach would be
for me to set up all the "scaffolding" --- i.e., the hooks into the data
schema API, the basic class to handle the writing etc., and leave the
main "write" method as a stub to be fleshed out. Then, if you want and
have the time, you can work at translating the tree structure into the
required JSON. I can work on this over this weekend or next week. In the
mean time, if you are agreeable, maybe get familiar with the Vega/D3
API, features, etc. (if you are not already)?

On 3/31/16 12:33 PM, Zachary Sailer wrote:

Fantastic! I'd be happy to help!

If development would go faster through you, I don't mind doing
code-review and branching off your work. Whatever works best for you!
Otherwise, I'd be happy take a crack at it myself (hopefully sometime
today or tomorrow).

Yeah, I agree it makes sense to have this API live in its own module in
|dendropy.dataio|. There is likely a format specified/standardized by
Vega https://github.com/vega/vega for this kind of data structure and
all the rendering features you mentioned. This might be a good place to
start planning the organization. Vega was started with D3 in mind. I'll
look through their docs for ideas.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#46 (comment)


Jeet Sukumaran

jeetsukumaran@gmail.com

Blog/Personal Pages:
http://jeetworks.org/
GitHub Repositories:
http://github.com/jeetsukumaran
Photographs (as stream):
http://www.flickr.com/photos/jeetsukumaran/
Photographs (by galleries):

http://www.flickr.com/photos/jeetsukumaran/sets/

@Zsailer
Copy link
Author

Zsailer commented Mar 31, 2016

Sounds good! Ping me when you have the basic hooks in place. I'll work on the JSON format and post some ideas here.

@jeetsukumaran
Copy link
Owner

Will do!

On 3/31/16 1:18 PM, Zachary Sailer wrote:

Sounds good! Ping me when you have the basic hooks in place. I'll work
on the JSON format and post some ideas here.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#46 (comment)


Jeet Sukumaran

jeetsukumaran@gmail.com

Blog/Personal Pages:
http://jeetworks.org/
GitHub Repositories:
http://github.com/jeetsukumaran
Photographs (as stream):
http://www.flickr.com/photos/jeetsukumaran/
Photographs (by galleries):

http://www.flickr.com/photos/jeetsukumaran/sets/

@jeetsukumaran
Copy link
Owner

Ok,

I've put the scaffolding in place:

https://github.com/jeetsukumaran/DendroPy/blob/d3writer/dendropy/dataio/d3writer.py#L148-L160

The _write_tree_list() is for any overhead/meta stuff required for a
group of trees, while _write_tree() works on a single tree. You can
see the corresponding methods in the NewickWriter class for examples on
how this is handled.

Test framework in place at:

https://github.com/jeetsukumaran/DendroPy/blob/d3writer/dendropy/test/test_dataio_d3_writer.py

Test design is going to take some thinking. Typically, the approach has
been to round-trip read-write-read, and then confirm that the objects of
second reading semantically correspond to the objects of the first
reading. Lots of infrastructure to support this.

With a write-only paradigm here, we might have to do a brute-force /
dumb approach, i.e., check if the generated strings match exactly what
is expected. This works, but is fragile -- i.e., non-semantic changes in
the rendering pipeline will break the test (e.g., placement of spaces,
newlines, etc.). But that's not a deal-breaker, I suppose, being only
majorly annoying in the main development phase and usually
easily-fixable. I am open to other suggestions if you have any.

I might find time to work on the actual D3 composition implementation
next week or later. If you want to give it a go in the mean time, that
would be great!

-- jeet

On 3/31/16 1:18 PM, Zachary Sailer wrote:

Sounds good! Ping me when you have the basic hooks in place. I'll work
on the JSON format and post some ideas here.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#46 (comment)


Jeet Sukumaran

jeetsukumaran@gmail.com

Blog/Personal Pages:
http://jeetworks.org/
GitHub Repositories:
http://github.com/jeetsukumaran
Photographs (as stream):
http://www.flickr.com/photos/jeetsukumaran/
Photographs (by galleries):

http://www.flickr.com/photos/jeetsukumaran/sets/

@Zsailer
Copy link
Author

Zsailer commented Apr 4, 2016

Awesome, thanks for getting that in place! I've forked the d3writer branch and should get some time to work on it today/tomorrow.

@Zsailer
Copy link
Author

Zsailer commented Oct 10, 2016

Hi @jeetsukumaran

Sorry for the long delay on this. The summer proved to be a busy time for me. But I was able work on this idea yesterday. I've even got a prototype branch here.

This is far from finished. Note, I haven't added most of the keyword arguments, nor have I written tests. I was mostly familiarizing myself with DendroPy's data I/O API.

I did make a simple, static example of my working branch visible in this gist.

Currently, I construct a nested dictionary for a tree. This basically converts the Tree object into a hierarchical, metadata dictionary. Then, I use python's standard json library to encode the metadata as a JSON string. This library handles all None-to-null, True-to-true, and False-to-false conversions.

It appears that there is not yet a clear, standardized JSON/Vega grammar defined for hierarchical, tree-like data. There is conversation going on between the "Open Tree of Life" group. In summary, Vega is starting to make a point to flatten JSON formats for readability, which doesn't work well for hierarchical data. My opinion is that we stick to the examples from D3. Every node has a "name" and "children" key-value pair. The "children" value is an array of child nodes. Child nodes have a "parent" argument pointing back to the parent name. I think, at the very least, this basic structure is acceptable to the general user-base:

{
    "name" : "A",
    "children" : [
        {
             "name" : "B",
             "children" : [],
        },
        {
             "name" : "C",
             "children" : [],
        },
    ]
}

Other items can be specific to DendroPy, but don't affect the generality of the output. For example, annotations, if not suppressed, are included as an "annotations" key mapped to a set of key-value pairs for each annotation. Lengths of branches, if not suppressed, are included as "length" in each child node's data.

Finally, as a small aside: I'm thinking we should rename the writer to JSONWriter instead of D3Writer. I think this data type of output is more general than just D3. It has other advantages like portability to other languages. I also find JSON format more human readable compared to other tree formats. It doesn't matter to me too much. If you'd rather keep the focus on D3, I'm fine with that as well.

@jeetsukumaran
Copy link
Owner

jeetsukumaran commented Oct 10, 2016

Great stuff!

Really like what is happening here.

I agree with you that readability is nice, but really, really, really, really, really, should not take priority over usability. And there is strong reason to keep hierarchical data hierarchical. At the same time, we are (presumably?) not out to create a new data format, but rather render the data model in a format that can be consumed by an existing visualization technology, and it makes sense to use the format/conventions/standards/expectations of the visualization technology that we are targeting to condition output. TL;DR: I agree, stick to D3 examples!

As far as the naming goes, given that I imagine at some point we would like to take advantage of some D3-specific expression capability, my suggestion would be to have a JsonWriter class that handles all generic JSON stuff, and a D3JsonWriter that specializes it. Client code would then specify schema="json-d3" or schema="d3" (for example; we can decide the name later) to render the tree as D3-specific JSON and schema="json" for more generic JSON (if we want to support that). The D3JsonWriter would, of course, over-ride the _write etc. methods as needed, and also call on the base class _write as needed. I think the addition of the class hierarchy complexity is offset by the gains in modularity, abstraction, and DNRY-ness?

But that is just my suggestion. If you feel that simply renaming it "JSONWriter" makes more sense (with, maybe, the optional specification of a keyword dialect="D3" to activate D3-specific rendering), then that would be the way to go!

@Zsailer
Copy link
Author

Zsailer commented Oct 11, 2016

Yes, I definitely agree that we aren't trying to create a new data format (haha). I'm just surprised that there isn't a defined "tree" grammar for JSON format already out there (at least not that I could find immediately). It seems like there are so many great visualization tools that are prime for such a grammar. By including such a writer in DendroPy, we might be inadvertently contributing to creation of such a grammar.

I agree with keeping the hierarchy in the output, and D3 seems to honor that as well.

I really like the idea of subclassing a JsonWriter class. I'll add that to my next implementation. I think D3 is one use-case (and likely most popular use-case) of a JSON format. It would great to connect DendroPy to fresh visualization tools like D3. A lot of these tools are written as Javascript libraries, so JSON is the natural porting mechanism. A subclassed writer would likely include extra visualization attributes (i.e. window size, colors, etc). The more general JSON format, however, would be useful for porting DendroPy tree data to other APIs or languages. I'm saying this for selfish reasons ;)

Thanks for talking through this stuff! I'll keep working on it and keep you updated!

@Zsailer
Copy link
Author

Zsailer commented Oct 11, 2016

Also, in the interest of making a general JSON format that is portable, would a JsonReader class make sense as well? Or do you think this is outside the scope of DendroPy?

@jeetsukumaran
Copy link
Owner

A reader would indeed be nice. I imagine the use case would be more limited, especially if the JSON is narrowly defined (DendroPy/D3-specific)? Though, having a reader will be useful for tests (round-tripping). Just as relevant, with projects like this, it is not always necessary to stick exclusively to what is important/needed/useful; I always tend to work in things that I like/want, even if it is very idiosyncractic and done more for interest rather than utility. So if the idea of writing a reader appeals to you, go for it!

WRT to JSON tree grammar/data format, the OTOL folks are using a JSON-based derivation of NeXML. I've been meaning to write a DendroPy parser for it, but it's been on the back-burner. Not saying that it should be used here, but mentioning it for reference or source of ideas.

@Zsailer
Copy link
Author

Zsailer commented Jul 16, 2018

Hey! @jeetsukumaran

I wanted to mention a new project I've been working on here. PhyloVega is a Python package that uses Vega's (JSON) specifications to draw interactive trees.

While writing this package, I (finally) figured out the Vega grammar for drawing trees. I think it's pretty powerful. With Vega's declarative grammar, I can style my tree any way I want. In this example below, I read in a tree using PhyloPandas and style the tree using a declarative grammar API. Underneath the hood, the TreeChart object is just building a JSON spec for Vega.

from phylopandas import read_newick
from phylovega.api import TreeChart

# Read tree using PhyloPandas
df = read_newick('tree.newick')

# Construct Vega Specification
chart = TreeChart(
    df,
    height_scale=200,

    # Node attributes
    node_size=200,
    node_color="#ccc",

    # Leaf attributes
    leaf_labels="id",

    # Edge attributes
    edge_width=2,
    edge_color="#000",
)

# Display in Jupyter
chart.display()

static-example

This may be an interesting light-weight visualization solution for DendroPy. If I can find some time, I'll write up documentation for this grammar. Then, it would be pretty easy to write a JSON I/O tool for DendroPy. What do you think?

@jeetsukumaran
Copy link
Owner

jeetsukumaran commented Jul 16, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants