Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Describe a DataTree: adding visualization and summarization capabilities #9271

Open
eschalkargans opened this issue Jul 23, 2024 · 0 comments

Comments

@eschalkargans
Copy link

Is your feature request related to a problem?

The string and rich representations of Dataset and DataArrays are standardized. However, the DataTrees can become pretty complex structurally (unlike DataArrays and Dataset). Hence, adding visualizations capabilities to summarize would be helpful, like the pandas describe method, that summarizes a DataFrame.

Describing the DataTree can be a lite-weight string representation focusing on the structure of the tree only without the contents.

Describe the solution you'd like

Here is some code I used to achieve that goal:

# Example taken from
# https://xarray-datatree.readthedocs.io/en/latest/hierarchical-data.html#ancestry-in-an-evolutionary-tree
vertebrates = dt.DataTree.from_dict(
    name="Vertebrae",
    d={
        "/Sharks": None,
        "/Bony Skeleton/Ray-finned Fish": None,
        "/Bony Skeleton/Four Limbs/Amphibians": None,
        "/Bony Skeleton/Four Limbs/Amniotic Egg/Hair/Primates": None,
        "/Bony Skeleton/Four Limbs/Amniotic Egg/Hair/Rodents & Rabbits": xr.Dataset(
            {f"variable_{k}": None for k in "abc"}
        ),
        "/Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Dinosaurs": xr.Dataset(
            {f"variable_{k}": None for k in "abc"}
        ),
        "/Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Birds": xr.Dataset(
            {f"variable_{k}": None for k in "abc"}
        ),
    },
)


def repr_datatree(
    xdt: dt.DataTree,
    *,
    tabsize: int = 4,
    offset: int = 1,
    with_full_path: bool = False,
    with_group_variables: bool = False,
):
    lines = []
    for node in xdt.subtree:
        path = PurePosixPath(node.path)
        tabs = len(path.parts)
        lines.append(f'{" " * ((tabs - offset) * tabsize)}{(node.name) if not with_full_path else path}')
        if with_group_variables:
            for varname in node.ds.data_vars:
                lines.append(f'{" " * (tabs * tabsize)}{path / varname if with_full_path else varname}')
    return "\n".join(lines)
print(repr_datatree(vertebrates, with_full_path=False, with_group_variables=False))
Vertebrae
    Sharks
    Bony Skeleton
        Ray-finned Fish
        Four Limbs
            Amphibians
            Amniotic Egg
                Hair
                    Primates
                    Rodents & Rabbits
                Two Fenestrae
                    Dinosaurs
                    Birds
print(repr_datatree(vertebrates, with_full_path=False, with_group_variables=True))
Vertebrae
    Sharks
    Bony Skeleton
        Ray-finned Fish
        Four Limbs
            Amphibians
            Amniotic Egg
                Hair
                    Primates
                    Rodents & Rabbits
                        variable_a
                        variable_b
                        variable_c
                Two Fenestrae
                    Dinosaurs
                        variable_a
                        variable_b
                        variable_c
                    Birds
                        variable_a
                        variable_b
                        variable_c
print(repr_datatree(vertebrates, with_full_path=True, with_group_variables=False))
/
    /Sharks
    /Bony Skeleton
        /Bony Skeleton/Ray-finned Fish
        /Bony Skeleton/Four Limbs
            /Bony Skeleton/Four Limbs/Amphibians
            /Bony Skeleton/Four Limbs/Amniotic Egg
                /Bony Skeleton/Four Limbs/Amniotic Egg/Hair
                    /Bony Skeleton/Four Limbs/Amniotic Egg/Hair/Primates
                    /Bony Skeleton/Four Limbs/Amniotic Egg/Hair/Rodents & Rabbits
                /Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae
                    /Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Dinosaurs
                    /Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Birds
print(repr_datatree(vertebrates, with_full_path=True, with_group_variables=True))
/
    /Sharks
    /Bony Skeleton
        /Bony Skeleton/Ray-finned Fish
        /Bony Skeleton/Four Limbs
            /Bony Skeleton/Four Limbs/Amphibians
            /Bony Skeleton/Four Limbs/Amniotic Egg
                /Bony Skeleton/Four Limbs/Amniotic Egg/Hair
                    /Bony Skeleton/Four Limbs/Amniotic Egg/Hair/Primates
                    /Bony Skeleton/Four Limbs/Amniotic Egg/Hair/Rodents & Rabbits
                        /Bony Skeleton/Four Limbs/Amniotic Egg/Hair/Rodents & Rabbits/variable_a
                        /Bony Skeleton/Four Limbs/Amniotic Egg/Hair/Rodents & Rabbits/variable_b
                        /Bony Skeleton/Four Limbs/Amniotic Egg/Hair/Rodents & Rabbits/variable_c
                /Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae
                    /Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Dinosaurs
                        /Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Dinosaurs/variable_a
                        /Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Dinosaurs/variable_b
                        /Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Dinosaurs/variable_c
                    /Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Birds
                        /Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Birds/variable_a
                        /Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Birds/variable_b
                        /Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Birds/variable_c
def empty_datatree_structure(xdt: dt.DataTree) -> dt.DataTree:
    # Print an empty datatree from a full structure (like a tree command)
    return dt.DataTree.from_dict({k.path: None for k in xdt.leaves})


empty_datatree_structure(loaded_xdt)

Describe alternatives you've considered

No response

Additional context

Issue content taken from xarray-contrib/datatree#334

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant