Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Construct graph from a few more input data structures (mimicking networkx) #434

Merged
merged 24 commits into from Sep 28, 2021

Conversation

iosonofabio
Copy link
Member

@iosonofabio iosonofabio commented Sep 7, 2021

Aiming to fix #433 and related forum request, and matching (roughly) networkx's range of input fuctions:

  • Graph.ListDict
  • Graph.DictDict

(we already had:

  • Graph.TupleList
  • Graph.DictList

)

Notes:

  • currently not working for lazy iterators rather than dicts. Would be easy (if a little obscuring) to extend, if you think it'd be useful (can't think of a clear case)
  • poorly documented ATM, but has some tests
  • Graph.ListDict currently takes no edge attributes, Graph.DictDict is better for that purpose

@szhorvat @ntamas do you feel this addresses the issue? If so I'd be happy to improve docs.

TODO:

  • finish docstrings
  • add API docs/user guide
  • extend tests to cover more cases
  • extend functionality (edge attributes,etc)
  • decide what to do with "name" attribute: fixed or not (API break)?

Copy link
Member

@vtraag vtraag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Looks good to me, I have a small suggestion for using an internal type that slightly simplifies the code, and I'm not sure if we should make the name attribute a choice, but perhaps @ntamas has some comments about it.

We could take a look at all other input functions that networkx supports, it would be nice to also support the other options.

src/igraph/__init__.py Outdated Show resolved Hide resolved
src/igraph/__init__.py Outdated Show resolved Hide resolved
@iosonofabio
Copy link
Member Author

iosonofabio commented Sep 8, 2021

Ok, I moved a bunch of constructor functions into a separate module called graph_constructors.py and added a variation to Adjacency and Weighted_Adjacency that pairs up with networkx's ability to read pandas DataFrame adjacency matrices.

I think we are now covering all the input options they cover.

Docs on arguments for some of the new constructors are still poor, happy to improve in the next few commits.

@iosonofabio iosonofabio changed the title Construct graph from adjacency dict of sequences (e.g. of lists) Construct graph from a few more input data structures (mimicking networkx) Sep 8, 2021
@iosonofabio
Copy link
Member Author

Since I was there I also moved a bunch of constructors (almost all, in fact) into the new module, which reduces the size of __init__.py by ~1,000 lines.

Ideally we would actually have separate modules for say constructing from file versus from live objects, but that seems a little much for this humble PR, so let's plan that for after this one is merged unless you guys have any urgency, in which case we can do it here as well (I can, or you can push here @ntamas).

@ntamas
Copy link
Member

ntamas commented Sep 9, 2021

Since I was there I also moved a bunch of constructors (almost all, in fact) into the new module, which reduces the size of __init__.py by ~1,000 lines.

Yay, that's definitely an improvement, thanks!

Ideally, in the long term (i.e. the duration of the CZI grant) we should be moving towards having many small modules grouped around topics instead of a single monolithic Graph class that tries to cover everything with instance and class methods. Of course that completely breaks the API of the module and I'm not sure yet how to do this nicely while also providing some kind of a backwards compatibility mechanism to ease the transition.

Maybe now that we also have the igraph module on PyPI, we could use that for a new-style Python API that is better designed than what we have now, and keep python-igraph with the old monolithic Graph class instead? That would also save the hassle of trying to keep igraph and python-igraph in sync while people get used to using pip install igraph instead of pip install python-igraph; basically old projects can stick to python-igraph, new projects can use igraph (with a new, improved API), and old projects who eventually bite the bullet and migrate to the new API can make the switch whenever it's convenient for them.

@iosonofabio
Copy link
Member Author

Love that idea! I suggest a two step process:

  1. We split the implementation of all/almost all graph methods into modules, but still link them in so the API didn't change.

  2. We fork that repo and bye bye

@ntamas
Copy link
Member

ntamas commented Sep 9, 2021

We split the implementation of all/almost all graph methods into modules, but still link them in so the API didn't change.

Yes, I think this is doable; I think if you implement a method like

def do_something(graph, ...):
    ...

then you can link it to the Graph class with Graph.do_something = do_something and then it also becomes an instance method of the Graph class; so, we can start stripping methods separately into functions and link them back to the Graph class to keep compaibility.

Still, there are too many parallel things going on in the python-igraph repo at the moment (the integer transition, the plotting backends, plus this one) so I'm going to put this train of thought aside for the moment until we finish the other two (which mostly involves me finally finishing the int conversion by tidying up the types in the C glue code, and then merging the plotting PR).

@iosonofabio
Copy link
Member Author

Don't worry, I'll just chug along against develop now that we have it, and this PR can come after those are done

setup.py Outdated Show resolved Hide resolved
@ntamas
Copy link
Member

ntamas commented Sep 9, 2021

Haven't reviewed the whole PR yet but I've added some comments that are worth considering.

One general problem I have with the new construct_... methods is that it is unclear whether they are meant to be part of the public API. If they are, then having cls as the first argument makes the signature quite awkward; I'd rather have it as a keyword-only argument (now that Python finally has keyword-only arguments) like this:

def construct_graph_somehow(first_arg, second_arg, *, graph_factory=Graph):
    ...

and then simulate the "old" class methods of the Graph class with another decorator:

from functools import wraps

def igraph_classmethod(func):
    @wraps(func)
    def wrapped(cls, *args, **kwds):
        kwds["graph_factory"] = cls
        return func(*args, **kwds)
    return wrapped

The problem with this proposal is that we will very likely introduce circular imports by having Graph as the default argument for graph_factory. A workaround is to let the default value of graph_factory be None and use a deferred import inside the constructor function to replace None with Graph if needed.

@ntamas
Copy link
Member

ntamas commented Sep 9, 2021

Also, when reviewing the PR, I will simply assume that you did not modify the bodies of the functions that you moved out from the Graph class and only added a few additional functions on your own so I won't need to compare the internals of the functions that were moved out with their original versions. Is that correct?

@iosonofabio
Copy link
Member Author

Thanks @ntamas

  1. yes I didn't touch the body of other functions.
  2. Sorry about the linting, sometimes it's too strong a temptation to resist ;-) Yes let's just black-lint everything at the end
  3. I wouldn't add those functions to the API: there should be one preferred way to create graphs and that's currently via class methods of Graph. In fact, I would like them to intentionally NOT be part of the API for that reason (perhaps I could prefix the function names with _ to make sure people don't stumble upon them.

I checked if help(Graph.Adjacency) and other class/instance methods yield the right documentation now, and they do. So I'm not sure L{Adjacency} is a broken reference?

@ntamas
Copy link
Member

ntamas commented Sep 10, 2021

I wouldn't add those functions to the API: there should be one preferred way to create graphs and that's currently via class methods of Graph. In fact, I would like them to intentionally NOT be part of the API

Fair enough, you have a point here; in this case, let's mark these methods with a leading underscore so it is clear that they are internal. I'm afraid that PyDoctor would pick them up and include them in the documentation if we don't do so, and maybe people coming from NetworkX would feel a temptation to import these functions directly instead of relying on the class methods.

@iosonofabio
Copy link
Member Author

I should add export functions for dict of dicts and friends

@iosonofabio
Copy link
Member Author

Ok now we have export functions for each import function in IO, which should cover everything networkx does and then some. This PR is somewhat sensitive since it generates a decent amonut of new API estate, so it'd be great if @vtraag you could have a look

If you guys think it's ok I can add proper docs and then we can merge.

@iosonofabio
Copy link
Member Author

I included PR #408 in this one since they would conflict and it's a closely related task. Closing the other one.

@iosonofabio iosonofabio marked this pull request as ready for review September 17, 2021 08:06
@iosonofabio
Copy link
Member Author

Extended tests a little including some trivial cases, added docs and docstrings, and some flexibility as of the "name" attribute, should be ready for merge

Copy link
Member

@vtraag vtraag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see all these input/output structure being supported, that makes it a lot easier to work with various existing data structures. Nice job!

Also very good to see so many functions from __init__.py being separated out into smaller files, that makes it a lot easier to navigate. I did not review the adjacency, file, images and libraries related stuff, assuming that's still the same as before, but just moved.

I only have some smaller comments for a couple of things. One question is perhaps worth discussing more generally: the name of SequenceDict instead of ListDict. It seems to be a bit inconsistent with the DictList, which can in principle also be a sequence. Not sure what your considerations were here? Also curious to hear what others think.

doc/source/generation.rst Outdated Show resolved Hide resolved
doc/source/generation.rst Show resolved Hide resolved
src/igraph/utils.py Outdated Show resolved Hide resolved
result is likely to be fit as long as it's iterable and yields
dict-like objects with every iteration.

@param vertices: the data source for the vertices or C{None} if
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What "data source" should this be? This can simply be any iterable that provides names for vertices? Perhaps then we should make explicit that it's an iterable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From later description, it seems as if this is a list of dicts as well?

src/igraph/io/objects.py Outdated Show resolved Hide resolved
src/igraph/io/objects.py Show resolved Hide resolved
src/igraph/io/objects.py Outdated Show resolved Hide resolved
src/igraph/io/objects.py Show resolved Hide resolved
src/igraph/io/objects.py Outdated Show resolved Hide resolved
src/igraph/io/objects.py Outdated Show resolved Hide resolved
@iosonofabio
Copy link
Member Author

Thanks Vincent, I've implemented most suggestions except API breaking changes. For those, I'd say if @ntamas is fine merging this PR then I can make separate ones breaking the API, to keep things tidy.

@ntamas ntamas merged commit 7b9267f into igraph:develop Sep 28, 2021
@ntamas
Copy link
Member

ntamas commented Sep 28, 2021

Thanks @iosonofabio ! Can you also update the CHANGELOG.md file in the root of the repo? It's okay if you just commit the changes to the develop branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants