Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add slice to __getitem__ for Node(Data)View #4086

Closed
wants to merge 1 commit into from

Conversation

MridulS
Copy link
Member

@MridulS MridulS commented Jul 17, 2020

@dschult I started playing around with returning slice lists when the user explicitly asks for it. I have done this for Node(Data)View right now, would love to get some feedback and thoughts about this design before going further with this.

The following code will be valid if we have slices in Node(Data)View.

In [13]: G.nodes[0:5]                                                                                                             
Out[13]: [8433035090, 6272075181, 1180114949, 3701032566, 4290388111]

In [14]: G.nodes(data=True)[0:5]                                                                                                  
Out[14]: 
[(8433035090, {'w': 2716738093, 't': '6076217445'}),
 (6272075181, {'w': 7046427529, 't': '891215421'}),
 (1180114949, {'w': 9901849457, 't': '4451045765'}),
 (3701032566, {'w': 2467916768, 't': '3668632860'}),
 (4290388111, {'w': 505061655, 't': '5801405547'})]

In [15]: G.nodes(data='w')[0:5]                                                                                                   
Out[15]: 
[(8433035090, 2716738093),
 (6272075181, 7046427529),
 (1180114949, 9901849457),
 (3701032566, 2467916768),
 (4290388111, 505061655)]

@MridulS MridulS marked this pull request as draft July 17, 2020 18:26
@dschult
Copy link
Member

dschult commented Jul 18, 2020

This look good to me.
It's a little strange to have G.nodes[4] return an attribute dict while G.nodes[4:6] returns a list of nodes. Can we explain that in a succinct way to users?

@rossbar
Copy link
Contributor

rossbar commented Jul 21, 2020

This certainly seems like a usability improvement! My knee-jerk reaction was that this might be a pretty significant API change (I'm really not familiar with the current *View APIs) in the sense that it could have implications for things like e.g. backwards compatibility. Maybe a change like this is a nice candidate for a design document (NXEP)? Maybe that would be overkill - I'm not sure at all :)

@jarrodmillman
Copy link
Member

jarrodmillman commented Jul 21, 2020

I started writing this before @rossbar commented, so this is redundant. I am going to leave here just for the record.

@rossbar Would you have time to look at this in the next couple of days? I have some general concerns about introducing new "convenience-oriented" API, but don't have specific thoughts about this PR yet. I believe @MridulS mentioned recently that this is a frequently requested from new users. (I may be misremembering, so please correct me if so.) I don't know if it is appropriate for this change, but maybe we should consider this for our first (new feature/API) NXEP:

If it seems like a good idea to create a NXEP for this, maybe you can work with @MridulS to draft it. Mridul will have much more experience with the code base and working with new users than you, but you may have more experience with NXEPs (since I basically copied the NEP process).

It would be good for us to gain experience with NXEPs. Hopefully, we will have several as we move closer to NX 3.0 and it would be good for us to see if there is anything we need to improve about the process sooner rather than later. It will also be helpful to have a couple of examples for future contributors to look at.

@jarrodmillman
Copy link
Member

@MridulS It looks like @rossbar and I are both wondering whether it would make sense to create a NXEP for this. Among other things, that would allow us to record use cases and would serve as useful design documentation going forward. What are your thoughts?

@MridulS
Copy link
Member Author

MridulS commented Jul 21, 2020 via email

@jarrodmillman
Copy link
Member

The NXEP process is very similar to the PEP process and should be near identical to the NEP and SKIP process. Since it looks like you may be the first person to try this out for NX, please don't hesitate to ask questions about the format and process. If things are too cumbersome or don't make sense, we should use this opportunity to refine the NXEP process.

@MridulS
Copy link
Member Author

MridulS commented Jul 23, 2020

It's a little strange to have G.nodes[4] return an attribute dict while G.nodes[4:6] returns a list of nodes. Can we explain that in a succinct way to users?

I agree with this, and this will be even stranger for G.edges as G.edges[0, 1] will be an attr dict while G.edges[0:1] will be a list.

While working on the NXEP I had a wild-ish idea of copying pandas .head() feature as an alternate implementation of this. We could have something like G.nodes.head() and G.edges.head() which prints out the first x nodes/edges by default. One of main uses of writing down list(G.edges(data=True))[0:10] for users (IMO) is to declutter the screen (especially inside a jupyter notebook) when they have large graphs and G.edges will print out some 1,000 edges. Thoughts?

@dschult
Copy link
Member

dschult commented Jul 23, 2020

G.nodes.head(n) gives the first n nodes. This separates the dict-like lookup from the list-like slicing. Also, "tail" could give the last n nodes.

But, maybe we should call it slice to allow more generality:

G.nodes.slice(10)
G.nodes.slice(1:2:20)
G.nodes.slice(-10:)

Base automatically changed from master to main March 4, 2021 18:20
@dschult
Copy link
Member

dschult commented Mar 25, 2021

I occurs to me that users probably don't want to process "the first 10 edges".
They probably want to see them. We could provide for that use case by adding a __str__ method that produced something like the __repr__ only stopping after 5 edges (or some other number).

I think this fixes the primary motivating use-case -- are there other compelling use-cases?

@jarrodmillman
Copy link
Member

jarrodmillman commented Mar 26, 2021

I like Dan's suggestion about adding a __str__ method. The more I think about G.nodes[4:6], the less I like it. It allows new users who don't know about Python data structures to get a response, but it makes everything more confusing and will require us to carefully explain what is happening. I also am leaning against adding a slice method.

It doesn't save any characters:

G.nodes.slice(10)
list(G.nodes)[10]

And converting an iterable to a list using list() is standard Python that users should learn if they are using NetworkX.

@rossbar
Copy link
Contributor

rossbar commented Mar 26, 2021

One thing about a __str__ method is that it's not parametrized, so users would be stuck with whatever number of elements is set in the method, or we'd have to provide an external way of setting it via e.g. something like np.printoptions or an rc file. IMO these latter options seem like more trouble than they're worth. I'm not sure

>>> with nx.printoptions(num_elem=5):
...     G.nodes()

is more convenient than

>>> list(G.nodes())[:5]

and is certainly less flexible in any case (what if a user wanted every other node?).

@MridulS MridulS closed this Feb 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

4 participants