Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Refactor creation of network 'feeds' (to allow new mechanism for adding network drives) #191

Merged
merged 9 commits into from Nov 16, 2020

Conversation

cjayb
Copy link
Collaborator

@cjayb cjayb commented Sep 26, 2020

Based on the work done in #177, continue creating new API for adding (and activating/simulating) external drives of the network (currently 'feeds'). Some ideas are discussed in a separate Doc (contact @cjayb for access). This is going to be a test-breaking PR due to very idiosyncratic handling of random seeds used in HNN-GUI feed assignment.

Edit 6 Nov:
Discussed GID-handling with @jasmainak and figured much could be won by disentangling it from NetworkBuilder. As it stands, Network creates the gid_dict on initialisation, so all that's really left for Builder to do is assign them to the different nodes (ranks).

Aims

  • clarify GID creation & handling
  • move all ExtFeed calls into Network, allowing instantiation of all external drives before NetworkBuilder is called (check old "API" still works)
  • move parallel_backends.py:_clone_and_simulate:L34-35 to NetworkBuilder in order to "provide jitter between trials" (prng_seedcore needs to be different)
    • turned out not to be needed
  • generally refactor Network and NetworkBuilder so that building a new mechanism easier (separate PR)

For future PR

  • demonstrate API ideas to allow incrementally adding drives
  • create new tests of Network event times
  • create new visualisations of Network event times
  • discuss and settle on seed handling
  • probably more...

Closes #104

@cjayb
Copy link
Collaborator Author

cjayb commented Sep 26, 2020

Just getting off to a start to make @jasmainak happy ;) Your last MAINT to #177 really helped clear it up for me, thanks! Hope to push some API ideas in soon. Ping @rythorpe

@codecov-commenter
Copy link

codecov-commenter commented Sep 26, 2020

Codecov Report

Merging #191 into master will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #191      +/-   ##
==========================================
+ Coverage   67.80%   67.82%   +0.01%     
==========================================
  Files          19       19              
  Lines        2022     2023       +1     
==========================================
+ Hits         1371     1372       +1     
  Misses        651      651              
Impacted Files Coverage Δ
hnn_core/network.py 62.25% <100.00%> (+0.25%) ⬆️
hnn_core/network_builder.py 92.94% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a1cc05d...1895e51. Read the comment docs.

Comment on lines 258 to 321
def _update_gid_dict(self):
"""Creates gid dict from scratch every time called.

Any method that adds real or artificial cells to the network must
call this to update the list of GIDs. Note that it's based on the
content of pos_dict and the lists of real and artificial cell names.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great idea @cjayb and will allow the user to more flexibly create or destroy cells or feeds! Since Network.gid_dict and Network.pos_dict are so intimately tied, would it make sense to combine their assignment under a unified method?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, this will only add or remove keys, not modify the values, right? E.g., if I have L2_Pyramidal: range(0, 100) and L5_Pyramidal: range(100, 200), do you imagine each of the key-value pairs to be updated when new cells are added?

Comment on lines 298 to 307
n_times = np.arange(0., self.net.params['tstop'],
self.net.params['dt']).size + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In support of this change, I can't think of any reason why n_times needs to be an attribute of Network. @blakecaldwell are there any GUI-related considerations we're missing here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it doesn't seem necessary. Thanks for notifying me @rythorpe.

@jasmainak
Copy link
Collaborator

@cjayb do you want to pair program sometime for this?

and by pair program, I mean really code and not discuss ;-)

@cjayb
Copy link
Collaborator Author

cjayb commented Nov 1, 2020 via email

@cjayb
Copy link
Collaborator Author

cjayb commented Nov 4, 2020

@jasmainak as a discussion-starter for tomorrow. Sorry about the git-hell, will prob. have to ditch this and start from scratch anyway! Bottom line: a lot harder to reach the type of API we were discussing. Some ideas here (like a curry'ed "generating function"), but looking forward to your insight :)

@cjayb cjayb force-pushed the network_drive_pr branch 2 times, most recently from db357a6 to a496eb5 Compare November 6, 2020 09:14
@cjayb
Copy link
Collaborator Author

cjayb commented Nov 6, 2020

As the commit says: WIP So no need to review just yet ;) There's a lot of cruft I need to weed out, but basically very confident that ExtFeed is now out of NetworkBuilder for good. I'll give a shout when I think it might make sense to take a closer peek.

@codecov-io
Copy link

codecov-io commented Nov 6, 2020

Codecov Report

Merging #191 (7d16670) into master (91f7507) will increase coverage by 1.62%.
The diff coverage is 86.81%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #191      +/-   ##
==========================================
+ Coverage   66.85%   68.48%   +1.62%     
==========================================
  Files          21       21              
  Lines        2136     2278     +142     
==========================================
+ Hits         1428     1560     +132     
- Misses        708      718      +10     
Impacted Files Coverage Δ
hnn_core/dipole.py 55.17% <0.00%> (-0.65%) ⬇️
hnn_core/mpi_child.py 0.00% <0.00%> (ø)
hnn_core/tests/test_feed.py 100.00% <ø> (ø)
hnn_core/parallel_backends.py 16.20% <40.00%> (-0.62%) ⬇️
hnn_core/cell.py 92.02% <72.41%> (-2.63%) ⬇️
hnn_core/pyramidal.py 98.25% <72.72%> (-1.75%) ⬇️
hnn_core/network.py 53.46% <79.03%> (+6.99%) ⬆️
hnn_core/basket.py 100.00% <100.00%> (ø)
hnn_core/network_builder.py 93.51% <100.00%> (+0.63%) ⬆️
hnn_core/tests/test_cell.py 98.24% <100.00%> (+2.41%) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 91f7507...7d16670. Read the comment docs.

@cjayb
Copy link
Collaborator Author

cjayb commented Nov 6, 2020

Awh crap, MPI is not happy about my changes. Will try to backtrack order changes in NetworkBuilder.
Edit : figured it out, problem with the way I loop over GIDs in NetworkBuilder. No worries, push imminent.

# XXX rename gid_dict in future
self.gid_dict = dict()

# When computing the network dynamics in parallel, the nodes of the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought "node" meant the different machines that you distribute your jobs to.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I guess there are plenty of inconsistencies in the variable naming, as there are in the comments. Not sure how to address that, other than keep on cleaning up? What would you call the cells of the network? "Cells", I suppose, though in terms of a network, they are just the "nodes" of a graph.

hnn_core/network.py Outdated Show resolved Hide resolved
hnn_core/network.py Outdated Show resolved Hide resolved
hnn_core/network.py Outdated Show resolved Hide resolved
return self._assigned_gid

@gid.setter
def gid(self, gid):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a realistic usecase for being able to update cell gids? Otherwise, I would just hide the functionality. The less you expose, the better

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this is a little stronger than hiding, no? I'm not sure the Cell gid ever needs to be updated, certainly not by users! Once it gets set, it should raise all hell if anything tries to change it. However, it would be nice to be able to get the value using cell.gid, hence the @property. I know this complicates the _Cell class a little, but may make reading downstream code clearer (see network_builder:L431-440)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure why we need 2 gid attributes here (i.e., _assigned_gid and gid). Why can't we just have one attribute named _ArtificialCell._gid that is set to the arg gid using your decorated setter function, and is retrieved by your decorated property function? Unless I'm missing something nuanced in the logic, such a change would simplify the gid code in this class as well as subsequent cell classes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, maybe a little convoluted there yes. I just pushing something now, which I instantly regretted :( Ignore first push, my proposal will be the second one! To answer your question: I'm proposing to protect .gid because we don't want to risk anything messing with a Cell'a gid once building has begun. Perhaps there's a better way of achieving this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm imagining is something like

def __init__(self, event_times, threshold, gid=None):
   ...
   self.gid = gid

@gid.setter
def gid(self, gid)
   if self.gid is None:
      self.gid = gid
   else:
      raise RuntimeError('Global ID for this cell already assigned!')

Copy link
Collaborator Author

@cjayb cjayb Nov 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here with the browser refresh ;) I'm trying your proposal, but test_cell hangs? I would be surprised if you could do self.gid = gid, isn't that trying to call the 'getter' and setter at the same time?! I've seen the _-prefix in cases like this elsewhere, thought it was a 'pattern'...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong, but the only consideration here is that the gid of a given cell should only be assigned to a value other than None once.

That's my goal, yes.

Copy link
Collaborator

@jasmainak jasmainak Nov 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me shed some light here. You can use private attributes (starting with underscore) for stuff that you don't want the user to mess with. For example if the object is complex and updating one attribute actually affects the rest of the object. For instance if the object had an attribute called sfreq (sampling frequency) and another attribute called times, you would have to update both simultaneously. This is why it sometimes makes sense to protect certain attributes and update them only through special functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here with the browser refresh ;) I'm trying your proposal, but test_cell hangs? I would be surprised if you could do self.gid = gid, isn't that trying to call the 'getter' and setter at the same time?!

Shoot, you're right. I think your implementation in 7d16670 is our best bet.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can always be changed to something smoother later.

@jasmainak
Copy link
Collaborator

@cjayb most of my comments are nitpicky about variable names.

I think the next step is to write tests for the new methods. This way you'll know --> a) they work as you expect b) they are modular enough to be part of tests

}

self._gid_assign()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think _gid_assign should happen after _create_cells_and_feeds, no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my surprise too: No! Turns out L398 is the key. The for-loop there must be on self.net._gid_list, which contains precisely those cells (gids) that exist on this rank/host. So the _gid_assign (note the round-robin for-loop) essentially 'picks out' the gids that belong to the present rank, after which create_... instantiates them here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if cells are created after the GIDs are assigned to different nodes, how will we allow users to add feeds? User code should not depend on how the GID assignment is done ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can't. But that should be OK, no? We can create pretty visualisations of the input to a net before it ever gets built. So seeing the effects of adding and removing feeds (histograms etc.) can be done interactively, _gid_assign is not run until Builder. Note that the reliance on self.net._p_unique is something that the future implementation could make prettier, but right now it works as it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, @cjayb it sounds like what you're saying is that all aspects of the feed/network that we intend the user to interact with should be defined prior to NetworkBuilder? If so, this makes sense since the nrn interface should be invisible to the user.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That wasn't really my goal but it mostly just became so when extracting the feed instantiation into Network. Now I think it's a good goal though, as you say ;)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I'm not implying that GID handling is "solved" quite yet. Baby steps... One of them being the protected cell attribute.

}

self._create_cells_and_feeds()

self._gid_assign()
Copy link
Collaborator

@jasmainak jasmainak Nov 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been thinking a bit about what is inside _gid_assign. I think ideally NetworkBuilder should take rank in the __init__ that is passed from mpi_child.py and then _clone_and_simulate. Then inside the __init__ you do:

self.rank = rank

and use self.rank inside _gid_assign instead of _get_rank. This will make it clear that NetworkBuilder is actually on a particular node. That way you will also limit nested calls to Neuron (via _get_rank) which we know can behave somewhat arbitrarily. Probably for another PR though

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, will keep in mind, as I'm pretty sure it'll belong to this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't hard, just added rank and n_hosts as attributes to NetworkBuilder objects.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it back: something exploded so I rolled it back. Separate PR!

@cjayb
Copy link
Collaborator Author

cjayb commented Nov 9, 2020

I think the next step is to write tests for the new methods. This way you'll know --> a) they work as you expect b) they are modular enough to be part of tests

Thanks for the comments so far. I'm very much in favour of writing tests, but wonder if it might make sense to discuss where to go from here first?

This PR is already substantial, and it's a major step forward: no more event_times created in NetworkBuilder. NB now takes care of GID assignment, cell creation & connection etc. All 'feeds' are created in Network, so they can be e.g. visualised prior to building & solving the network itself.

How about I take a break here & add some tests to test_network on expected event_times when using the default/testing parameters? But kinda leave it at that? The reason is that I have no intention of leaving network.py in the sorry state it is in! I want to implement a totally new API for getting the event_times instantiated, so making tests for the current bloody mess seems like an effort misplaced?

My proposal is: change the scope of this PR to the present (plus a few more tests). Then merge & move the remaining aims to a new branch. WDYT?

@cjayb
Copy link
Collaborator Author

cjayb commented Nov 10, 2020

Um so the rebase was a mess:
https://github.com/cjayb/hnn-core/pull/new/network_drive_pr_rebase
OK to force-push here? I guess the review history for this PR will be lost?

Obviously, I think my changes to cell and pyramidal are improvements, such as being explicit about override_params in __init__ etc. Note that with the last commit (34dbddb), _create_cells_and_feeds and _gids_to_parallel_context no longer rely on params!

@jasmainak
Copy link
Collaborator

jasmainak commented Nov 10, 2020 via email

hnn_core/network.py Outdated Show resolved Hide resolved
hnn_core/network.py Outdated Show resolved Hide resolved
@jasmainak
Copy link
Collaborator

@cjayb looks like you have rebase conflicts. Could you fix them? I'm fetching your PR and looking at it in the meanwhile.

@@ -42,6 +42,7 @@ def simulate_dipole(net, n_trials=None):

if n_trials is not None:
net.params['N_trials'] = n_trials
net._instantiate_feeds() # need to redo these if n_trials changed!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ughgh ... how will this work when you instantiate the feeds outside of simulate_dipole ? do you have any ideas?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked: n_trials is only an argument to simulate_dipole. All other parts of code check net.params['N_trials'] to figure out the same. So there are no other places to worry about wrt. instantiation of event_times.
I know this isn't pretty. I've made n_trials an explicit argument of _instantiate_feeds to make it clearer there's a dependency there.

hnn_core/network.py Outdated Show resolved Hide resolved
Comment on lines +268 to +282
prng_seedcore_initial = cur_params['prng_*'].copy()
for param_key in prng_seedcore_initial.keys():
cur_params[param_key] =\
prng_seedcore_initial[param_key] + trial_idx
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should rethink this approach now that a lot of feed.py code is simplified in #204 and we begin to understand what that code is doing. I tagged you in #113

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not merge #204 first, then rebase and fix this (more invasive) PR on that? I also predict #113 will then just melt away :)

if src_type in ('L2_pyramidal', 'L5_pyramidal'):
cell = Cell(gid, src_pos, params)
PyramidalCell = type2class[src_type]
# XXX Why doesn't a _Cell have a .threshold? Would make a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure ... separate PR :)

@jasmainak
Copy link
Collaborator

PR looks fine to me, happy to merge once CIs come green. @rythorpe do you want to take a look?

Squash of several iterations
- moved ExtFeed calls to Network
- made GID an optional property of real and artificial cells
- reordered and refactored NetworkBuilder methods

More cleanup to come...

creation of cells and PC creation separated

minor cleanup

a _Cell works fine without a gid

SERIAL creation of cells and feeds possible (still ugly)

remove superfl. gid in ArtificialCells

moved calls to ExtFeed into Network

re-introduce gid as protected Cell attribute

added tests for Cell and _ArtCell gid

cleanup and rearrange NB methods

minor tweak hoping MPI will be placated

testing GID assignment OK

MPI working, test prob fail due to seed assignment

added trial_idx information to NetworkBuilder

gotcha when overriding params['N_trials'] at runtime

added event_time tests and some cleanups

no need to pass params dict for cell creation

rebase and fix pyramidal.py

RENAMED gid_dict to gid_ranges

ranaming, refactoring and cleanup

rebase and flake8
@cjayb
Copy link
Collaborator Author

cjayb commented Nov 16, 2020

Had to squash to make rebasing feasible: #190 really made it hard!
The last commit (da134c7) is quite invasive, so maybe @ntolley would like to take a look? Basically I don't see why times should be an attribute of a Network? I've moved it into Spikes, which seems to me to make more sense (though could be wrong?).
I feel quite strongly though about times being the same for all n_trials! I've made an in-line comment about it in parallel_backends.py, please take a look.

@jasmainak I still think #204 could be merged before this, but I don't feel like another rebase on this one, so hope we could move fast?

net.spikes._vsoma.append(spikedata[3])
net.spikes._times.append(net.times.tolist())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason different trials should be allowed to have different times?!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing that the goal here was to maintain consistent formatting between the primary attributes of the Spikes class, but you're right that this is probably unnecessary. Any considerations we're missing @ntolley?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency was the main motivation, the way it is written should not allow for different times across trials. times is actually created from directly from params when the network is instantiated, this function is really just copying the data.

Not really a big deal to change and probably cleaner this way. Most important thing is for the examples to demonstrate how to use the variable in plotting.

@jasmainak
Copy link
Collaborator

yes, let's move fast. But another set of eyes other than me would be good. @ntolley or @rythorpe perhaps?

hnn_core/network.py Outdated Show resolved Hide resolved
Copy link
Contributor

@rythorpe rythorpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I now have a decent grasp of this PR. (Sorry for being so absent; as soon as I get this manuscript I've been working on submitted I'll be much more present.) Your method for creating jittered feed times all within Network.instantiate_feeds() such that they can be referenced on a trial-by-trial basis in NetworkBuilder and across parallel backends will be super useful as we move forward. Thanks @cjayb!

@jasmainak jasmainak merged commit fd545cb into jonescompneurolab:master Nov 16, 2020
@jasmainak
Copy link
Collaborator

It's in, thanks @cjayb ! :-) Let's merge #204 next so you can make your next PR on top of it

@cjayb
Copy link
Collaborator Author

cjayb commented Nov 16, 2020

Cool! Thanks both, good to get this out of the way :) @jasmainak what's the best way I can help rebase #204?

@jasmainak
Copy link
Collaborator

jasmainak commented Nov 16, 2020

@cjayb I am working on my rebase and realized that trial_event_times should have dict() as the outermost level. Then inside you can probably have numpy arrays instead of "list of list" ... might be the cleanest.

EDIT: maybe not numpy array but still dict should probably be outermost level

@cjayb
Copy link
Collaborator Author

cjayb commented Nov 16, 2020

A dict with which keys though? I don't have a strong opinion there so feel free to suggest a change in #204!

@jasmainak
Copy link
Collaborator

It should be dict (keys are the src_type) of list (gids) of list (n_trials) of list (n_events) ? I think it will help unify things related to trial jitter etc.

@cjayb cjayb deleted the network_drive_pr branch November 18, 2020 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

clean up handling of external 'input' feeds
7 participants