Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix stream allocation scoping #766

Closed

Conversation

komplexon3
Copy link
Collaborator

There was an issue where streams would be allocated globally (to a state) and locally. This should not happen. The expected behaviour is to never allocate streams locally to a scope.

This PR fixes this issue by never including streams in the scope transient analysis.

@codecov
Copy link

codecov bot commented Jun 28, 2021

Codecov Report

Merging #766 (b6e1c96) into master (ea8aeed) will decrease coverage by 21.66%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #766       +/-   ##
===========================================
- Coverage   80.49%   58.83%   -21.66%     
===========================================
  Files         222      224        +2     
  Lines       39796    40707      +911     
===========================================
- Hits        32031    23949     -8082     
- Misses       7765    16758     +8993     
Impacted Files Coverage Δ
dace/codegen/targets/fpga.py 13.96% <ø> (-55.28%) ⬇️
dace/sdfg/utils.py 66.02% <100.00%> (-17.45%) ⬇️
dace/libraries/pblas/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
dace/libraries/pblas/nodes/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
dace/libraries/pblas/environments/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
dace/libraries/pblas/nodes/pgeadd.py 0.00% <0.00%> (-94.81%) ⬇️
dace/frontend/octave/ast_expression.py 11.39% <0.00%> (-82.67%) ⬇️
dace/transformation/interstate/transient_reuse.py 12.80% <0.00%> (-81.60%) ⬇️
dace/transformation/testing.py 0.00% <0.00%> (-79.66%) ⬇️
dace/codegen/targets/sve/unparse.py 12.24% <0.00%> (-76.87%) ⬇️
... and 147 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ea8aeed...b6e1c96. Read the comment docs.

@definelicht
Copy link
Contributor

definelicht commented Jun 28, 2021

I think the approach is still not the right one. What we want is to make sure that FPGA streams are always allocated in exactly one place, which is at the beginning of kernel generation. My intuition would be that we have to:

  • Make sure that all streams are allocated when generating the FPGA kernel.
  • Never allocate local streams when inside an FPGA kernel.

@komplexon3
Copy link
Collaborator Author

komplexon3 commented Jun 28, 2021

Okay - I found something interesting. The place where the stream allocation "would" be suppressed [here],(

if not isinstance(nodedesc, dt.Stream):
) it is explicitly not suppressed because of something with the Intel FPGA backend. Can someone explain to me why this is?

@definelicht
Copy link
Contributor

Okay - I found something interesting. The place where the stream allocation "would" be suppressed [here],(

if not isinstance(nodedesc, dt.Stream):

) it is explicitly not suppressed because of something with the Intel FPGA backend. Can someone explain to me why this is?

Honestly I already find it a bit weird that we are suppressing these allocations here. Have you checked the callsite of where allocate_array is being called from in the case that is causing problems for you? Could you post the call stack here?

@tbennun
Copy link
Collaborator

tbennun commented Jun 28, 2021

I think the approach is still not the right one. What we want is to make sure that FPGA streams are always allocated in exactly one place, which is at the beginning of kernel generation. My intuition would be that we have to:

  • Make sure that all streams are allocated when generating the FPGA kernel.
  • Never allocate local streams when inside an FPGA kernel.

exactly this. This PR should be changed to focus on making sure the FPGA codegen allocates all streams in one place.

@komplexon3
Copy link
Collaborator Author

komplexon3 commented Jun 28, 2021

@definelicht This would be the stack trace (made it throw an exception if an already defined stream reaches this bit of code).

It is called by dispatch_allocate. That's the same code allocation "top-level local" data.

Traceback (most recent call last):
  File "tests/stream_allocation_scope_test.py", line 98, in <module>
    test_stream_allocation_scope()
  File "tests/stream_allocation_scope_test.py", line 91, in test_stream_allocation_scope
    assert sdfg.generate_code()[2].clean_code.count("dace::SetNames(AtoB") == 1
  File "/home/widmmarc/k3-dace/dace/sdfg/sdfg.py", line 2249, in generate_code
    program_code = codegen.generate_code(sdfg)
  File "/home/widmmarc/k3-dace/dace/codegen/codegen.py", line 178, in generate_code
    used_environments) = frame.generate_code(sdfg, None)
  File "/home/widmmarc/k3-dace/dace/codegen/targets/framecode.py", line 552, in generate_code
    states_generated = self.generate_states(sdfg, global_stream,
  File "/home/widmmarc/k3-dace/dace/codegen/targets/framecode.py", line 463, in generate_states
    cft.as_cpp(self.dispatcher.defined_vars, sdfg.symbols), sdfg)
  File "/home/widmmarc/k3-dace/dace/codegen/control_flow.py", line 195, in as_cpp
    expr += elem.as_cpp(defined_vars, symbols)
  File "/home/widmmarc/k3-dace/dace/codegen/control_flow.py", line 126, in as_cpp
    expr += self.dispatch_state(self.state)
  File "/home/widmmarc/k3-dace/dace/codegen/targets/framecode.py", line 440, in dispatch_state
    self._dispatcher.dispatch_state(sdfg, state, global_stream, stream)
  File "/home/widmmarc/k3-dace/dace/codegen/dispatcher.py", line 310, in dispatch_state
    satisfied_dispatchers[0].generate_state(sdfg, state,
  File "/home/widmmarc/k3-dace/dace/codegen/targets/fpga.py", line 211, in generate_state
    self.generate_kernel(sdfg, state, kernel_name, subgraphs,
  File "/home/widmmarc/k3-dace/dace/codegen/targets/fpga.py", line 1399, in generate_kernel
    self.generate_kernel_internal(sdfg, state, kernel_name, subgraphs,
  File "/home/widmmarc/k3-dace/dace/codegen/targets/xilinx.py", line 798, in generate_kernel_internal
    self.generate_modules(sdfg, state, kernel_name, subgraphs,
  File "/home/widmmarc/k3-dace/dace/codegen/targets/fpga.py", line 1450, in generate_modules
    self.generate_module(sdfg, state, module_name, subgraph,
  File "/home/widmmarc/k3-dace/dace/codegen/targets/xilinx.py", line 732, in generate_module
    self._dispatcher.dispatch_subgraph(sdfg,
  File "/home/widmmarc/k3-dace/dace/codegen/dispatcher.py", line 351, in dispatch_subgraph
    self.dispatch_scope(v.map.schedule, sdfg, scope_subgraph,
  File "/home/widmmarc/k3-dace/dace/codegen/dispatcher.py", line 403, in dispatch_scope
    self._map_dispatchers[map_schedule].generate_scope(
  File "/home/widmmarc/k3-dace/dace/codegen/targets/fpga.py", line 510, in generate_scope
    self.generate_node(sdfg, dfg_scope, state_id,
  File "/home/widmmarc/k3-dace/dace/codegen/targets/fpga.py", line 1043, in generate_node
    getattr(self, method_name)(sdfg, dfg, state_id, node,
  File "/home/widmmarc/k3-dace/dace/codegen/targets/fpga.py", line 1345, in _generate_MapEntry
    self._dispatcher.dispatch_allocate(sdfg, dfg, state_id, child, None,
  File "/home/widmmarc/k3-dace/dace/codegen/dispatcher.py", line 424, in dispatch_allocate
    dispatcher.allocate_array(sdfg, dfg, state_id, node, function_stream,
  File "/home/widmmarc/k3-dace/dace/codegen/targets/fpga.py", line 543, in allocate_array
    raise Exception

I will also need to ensure that all streams are allocated once (at the top-level). At the moment, only data shared between two or more subgraphs are allocated "top-level locally". (Missed that bit before.) This should be rather straightforward - just look for all streams in the subgraphs and allocate them when it allocates the data shared between subgraphs.

@definelicht
Copy link
Contributor

As a first step, make sure that all these streams are allocated where we want them to be allocated (i.e., at the beginning of generating an FPGA kernel). Then we can see if we need to prevent them from being reallocated later, but we might not need to, because of that code you linked.

@komplexon3
Copy link
Collaborator Author

Now all streams are allocated "top-level locally". The FPGA codegen now allocates data at that level if it shared between two subgraphs (status-quo) or if it is a stream.

There is still the issue with them being declared locally due to that explicit check mentioned before.

if not isinstance(nodedesc, dt.Stream):

@TizianoDeMatteis It was added by this commit. Could you explain to me why this check is needed? I am not familiar with the concept of views in DaCe.

@TizianoDeMatteis
Copy link
Contributor

TizianoDeMatteis commented Jul 7, 2021

@TizianoDeMatteis It was added by this commit. Could you explain to me why this check is needed? I am not familiar with the concept of views in DaCe.

This is not directly related to views. We need to allocate anyway streams to keep track of their names across the SDFGs hierarchy (a stream may have a certain name in the top-level scope, then another one in a nested SDFG and so on...)

@tbennun tbennun closed this Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants