Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust VPR flags to reduce runtime #1735

Merged
merged 4 commits into from
Nov 13, 2020

Conversation

HackerFoo
Copy link
Contributor

@HackerFoo HackerFoo commented Oct 30, 2020

With these settings, baselitex on the 50t is 24% faster on pack, place and route, while ibex is 16% faster, with scalable_proc tests running faster as well.

@litghost
Copy link
Contributor

CI is red, looks like flags were not deduped correctly.

Also I'd like to see what the runtime / QoR trade off looked like.

@HackerFoo
Copy link
Contributor Author

HackerFoo commented Oct 30, 2020

Comparing the last run to #1726 (old) for baselitex:

old runtime:
  pack:   147.20 seconds (max_rss 3498.1 MiB)
  place:  185.75 seconds (max_rss 3471.0 MiB)
  route:  488.92 seconds (max_rss 3471.1 MiB)

new runtime:
  pack:   152.99 seconds (max_rss 3499.8 MiB)
  place:  175.32 seconds (max_rss 3472.9 MiB)
  route:  293.65 seconds (max_rss 3472.3 MiB)

old CPD:
  sys_clk to sys_clk CPD:       18.2112  ns (54.9112 MHz)
  clk200_clk to clk200_clk CPD:  5.77223 ns (173.243 MHz)

new CPD:
  sys_clk to sys_clk CPD:       18.2289  ns (54.8581 MHz)
  clk200_clk to clk200_clk CPD:  5.66468 ns (176.533 MHz)

@litghost
Copy link
Contributor

Also I'd like to see what the runtime / QoR trade off looked like.

Let me be more specific. Much like inner num for the placer, A* is a runtime versus quality trade off. The lookahead quality determines the sharpness of the trade off. I'd like to see how close to the edge we are with an A* of 1.8. Also at some point as A* gets higher, the router may actually slow down again as the router excessively trusts the lookahead.

So I'm basically asking for two graphs, with A* on the x-axis of both. On the y-axis of one is the runtime, and the other is CPD. A third possible graph would be iterations to convergence on the y-axis.

How many circuits are you testing with?

@HackerFoo
Copy link
Contributor Author

HackerFoo commented Oct 30, 2020

I've analyzed the runtime tradeoffs of 5 different parameters for baselitex and ibex on the arty. Here is the Colab I've been working from.

@litghost
Copy link
Contributor

litghost commented Oct 30, 2020

Two circuits is likely too small to be confident that the new parameters are robust. At a minimum I would add something like the scalable proc so you can increase the fabric usage pressure and make sure the new parameters are not too optimistic.

@litghost
Copy link
Contributor

litghost commented Oct 30, 2020

Did I miss it, or did you not test A* = 1.2 (the VPR default)? Also I recommend running at least one run at A* <= 1, as this will approach the best case QoR (from a router standpoint), and give you a ratio of best QoR / worst runtime point versus a best runtime / ??? QoR point.

@HackerFoo
Copy link
Contributor Author

HackerFoo commented Oct 30, 2020

I did run 1.2, but the data isn't there because I focused the matrix of parameters on what was working well. As you can see from the data above, though, there is little to no impact on CPD.

For scalable_proc:

old:
top_bram_n8: 164.11 seconds (max_rss 747.3 MiB)
top_bram36_n8: 166.70 seconds (max_rss 746.6 MiB)
top_dram_n3: 65.49 seconds (max_rss 706.4 MiB)

new:
top_bram_n8: 95.26 seconds (max_rss 746.9 MiB)
top_bram36_n8: 154.12 seconds (max_rss 746.3 MiB)
top_dram_n3: 48.55 seconds (max_rss 706.7 MiB)

@HackerFoo
Copy link
Contributor Author

sqlite3.OperationalError: database or disk is full

@litghost
Copy link
Contributor

litghost commented Oct 30, 2020

sqlite3.OperationalError: database or disk is full

We've been seeing this, but it isn't clear why. df -h reports that the working disk is 4 TB which feels like a network backed disk. You can examine the logs from #1725 to see this behavior. My best guess is even though df -h is reporting plenty of free space, there is an effective limit that is not obvious.

========================================
Disk usage
----------------------------------------
Filesystem      Size  Used Avail Use% Mounted on
udev             52G     0   52G   0% /dev
tmpfs            11G  8.6M   11G   1% /run
/dev/sda1        99G   69G   26G  73% /
tmpfs            52G     0   52G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            52G     0   52G   0% /sys/fs/cgroup
/dev/sdc1       246G   43G  203G  18% /opt/Xilinx
cgmfs           100K     0  100K   0% /run/cgmanager/fs
/dev/sdb1       4.0T  5.7G  4.0T   1% /tmpfs
tmpfs            11G     0   11G   0% /run/user/1000
----------------------------------------

The working directory is /tmpfs for kokoro.

@acomodi
Copy link
Contributor

acomodi commented Nov 3, 2020

I think that if we get good improvements in run-time at a slight CPD cost, it might be worth adding these flags.

One thing though would be to compare the xc7_qor produced by this PR's Kokoro CI and the one produced by the current master.

@litghost
Copy link
Contributor

litghost commented Nov 3, 2020

One thing though would be to compare the xc7_qor produced by this PR's Kokoro CI and the one produced by the current master.

xc7 QoR looks really good. The results from the arch-defs show this appears to be a solid point, at least per the circuits in arch-defs.

@HackerFoo can we add curves to the Colab page? I think we should be prepared to show these results this Thursday and see if we can get some insight from Vaughn.

@HackerFoo
Copy link
Contributor Author

I'm running a sweep and also documenting how I do this here.

@HackerFoo
Copy link
Contributor Author

I've added a scatter plot of runtime vs. max frequency for each of ibex, baselitex, and bram-n3.

@HackerFoo
Copy link
Contributor Author

@litghost I've added more detailed instructions to the colab.

@litghost
Copy link
Contributor

litghost commented Nov 11, 2020

@litghost I've added more detailed instructions to the colab.

New instructions look good. Please update the Colab copy in git. Last thing (besides looking at CI results) is to change the references for https://github.com/HackerFoo/nix-symbiflow to https://github.com/Symbiflow/nix-symbiflow . You need to add DCO checks to that repo, and add DCO on your commits.

GitHub
Nix packages for SymbiFlow projects and dependencies - HackerFoo/nix-symbiflow
GitHub
GitHub is where people build software. More than 50 million people use GitHub to discover, fork, and contribute to over 100 million projects.

@HackerFoo
Copy link
Contributor Author

HackerFoo commented Nov 12, 2020

New instructions look good. Please update the Colab copy in git. Last thing (besides looking at CI results) is to change the references for https://github.com/HackerFoo/nix-symbiflow to https://github.com/Symbiflow/nix-symbiflow . You need to add DCO checks to that repo, and add DCO on your commits.

@litghost I've updated the Colab in git, and changed the references to https://github.com/SymbiFlow/nix-symbiflow, which has DCO checks.

I'm re-running the "Xilinx Series 7 - Install (Presubmit)" test which failed due to an infrastructure failure.

Assuming there are no problems with that, is this PR okay to merge?

GitHub
Nix packages for SymbiFlow projects and dependencies - HackerFoo/nix-symbiflow
GitHub
Nix packages for SymbiFlow projects and dependencies - SymbiFlow/nix-symbiflow
GitHub
Nix packages for SymbiFlow projects and dependencies - SymbiFlow/nix-symbiflow

@HackerFoo
Copy link
Contributor Author

Runtime is 24% faster for ibex (2% higher CPD), 10% faster for litex (7% lower CPD.)

@litghost
Copy link
Contributor

litghost commented Nov 12, 2020

Runtime is 24% faster for ibex (2% higher CPD), 10% faster for litex (7% lower CPD.)

So the previous settings from ead80ae were a Pareto improvement on geomean(route_time) and geomean(CPD) over master. That is not the case on the latest settings. I believe the settings from ead80ae were a better point?

@HackerFoo
Copy link
Contributor Author

@litghost Which designs are worse? I can revert the settings to ead80ae.

@litghost
Copy link
Contributor

litghost commented Nov 13, 2020

ibex and ddr_uart_arty both show significant change, but wide array of designs are worse at that design point. Mean percentage change is 4% worse, geomean CPD is 3% worse. The other point had a geomean CPD change of less than 0.2 %. Given that ead80ae has most of the performance gain with basically no geomean CPD change feels like a superior trade.

@HackerFoo
Copy link
Contributor Author

@litghost Okay, I've reverted the settings. Anything else before I merge this?

@litghost
Copy link
Contributor

@litghost Okay, I've reverted the settings. Anything else before I merge this?

Just waiting for green. Please rebase on master to grab the other fixes.

Copy link
Contributor

@litghost litghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, merge once green. Recommend rebasing on master.

Signed-off-by: Dusty DeWeese <dustin.deweese@gmail.com>
Signed-off-by: Dusty DeWeese <dustin.deweese@gmail.com>
Signed-off-by: Dusty DeWeese <dustin.deweese@gmail.com>
Signed-off-by: Dusty DeWeese <dustin.deweese@gmail.com>
@HackerFoo
Copy link
Contributor Author

Vendor tool tests are failing due to unrelated compilation errors.

@HackerFoo HackerFoo merged commit b175e3a into f4pga:master Nov 13, 2020
@@ -1,6 +1,6 @@
cairosvg
gitpython
hilbertcurve
hilbertcurve==1.0.5
Copy link
Contributor

@litghost litghost Nov 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HackerFoo Please file an issue with upstream hilbertcurve, and create an issue (and PR to add a TODO comment here) to remove the pin once upstream hilbertcurve is fixed.

Copy link
Contributor Author

@HackerFoo HackerFoo Nov 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the issue is upstream. The API changed, which is okay for a major version change (2.x)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR comment in galtay/hilbertcurve#25 (comment) indicated that that specific PR was supposed to be backwards compatible. It is unclear if the API interface was supposed to change here, or if it was an accident.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment was basically to post an issue upstream showing that from 1.0.5 to 2.0.x that this API no longer existed/worked, and determine if that break was intentional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the notes from 2.0.3 do say New API, which is foreboding.

Regardless, we should likely create an issue to remove the pin in the future to avoid being in a situation where we have a very stale dependency. This isn't equivalent, but a numpy pin on prjxray eventually resulted in the pip install for numpy to require building it instead of using pip binary caches. Ideally we'd like to avoid something like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose removing the dependency and using VPR's RR node reordering option: #1773

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Can you please open a PR to that effect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll assign the issue to me.

litghost added a commit to litghost/symbiflow-arch-defs that referenced this pull request Nov 16, 2020
…time"

This reverts commit b175e3a, reversing
changes made to f71a554.
litghost added a commit to litghost/symbiflow-arch-defs that referenced this pull request Nov 16, 2020
…time"

This reverts commit b175e3a, reversing
changes made to f71a554.

Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
litghost added a commit to litghost/symbiflow-arch-defs that referenced this pull request Nov 16, 2020
…time"

This reverts commit b175e3a, reversing
changes made to f71a554.

Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants