Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] [BIT 578] speed up metagraph storage query #933

Closed
wants to merge 44 commits into from

Conversation

camfairchild
Copy link
Collaborator

@camfairchild camfairchild commented Sep 29, 2022

This PR speeds up the storage call made bysubtensor.neurons in the subtensor.use_neurons_fast function.

This feature works by bundling a nodejs binary with the polkadotjs API.
This binary is a CLI that implements the sync_and_save --filename <default:~/.bittensor/metagraph.json> --block_hash <default:latest> command.
This syncs the metagraph at the blockhash and saves it to a json file.

The speed-up is quite significant, below is a test run of the manual without the fix, with the ipfs cache, and with the fix.
output

And below is the IPFS cache sync versus the manual sync (with fix)

output_cach_vs_fixed_manual

A pro of this is that it removes the need for a centralized IPFS cache of the metagraph.

A downside of this fix is that the binaries with nodejs bundled use ~50MB each (one linux, one macos).
There is currently no binary for windows, but I'm not certain this should be included anyway, as we only support linux/macos.

Another pro of this fix is it works on both nobunaga and nakamoto, and can be adapted to any network.
This also leaves room for adding other large substrate queries and working further with the polkadot js api.

@camfairchild camfairchild requested review from unconst, Eugene-hu and shibshib and removed request for Eugene-hu October 3, 2022 15:21
@camfairchild camfairchild marked this pull request as ready for review October 3, 2022 15:21
setup.py Outdated Show resolved Hide resolved
Comment on lines 1554 to 1580
"""
We expect a JSON array of:
{
"uid": int,
"ip": str,
"ip_type": int,
"port": int,
"stake": str(int),
"rank": str(int),
"emission": str(int),
"incentive": str(int),
"consensus": str(int),
"trust": str(int),
"dividends": str(int),
"modality": int,
"last_update": str(int),
"version": int,
"priority": str(int),
"last_update": int,
"weights": [
[int, int],
],
"bonds": [
[int, str(int)],
],
}
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is helpful given that you are including an external binary.

What about moving this specification to a document in the bin file and translate this comment also to a set of unit tests?

Copy link
Contributor

@eduardogr eduardogr Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both documents; the specification and the tests could be referenced here with links

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding unit tests for the binary file is good to:

  • localize files and ensure that you have all of them in the repo
  • write a specification

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, @camfairchild this should be moved to a proper documentation as well. Nicely done!

@eduardogr
Copy link
Contributor

The binaries are from https://github.com/opentensor/subtensor-node-api Not sure where to write docs about this though

If we are getting binaries from there I would suggest to comment it somewhere so we can wire the components and generate the knowledge

@camfairchild very nice job!!

@camfairchild
Copy link
Collaborator Author

The binaries are from https://github.com/opentensor/subtensor-node-api Not sure where to write docs about this though

If we are getting binaries from there I would suggest to comment it somewhere so we can wire the components and generate the knowledge

@camfairchild very nice job!!

Good idea. I'll add it in the setup.py file. Perhaps though we should maintain a docs website in future.

@shibshib
Copy link
Contributor

Perhaps I misunderstood your graphs, but looks like the ifs cache is still a little faster despite being centralized, is that right?

Also, you have conflicts ;)

@camfairchild
Copy link
Collaborator Author

Perhaps I misunderstood your graphs, but looks like the ifs cache is still a little faster despite being centralized, is that right?

Also, you have conflicts ;)

Nope, you're right. The IPFS is faster, but only when it's up ;)

@shibshib
Copy link
Contributor

Perhaps I misunderstood your graphs, but looks like the ifs cache is still a little faster despite being centralized, is that right?
Also, you have conflicts ;)

Nope, you're right. The IPFS is faster, but only when it's up ;)

ouch. 🗡️

@shibshib
Copy link
Contributor

@camfairchild why is this still marked do not merge? any blockers?

@camfairchild
Copy link
Collaborator Author

Going to refactor into an external pypi package so the binary can be distributed easier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants