wip - catchup to last known block #31

yusefnapora · 2016-08-04T18:51:37Z

This updates the SimpleClient and receive_blockchain_into_indexer functions to use the new canonical_stream output format from mediachain/oldchain-client#88

I added a cli flag so I could test catching up to a known block (--last-known-block=QmF00...), but plan to remove that soon. It seems like we should be writing out the last known block ref somewhere so we can read it in after a restart.

@autoencoder, where do you think that value should get stored? Could just spit it out to a file somewhere...

parkan · 2016-08-04T19:23:26Z

Does indexer have a rockdb so far?

parkan · 2016-08-04T19:32:48Z

This needs corresponding requirements.txt change

yusefnapora · 2016-08-04T19:34:30Z

the testnet indexer has rocksdb installed, yes.

parkan · 2016-08-04T19:39:06Z

should probably put it in there, then?

parkan · 2016-08-04T19:41:45Z

mediachain/indexer/mc_simpleclient.py

                                started = True
                            else:
                                continue

-                        yield ref, obj
+                        yield obj_info


presumably these returns are different now, are the callsites updated?

yeah, the get_artefacts and get_entities functions just pass through this return value unchanged, and only get_artefacts is used ATM. I updated receive_blockchain_into_indexer to unpack the dictionary instead of the tuple.

The only other place its used is in the debug tail_blockchain method, which just prints without unpacking anything

Even better - since this is a public API, it'd be nice to have some kind of versioning that's machine-interpretable, and each API caller would do an assert against the version number each time the program ran, prior to actually using the API.

E.g. major number increment indicates reverse-compatibility breaking changes, minor number increment indicates behind-the-scenes enhancements, and each API caller asserts that major number == last major number for which the person manually reviewed the list of breaking changes.

This would help us avoid subtle bugs that automated tests may not catch & help keep all devs keep in sync.

yusefnapora · 2016-08-04T19:42:13Z

yeah, I haven't so far since we need to coordinate with @autoencoder - it's only needed for the blockchain stream, and I didn't want to break any other deployments he's got :) But it would be cool to have it as a hard requirement, since the blockchain catchup will be memory hungry and transient without rocksdb installed.

parkan · 2016-08-04T19:50:18Z

Ok deferring to @autoencoder then, would also be ok to drop it into a ~/.mediachain or w/e

yusefnapora · 2016-08-04T19:51:11Z

yeah, the rocksdb blockchain cache gets stored in ~/.mediachain anyway, so we could just write the block ref to a file there.

parkan · 2016-08-04T20:38:14Z

I'm down to just drop a file then. Do we need any locking semantics? I guess not given that only 1 thread would write, though now that I think about it we probably need a pidfile/lock for the whole thing

yusefnapora · 2016-08-04T20:40:05Z

yeah, at the moment we shouldn't need locking, since we'll just read once before we start tailing the blockchain, and only write from one thread. but it might get more complicated as time goes on if we end up doing some kind of parallelization, etc

parkan · 2016-08-04T21:47:03Z

Yeah, we should probably drop a pidfile and refuse to run if another process is present for now

autoencoder · 2016-08-04T21:50:59Z

For the current block count, may end up sticking that counter (edit: probably more of a transaction ID) in ES, since the contents of ES would be the reason for the Indexer to be tracking this number.

Sounding good. Will take a closer look / merge in the AM.

yusefnapora · 2016-08-04T21:52:50Z

@autoencoder the thing is, we can't easily do a simple block height counter without changing the transactor RPC api to include that. At the moment we just have a ref to the block, which isn't ordered at all.

We could keep our own count on the client, but that would get tricky with the partial catchup, etc. Adding the block height or some kind of sequence number to the API probably makes sense

yusefnapora · 2016-08-04T21:54:07Z

actually, I guess we could pull the index of the first entry in the block, and use that as a sequence number... will think about that some more

autoencoder · 2016-08-04T21:59:27Z

Ideally it'd be an ID which could be later used to identify exactly what position of which fork the chain was on... and then if the API caller later tries to resume from a position that was on an abandoned fork, the client API would the replay the necessary inserts / updates / deletes into the Indexer to get it back in sync with the proper fork. Something like that.

Maybe not needed yet.

vyzo · 2016-08-05T08:01:24Z

I echo @parkan for having a lock/pid file for the local block cache.
Btw, is rocksdb safe for concurrent processes?

autoencoder · 2016-08-05T11:51:15Z

Ok, took a look. Looks good. Noting some of the parts still WIP:

Persistent state recording strategy. BTW, why not just record this in RocksDB? Looks like we'll have at least 2 "current block position" IDs recorded. One for the Client cache, and one or more for Indexer instances.
Multi-process parallelism - Am I right that the proposed strategy is: the Client reader does single-process reconciliation / downloading / caching of the blockchain, and then multiple reader processes will be able to access that cache? @yusef looks like RocksDB does support multiple reader processes, but you have to explicitly open the other readers in read-only mode: Does RocksDB support multi-process read access? facebook/rocksdb#908
We should do a refreshed, top-down look at this API as a whole, from the API user's perspective in typical use-cases. Perhaps next week.

yusefnapora · 2016-08-05T14:30:19Z

yeah, it probably does make more sense for us to track the current block in rocksdb in the client code. I can set that up and see about exposing the index so we can use it as our "block height".

We could do multi-process catchup by opening the block cache in read-only mode; that's a good idea. It would need a little bit of extension to the current BlockCache api. Right now the BlockCache is just a read-through cache that doesn't keep track of block ordering, etc. So there's no way to "seek" to a particular block; instead we're always walking back through the chain from the current block. But I think if we track the block index numbers we should be able to spawn multiple processes that each take a range of blocks.

We'll also need to track the particular blockchain that the blocks are part of; it would be nice if each chain had a unique id or genesis block ref or something that we could identify the chain with. Right now the block cache will store blocks from any chain, since it's just a K/V store. But if we want to keep track of the structure / sequence of blocks, we also need to differentiate between different chains and either store them in separate rocskdb instances or prefix the keys, etc.

update for new output format of canonical_stream

19ae564

bump mediachain-client dep

1b9cb66

parkan reviewed Aug 4, 2016
View reviewed changes

write block ref to ~/.mediachain/last-known-block

b3f9ca6

autoencoder merged commit 444f9fd into master Aug 5, 2016

yusefnapora deleted the yn-last-known-block branch August 8, 2016 18:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wip - catchup to last known block #31

wip - catchup to last known block #31

yusefnapora commented Aug 4, 2016

parkan commented Aug 4, 2016

parkan commented Aug 4, 2016

yusefnapora commented Aug 4, 2016

parkan commented Aug 4, 2016

parkan Aug 4, 2016

yusefnapora Aug 4, 2016

autoencoder Aug 5, 2016 •

edited

Loading

yusefnapora commented Aug 4, 2016

parkan commented Aug 4, 2016

yusefnapora commented Aug 4, 2016

parkan commented Aug 4, 2016

yusefnapora commented Aug 4, 2016

parkan commented Aug 4, 2016

autoencoder commented Aug 4, 2016 •

edited

Loading

yusefnapora commented Aug 4, 2016

yusefnapora commented Aug 4, 2016

autoencoder commented Aug 4, 2016 •

edited

Loading

vyzo commented Aug 5, 2016

autoencoder commented Aug 5, 2016

yusefnapora commented Aug 5, 2016

wip - catchup to last known block #31

wip - catchup to last known block #31

Conversation

yusefnapora commented Aug 4, 2016

parkan commented Aug 4, 2016

parkan commented Aug 4, 2016

yusefnapora commented Aug 4, 2016

parkan commented Aug 4, 2016

parkan Aug 4, 2016

Choose a reason for hiding this comment

yusefnapora Aug 4, 2016

Choose a reason for hiding this comment

autoencoder Aug 5, 2016 • edited Loading

Choose a reason for hiding this comment

yusefnapora commented Aug 4, 2016

parkan commented Aug 4, 2016

yusefnapora commented Aug 4, 2016

parkan commented Aug 4, 2016

yusefnapora commented Aug 4, 2016

parkan commented Aug 4, 2016

autoencoder commented Aug 4, 2016 • edited Loading

yusefnapora commented Aug 4, 2016

yusefnapora commented Aug 4, 2016

autoencoder commented Aug 4, 2016 • edited Loading

vyzo commented Aug 5, 2016

autoencoder commented Aug 5, 2016

yusefnapora commented Aug 5, 2016

autoencoder Aug 5, 2016 •

edited

Loading

autoencoder commented Aug 4, 2016 •

edited

Loading

autoencoder commented Aug 4, 2016 •

edited

Loading