# Y-py 101

Primer for Y-py before moving on to examples of using Y-Pydantic bindings and simulated providers.

Think about the Y ecosystem as having three core concepts:
 - The Y CRDT and `shared data types`, implemented in yjs (JS) and yrs (Rust), ported to ypy (Python) via maturin
 - `bindings` that connect a CRDT with something you can see or interact with, such as a *codemirror* element in an HTML document
 - `providers` that synchronize different clients over wire protocols (webrtc, websockets, etc) using a small messaging protocol and then exchanging state vectors and diffs (aka deltas or updates).


In [1]:
# for applying Black to code in Notebook cells
%load_ext nb_black

<IPython.core.display.Javascript object>

## YDoc

In [2]:
import y_py as Y

# In most cases, each client will have a single YDoc with
# some data types attached to that, then sync the linear history
# of that single YDoc between other clients
doc = Y.YDoc()
doc

<YDoc at 0x7f3b5e01bc90>

<IPython.core.display.Javascript object>

### State vectors

A state vector is one tool used to sync clients, it represents a point in linear history and can be used by another client to calculate the differences between what updates your client needs to apply to catch up with the remote client.

Note: you can't go from state_vector to new YDoc, more on that later.


In [3]:
# Since our YDoc has no changes, we should expect this to be b'\x00'
with doc.begin_transaction() as txn:
    sv = txn.state_vector_v1()
sv

b'\x00'

<IPython.core.display.Javascript object>

In [4]:
# Alternative syntax to above
doc.begin_transaction().state_vector_v1()

b'\x00'

<IPython.core.display.Javascript object>

In [5]:
# More alternative syntax
Y.encode_state_vector(doc)

b'\x00'

<IPython.core.display.Javascript object>

### Diffs

Diffs (aka deltas, updates) are changes to the linear history of the CRDT. They are what two clients actually use to sync. A "full history" can be calculated, or you can calculate a smaller diff between your current state and an earlier point in history (defined by a state vector).

In [6]:
# Since we have no changes, we should expect this to be b'\x00\x00'
with doc.begin_transaction() as txn:
    diff = txn.diff_v1()
diff

b'\x00\x00'

<IPython.core.display.Javascript object>

In [7]:
# alternative syntax
doc.begin_transaction().diff_v1()

b'\x00\x00'

<IPython.core.display.Javascript object>

In [8]:
# more alternative syntax
Y.encode_state_as_update(doc)

b'\x00\x00'

<IPython.core.display.Javascript object>

## Text

So far our YDoc has no attached shared types. Let's add a YText to it and see how that changes the state vectors and diffs.

In [9]:
ytext = doc.get_text("text-key")
ytext

YText()

<IPython.core.display.Javascript object>

In [10]:
str(ytext)

''

<IPython.core.display.Javascript object>

In [11]:
# Have we added anything to linear history yet by making
# a reference to the 'text-key'? No.
doc.begin_transaction().diff_v1()

b'\x00\x00'

<IPython.core.display.Javascript object>

In [12]:
# Now write something to that YText data type
with doc.begin_transaction() as txn:
    ytext.extend(txn, "foo")

ytext

YText(foo)

<IPython.core.display.Javascript object>

In [13]:
str(ytext)

'foo'

<IPython.core.display.Javascript object>

In [14]:
doc.begin_transaction().diff_v1()

b'\x01\x01\x99\x8a\x8b\x8f\x0b\x00\x04\x01\x08text-key\x03foo\x00'

<IPython.core.display.Javascript object>

In [15]:
doc.begin_transaction().state_vector_v1()

b'\x01\x99\x8a\x8b\x8f\x0b\x03'

<IPython.core.display.Javascript object>

## Syncing clients

Now that we have a YDoc with some attached data, let's see how to sync another client. In other words, create another YDoc and apply the linear history from our first document to the new one.

In a real application, this syncing is handled by `providers` like `y-websocket` or `y-webrtc` and includes some extra messaging protocol (https://github.com/yjs/y-protocols/blob/master/PROTOCOL.md) besides passing state vectors and diffs. But this demonstration should give you the right foundation for how clients sync.

In [16]:
new_doc = Y.YDoc()
new_doc

<YDoc at 0x7f3b38de5510>

<IPython.core.display.Javascript object>

In [17]:
str(new_doc.get_text("text-key"))

''

<IPython.core.display.Javascript object>

In [18]:
# We expect a brand new doc to have a state vector of b'\x00'
new_doc.begin_transaction().state_vector_v1()

b'\x00'

<IPython.core.display.Javascript object>

In [19]:
# We expect a brand new doc to have a diff of b'\x00\x00'
new_doc.begin_transaction().diff_v1()

b'\x00\x00'

<IPython.core.display.Javascript object>

In [20]:
# Apply the diff from our first client to this new one
# In provider/protocol docs, this could read as:
# - local (new) client sends Sync Step 1 (its own state vector)
sv = new_doc.begin_transaction().state_vector_v1()
# - remote client sends Sync Step 2 (the diff/delta/update to apply to catch up)
diff = doc.begin_transaction().diff_v1(sv)
# - local (new) client catches up, and is now okay to make changes and
# send out updates that other clients could apply
new_doc.begin_transaction().apply_v1(diff)
new_doc

<YDoc at 0x7f3b38de5510>

<IPython.core.display.Javascript object>

In [21]:
str(new_doc.get_text("text-key"))

'foo'

<IPython.core.display.Javascript object>

In [22]:
new_doc.begin_transaction().state_vector_v1()

b'\x01\x99\x8a\x8b\x8f\x0b\x03'

<IPython.core.display.Javascript object>

## Observing changes

Y-CRDT allows us to attach observers to the shared data types though, which offer friendlier updates. We can also attach an observer to the top level YDoc but it doesn't yield as useful information.

In [23]:
third_doc = Y.YDoc()


def obs_doc(event: Y.AfterTransactionEvent):
    print("called after a transaction (anything changed with the Doc)")
    print(f"Before state: {event.before_state}")
    print(f"After state: {event.after_state}")
    print(f"Diff: {event.get_update()}")
    print("")


third_doc.observe_after_transaction(obs_doc)


def obs_text(event: Y.YTextEvent):
    print("called when a YText data type has changed")
    print(event)
    print("")


third_doc.get_text("text-key").observe(obs_text)

third_doc

called after a transaction (anything changed with the Doc)
Before state: b'\x00'
After state: b'\x00'
Diff: b'\x00\x00'



<YDoc at 0x7f3b38dff4b0>

<IPython.core.display.Javascript object>

In [24]:
diff = doc.begin_transaction().diff_v1()
third_doc.begin_transaction().apply_v1(diff)
new_doc

called when a YText data type has changed
YTextEvent(target=foo, delta=[{'insert': 'foo'}], path=[])

called after a transaction (anything changed with the Doc)
Before state: b'\x00'
After state: b'\x01\x99\x8a\x8b\x8f\x0b\x03'
Diff: b'\x01\x01\x99\x8a\x8b\x8f\x0b\x00\x04\x01\x08text-key\x03foo\x00'



<YDoc at 0x7f3b38de5510>

<IPython.core.display.Javascript object>

In [25]:
# Make a few other changes to the text doc
# Note how multiple changes can be made in one transaction
with third_doc.begin_transaction() as txn:
    ytext = txn.get_text("text-key")
    # write bar at the start of the string, so should be 'barfoo'
    ytext.insert(txn, 0, "bar")

ytext

called when a YText data type has changed
YTextEvent(target=barfoo, delta=[{'insert': 'bar'}], path=[])

called after a transaction (anything changed with the Doc)
Before state: b'\x01\x99\x8a\x8b\x8f\x0b\x03'
After state: b'\x02\x99\x8a\x8b\x8f\x0b\x03\x9a\xbc\xda\xb9\x04\x03'
Diff: b'\x01\x01\x9a\xbc\xda\xb9\x04\x00D\x99\x8a\x8b\x8f\x0b\x00\x03bar\x00'



YText(barfoo)

<IPython.core.display.Javascript object>

In [26]:
with third_doc.begin_transaction() as txn:
    # delete 3 characters starting at index 1, should be 'boo'
    ytext.delete_range(txn, index=1, length=3)

ytext

called when a YText data type has changed
YTextEvent(target=boo, delta=[{'retain': 1}, {'delete': 3}], path=[])

called after a transaction (anything changed with the Doc)
Before state: b'\x02\x99\x8a\x8b\x8f\x0b\x03\x9a\xbc\xda\xb9\x04\x03'
After state: b'\x02\x99\x8a\x8b\x8f\x0b\x03\x9a\xbc\xda\xb9\x04\x03'
Diff: b'\x00\x02\x99\x8a\x8b\x8f\x0b\x01\x00\x01\x9a\xbc\xda\xb9\x04\x01\x01\x02'



YText(boo)

<IPython.core.display.Javascript object>

In [27]:
# Note we can make multiple changes in one txn and
# the observed Event will only show the computed changes
with third_doc.begin_transaction() as txn:
    # add '123' so we have 'boo123'
    ytext.extend(txn, "123")
    # delete two characters starting at index 3 so we have boo3
    ytext.delete_range(txn, 3, 2)
ytext

called when a YText data type has changed
YTextEvent(target=boo3, delta=[{'retain': 3}, {'insert': '3'}], path=[])

called after a transaction (anything changed with the Doc)
Before state: b'\x02\x99\x8a\x8b\x8f\x0b\x03\x9a\xbc\xda\xb9\x04\x03'
After state: b'\x02\x99\x8a\x8b\x8f\x0b\x03\x9a\xbc\xda\xb9\x04\x06'
Diff: b'\x01\x02\x9a\xbc\xda\xb9\x04\x03\x81\x99\x8a\x8b\x8f\x0b\x02\x02\x84\x9a\xbc\xda\xb9\x04\x04\x013\x01\x9a\xbc\xda\xb9\x04\x01\x03\x02'



YText(boo3)

<IPython.core.display.Javascript object>

## Another Sync

Now that third_doc has several changes in its history, let's see what it looks like to sync our original doc with the changes (linear history).

In [28]:
# First let's see the difference between a "full update" for third_doc
# and an update calculated against the state vector of doc
full_update = third_doc.begin_transaction().diff_v1()
full_update

called after a transaction (anything changed with the Doc)
Before state: b'\x02\x99\x8a\x8b\x8f\x0b\x03\x9a\xbc\xda\xb9\x04\x06'
After state: b'\x02\x99\x8a\x8b\x8f\x0b\x03\x9a\xbc\xda\xb9\x04\x06'
Diff: b'\x00\x00'



b'\x02\x02\x99\x8a\x8b\x8f\x0b\x00\x01\x01\x08text-key\x01\x84\x99\x8a\x8b\x8f\x0b\x00\x02oo\x04\x9a\xbc\xda\xb9\x04\x00D\x99\x8a\x8b\x8f\x0b\x00\x01b\xc1\x9a\xbc\xda\xb9\x04\x00\x99\x8a\x8b\x8f\x0b\x00\x02\x81\x99\x8a\x8b\x8f\x0b\x02\x02\x84\x9a\xbc\xda\xb9\x04\x04\x013\x02\x99\x8a\x8b\x8f\x0b\x01\x00\x01\x9a\xbc\xda\xb9\x04\x01\x01\x04'

<IPython.core.display.Javascript object>

In [29]:
sv = doc.begin_transaction().state_vector_v1()
computed_update = third_doc.begin_transaction().diff_v1(sv)
computed_update

called after a transaction (anything changed with the Doc)
Before state: b'\x02\x99\x8a\x8b\x8f\x0b\x03\x9a\xbc\xda\xb9\x04\x06'
After state: b'\x02\x99\x8a\x8b\x8f\x0b\x03\x9a\xbc\xda\xb9\x04\x06'
Diff: b'\x00\x00'



b'\x01\x04\x9a\xbc\xda\xb9\x04\x00D\x99\x8a\x8b\x8f\x0b\x00\x01b\xc1\x9a\xbc\xda\xb9\x04\x00\x99\x8a\x8b\x8f\x0b\x00\x02\x81\x99\x8a\x8b\x8f\x0b\x02\x02\x84\x9a\xbc\xda\xb9\x04\x04\x013\x02\x99\x8a\x8b\x8f\x0b\x01\x00\x01\x9a\xbc\xda\xb9\x04\x01\x01\x04'

<IPython.core.display.Javascript object>

In [30]:
# Let's observe the YText change when applying the computed update
# to the original doc. It had text 'foo' to start and should end
# up with 'boo3' after the updates are applied.
doc.get_text("text-key").observe(obs_text)
doc.begin_transaction().apply_v1(computed_update)
doc.get_text("text-key")

called when a YText data type has changed
YTextEvent(target=boo3, delta=[{'insert': 'b'}, {'delete': 1}, {'retain': 2}, {'insert': '3'}], path=[])



YText(boo3)

<IPython.core.display.Javascript object>

In [31]:
# as a final test, would it matter if we applied the full update
# or the computed update? No, Y-CRDT idempotency means we
# can apply the same updates repeatedly and it doesn't break the sync
new_doc.get_text("text-key").observe(obs_text)
new_doc.get_text("text-key")

YText(foo)

<IPython.core.display.Javascript object>

In [32]:
new_doc.begin_transaction().apply_v1(full_update)
new_doc.get_text("text-key")

called when a YText data type has changed
YTextEvent(target=boo3, delta=[{'insert': 'b'}, {'delete': 1}, {'retain': 2}, {'insert': '3'}], path=[])



YText(boo3)

<IPython.core.display.Javascript object>

In [33]:
new_doc.begin_transaction().apply_v1(computed_update)
new_doc.get_text("text-key")

YText(boo3)

<IPython.core.display.Javascript object>