# Technical note: Equality checking, hashing, pickling

This document lays out some of the details of equality checking in Python, its application in hashing, and the related issue of pickling/unpickling (where unpickling should produce an object that's equal to the original).

The goal is to explain, outline, and decide the decisions Myokit's design makes in this regard.

## Equality (`==`) vs `is`

`==` checks if two objects "should be considered equal", e.g. `myokit.Number(1) == myokit.Number(1)` should return `True`.

`is` checks if two variables point to the same object, e.g. `x is x` should return `True`, as should `y = x; x is y`. 

### `id()` and `is`

- The line `x is y` is equivalent to `id(x) == id(y)`. 
- An object's id is unique **during its lifetime**. In the standard implementation, the value returned by id(x) is `&x`, the memory address of x.

This means that:
- If you store an objects id **but not the object**, and then want to check if you're seeing the same object again, you can't use `id(x) == stored_id`.
- **However**, in most cases you would simply store the object. As long as you have a reference to the object, its id will stay in use, and so `x is stored_x` will always return the correct answer.

### Literals

A line like `Number(1) is Number(1)` creates two new objects, each with their own id, and so returns False.
It gets a bit more complicated for literals, as Python tends to cache them: a line `1 is 1` retrieves two references to the same object (I think -- there might be further tricky details to make things fast) and so can return True.

This seems to hold for low ints and floats, but not for strings and very long integers:
```
>>> id(1)
140111174779120
>>> id(1)
140111174779120
>>> id(1)
140111174779120

>>> id(-1.234e-5)
140111173736976
>>> id(-1.234e-5)
140111173736976
>>> id(-1.234e-5)
140111173736976
```

```
>>> id('Hello')
140111173489136
>>> id('Hello')
140111173489328
>>> id('Hello')
140111173489136
```
```
>>> id(12345678901234567890)
140111173472384
>>> id(12345678901234567890)
140111173473296
>>> id(12345678901234567890)
140111173472288
```

### Equality in user-defined classes

The `==` operator on objects calls the class's `__eq__` method.

The default `__eq__` method returns `True if self is other else NotImplemented`:

In [1]:
class A:
    pass

x = A()
y = A()
print(x.__eq__(x))
print(x.__eq__(y))

True
NotImplemented


But `==` converts this to `False`:

In [2]:
print(x == x)
print(x == y)

True
False


There is also a `__ne__` method that's used by `!=`. The default here inverses the result of `__eq__` (again dealing with `NotImplemented`):

In [3]:
print(x.__ne__(x))
print(x.__ne__(y))

False
NotImplemented


In [4]:
print(x == x)
print(x == y)

True
False


There is usually no need to override `__ne__`.

## The `==` operator in Myokit
The following objects define an `__eq__` method in Myokit:

- Unit
- Quantity
- Expression (Defined in `Name` and in the base class `Expression`)
- Equation
- ~Model~
- Protocol

### Units
Two units are considered equal if their internal representation (7 exponents and 1 multiplier) are equal, as judged by `myokit.float.eq`.

As a simple example, we create two numbers that are within 1 operation's floating point error from each other. Python shows these are not equal. But `myokit.float.eq` thinks they are:

In [5]:
import myokit

x = 1
y = (1 / 1.234) * 1.234
print(x)
print(y)
print(x == y)
print(myokit.float.eq(x, y))



1
0.9999999999999999
False
True


If we create units with these numbers, we can see Myokit (1) using float.eq to judge equality, (2) using `float.cround` (which is even more tolerant) for display:

In [6]:
a = myokit.units.m * x
b = myokit.units.m * y
print(a)
print(b)
print(a == b)

[m]
[m]
False


We can go further still, and see how "closeness" is not applied in equality checking (but is used in displaying units):

In [7]:
x = 1
y = (1 / 1.234)**2 * 1.234 * 1.234
print(x)
print(y)
print(x == y)
print(myokit.float.eq(x, y))
print(myokit.float.close(x, y))

1
0.9999999999999998
False
False
True


In [8]:
a = myokit.units.m * x
b = myokit.units.m * y
print(str(a))
print(str(b))
print(repr(a))
print(repr(b))
print(a == b)
print(myokit.Unit.close(a, b))


[m]
[m]
[m]
[m (0.9999999999999998)]
False
True


For more on float equality, see the previous notebook.

**Note** Units currently have a global "preferred representation" property. This might become a per-object one in [#783](https://github.com/myokit/myokit/issues/783), in which case we'd need to decide if two units with a different preferred representation are equal (I'd say yes).


### Quantities

Two `myokit.Quantity` objects are considered equal if their value is equal (according to Python's `==` operator for floats) and if their units are equal (again with `==`, so using `myokit.Unit.__eq__`).

**Note**: Quantities might be merged with myokit.Number in [#798](https://github.com/myokit/myokit/issues/798).


### Expressions

For most expressions there is an obvious and desirable `==` implementation, e.g.:

`myokit.Number(1) == myokit.Number(1)`,

and

`myokit.Plus(myokit.Number(1), myokit.Number(2)) == myokit.Plus(myokit.Number(1), myokit.Number(2))`.

#### Names refer to variables

The tricky part is _names_, which refer to a `myokit.Variable` object.

(Another detail is that, for debugging purposes (and just for messing around), Name expressions' value can be any other type of object. This lets you re-use Myokit's expressions system for other things. It's not an official part of the API though...)

The `myokit.Variable` class does not implement an `__eq__` operator. So if you load a model twice, a variable in the first model won't equal the "same" variable in the second:

In [9]:
import myokit

m1 = myokit.load_model('example')
m2 = myokit.load_model('example')
print(m1.code() == m2.code())

True


In [10]:
i1 = m1.get('ina.INa')
i2 = m2.get('ina.INa')
print(i1 == i2)

False


even though the variables have the same code:

In [11]:
print(i1.code() == i2.code())

True


**Should** variables implement an equality check so that `i1 == i2` in the above? It sounds reasonable, but there are several issues:

- The main part of a variable is its defining equation. But checking if equations are equal would mean checking that the expressions are equal, which would involve checking the equality of several other variables, etc. This would make `==` an expensive - and potentially circular - operation.
- Should two variables be equal if their parent's are not?

**Decision**: Variables will not implement the `==` operator. 

#### Expressions are immutable

Making expressions immutable has a big advantage: it means we can cache the result of expensive tree operations.

This meshes well with the idea of matching variables in names with `is`: the id of an object is immutable in its lifetime, and since a Name stores a reference to the object as well, that means the id of a name's value is immutable while the expression is alive.

Note that variables themselves are not immutable: a reference to a variable called `a.b` can become a reference to `c.d` if the variable `a.b` is renamed. But because it is still the same variable object that is pointed to the reference itself has not changed. This does mean that the output from `.code()` cannot be cached.

#### Avoiding repeated tree recursion

To avoid repeated tree recursion when checking if expression equality, we can create a string reprensentation of the expression on the first call, and re-use it in subsequent calls.
This is implemented in the method `Expression._polish` (and more specifically the various implementations of helper method `Expression._polishb`).

### Equations

Two equations are equal if their LHS and RHS expressions are equal with `==`.

### Models

#### `__eq__` was added, but removed again

An equality operator was added to `Model` in [#548](https://github.com/myokit/myokit/pull/548), and used to check that models could be pickled and unpickled.

In this implementation, models are considered equal if they are the same object, or if

- they have the same set of reserved unames (which are strings, so immutable and easy to compare), and
- they have the same set of reserved uname prefixes (strings again), and
- the output of their `code()` methods is the same. 

There are some pros and cons:

- **pro** If you load the same model twice, the models are equal
- **pro** Once you modify a model, it's no longer equal
- **con** The unames are not something many/most users will remember about (pro: but that perhaps means they won't use them?)
- **con** Because components and variables don't have a custom `__eq__` (see above), two models that are "considered equal" will consists of components and variables that are *not* considered equal. Similarly, the expressions in one model won't equal the expressions in another.

Because this situation doesn't make sense, the `__eq__` implementation for models was removed again in [#849](https://github.com/myokit/myokit/pull/849).

### Protocols

Two protocols are considered equal if they contain the same events. This is checked by comparing their `.code()` output (which is in a canonicalised form).

## Hashing and equality

Sets and dicts in Python are based on hash maps. To make objects useable as keys in a dict or items in a set, they need to implement a *hash function* that returns an **almost unique integer**. Look-ups in a set or dict start with a quick hash-based jump, followed by a "proper" check using `==`. As a result, the `__hash__` and `__eq__` methods of user-defined classes have [some restrictions](https://docs.python.org/3/reference/datamodel.html#object.__hash__):

**Default implementations use `is`**.
By default, `x.__eq__(y)` returns `x == y`, and `hash(x)` returns "an appropriate value such that x == y implies both that x is y and hash(x) == hash(y)". So if you leave hash and eq alone, your objects will be hashable, but with an "is" condition. In other words, `myokit.Number(1) in {myokit.Number(1)}` will return False because `id(myokit.Number(1)) != id(myokit.Number(1))`.

**Overriding eq removes default hash**.
If you override `__eq__` but not `__hash__`, Python will automatically set `YourClass.__hash__ = None`, rendering your object unhashable.

**Overriding hash? Then do eq too**.
If you override `__hash__`, you **must** also provide an appropriate `__eq__` function.

**Hashes must be immutable**.
The value returned by an object's `__hash__` must stay the same during its lifetime. So in general you should only implement `__hash__` for immutable objects.

## Hashing

The following Myokit classes override `__hash__`:

- Unit
- Quantity
- Expression (the base class)
- Equation

### Units

Units are immutable.

Units return a hash made from the unit's string representation -- without lookup of a preferred global notation (which itself would involve a call to hash, leading to cycles). Rounding is used, making these hashes only unique-ish.

### Quantities

Quantities are immutable. They use the hash of their string representation (which is set at construction time).

### Expressions

Expression are immutable. Expressions use the hash of `Expression._polish()`, which is an immutable string that uses object ids instead of variable names (see above).

### Equations

Equations are immutable. An equation's hash is made up of the hashes of its LHS and RHS expressions.

## Pickling

The following objects in Myokit have extra functions related to pickling & unpickling:

- Model
- Protocol
- Simulation & LegacySimulation
- ~Expressions~

### Models

Use `__reduce__` and `__setstate__` to (1) store the model code (including whatever the current state vector is) and then create a new model with `myokit.parse_model` and (2) store and restore the list of unames and uname prefixes (sets of strings, so easily pickled).

### Protocols

Use `__reduce__` to store protocol code and restore it with `myokit.parse_protocol`.

### Simulation and LegacySimulation

Use `__reduce__` and `__setstate__` to (1) store the arguments needed to create a new sim, by calling its constructor (and recompiling), and (2) storing and re-setting the simulation state. Step 1 involves pickling a model and a protocol.

## ~Expressions~

Unpickling an expression containing names can not be done without the context of a model. To make this clear, the `__reduce__` method has been overridden to produce an exception that suggests the following strategy instead:

In [12]:
import myokit
import pickle

# Load a model
model = myokit.load_model('example')

# Get an expression containing Names
expression = model.get('ina.INa').rhs()
print(expression, type(expression))

# To serialise, get a string representation of the expression (no longer linked to a model)
string = expression.code()

pickled_string = pickle.dumps(string)
unpickled_string = pickle.loads(pickled_string)

print(unpickled_string, type(unpickled_string))

# After unpickling, create a new expression from the string (using the model as "context")
new_expression = myokit.parse_expression(unpickled_string, context=model)
print(new_expression, type(new_expression))

ina.gNa * ina.m^3 * ina.h * ina.j * (membrane.V - ina.ENa) <class 'myokit._expressions.Multiply'>
ina.gNa * ina.m^3 * ina.h * ina.j * (membrane.V - ina.ENa) <class 'str'>
ina.gNa * ina.m^3 * ina.h * ina.j * (membrane.V - ina.ENa) <class 'myokit._expressions.Multiply'>


This is obviously a bit much for an error message:

In [13]:
try:
    pickle.dumps(expression)
except Exception as e:
    print(e)

Individual myokit Expressions can not be pickled. Please try e.g. pickling a full model, or pickling the output of `Expression.code()` and following unpickling with a call to `myokit.parse_expression(unpickled_code, context=a_model)` to recreate the Expression.


so this will have to do.