### Item 44: Make pickle Reliable with copyreg

* The `pickle` built-in module can serialize Python objects into a stream of bytes and deserialize bytes back into objects.
* The purpose of `pickle` is to let you pass Python objects between programs that you control over binary channels.

* Note:
    * The `pickle` module's serialization format is unsafe by design.
    * In contrast, the `json` module is safe by design.
        * Formats like `JSON` should be used for communicatin between programs or people that don't trust each other.

In [None]:
import copyreg
import pickle

#### Example

* Use a Python object to represent the state of a player's progress in a game.

In [None]:
class GameState(object):
    def __init__(self):
        self.level = 0
        self.lives = 4

* The program modifies this object as the game runs.

In [None]:
state = GameState()
state.level += 1  # player beat a level
state.lives -= 1  # player had to try again

* Dump the GameState object directly to a file.

In [None]:
state_path = "game_state.bin"
with open(state_path, "wb") as f:
    pickle.dump(state, f)

In [None]:
ls

* Later, I can load the file and get back the GameState object as if it had never been serialized.

In [None]:
with open(state_path, "rb") as f:
    state_after = pickle.load(f)

print(state_after.__dict__)

#### Problem

* The problem with this approach is what happens as the game's features expand over time.

In [None]:
class GameState(object):
    def __init__(self):
        self.level = 0
        self.lives = 4
        self.points = 0

* Serializing the new version of the GameState class using `pickle` will work exactly as before.

In [None]:
state = GameState()
serialized = pickle.dumps(state)
state_after = pickle.loads(serialized)

print(state_after.__dict__)

* What happens to older saved GameState objects?
    * The points attribute is missing!
    * This is confusing because the returned object is an instance of the new GameState class.

In [None]:
with open(state_path, "rb") as f:
    state_after = pickle.load(f)

print(state_after.__dict__)

In [None]:
assert isinstance(state_after, GameState)

* This behavior is a byproduct of the way the `pickle` module works.
* Its primary use case is making it easy to serialize objects.

#### Fix

* Use the `copyreg` built-in module.
* The `copyreg` module lets you register the functions responsible for serializing Python objects, allowing you to control the behavior of `pickle` and make it more reliable.

#### Default Attribute Values

* Use a constructor with default arguments to ensure that GameState objects will always have all attributes after unpickling.
    * See `Item 19`: Provide Optional Behavior with Keyword Arguments

In [None]:
# use default args
class GameState(object):
    def __init__(self, level=0, lives=4, points=0):
        self.level = level
        self.lives = lives
        self.points = points

* Define pickle_game_state helper.
    * Takes a GameState object and turn it into a tuple of parameters for the `copyreg` module.
    * The returned tuple contains the function to use for unpickling and the parameters to pass to the unpickling function.

In [None]:
def pickle_game_state(game_state):
    kwargs = game_state.__dict__
    return unpickle_game_state, (kwargs,)

* Define the unpickle_game_state helper
    * This fuction takes serialized data and parameters from pickle_game_state and returns the corresponding GameState object.
    * It's a tiny wrapper around the constructor.

In [None]:
def unpickle_game_state(kwargs):
    return GameState(**kwargs)

* Register these with the `copyreg` built-in module.

In [None]:
copyreg.pickle(GameState, pickle_game_state)

* Serializing and deserializing works as before.

In [None]:
state = GameState()
state.points += 1000
serialized = pickle.dumps(state)
state_after = pickle.loads(serialized)

print(state_after.__dict__)

* Add magic spells to use

In [None]:
class GameState(object):
    def __init__(self, level=0, lives=4, points=0, magic=5):
        self.level = level
        self.lives = lives
        self.points = points
        self.magic = magic

In [None]:
state_after = pickle.loads(serialized)

print(state_after.__dict__)

#### Versioning Classes

* Problem

    * Make a backwards-incompatible changes by removing fields.
    * This breaks deserializing old game data.

In [None]:
class GameState(object):
    def __init__(self, level=0, points=0, magic=5):
        self.level = level
        self.points = points
        self.magic = magic

In [None]:
pickle.loads(serialized)

* Solution
    * Add a version parameter to the fuction supplied to `copyreg`.

In [None]:
# add version
def pickle_game_state(game_state):
    kwargs = game_state.__dict__
    kwargs["version"] = 2
    return unpickle_game_state, (kwargs,)

* Manipulate the arguments passed to the GameState constructor accordingly.
* Any logic you need to adapt an old version of the class to a new version of the class can go in the unpickle_game_state function.

In [None]:
def unpickle_game_state(kwargs):
    version = kwargs.pop("version", 1)
    if version == 1:
        kwargs.pop("lives")
    return GameState(**kwargs)

In [None]:
copyreg.pickle(GameState, pickle_game_state)
state_after = pickle.loads(serialized)

print(state_after.__dict__)

#### Stable Import Paths

* Renaming classes and moving them to other modules will break the pickle module.

In [None]:
# rename the GameState
class BetterGameState(object):
     def __init__(self, level=0, points=0, magic=5):
        self.level = level
        self.points = points
        self.magic = magic

In [None]:
pickle.loads(serialized)

In [None]:
print(serialized[:25])

* Solution
    * Use `copyreg` module.
    * You can specify a stable identifier for the function to use for unpickling an object.
    * It gives you a level of indirection

In [None]:
copyreg.pickle(BetterGameState, pickle_game_state)

* After using `copyreg`, you can see that the import path to unpickle_game_state is encoded in the serialized data instead of BetterGameState.

In [None]:
state = BetterGameState()
serialized = pickle.dumps(state)

print(serialized[:35])

* Gotcha
    * You can't change the path of the module in which the unpickle_game_state function is present.
    * Once you serialize data with a function, it must remain available on that import path for deserializing in the future.

### Things to Remember

* The `pickle` built-in module is only useful for serializing and deserializing objects between trusted programs.
* The `pickle` module may break down when used for more than trivial use cases.
* Use the `copyreg` built-in module with `pickle` to add missing attribute values, allow versioning of classes, and provide stable paths.