In [5]:
from mongoengine import connect, disconnect
from mongoengine import Document
from mongoengine import StringField, IntField,BooleanField
from mongo_config import TEST_DB1, TEST_DB2, TEST_DB3, HOST, PORT, USERNAME, PASSWORD
connect(TEST_DB1, host=HOST, port=PORT, username=USERNAME, password=PASSWORD, authentication_source=TEST_DB1, alias=TEST_DB1)

MongoClient(host=['192.168.2.172:27017'], document_class=dict, tz_aware=False, connect=True, read_preference=Primary(), uuidrepresentation=3)

## 2.10. Documents migration(文档迁移)

The structure of your documents and their associated mongoengine schemas are likely to change over the lifetime of an application. This section provides guidance and recommendations on how to deal with migrations.

Due to the very flexible nature of mongodb, migrations of models aren’t trivial and for people that know about alembic for sqlalchemy, there is unfortunately no equivalent library that will manage the migration in an automatic fashion for mongoengine.

你的文档和mongoengine模式在应用的生命周期内可能会变化，本节用来指引如何处理这种变化。  
实际上，对于拥有灵活属性的mongodb来讲，模型的变化并不能对使用着带来困扰，使用关系型数据库的朋友会深有体会。  
但不幸的是，并没有专门的库协助mongoengine处理文档的迁移，迁移的过程需要通过手动处理，但并不复杂。

### 2.10.1. Example 1: Addition of a field(添加字段)

Let’s start by taking a simple example of a model change and review the different option you have to deal with the migration.

Let’s assume we start with the following schema and save an instance:

In [3]:
class User(Document):
    name = StringField()
    meta ={ "db_alias": TEST_DB1}

User(name="John Doe").save()

# print the objects as they exist in mongodb
print(User.objects().as_pymongo())    # [{u'_id': ObjectId('5d06b9c3d7c1f18db3e7c874'), u'name': u'John Doe'}]

[{'_id': ObjectId('640eaf1a2a9e0414ab6332b0'), 'name': 'John Doe'}]


On the next version of your application, let’s now assume that a new field enabled gets added to the existing `User` model with a `default=True`. Thus you simply update the `User` class to the following:

In [13]:
class User(Document):
    name = StringField(required=True)
    enabled = BooleanField(default=True)
    meta ={ "db_alias": TEST_DB1}

Without applying any migration, we now reload an object from the database into the User class and checks its enabled attribute:

In [16]:
# 不实施任何迁移措施的时候，新字段 enable 并没有出现在原有的文档中
assert User.objects.count() == 2
user = User.objects().first()
assert user.enabled is True
assert User.objects(enabled=True).count() == 0    # pass
assert User.objects(enabled=False).count() == 0   # pass

# this is consistent with what we have in the database
# in fact, 'enabled' does not exist
print(User.objects().as_pymongo().first())    # {u'_id': ObjectId('5d06b9c3d7c1f18db3e7c874'), u'name': u'John'}
assert User.objects(enabled=None).count() == 2  # pass
print(User.objects().as_pymongo())

{'_id': ObjectId('640eaf1a2a9e0414ab6332b0'), 'name': 'John Doe'}
[{'_id': ObjectId('640eaf1a2a9e0414ab6332b0'), 'name': 'John Doe'}, {'_id': ObjectId('640eaf622a9e0414ab6332b1'), 'name': '张三'}]


As you can see, even if the document wasn’t updated, mongoengine applies the default value seamlessly when it loads the pymongo dict into a `User` instance. At first sight it looks like you don’t need to migrate the existing documents when adding new fields but this actually leads to inconsistencies when it comes to querying.

In fact, when querying, mongoengine isn’t trying to account for the default value of the new field and so if you don’t actually migrate the existing documents, you are taking a risk that querying/updating will be missing relevant record.

更新/修改文档类后，即使多出字段，查询也不会出错。这样看起来并不需要执行特殊的迁移工作就可以正常工作。但实际上修改字段前后数据库中的内容不一致，会导致查询结果不准确。

When adding fields/modifying default values, you can use any of the following to do the migration as a standalone script:
当文档模型更改时，可以使用以下的独立脚本执行迁移工作:

In [19]:
# 将新增字段的默认值填充至修改前的数据中
User.objects().update(enabled = True)

2

### 2.10.2. Example 2: Inheritance change(继承文档的变更)

Let’s consider the following example:

In [26]:
class Human(Document):
    name = StringField()
    meta = {"allow_inheritance": True, "db_alias": TEST_DB1 }

class Jedi(Human):
    dark_side = BooleanField()
    light_saber_color = StringField()

#Jedi(name="Darth Vader", dark_side=True, light_saber_color="red").save()
#Jedi(name="Obi Wan Kenobi", dark_side=False, light_saber_color="blue").save()

assert Human.objects.count() == 2
assert Jedi.objects.count() == 2

# Let's check how these documents got stored in mongodb
print(Jedi.objects.as_pymongo())
# [
#   {'_id': ObjectId('5fac4aaaf61d7fb06046e0f9'), '_cls': 'Human.Jedi', 'name': 'Darth Vader', 'dark_side': True, 'light_saber_color': 'red'},
#   {'_id': ObjectId('5fac4ac4f61d7fb06046e0fa'), '_cls': 'Human.Jedi', 'name': 'Obi Wan Kenobi', 'dark_side': False, 'light_saber_color': 'blue'}
# ]

[{'_id': ObjectId('640eb3c32a9e0414ab6332b2'), '_cls': 'Human.Jedi', 'name': 'Darth Vader', 'dark_side': True, 'light_saber_color': 'red'}, {'_id': ObjectId('640eb3c32a9e0414ab6332b3'), '_cls': 'Human.Jedi', 'name': 'Obi Wan Kenobi', 'dark_side': False, 'light_saber_color': 'blue'}]


In [22]:
# 可以独立添加 父类文档，但没有其他属性；文档中也有 `_cls`字段
Human(name="张三").save()

<Human: Human object>

As you can observe, when you use inheritance, MongoEngine stores a field named ‘_cls’ behind the scene to keep track of the Document class.

Let’s now take the scenario that you want to refactor the inheritance schema and: - Have the Jedi’s with dark_side=True/False become GoodJedi’s/DarkSith - get rid of the ‘dark_side’ field

修改思路: 由于 light_saber_color 颜色和 dark_side 的布尔值对应，因此 计划分化出两个类，分别表示 dark_side的真假，并去掉 dark_side属性。

move to the following schemas:

In [30]:
class Human(Document):
    name = StringField()
    meta = {"allow_inheritance": True, "db_alias": TEST_DB1 }

# attribute 'dark_side' removed
class GoodJedi(Human):
    light_saber_color = StringField()

# new class
class BadSith(Human):
    light_saber_color = StringField()

MongoEngine doesn’t know about the change or how to map them with the existing data so if you don’t apply any migration, you will observe a strange behavior, as if the collection was suddenly empty.

修改后的文档模型，MongoEngine不会自动对应出数据库中不同的类。

In [31]:
# As a reminder, the documents that we inserted
# have the _cls field = 'Human.Jedi'

# Following has no match
# because the query that is used behind the scene is
# filtering on {'_cls': 'Human.GoodJedi'}
assert GoodJedi.objects().count() == 0

# Following has also no match
# because it is filtering on {'_cls': {'$in': ('Human', 'Human.GoodJedi', 'Human.BadSith')}}
# which has no match
assert Human.objects.count() == 0
assert Human.objects.first() is None

# If we bypass MongoEngine and make use of underlying driver (PyMongo)
# we can see that the documents are there
humans_coll = Human._get_collection()
assert humans_coll.count_documents({}) == 2
# print first document
print(humans_coll.find_one())
# {'_id': ObjectId('5fac4aaaf61d7fb06046e0f9'), '_cls': 'Human.Jedi', 'name': 'Darth Vader', 'dark_side': True, 'light_saber_color': 'red'}

{'_id': ObjectId('640eb3c32a9e0414ab6332b2'), '_cls': 'Human.Jedi', 'name': 'Darth Vader', 'dark_side': True, 'light_saber_color': 'red'}


In [32]:
Human.objects.count()

0

As you can see, first obvious problem is that we need to modify ‘_cls’ values based on existing values of ‘dark_side’ documents.  
手动修改 `_csl`类，以对应新的文档模型:

In [33]:
humans_coll = Human._get_collection()
old_class = 'Human.Jedi'
good_jedi_class = 'Human.GoodJedi'
bad_sith_class = 'Human.BadSith'
humans_coll.update_many({'_cls': old_class, 'dark_side': False}, {'$set': {'_cls': good_jedi_class}})
humans_coll.update_many({'_cls': old_class, 'dark_side': True}, {'$set': {'_cls': bad_sith_class}})

<pymongo.results.UpdateResult at 0x7fb17b959800>

In [None]:
# Let’s now check if querying improved in MongoEngine:

assert GoodJedi.objects().count() == 1  # Hoorah!
assert BadSith.objects().count() == 1   # Hoorah!
assert Human.objects.count() == 2       # Hoorah!

# let's now check that documents load correctly
jedi = GoodJedi.objects().first()
# raises FieldDoesNotExist: The fields "{'dark_side'}" do not exist on the document "Human.GoodJedi"
# 修改原来的 `_csl` 属性，并不完全准确， 已存在的 'dark_side' 与 现有文档的 字段不一致，会导致查询错误。


In fact we only took care of renaming the _cls values but we havn’t removed the ‘dark_side’ fields which does not exist anymore on the GoodJedi’s and BadSith’s models. Let’s remove the field from the collections:

In [36]:
# 手动全部删除 dark_side 字段， 使用 unset 操作符
humans_coll = Human._get_collection()
humans_coll.update_many({}, {'$unset': {'dark_side': 1}})

<pymongo.results.UpdateResult at 0x7fb189108340>

In [39]:
GoodJedi.objects().as_pymongo()

[{'_id': ObjectId('640eb3c32a9e0414ab6332b3'), '_cls': 'Human.GoodJedi', 'name': 'Obi Wan Kenobi', 'light_saber_color': 'blue'}]

In [None]:
# 一步完成上述更改
#We did this migration in 2 different steps for the sake of example but it could 
# have been combined with the migration of the _cls fields:

humans_coll.update_many(
    {'_cls': old_class, 'dark_side': False},
    {
        '$set': {'_cls': good_jedi_class},
        '$unset': {'dark_side': 1}
    }
)

### 2.10.3. Example 4: Index removal

If you remove an index from your Document class, or remove an indexed Field from your Document class, you’ll need to manually drop the corresponding index. MongoEngine will not do that for you.

The way to deal with this case is to identify the name of the index to drop with index_information(), and then drop it with drop_index()

当你删除模型中的索引或索引对应的字段时，数据库中的索引不会对应删除，MongoEngine也不会帮你做这件事。

Let’s for instance assume that you start with the following Document class: 

In [41]:
class User2(Document):
    name = StringField(index=True)

    meta = {"indexes": ["name"], "db_alias": TEST_DB1}


<User: User object>

In [43]:
User2(name="John Doe").save()

<User2: User2 object>

当我们使用文档和数据库交互时，就会创建索引。  
As soon as you start interacting with the Document collection (when .save() is called in this case), it would create the following indexes:

In [45]:
print(User2._get_collection().index_information())
# {
#  '_id_': {'key': [('_id', 1)], 'v': 2},
#  'name_1': {'background': False, 'key': [('name', 1)], 'v': 2},
# }

{'_id_': {'v': 2, 'key': [('_id', 1)]}, 'name_1': {'v': 2, 'key': [('name', 1)], 'background': False}}


Thus: ‘_id’ which is the default index and ‘name_1’ which is our custom index. If you would remove the ‘name’ field or its index, you would have to call:

In [46]:
User2._get_collection().drop_index("name_1")

> When adding new fields or new indexes, MongoEngine will take care of creating them (unless `auto_create_index` is disabled)

> 当我们添加新的索引时， MongoEngine会为我们自动创建它（除非 `auto_create_index` 属性被关闭）。

### 2.10.4. Recommendations（建议）

Write migration scripts whenever you do changes to the model schemas

Using `DynamicDocument` or `meta = {"strict": False}` may help to avoid some migrations or to have the 2 versions of your application to co-exist.

使用 `DynamicDocument`(动态文档) or `meta = {"strict": False}` (修改属性为 `False` ) 可以避免文档迁移，并且可以保留两个以上的文档模型版本。

Write post-processing checks to verify that migrations script worked. See below:

#### `meta = {"strict": True}` 时的测试

In [49]:
# version1,建立 用户3 文档模型
class User3(Document):
    name = StringField(index=True)

    meta = {
        "indexes": ["name"], 
        "db_alias": TEST_DB1,
        "strict": True
    }

In [50]:
User3(name="小刚").save()

<User3: User3 object>

In [51]:
# version2, 添加了gendr(性别)字段
class User3(Document):
    name = StringField(index=True)
    gender = StringField()
    meta = {
        "indexes": ["name"], 
        "db_alias": TEST_DB1,
        "strict": True
    }

In [55]:
User3(name="小明", gender="男").save()

<User3: User3 object>

In [60]:
check_documents(User3, sample_size=2)

### 2.10.5. Post-processing checks（后处理检查）

The following recipe can be used to sanity check a Document collection after you applied migration. It does not make any assumption on what was migrated, it will fetch 1000 objects randomly and run some quick checks on the documents to make sure the document looks OK. As it is, it will fail on the first occurrence of an error but this is something that can be adapted based on your needs.

这是一段通用的 在文档迁移之后进行检查的脚本。检查脚本本身不对文档的迁移做任何假设，只是检查文档是否符合最新的字段设定。如果第一个检查出存在错误，程序就会停止。sample_size指随即选取检查的数量。

In [47]:
# 文档原始版本  getattr(doc, field) 检测不出错误
def get_random_oids(collection, sample_size):
    pipeline = [{"$project": {'_id': 1}}, {"$sample": {"size": sample_size}}]
    return [s['_id'] for s in collection.aggregate(pipeline)]

def get_random_documents(DocCls, sample_size):
    doc_collection = DocCls._get_collection()
    random_oids = get_random_oids(doc_collection, sample_size)
    return DocCls.objects(id__in=random_oids)

def check_documents(DocCls, sample_size):
    for doc in get_random_documents(DocCls, sample_size):
        # general validation (types and values)
        doc.validate()

        # load all subfields,
        # this may trigger additional queries if you have ReferenceFields
        # so it may be slow
        for field in doc._fields:
            try:
                getattr(doc, field)
            except Exception:
                LOG.warning(f"Could not load field {field} in Document {doc.id}")
                raise

check_documents(Human, sample_size=1000)

In [76]:
# 修改后可以正常运行的版本
def get_random_oids(collection, sample_size):
    pipeline = [{"$project": {'_id': 1}}, {"$sample": {"size": sample_size}}]
    return [s['_id'] for s in collection.aggregate(pipeline)]

def get_random_documents(DocCls, sample_size):
    doc_collection = DocCls._get_collection()
    random_oids = get_random_oids(doc_collection, sample_size)
    return DocCls.objects(id__in=random_oids)

def check_documents(DocCls, sample_size):
    for doc in get_random_documents(DocCls, sample_size):
        # general validation (types and values)
        doc.validate()

        # load all subfields,
        # this may trigger additional queries if you have ReferenceFields
        # so it may be slow
        for field in doc._fields:
            if getattr(doc, field) is None:
                print(f"Could not load field '{field}' in Document {doc.id}")
                raise


In [79]:
# 全部检查版本
def check_documents_all(DocCls):
    for doc in DocCls.objects():
        # general validation (types and values)
        doc.validate()
        for field in doc._fields:
            if getattr(doc, field) is None:
                print(f"Could not load field '{field}' in Document {doc.id}")
                raise


In [80]:
check_documents_all(User3)

Could not load field 'gender' in Document 640ed0782a9e0414ab6332bb


RuntimeError: No active exception to reraise

In [81]:
check_documents(User3, 1000)

Could not load field gender in Document 640ed0782a9e0414ab6332bb


RuntimeError: No active exception to reraise