### Idiomatic Python: Merging Dictionaries

Sometimes we need to merge various dictionaries together.

A typical example of this might be for app configurations.

We may have three sets of settings (how they are serialized and where they come from does not matter specifically) that end up as three different dictionaries in our app.

For example we may have: default configs, user specific overrides, environment variable based overrides, command line overrides, etc.

For example:

In [26]:
default_settings = {
    "db_host": "localhost",
    "port": 3306,
    "user_name": None,
    "password": None,
    "connection_timeout": 10,
    "query_timeout": 30,
}

user_settings = {
    "port": 9906,
    "connection_timeout": 20,
}

env_vars = {
    "user_name": "test",
    "password": "some-secret",
}

What we want here is a single dictionary where `env_vars` overrides settings in `user_settings`, which in turn override settings in `default_settings`.

Let's look at various approaches, from least desirable to most pythonic.

First way we could do this is to make a copy of `default_settings` and then run through several updates:

In [2]:
settings = default_settings.copy()
settings.update(user_settings)
settings.update(env_vars)

settings

{'db_host': 'localhost',
 'port': 9906,
 'user_name': 'test',
 'password': 'some-secret',
 'connection_timeout': 20,
 'query_timeout': 30}

This works, but the code can be greatly improved by leveraging the more pythonic dictionary unpacking:

In [3]:
settings = {**default_settings, **user_settings, **env_vars}

settings

{'db_host': 'localhost',
 'port': 9906,
 'user_name': 'test',
 'password': 'some-secret',
 'connection_timeout': 20,
 'query_timeout': 30}

So, we essentially did the same thing - create a new dictionary with the merged values, but the code is much cleaner.

So this would be fine - it is quite pythonic, and it works.

However, we ended up creating a new dictionary to hold the new combined settings - something we need not do.

We can avoid creating a new dictionary altogether by using the `ChainMap` class in the `collections` module.


In [4]:
from collections import ChainMap

The way `ChainMap` works, is that we give it a variable number of dictionaries (maps) as arguments.

When we lookup a key, it will look for that key in the first map and return the corresponding value if it finds it, otherwise it moves on to the second map and tries there - rinse and repeat.

We can actually also update a `ChainMap`, but inserts/updates/deletes will only affect the first map in the chain. Depending on your situation that may not be what you need, but in this scenario, we don't really need to modify our settings (usually we read them in once, and keep using them as is - app state, if needed, would be stored in other places anyway).

From what I just described, we need to be careful about the order in which we specify the maps in the chain - we want most specific to least specific - i.e. since we want `user_settings` to override `default_settings`, `user_settings` should come before `default_settings` in the chain. Similarly with `env_vars`.

In [5]:
settings = ChainMap(env_vars, user_settings, default_settings)

In [6]:
settings

ChainMap({'user_name': 'test', 'password': 'some-secret'}, {'port': 9906, 'connection_timeout': 20}, {'db_host': 'localhost', 'port': 3306, 'user_name': None, 'password': None, 'connection_timeout': 10, 'query_timeout': 30})

We can look up individual keys in there:

In [7]:
settings['password']

'some-secret'

In [8]:
settings['query_timeout']

30

We could even, if we really wanted to, convert the chain to a regular dictionary:

In [9]:
dict(settings)

{'db_host': 'localhost',
 'port': 9906,
 'user_name': 'test',
 'password': 'some-secret',
 'connection_timeout': 20,
 'query_timeout': 30}

which is the same as what we did earlier:

In [10]:
{**default_settings, **user_settings, **env_vars}

{'db_host': 'localhost',
 'port': 9906,
 'user_name': 'test',
 'password': 'some-secret',
 'connection_timeout': 20,
 'query_timeout': 30}

If we perform an update, it will affect the first map in the chain only.

We can see the various maps in the chain this way:

In [11]:
settings.maps

[{'user_name': 'test', 'password': 'some-secret'},
 {'port': 9906, 'connection_timeout': 20},
 {'db_host': 'localhost',
  'port': 3306,
  'user_name': None,
  'password': None,
  'connection_timeout': 10,
  'query_timeout': 30}]

Let's do an update and an insert:

In [12]:
settings['new_key'] = 'test'
settings['query_timeout'] = 100

In [13]:
settings

ChainMap({'user_name': 'test', 'password': 'some-secret', 'new_key': 'test', 'query_timeout': 100}, {'port': 9906, 'connection_timeout': 20}, {'db_host': 'localhost', 'port': 3306, 'user_name': None, 'password': None, 'connection_timeout': 10, 'query_timeout': 30})

In [14]:
settings.maps

[{'user_name': 'test',
  'password': 'some-secret',
  'new_key': 'test',
  'query_timeout': 100},
 {'port': 9906, 'connection_timeout': 20},
 {'db_host': 'localhost',
  'port': 3306,
  'user_name': None,
  'password': None,
  'connection_timeout': 10,
  'query_timeout': 30}]

As you can see, just the first map (the root map) was modified.

How about deleting an entry?

In [15]:
del settings['password']

In [16]:
settings.maps

[{'user_name': 'test', 'new_key': 'test', 'query_timeout': 100},
 {'port': 9906, 'connection_timeout': 20},
 {'db_host': 'localhost',
  'port': 3306,
  'user_name': None,
  'password': None,
  'connection_timeout': 10,
  'query_timeout': 30}]

So that removed the entry from the root map, and now our chain looks like this:

In [17]:
dict(settings)

{'db_host': 'localhost',
 'port': 9906,
 'user_name': 'test',
 'password': None,
 'connection_timeout': 20,
 'query_timeout': 100,
 'new_key': 'test'}

But what about deleting a key that is **not** in the root map?

In [18]:
settings.maps

[{'user_name': 'test', 'new_key': 'test', 'query_timeout': 100},
 {'port': 9906, 'connection_timeout': 20},
 {'db_host': 'localhost',
  'port': 3306,
  'user_name': None,
  'password': None,
  'connection_timeout': 10,
  'query_timeout': 30}]

Let's try to delete `port`:

In [19]:
try:
    del settings['port']
except KeyError as ex:
    print("KeyError", ex)

KeyError "Key not found in the first mapping: 'port'"


If we really wanted to delete a key from the entire chain, we would have to work a bit harder. If you really need to manipulate the keys in the chain, then maybe a `ChainMap` is not the solution - probably better to revert back top a normal dictionary.

In [20]:
for map_ in settings.maps:
    map_.pop('port', None)

In [21]:
dict(settings)

{'db_host': 'localhost',
 'user_name': 'test',
 'password': None,
 'connection_timeout': 20,
 'query_timeout': 100,
 'new_key': 'test'}

In [22]:
settings.maps

[{'user_name': 'test', 'new_key': 'test', 'query_timeout': 100},
 {'connection_timeout': 20},
 {'db_host': 'localhost',
  'user_name': None,
  'password': None,
  'connection_timeout': 10,
  'query_timeout': 30}]

As you can see, `port` has now been removed entirely.

Did these operations affect our original dictionaries?

In [23]:
default_settings

{'db_host': 'localhost',
 'user_name': None,
 'password': None,
 'connection_timeout': 10,
 'query_timeout': 30}

In [24]:
user_settings

{'connection_timeout': 20}

In [25]:
env_vars

{'user_name': 'test', 'new_key': 'test', 'query_timeout': 100}

The answer is **yes**!

Notice that `port` is completely gone, and `new_key` exists in the `env_vars` dictionary.

Remember, the whole goal of using `ChainMap` was to avoid creating a new dictionary - so it makes sense that mutating the chain map would mutate the objects it is based on.

So bottom line, two ways of merging dictionaries, one way involves making a copy of all the data, and the other, using `ChainMap` does not. Pick whichever one is most applicable in your particular use case.