#### Legal Notice
<span style="font-family: 'Monospace'; font-size: 0.6em;">
The BHC Complexity Toolkit is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
<br/>
The BHC Complexity Toolkit is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
<br/>
You should have received a copy of the GNU General Public License along with the BHC Complexity Toolkit.  If not, see <https://www.gnu.org/licenses/>.
<br/><br/>
Copyright 2019, Mark D. Flood
<br/>
Author: Mark D. Flood
<br/>
Last revision: 22-Jun-2019
</span>

# Bank Holding Company (BHC) organizational complexity 
This notebook presents Python source code implementing a method for analyzing the topological complexity of BHC ownership and control hierarchies. Each BHC hierarchy has the structure of a [directed graph](https://en.wikipedia.org/wiki/Directed_graph), where: 

 *   _Nodes_ represent BHC legal entities and their subsidiary firms, and 
 *   _Edges_ represent ownership/control relationships among the nodes

Each edge is directed _from_ the controlling firm (the "parent") _to_ the subsidiary ("offspring") firm. 

## Data
The _Federal Reserve_ collects, via its form FR Y-10, the input data required to implement this methodology:

 *   Form FR Y-10: [**Report of Changes in Organizational Structure**](https://www.federalreserve.gov/apps/reportforms/reportdetail.aspx?sOoYJ+5BzDaGhRRQo6EFJQ==)
 
     _This report provides data on organizational structural changes for the reportable companies listed in the respondent panel section below. There are eight schedules: Banking; Savings and Loan; Nonbanking; Merger; 4(k); Domestic Branch; Foreign Branches of U.S. Banking Organizations; and Branch, Agency, and Representative Office._
     
The data form the core of the _Federal Reserve's_ National Information Center (NIC) database. The interagency _Federal Financial Institutions Examination Council_ (FFIEC) makes the NIC data available via their [**NIC Public Website** (NPW)](https://www.ffiec.gov/npw):

 *   NIC National Information Center: [**Data Download**](https://www.ffiec.gov/npw/FinancialReport/DataDownload)
 
     _Tables of structure information for select banks and institutions for which the Federal Reserve has a supervisory, regulatory, or research interest_

The Python code assumes you have downloaded the (zipped) XML version of five files:

 *   **Attributes - Active**
 *   **Attributes - Branches**
 *   **Attributes - Closed**
 *   **Relationships**
 *   **Transformations**

You should download all five files to capture an internally consistent snapshot of the data at a point in time. The FFIEC updates the data at least quarterly. 

The FFIEC now ([since 2018](https://www.ffiec.gov/npw/Home/About)) also provides these files in comma-separated-value (CSV) format; the earlier version of the NPW provided XML only. If you choose to download the CSV files instead of XML, then you should skip Step 2 below. 

The data dictionary describing the NIC dataset (version 2.0, dated July 2018) is available here:

 *   [**Bulk Data Download - Data Dictionary and Reference Guide**](https://www.ffiec.gov/npw/StaticData/DataDownload/NPW%20Data%20Dictionary.pdf)
 
     _The Bulk Data Download feature was developed in response to growing demand from the public for data in bulk format. Data being provided are considered non-confidential, public data. As our first iteration of this feature, we are releasing the tables related to attributes, relationships,and transformations. Details of these tables can be found in the subsequent sections of this Data Dictionary._
     
## Mathematical details of complexity measurement
The formal methodology implemented by the Python modules is described in detail in the following paper:

 *   M. Flood, D. Kenett, R. Lumsdaine, and J. Simon (2017), "The Complexity of Bank Holding Companies: A Topological Approach," [Working Paper 23755](http://www.nber.org/papers/w23755), _National Bureau of Economic Research_, August. 
 
     _Large bank holding companies (BHCs) are structured into intricate ownership hierarchies involving hundreds or even thousands of legal entities. Each subsidiary in these hierarchies has its own legal form, assets, liabilities, managerial goals, and supervisory authorities. In the event of BHC default or insolvency, regulators may need to resolve the BHC and its constituent entities. Each entity individually will require some mix of cash infusion, outside purchase, consolidation with other subsidiaries, legal guarantees, and outright dissolution. The subsidiaries are not resolved in isolation, of course, but in the context of resolving the consolidated BHC at the top of the hierarchy. The number, diversity, and distribution of subsidiaries within the hierarchy can therefore significantly ease or complicate the resolution process. We propose a set of related metrics intended to assess the complexity of the BHC ownership graph. These proposed metrics focus on the graph quotient relative to certain well identified partitions on the set of subsidiaries, such as charter type and regulatory jurisdiction. The intended measures are mathematically grounded, intuitively sensible, and easy to implement. We illustrate the process with a case study of one large U.S. BHC._
     
An alternate version of this paper is available from the Office of Financial Research:

 *   M. Flood, D. Kenett, R. Lumsdaine, and J. Simon (2017), "The Complexity of Bank Holding Companies: A New Measurement Approach," [OFR Working Paper 17-03](https://www.financialresearch.gov/working-papers/2017/09/29/complexity-of-bank-holding-companies/), _Office of Financial Research_, September. 

## Software

### Setup

The software is bundled in a Python 3 package, **bhccpx**. The software consists of a set of Python modules, designed to run in the following sequence:

 1.   \[**www2dat**\] Create local directories and download data from the _National Information Center_ (NIC) site
 1.   \[**dat2csv**\] Unzipping and parsing (as required) the downloads into CSV versions of the same data
 1.   \[**csv2cch**\] Creating and caching python data objects from the data
 1.   \[**cch2jbf**\] Creating various outputs specifically for the J. of Banking and Finance submission
 
This sequence is encoded in the notebook cells below, with each cell invoking one of the Python modules indicated in square brackets above. You can execute the modules in any of several alternative modes:

 *   Through the steps (cells) in this notebook
 *   By invoking the modules from the command line
 *   By running the modules in an interactive Python shell
 
The mode of execution should not affect the outcome of the calculations. 

 *   [Python 3](https://www.python.org/downloads/)
 *   [Installing Python](https://docs.python-guide.org/starting/installation/)

### Dependencies
The software requires Python version 3.x to run correctly. 

In addition, the sofware uses the following supplementary modules:

 *   networkx -- v2.3 or higher
 *   numpy -- v1.16 or higher
 *   pandas -- v0.24 or higher



## Execution sequence

### Step 0 - Configuration
You can control the execution process through a collection of configuration parameters. These parameters reside in the **bhc_complex.ini** configuration file, as key-value pairs. The availability of configuration parameters should discourage tinkering with the (tested) Python code. 

The configuration file is divided into _sections_, indicated by labels in square brackets. For example, this snippet shows the beginning of the **[DEFAULT]** section in the configuration file:

   ***
```python
# =============================================================================
# ====== Default configuration parameters, for all modules ====================
# =============================================================================
[DEFAULT]

# Default verbosity level
verbose=True

# Default high verbosity level
veryverbose=False
```
   ***

All parameter values in the **bhc_complex.ini** file are treated as strings. For simple (one-line) parameter settings, no quotes are required. For example, in the preceding snippet, the **verbose** parameter is assigned the string value, '_True_'. The Python program converts this string into the equivalent boolean value at runtime. 

Lists and dictionaries in the **bhc_complex.ini** file are handled slightly differently. First, these parameter values are typically read from multiple consecutive lines in the configuration file. The closing brace indicating the end of the list or dictionary must be indented by at least one character (if the closing brace is not indented, it will be treated as the start of a new parameter key). The resulting string is evaluated as standard Python. So, string values _within_ lists and dictionaries should be enclosed in quotes. For example, in the following snippet, the **bhclist** parameter is a list of two integers (ID_RSSDs), while **asoflist** is a list of four strings (note the single quotes indicating string values). 

   ***
```python
# =============================================================================
# ========= Configuration parameters for sys2bhc.py ===========================
# =============================================================================
[sys2bhc]

bhclist=[
    1073551, # Wachovia
    1120754, # Wells Fargo
 ]
 
asoflist=[
    '2006Q4', 
    '2008Q3', 
    '2008Q4', 
    '2010Q4', 
 ]
```
   ***

You can also override the parameter values in the **bhc_complex.ini** file at runtime by modifying the configuration after you have read it from disk. The configuration indexes the parameters first by _section_ and then by _key_. For example, to tweak the **indir** directory location, you might submit the following code: 

   ***
```python
import bhc_datautil as UTIL
CONFIG = UTIL.read_bhc_config()
CONFIG['zip2xml']['indir']='../../data/2016redo'
```
   ***
Be sure to enclose all (simple) parameter values within quotes when overriding. 

In [13]:
# STEP 0 - Configuration
#
# The ability to reload modules is useful
from importlib import reload

# Add the path to Python code for BHC complexity
import sys
sys.path.append('./bhccpx/bhccpx/')

# Base configuration is stored as key-value pairs in the bhc_config.ini file
import bhc_util as UTIL
CONFIG = UTIL.read_config()

# Last-minute overrides choices within CONFIG:
#CONFIG['DEFAULT']['datadir']='./data/2016redo'
#CONFIG['DEFAULT']['cachedir']='./data/2016redo/cache'

#CONFIG['www2dat']['nic_dir']='./data/NIC'
#CONFIG['www2dat']['nic_subdir']='2016redo'

### Step 1 - Downloading the data
This step makes local target directories and downloads (into them) the necessary data public websites:

 * FFIEC National Information Center (NIC) data on BHC hierarchies
 * FDIC Summary of Deposits (SoD) data on insured depositories and their branches
 * FDIC Community Banking (CB) data 
 * FDIC bank failure data

Note that the FFIEC has not adhered to a fixed naming convention for NIC filenames over time. The names of the downloaded NIC files can vary, depending on the vintage of the download.  

You must run **_Step 0_** first, to load the configuration, before running this step. 

The parameters for this step cover just the source URLs where the data can be found on the Internet, and the target directories and filenames for storing them when downloaded.  


In [8]:
# STEP 1 - Downloading the data
#
# Downloads necessary banking data from the Internet
import www2dat as w2d
# Reload the module, just to ensure the notebook kernel has the latest version from disk
reload(w2d)

# Now, do the actual work:
w2d.make_dirs(CONFIG)
w2d.download_data(CONFIG)

100% (25 of 25) |########################| Elapsed Time: 0:00:00 Time:  0:00:00
100% (1 of 1) |##########################| Elapsed Time: 0:00:00 Time:  0:00:00


### Step 2 - Unpacking the data downloads
This step unpacks the downloaded data files to local CSV files. 

Note that the FFIEC has not adhered to a fixed naming convention for NIC filenames over time. The names of the downloaded NIC files can vary, depending on the vintage of the download.  

You must run **_Step 0_** first, to load the configuration, before running this step. 

The parameters for this step cover just the source location where the **\*.zip** files reside, the target location for storing their unpacked contents, and the names of the five **\*.zip** files themselves.  


In [14]:
# STEP 2 - Unpacking the data downloads
#
# Unzips NIC downloads to expose XML or CSV files
import dat2csv as d2s
# Reload the module, just to ensure the notebook kernel has the latest version from disk
reload(d2s)

# Now, do the actual work:
d2s.unzip_data(CONFIG)
d2s.parse_nic(CONFIG)

100% (5 of 5) |##########################| Elapsed Time: 0:00:13 Time:  0:00:13
100% (25 of 25) |########################| Elapsed Time: 0:00:14 Time:  0:00:14
100% (7 of 7) |##########################| Elapsed Time: 0:00:05 Time:  0:00:05
N/A% (0 of 1) |                          | Elapsed Time: 5:12:32 ETA:  --:--:--

### Step 3 - Caching data objects
This step parses and caches certain key data objects, to speed subsequent analysis:

 * Quarterly FFIEC NIC data, parsed into Pandas dataframes
 * Quarterly snapshots of the banking system graph, as NetworkX digraphs
 * Annual FDIC SoD data, as Pandas dataframes
 * Quarterly FDIC CB reference data, as Pandas dataframes
 
You must run **_Step 0_** first, to load the configuration, before running this step. 


In [15]:
# STEP 3 - Caching data objects
#
# Builds and caches local data objects
import csv2cch as c2c
# Reload the module, just to ensure the notebook kernel has the latest version from disk
reload(c2c)

# Now, do the actual work:
c2c.build_nic(CONFIG)
c2c.build_banksys(CONFIG)
c2c.build_fdicsod(CONFIG)
c2c.build_fdiccb(CONFIG)

### Step 4 - Build journal submission outputs
This step creates several outputs used for the journal submission

You must run **_Step 0_** first, to load the configuration, before running this step. 


In [17]:
# STEP 4 - Build outputs
#
# Creates (and caches) network objects for limited sets of dates and BHCs
import cch2jbf as c2j

# Reload the modules, just to ensure the notebook kernel has the latest version from disk
reload(c2j)

# Now, do the actual work:
c2j.make_wachwells(CONFIG)
c2j.make_panel(CONFIG)
c2j.make_failscatter(CONFIG)
c2j.make_persistent(CONFIG)


Process ForkPoolWorker-64:
Process ForkPoolWorker-61:
Process ForkPoolWorker-63:
Process ForkPoolWorker-65:
Process ForkPoolWorker-62:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "./bhccpx/bhccpx/cch2jbf.py", line 403, in make_panel_asof
    #            context = f'ASOF={asofdate}, RSSD={rssd}'
KeyboardInterrupt


KeyboardInterrupt: 

Traceback (most recent call last):
KeyboardInterrupt
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "./bhccpx/bhccpx/cch2jbf.py", line 402, in make_panel_asof
    #        if ('TRUE'==config[MODNAME]['test_metrics'].upper()):
  File "./bhccpx/bhccpx/cch2jbf.py", line 143, in populate_bhc
    BHC = c2c.add_attributes_edge(BHC, NICdata, edge_atts)
  File "./bhccpx/bhccpx/csv2cch.py", line 133, in add_attributes_edge
    rel = RELdf.query(f'rssd_par=={par} and rssd_off=={off}').iloc[0]
  File "/home/mf/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 3088, in query
    res = self.eval(expr, **kwargs)
  File "/home/mf/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 3203, in eval
