In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


## This week’s topic: Bank failures

Decades ago, the US decided to do something about this. They established the FDIC (Federal Deposit Insurance Corporation), which is a sort of bank insurance company. Every bank is required to pay FDIC. In exchange, the FDIC guarantees that anyone with up to $250k in their account will get their money, even if the bank goes insolvent. You, as a bank customers, thus don’t need to worry too much about which bank you’re at; so long as it’s FDIC insured, you’re guaranteed to get your money.

Turns out, they were the bank of choice for 50 percent (!) of all venture-backed startups in the US. They were the 16th largest bank in the US (according to the [Federal Reserve](https://www.federalreserve.gov/releases/lbr/current/default.htm)), with about $200b in assets. Ths isn’t a small community bank!

As of this writing, the FDIC has taken over SVB. Most interestingly, various parties (the FDIC, Federal Reserve, and US Treasury) have said that everyone will get their money back, not just those with up to $250k in deposits. That was likely a relief to the many startups that had deposited millions and billions of investments in the bank, and which were starting to worry about making payroll.

For more background on what happened with SVB, I’ll refer you to the (always amazing) Matt Levine, whose [“Money Stuff” column](https://www.bloomberg.com/opinion/authors/ARbTQlRLRjE/matthew-s-levine) can’t come often enough. I don’t know how he writes so much, and so insightfully, while being so funny — but he does.

Also, check out this story from This American Life in 2009, where reporter Chana Jaffe-Walt takes us through the FDIC’s takeover of a bank: https://www.thisamericanlife.org/377/scenes-from-a-recession/act-two-3. It was such a good story that I remembered it from more than a decade ago.

### Data

Our data this week comes from the FDIC, which offers a complete database describing bank failures at https://banks.data.fdic.gov/docs/. We’re specifically going to look at the bank-failure data, which is at the following page:

https://banks.data.fdic.gov/explore/failures?aggReport=detail&displayFields=NAME%2CCERT%2CFIN%2CCITYST%2CFAILDATE%2CSAVR%2CRESTYPE%2CCOST%2CRESTYPE1%2CCHCLASS1%2CQBFDEP%2CQBFASSET&endFailYear=2023&sortField=FAILDATE&sortOrder=desc&startFailYear=1934
Once at this URL, you can click on the “download” button. Or you can use the following URL:

https://pfabankapi.app.cloud.gov/api/failures?fields=NAME%2CCERT%2CFIN%2CCITYST%2CFAILDATE%2CSAVR%2CRESTYPE%2CCOST%2CRESTYPE1%2CCHCLASS1%2CQBFDEP%2CQBFASSET&filters=FAILYR%3A%5B1934%20TO%202023%5D&limit=10000&react=true&sort_by=FAILDATE&sort_order=desc&subtotal_by=RESTYPE&total_fields=QBFDEP%2CQBFASSET%2CCOST&format=csv&download=true&filename=bank-data

(And yes, I’ll admit that some of this data set is about “assistance,” rather than “failures,” but I’m just going to lump them all together under the term “failures” for the purposes of this week’s question and answers.)

The data dictionary for the file is at https://banks.data.fdic.gov/explore/failures/help.

### Tasks

Our questions for this week are:

1. According to our document, how many bank failures have there been since the FDIC was opened?
1. What was the earliest failure in our data set? What is the most recent failure in our data set?
1. In which five years did the greatest number of banks fail?
1. In which three states were the greatest number of failed banks?
1. What was the average market capitalization of the banks that failed? Given a capitalization of $200b, did that make SVB above, below, or about average?
1. When was the most recent failure greater than SVB?
1. Bank failures can be resolved in several different ways. How often, historically, have we seen each resolution? Were the odds good that SVB's uninsured depositors would get their money?
1. What about bank failures in the last 25 years -- if we just look at those, do the odds change?
1. What was the mean estimated loss in bank failures? What proportion of a bank's assets did this generally involve?

The learning goals for this week include working with dates and strings. And some insights into whether people were right to panic about losing their money when SVB went under.

In [2]:
url = "https://pfabankapi.app.cloud.gov/api/failures?fields=NAME%2CCERT%2CFIN%2CCITYST%2CFAILDATE%2CSAVR%2CRESTYPE%2CCOST%2CRESTYPE1%2CCHCLASS1%2CQBFDEP%2CQBFASSET&filters=FAILYR%3A%5B1934%20TO%202023%5D&limit=10000&react=true&sort_by=FAILDATE&sort_order=desc&subtotal_by=RESTYPE&total_fields=QBFDEP%2CQBFASSET%2CCOST&format=csv&download=true&filename=bank-data"
df = pd.read_csv(url)

In [3]:
df.head()

Unnamed: 0,CERT,CHCLASS1,CITYST,COST,FAILDATE,FIN,ID,NAME,QBFASSET,QBFDEP,RESTYPE,RESTYPE1,SAVR
0,8758.0,NM,"SAC CITY, IA",14804.0,11/3/2023,10545,4109,CITIZENS BANK,60448.0,52311.0,FAILURE,PA,DIF
1,25851.0,SM,"ELKHART, KS",54167.0,7/28/2023,10544,4108,HEARTLAND TRI-STATE BANK,139446.0,130110.0,FAILURE,PA,DIF
2,59017.0,NM,"SAN FRANCISCO, CA",16521935.0,5/1/2023,10543,4107,FIRST REPUBLIC BANK,212638872.0,176436706.0,FAILURE,PA,DIF
3,57053.0,NM,"NEW YORK, NY",884248.0,3/12/2023,10540,4106,SIGNATURE BANK,110363650.0,88612911.0,FAILURE,PA,DIF
4,24735.0,SM,"SANTA CLARA, CA",17820026.0,3/7/2023,10539,4105,SILICON VALLEY BANK,209026000.0,175378000.0,FAILURE,PA,DIF


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4109 entries, 0 to 4108
Data columns (total 13 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   CERT      3621 non-null   float64
 1   CHCLASS1  4109 non-null   object 
 2   CITYST    4109 non-null   object 
 3   COST      3472 non-null   float64
 4   FAILDATE  4109 non-null   object 
 5   FIN       4109 non-null   int64  
 6   ID        4109 non-null   int64  
 7   NAME      4109 non-null   object 
 8   QBFASSET  3955 non-null   float64
 9   QBFDEP    4107 non-null   float64
 10  RESTYPE   4109 non-null   object 
 11  RESTYPE1  4109 non-null   object 
 12  SAVR      4109 non-null   object 
dtypes: float64(4), int64(2), object(7)
memory usage: 417.4+ KB


In [5]:
df["FAILDATE"] = pd.to_datetime(df["FAILDATE"])

In [6]:
df.head()

Unnamed: 0,CERT,CHCLASS1,CITYST,COST,FAILDATE,FIN,ID,NAME,QBFASSET,QBFDEP,RESTYPE,RESTYPE1,SAVR
0,8758.0,NM,"SAC CITY, IA",14804.0,2023-11-03,10545,4109,CITIZENS BANK,60448.0,52311.0,FAILURE,PA,DIF
1,25851.0,SM,"ELKHART, KS",54167.0,2023-07-28,10544,4108,HEARTLAND TRI-STATE BANK,139446.0,130110.0,FAILURE,PA,DIF
2,59017.0,NM,"SAN FRANCISCO, CA",16521935.0,2023-05-01,10543,4107,FIRST REPUBLIC BANK,212638872.0,176436706.0,FAILURE,PA,DIF
3,57053.0,NM,"NEW YORK, NY",884248.0,2023-03-12,10540,4106,SIGNATURE BANK,110363650.0,88612911.0,FAILURE,PA,DIF
4,24735.0,SM,"SANTA CLARA, CA",17820026.0,2023-03-07,10539,4105,SILICON VALLEY BANK,209026000.0,175378000.0,FAILURE,PA,DIF


#### 1. According to our document, how many bank failures have there been since the FDIC was opened?

In [7]:
df["RESTYPE"].value_counts()

RESTYPE
FAILURE       3516
ASSISTANCE     593
Name: count, dtype: int64

In [8]:
df.shape

(4109, 13)

In [9]:
len(df)

4109

In [10]:
df.count()

CERT        3621
CHCLASS1    4109
CITYST      4109
COST        3472
FAILDATE    4109
FIN         4109
ID          4109
NAME        4109
QBFASSET    3955
QBFDEP      4107
RESTYPE     4109
RESTYPE1    4109
SAVR        4109
dtype: int64

#### 2. What was the earliest failure in our data set? What is the most recent failure in our data set?

In [11]:
df["FAILDATE"].sort_values().head(1)

4108   1934-04-19
Name: FAILDATE, dtype: datetime64[ns]

In [12]:
df["FAILDATE"].min()

Timestamp('1934-04-19 00:00:00')

In [13]:
df["FAILDATE"].sort_values().tail(1)

0   2023-11-03
Name: FAILDATE, dtype: datetime64[ns]

In [14]:
df["FAILDATE"].max()

Timestamp('2023-11-03 00:00:00')

In [15]:
df["FAILDATE"].agg(["min", "max"])

min   1934-04-19
max   2023-11-03
Name: FAILDATE, dtype: datetime64[ns]

In [16]:
df["FAILDATE"].describe()

count                             4109
mean     1985-12-28 03:27:06.965198336
min                1934-04-19 00:00:00
25%                1985-07-11 00:00:00
50%                1988-12-30 00:00:00
75%                1991-05-03 00:00:00
max                2023-11-03 00:00:00
Name: FAILDATE, dtype: object

#### 3. In which five years did the greatest number of banks fail?

In [17]:
df["FAILDATE"].dt.year.value_counts().head(5)

FAILDATE
1989    534
1988    470
1990    382
1991    271
1987    262
Name: count, dtype: int64

In [18]:
df.groupby(df["FAILDATE"].dt.year)["NAME"].count().sort_values(ascending=False).head(5)

FAILDATE
1989    534
1988    470
1990    382
1991    271
1987    262
Name: NAME, dtype: int64

#### 4. In which three states were the greatest number of failed banks?

In [19]:
df["CITYST"].value_counts().head(5)

CITYST
HOUSTON, TX        105
DALLAS, TX          64
CHICAGO, IL         51
SAN ANTONIO, TX     45
AUSTIN, TX          43
Name: count, dtype: int64

Because CITYST contains a city name and the state, we have to extract the state first to answer the question

In [20]:
df["ST"] = df["CITYST"].str.split(", ", expand=True)[1]

In [23]:
df["ST"] = df["CITYST"].str.slice(-2, None)

In [24]:
df.head()

Unnamed: 0,CERT,CHCLASS1,CITYST,COST,FAILDATE,FIN,ID,NAME,QBFASSET,QBFDEP,RESTYPE,RESTYPE1,SAVR,ST
0,8758.0,NM,"SAC CITY, IA",14804.0,2023-11-03,10545,4109,CITIZENS BANK,60448.0,52311.0,FAILURE,PA,DIF,IA
1,25851.0,SM,"ELKHART, KS",54167.0,2023-07-28,10544,4108,HEARTLAND TRI-STATE BANK,139446.0,130110.0,FAILURE,PA,DIF,KS
2,59017.0,NM,"SAN FRANCISCO, CA",16521935.0,2023-05-01,10543,4107,FIRST REPUBLIC BANK,212638872.0,176436706.0,FAILURE,PA,DIF,CA
3,57053.0,NM,"NEW YORK, NY",884248.0,2023-03-12,10540,4106,SIGNATURE BANK,110363650.0,88612911.0,FAILURE,PA,DIF,NY
4,24735.0,SM,"SANTA CLARA, CA",17820026.0,2023-03-07,10539,4105,SILICON VALLEY BANK,209026000.0,175378000.0,FAILURE,PA,DIF,CA


In [25]:
df["ST"].value_counts().head(5) 

ST
TX    910
CA    265
IL    227
FL    199
OK    182
Name: count, dtype: int64