# Analyzing FEMA's National Flood Insurance Program (NFIP) Data With DuckDB
Author: Mark Bauer

# OpenFEMA Dataset: FIMA NFIP Redacted Claims - v2
Federal Emergency Management Agency (FEMA), OpenFEMA Dataset: FIMA NFIP Redacted Claims - v2. Retrieved from https://www.fema.gov/openfema-data-page/fima-nfip-redacted-claims-v2. This product uses the FEMA OpenFEMA API, but is not endorsed by FEMA. The Federal Government or FEMA cannot vouch for the data or analyses derived from these data after the data have been retrieved from the Agency's website(s).

Read more about OpenFEMA's  [Terms and Conditions](https://www.fema.gov/about/openfema/terms-conditions).

**Dataset Description**:
>This dataset provides details on NFIP claims transactions. It is derived from the NFIP system of record, staged in the NFIP reporting platform and redacted to protect policy holder personally identifiable information.
>
>This dataset is not intended to be an official federal report, and should not be considered an official federal report.

**About the National Flood Insurance Program**:   
>Congress passed the National Flood Insurance Act (NFIA), 42 U.S.C. 4001 in 1968, creating the National Flood Insurance Program (NFIP) in order to reduce future flood losses through flood hazard identification, floodplain management, and providing insurance protection. The Department of Housing and Urban Development (HUD) originally administered the NFIP, and Congress subsequently transferred the NFIP to FEMA upon its creation in 1979. FEMA and insurance companies participating in FEMA's Write Your Own (WYO) program offer NFIP insurance coverage for building structures as well as for contents and personal property within the building structures to eligible and insurable properties. The WYO program began in 1983 with NFIP operating under Part B of the NFIA and allows FEMA to authorize private insurance companies to issue the Standard Flood Insurance Policy (SFIP) as FEMA's fiduciary and fiscal agent. FEMA administers NFIP by ensuring insurance applications are processed properly; determining correct premiums; renewing, reforming, and cancelling insurance policies; transferring policies from the seller of the property to the purchaser of the property in certain circumstances; and processing insurance claims.
>
>The paid premiums of SFIPs and claims payments for damaged property are processed through the National Flood Insurance Fund (NFIF). NFIF was established by the National Flood Insurance Act of 1968 (42 U.S.C. 4001, et seq.), and is a centralized premium revenue and fee-generated fund that supports NFIP, which holds these U.S. Treasury funds.
>
>The Flood Insurance Claims Manual (https://nfipservices.floodsmart.gov/insurance-manuals) provides claims guidance to WYOs, vendors, adjusters, and examiners so that policyholders experience consistent and reliable service. The Manual provides processes for handling claims from the notice of loss to final payment. The NFIP has provided answers to Frequently Asked Questions (FAQs) to assist the public in understanding and navigating the data our program makes available: https://www.fema.gov/sites/default/files/documents/fema_nfip-data-faqs.pdf.

**Data Dictionary**:  
View the data dictionary on OpenFEMA under the [Data Fields](https://www.fema.gov/openfema-data-page/fima-nfip-redacted-claims-v2) section.

Note: In this notebook, all dollar amounts are reported in nominal terms and have not been adjusted for inflation.

In [1]:
# import libraries
import duckdb
import pandas as pd

In [2]:
# reproducibility
%reload_ext watermark
%watermark -v -p duckdb,pandas

Python implementation: CPython
Python version       : 3.11.0
IPython version      : 8.6.0

duckdb: 1.0.0
pandas: 1.5.1



In [3]:
# list datasets
%ls data/

FimaNfipClaims.parquet       nfip-data.db
FimaNfipPolicies.parquet     policies-nyc-year.parquet
claims-nyc-year.parquet      policies-state-year.parquet
claims-state-year.parquet    policies.db
claims.db


# Redacted Claims: Before Getting Started
Please note that this dataset provides details on NFIP claims transactions and is ***redacted*** to protect policy holder personally identifiable information. The claim's `latitude` and `longitude` field **should not be used to represent the precise location of the insured building**. From the data dictionary:

>Latitude: **Approximate latitude of the insured building (to 1 decimal place)**. This represents the approximate location of the insured property. The precision has been lessened to ensure individual privacy. This may result in a point location that exists in an incorrect county or state. Use the state and county fields for record aggregation for these dimensions.
>
>Longitude: **Approximate longitude of the insured building (to 1 decimal place)**. This represents the approximate location of the insured property. The precision has been lessened to ensure individual privacy. This may result in a point location that exists in an incorrect county or state. Use the state and county fields for record aggregation for these dimensions.

For more information, visit the [Data Dictionary](https://www.fema.gov/openfema-data-page/fima-nfip-redacted-claims-v2) and review [Frequently Asked Questions about NFIP Policies and Claims Data](https://nfipservices.floodsmart.gov/frequently-asked-questions-about-nfip-policies-and-claims-data).

This analysis presents all financial figures in nominal dollars. No adjustments for inflation have been made, meaning that the values reflect the actual dollar amounts at the time of measurement. Additionally, the date and time of data access can be found in the download-data notebook.

# Create a DuckDB database instance using the Python client

In [4]:
%%time

# create a DuckDB database instance
con = duckdb.connect("data/nfip.db")

# create table claims of dataset
con.execute("""
    CREATE OR REPLACE TABLE claims AS
        FROM read_parquet('data/FimaNfipClaims.parquet')
""")

# sanity check
con.sql("""
    SELECT *
    FROM claims
    LIMIT 5
""").show()

┌──────────────────────┬──────────────────────┬───┬──────────────┬──────────────┬──────────────────────┐
│ agricultureStructu…  │       asOfDate       │ … │   latitude   │  longitude   │          id          │
│       boolean        │ timestamp with tim…  │   │ decimal(9,1) │ decimal(9,1) │         uuid         │
├──────────────────────┼──────────────────────┼───┼──────────────┼──────────────┼──────────────────────┤
│ false                │ NULL                 │ … │         39.2 │        -74.6 │ a4edd1e3-a2cc-4ea7…  │
│ false                │ NULL                 │ … │         29.9 │        -95.3 │ 5fa56e50-7923-44f3…  │
│ false                │ NULL                 │ … │         40.0 │        -74.1 │ ee43a296-bc2b-4b49…  │
│ false                │ NULL                 │ … │         29.9 │        -95.4 │ 2d96f6b6-d33b-4eda…  │
│ false                │ NULL                 │ … │         26.4 │        -81.9 │ 37577287-ba9f-4cea…  │
├──────────────────────┴──────────────────────┴───┴────

In [5]:
# list tables and schemas
con.sql("SHOW ALL TABLES").df()

Unnamed: 0,database,schema,name,column_names,column_types,temporary
0,nfip,main,claims,"[agricultureStructureIndicator, asOfDate, base...","[BOOLEAN, TIMESTAMP WITH TIME ZONE, SMALLINT, ...",False


In [6]:
# count of rows
con.sql("""
    SELECT COUNT(*) AS count_rows
    FROM claims
""")

┌────────────┐
│ count_rows │
│   int64    │
├────────────┤
│    2712269 │
└────────────┘

In [7]:
# count of columns
con.sql("""
    SELECT COUNT(column_name) AS count_columns
    FROM (DESCRIBE FROM claims)
""")

┌───────────────┐
│ count_columns │
│     int64     │
├───────────────┤
│            73 │
└───────────────┘

In [8]:
# last refreshed
con.sql("""
    SELECT asOfDate
    FROM claims
    ORDER BY asOfDate DESC
    LIMIT 1
""")

┌───────────────────────────────┐
│           asOfDate            │
│   timestamp with time zone    │
├───────────────────────────────┤
│ 2025-05-09 10:37:51.032136-04 │
└───────────────────────────────┘

# Examine Dataset

## Column Info

In [9]:
# examine column datatypes
con.sql("""
    SELECT
        column_name,
        column_type
    FROM (DESCRIBE claims)
""").show(max_rows=80)

┌────────────────────────────────────────────┬──────────────────────────┐
│                column_name                 │       column_type        │
│                  varchar                   │         varchar          │
├────────────────────────────────────────────┼──────────────────────────┤
│ agricultureStructureIndicator              │ BOOLEAN                  │
│ asOfDate                                   │ TIMESTAMP WITH TIME ZONE │
│ basementEnclosureCrawlspaceType            │ SMALLINT                 │
│ policyCount                                │ SMALLINT                 │
│ crsClassificationCode                      │ SMALLINT                 │
│ dateOfLoss                                 │ DATE                     │
│ elevatedBuildingIndicator                  │ BOOLEAN                  │
│ elevationCertificateIndicator              │ VARCHAR                  │
│ elevationDifference                        │ DECIMAL(6,1)             │
│ baseFloodElevation                  

In [10]:
# approximate column null percentage
con.sql("""
    SELECT
        column_name,
        null_percentage
    FROM (SUMMARIZE FROM claims)
    ORDER BY null_percentage DESC
""").show(max_rows=80)

┌────────────────────────────────────────────┬─────────────────┐
│                column_name                 │ null_percentage │
│                  varchar                   │  decimal(9,2)   │
├────────────────────────────────────────────┼─────────────────┤
│ asOfDate                                   │           99.38 │
│ floodCharacteristicsIndicator              │           98.53 │
│ crsClassificationCode                      │           95.09 │
│ eventDesignationNumber                     │           93.23 │
│ lowestAdjacentGrade                        │           81.09 │
│ elevationCertificateIndicator              │           77.93 │
│ nonPaymentReasonBuilding                   │           77.93 │
│ lowestFloorElevation                       │           76.37 │
│ baseFloodElevation                         │           75.68 │
│ elevationDifference                        │           72.96 │
│ floodZoneCurrent                           │           71.68 │
│ nfipCommunityNumberCurr

## Preview Data

In [11]:
# preview data as pandas dataframe for readability
sql = """
    SELECT *
    FROM claims
    LIMIT 5
"""

# examine each column in sections because of large number of columns
con.sql(sql).df().iloc[:, :15]

Unnamed: 0,agricultureStructureIndicator,asOfDate,basementEnclosureCrawlspaceType,policyCount,crsClassificationCode,dateOfLoss,elevatedBuildingIndicator,elevationCertificateIndicator,elevationDifference,baseFloodElevation,ratedFloodZone,houseWorship,locationOfContents,lowestAdjacentGrade,lowestFloorElevation
0,False,NaT,0.0,1,,2012-10-29,True,,1.0,9.0,AE,False,3.0,0.0,9.7
1,False,NaT,0.0,1,,2017-08-26,False,,4.0,67.0,AE,False,3.0,69.5,70.5
2,False,NaT,,1,8.0,2012-10-29,False,3.0,3.0,7.0,AE,False,,0.0,9.7
3,False,NaT,,1,,2017-08-26,True,,,,X,False,3.0,,
4,False,NaT,0.0,1,5.0,2022-09-28,False,,-6.0,10.0,AE,False,3.0,3.9,4.3


In [12]:
# slice through columns
con.sql(sql).df().iloc[:, 15:30]

Unnamed: 0,numberOfFloorsInTheInsuredBuilding,nonProfitIndicator,obstructionType,occupancyType,originalConstructionDate,originalNBDate,amountPaidOnBuildingClaim,amountPaidOnContentsClaim,amountPaidOnIncreasedCostOfComplianceClaim,postFIRMConstructionIndicator,rateMethod,smallBusinessIndicatorBuilding,totalBuildingInsuranceCoverage,totalContentsInsuranceCoverage,yearOfLoss
0,2,False,10.0,12,1970-11-26,2000-01-01,,,,False,RatingEngine,False,0,86000,2012
1,1,False,,11,2001-01-01,2006-02-13,50292.22,11607.9,0.0,True,RatingEngine,False,250000,83000,2017
2,2,False,,1,1987-07-01,1994-02-20,16320.6,0.0,0.0,True,1,False,133100,0,2012
3,1,False,10.0,1,1978-01-01,2002-06-20,27213.28,8526.54,0.0,False,7,False,250000,100000,2017
4,1,False,,11,1978-01-01,2010-06-01,250000.0,27000.0,0.0,False,RatingEngine,False,250000,29000,2022


In [13]:
# slice through columns
con.sql(sql).df().iloc[:, 30:45]

Unnamed: 0,primaryResidenceIndicator,buildingDamageAmount,buildingDeductibleCode,netBuildingPaymentAmount,buildingPropertyValue,causeOfDamage,condominiumCoverageTypeCode,contentsDamageAmount,contentsDeductibleCode,netContentsPaymentAmount,contentsPropertyValue,disasterAssistanceCoverageRequired,eventDesignationNumber,ficoNumber,floodCharacteristicsIndicator
0,False,,,0.0,,,N,,1,0.0,,,,305.0,
1,True,51542.0,F,50292.22,144996.0,0.0,N,12858.0,1,11607.9,48910.0,,,682.0,
2,True,14534.0,F,16320.6,135978.0,1.0,N,,0,0.0,,0.0,,305.0,
3,False,28004.0,F,27213.28,113293.0,4.0,N,9776.0,F,8526.54,90000.0,0.0,,682.0,
4,True,260584.0,2,250000.0,279207.0,4.0,N,38518.0,2,27000.0,279207.0,,FL0222,,


In [14]:
# slice through columns
con.sql(sql).df().iloc[:, 45:60]

Unnamed: 0,floodWaterDuration,floodproofedIndicator,floodEvent,iccCoverage,netIccPaymentAmount,nfipRatedCommunityNumber,nfipCommunityNumberCurrent,nfipCommunityName,nonPaymentReasonContents,nonPaymentReasonBuilding,numberOfUnits,buildingReplacementCost,contentsReplacementCost,replacementCostBasis,stateOwnedIndicator
0,0.0,False,Hurricane Sandy,,0.0,345310,345310,"OCEAN CITY, CITY OF",99.0,,2,,,A,False
1,0.0,False,Hurricane Harvey,30000.0,0.0,480287,480287,HARRIS COUNTY*,,,1,196855.0,0.0,A,False
2,0.0,False,Hurricane Sandy,30000.0,0.0,345293,345293,"TOMS RIVER, TOWNSHIP OF",,,1,159241.0,,R,False
3,0.0,False,Hurricane Harvey,30000.0,0.0,480296,480296,"HOUSTON, CITY OF",,,1,122976.0,0.0,R,False
4,,False,Hurricane Ian,30000.0,0.0,120673,120673,"FORT MYERS BEACH, TOWN OF",,,1,307589.0,0.0,A,False


In [15]:
# slice through columns
con.sql(sql).df().iloc[:, 60:]

Unnamed: 0,waterDepth,floodZoneCurrent,buildingDescriptionCode,rentalPropertyIndicator,state,reportedCity,reportedZipCode,countyCode,censusTract,censusBlockGroupFips,latitude,longitude,id
0,0,AE,1.0,False,NJ,Currently Unavailable,8226,34009,34009020206,340090202062,39.2,-74.6,a4edd1e3-a2cc-4ea7-b379-a8c61361d60b
1,1,AE,1.0,False,TX,Currently Unavailable,77039,48201,48201222900,482012229003,29.9,-95.3,5fa56e50-7923-44f3-832b-e527f639f28f
2,24,AE,,False,NJ,Currently Unavailable,8753,34029,34029723400,340297234002,40.0,-74.1,ee43a296-bc2b-4b49-b5b1-cc2b28eb4dbb
3,1,X,1.0,False,TX,Currently Unavailable,77060,48201,48201222502,482012225022,29.9,-95.4,2d96f6b6-d33b-4eda-8988-6ecbf481c5db
4,8,AE,1.0,False,FL,Currently Unavailable,33931,12071,12071060203,120710602032,26.4,-81.9,37577287-ba9f-4cea-a1ed-ddc88be18348


In [16]:
# count duplicate IDs
con.sql("""
    SELECT
        id,
        COUNT(id) AS count
    FROM claims
    GROUP BY id
    HAVING count > 1
""").show()

┌──────┬───────┐
│  id  │ count │
│ uuid │ int64 │
├──────┴───────┤
│    0 rows    │
└──────────────┘



In [17]:
# earliest and latest record effective date
con.sql("""
    SELECT
        min(asOfDate) AS earliestAsOfDate,
        max(asOfDate) AS latestAsOfDate
    FROM claims
""").show()

┌────────────────────────────┬───────────────────────────────┐
│      earliestAsOfDate      │        latestAsOfDate         │
│  timestamp with time zone  │   timestamp with time zone    │
├────────────────────────────┼───────────────────────────────┤
│ 2019-09-19 09:45:58.926-04 │ 2025-05-09 10:37:51.032136-04 │
└────────────────────────────┴───────────────────────────────┘



In [18]:
# top 5 most recent effective date claim records
con.sql("""
    SELECT
        asOfDate,
        dateOfLoss,
        floodEvent,
        state,
        ROUND(
            amountPaidOnBuildingClaim
            + amountPaidOnContentsClaim
            + amountPaidOnIncreasedCostOfComplianceClaim)::BIGINT AS paidTotalClaim
    FROM claims
    ORDER BY asOfDate DESC
    LIMIT 5
""").show()

┌───────────────────────────────┬────────────┬──────────────────┬─────────┬────────────────┐
│           asOfDate            │ dateOfLoss │    floodEvent    │  state  │ paidTotalClaim │
│   timestamp with time zone    │    date    │     varchar      │ varchar │     int64      │
├───────────────────────────────┼────────────┼──────────────────┼─────────┼────────────────┤
│ 2025-05-09 10:37:51.032136-04 │ 2024-10-09 │ Hurricane Milton │ FL      │          10000 │
│ 2025-05-09 10:21:49.039634-04 │ 2009-11-14 │ NULL             │ NY      │          10742 │
│ 2025-05-08 16:19:32.301581-04 │ 2017-09-11 │ Hurricane Irma   │ FL      │           5000 │
│ 2025-05-08 14:13:59.390838-04 │ 2012-10-29 │ Hurricane Sandy  │ CT      │          29514 │
│ 2025-05-08 13:47:50.294098-04 │ 2010-09-08 │ NULL             │ TX      │          71758 │
└───────────────────────────────┴────────────┴──────────────────┴─────────┴────────────────┘



In [19]:
# earliest and latest date of loss in dataset
con.sql("""
    SELECT
        min(dateOfLoss) AS earliestDateOfLoss,
        max(dateOfLoss) AS latestDateOfLoss
    FROM claims
""").show()

┌────────────────────┬──────────────────┐
│ earliestDateOfLoss │ latestDateOfLoss │
│        date        │       date       │
├────────────────────┼──────────────────┤
│ 1978-01-01         │ 2025-05-08       │
└────────────────────┴──────────────────┘



In [20]:
# top 5 latest claim records by date of loss 
con.sql("""
    SELECT
        dateOfLoss,
        asOfDate,
        floodEvent,
        state,
        ROUND(
            amountPaidOnBuildingClaim
            + amountPaidOnContentsClaim
            + amountPaidOnIncreasedCostOfComplianceClaim)::BIGINT AS paidTotalClaim
    FROM claims
    ORDER BY dateOfLoss DESC, asOfDate DESC
    LIMIT 5
""").show()

┌────────────┬──────────────────────────┬────────────┬─────────┬────────────────┐
│ dateOfLoss │         asOfDate         │ floodEvent │  state  │ paidTotalClaim │
│    date    │ timestamp with time zone │  varchar   │ varchar │     int64      │
├────────────┼──────────────────────────┼────────────┼─────────┼────────────────┤
│ 2025-05-08 │ NULL                     │ NULL       │ TX      │           NULL │
│ 2025-05-07 │ NULL                     │ NULL       │ LA      │           NULL │
│ 2025-05-07 │ NULL                     │ NULL       │ LA      │           NULL │
│ 2025-05-07 │ NULL                     │ NULL       │ LA      │           NULL │
│ 2025-05-07 │ NULL                     │ NULL       │ LA      │           NULL │
└────────────┴──────────────────────────┴────────────┴─────────┴────────────────┘



In [21]:
# top 5 latest claim records by date of loss where total paid claim > 0
con.sql("""
    SELECT
        dateOfLoss,
        asOfDate,
        floodEvent,
        state,
        ROUND(
            amountPaidOnBuildingClaim
            + amountPaidOnContentsClaim
            + amountPaidOnIncreasedCostOfComplianceClaim)::BIGINT AS paidTotalClaim
    FROM claims
    WHERE paidTotalClaim > 0
    ORDER BY dateOfLoss DESC, asOfDate DESC
    LIMIT 5
""").show()

┌────────────┬──────────────────────────┬─────────────────────────────┬─────────┬────────────────┐
│ dateOfLoss │         asOfDate         │         floodEvent          │  state  │ paidTotalClaim │
│    date    │ timestamp with time zone │           varchar           │ varchar │     int64      │
├────────────┼──────────────────────────┼─────────────────────────────┼─────────┼────────────────┤
│ 2025-04-28 │ NULL                     │ NULL                        │ KS      │           5000 │
│ 2025-04-26 │ NULL                     │ NULL                        │ OK      │          89711 │
│ 2025-04-26 │ NULL                     │ NULL                        │ OK      │           5469 │
│ 2025-04-25 │ NULL                     │ NULL                        │ LA      │          12000 │
│ 2025-04-25 │ NULL                     │ Central U.S. April Flooding │ OH      │          10000 │
└────────────┴──────────────────────────┴─────────────────────────────┴─────────┴────────────────┘



In [22]:
# total insured units in dataset
con.sql("""
    SELECT SUM(policyCount) AS totalPolicyCount
    FROM claims
""").show()

┌──────────────────┐
│ totalPolicyCount │
│      int128      │
├──────────────────┤
│          3556310 │
└──────────────────┘



Policy Count:
>Insured units in an active status. A policy contract ceases to be in an active status as of the cancellation date or the expiration date. Residential Condominium Building Association Policy (RCBAP) contracts are stored as a single policy contract but insure multiple units and therefore represent multiple policies.

Source: https://www.fema.gov/openfema-data-page/fima-nfip-redacted-claims-v2

## Summary Statistics

In [23]:
# calculate summary statistics of each column, pandas df for readability
summarize_df = con.sql("""
    SELECT *
    FROM (SUMMARIZE claims)
""").df()

summarize_df.head()

Unnamed: 0,column_name,column_type,min,max,approx_unique,avg,std,q25,q50,q75,count,null_percentage
0,agricultureStructureIndicator,BOOLEAN,false,true,2,,,,,,2712269,0.0
1,asOfDate,TIMESTAMP WITH TIME ZONE,2019-09-19 09:45:58.926-04,2025-05-09 10:37:51.032136-04,2020,,,,,,2712269,99.38
2,basementEnclosureCrawlspaceType,SMALLINT,0,4,4,1.1770725064274417,1.0645721195210949,0.0,1.0,2.0,2712269,69.81
3,policyCount,SMALLINT,1,1090,399,1.3111936905963235,6.7764159242846,1.0,1.0,1.0,2712269,0.0
4,crsClassificationCode,SMALLINT,2,10,9,7.097005736252515,1.2044385650414646,6.0,7.0,8.0,2712269,95.09


In [24]:
# slice through columns for readability
summarize_df.iloc[:15, :]

Unnamed: 0,column_name,column_type,min,max,approx_unique,avg,std,q25,q50,q75,count,null_percentage
0,agricultureStructureIndicator,BOOLEAN,false,true,2,,,,,,2712269,0.0
1,asOfDate,TIMESTAMP WITH TIME ZONE,2019-09-19 09:45:58.926-04,2025-05-09 10:37:51.032136-04,2020,,,,,,2712269,99.38
2,basementEnclosureCrawlspaceType,SMALLINT,0,4,4,1.1770725064274417,1.0645721195210949,0.0,1.0,2.0,2712269,69.81
3,policyCount,SMALLINT,1,1090,399,1.3111936905963235,6.7764159242846,1.0,1.0,1.0,2712269,0.0
4,crsClassificationCode,SMALLINT,2,10,9,7.097005736252515,1.2044385650414646,6.0,7.0,8.0,2712269,95.09
5,dateOfLoss,DATE,1978-01-01,2025-05-08,17092,,,,,,2712269,0.0
6,elevatedBuildingIndicator,BOOLEAN,false,true,2,,,,,,2712269,0.0
7,elevationCertificateIndicator,VARCHAR,1,E,9,,,,,,2712269,77.93
8,elevationDifference,"DECIMAL(6,1)",-9989.0,998.0,374,1.2412984434821497,28.89929695021127,0.0,1.0,3.0,2712269,72.96
9,baseFloodElevation,"DECIMAL(6,1)",-9999.0,9998.0,10670,127.91883349679618,771.9052642868454,7.0,9.0,14.0,2712269,75.68


In [25]:
# slice through columns for readability
summarize_df.iloc[15:30, :]

Unnamed: 0,column_name,column_type,min,max,approx_unique,avg,std,q25,q50,q75,count,null_percentage
15,numberOfFloorsInTheInsuredBuilding,SMALLINT,1,6,6,1.6938736284429694,0.8766122533916554,1.0,1.0,2.0,2712269,0.64
16,nonProfitIndicator,BOOLEAN,false,true,2,,,,,,2712269,0.0
17,obstructionType,SMALLINT,10,98,20,16.647995328689827,16.220438899177925,10.0,10.0,10.0,2712269,44.31
18,occupancyType,SMALLINT,1,19,14,3.409205010736819,4.302312472419813,1.0,1.0,4.0,2712269,0.02
19,originalConstructionDate,DATE,1069-01-01,2025-01-24,32065,,,,,,2712269,0.07
20,originalNBDate,DATE,0998-09-21,2025-06-10,18008,,,,,,2712269,0.0
21,amountPaidOnBuildingClaim,"DECIMAL(12,2)",-201667.50,10741476.93,1340761,33420.77803829708,70287.17975307132,2372.0,10013.0,39430.0,2712269,20.94
22,amountPaidOnContentsClaim,"DECIMAL(12,2)",-80000.00,757048.95,491712,7241.491405149647,22309.39540062048,0.0,0.0,4930.0,2712269,20.94
23,amountPaidOnIncreasedCostOfComplianceClaim,"DECIMAL(12,2)",-6450.00,60000.00,8778,443.9857925535164,3430.7187165367727,0.0,0.0,0.0,2712269,20.94
24,postFIRMConstructionIndicator,BOOLEAN,false,true,2,,,,,,2712269,0.0


In [26]:
# slice through columns for readability
summarize_df.iloc[30:45, :]

Unnamed: 0,column_name,column_type,min,max,approx_unique,avg,std,q25,q50,q75,count,null_percentage
30,primaryResidenceIndicator,BOOLEAN,false,true,2,,,,,,2712269,0.0
31,buildingDamageAmount,BIGINT,0,927700000,212080,38208.02455475229,797984.7176002406,3541.0,11716.0,43005.0,2712269,22.08
32,buildingDeductibleCode,VARCHAR,0,H,15,,,,,,2712269,9.92
33,netBuildingPaymentAmount,"DECIMAL(12,2)",-201667.50,10741476.93,1340258,26392.024846635788,63904.97125793367,4.0,4967.0,26814.0,2712269,0.0
34,buildingPropertyValue,BIGINT,0,2143596000,441823,1178552.9682516523,33428601.47466117,61439.0,112388.0,195772.0,2712269,22.16
35,causeOfDamage,VARCHAR,0,Z,17,,,,,,2712269,1.39
36,condominiumCoverageTypeCode,VARCHAR,A,U,5,,,,,,2712269,1.38
37,contentsDamageAmount,BIGINT,0,19230507,104391,18433.39366964457,84558.06826232812,1569.0,5746.0,18956.0,2712269,58.5
38,contentsDeductibleCode,VARCHAR,0,H,15,,,,,,2712269,20.2
39,netContentsPaymentAmount,"DECIMAL(12,2)",-80000.00,757048.95,491683,5703.9727689841975,20032.401811819276,0.0,0.0,2342.0,2712269,0.0


In [27]:
# slice through columns for readability
summarize_df.iloc[45:60, :]

Unnamed: 0,column_name,column_type,min,max,approx_unique,avg,std,q25,q50,q75,count,null_percentage
45,floodWaterDuration,SMALLINT,0,999,261,0.8052723609654414,14.711450527028967,0.0,0.0,0.0,2712269,10.22
46,floodproofedIndicator,BOOLEAN,false,true,2,,,,,,2712269,0.0
47,floodEvent,VARCHAR,2021 Mid-Spring Severe Storms,Yellowstone Flooding,181,,,,,,2712269,28.23
48,iccCoverage,INTEGER,0,30000,4,28265.786678448367,4401.656086048918,30000.0,30000.0,30000.0,2712269,27.73
49,netIccPaymentAmount,"DECIMAL(8,2)",-6450.00,60000.00,8735,349.706785562199,3049.9895153999505,0.0,0.0,0.0,2712269,0.0
50,nfipRatedCommunityNumber,VARCHAR,000000,999999,16336,,,,,,2712269,0.0
51,nfipCommunityNumberCurrent,VARCHAR,0000,815000,12126,,,,,,2712269,71.63
52,nfipCommunityName,VARCHAR,ABBEVILLE COUNTY *,"ZUMBRO FALLS, CITY OF",9918,,,,,,2712269,69.69
53,nonPaymentReasonContents,VARCHAR,01,99,23,,,,,,2712269,68.56
54,nonPaymentReasonBuilding,VARCHAR,01,99,23,,,,,,2712269,77.93


In [28]:
# slice through columns for readability
summarize_df.iloc[60:, :]

Unnamed: 0,column_name,column_type,min,max,approx_unique,avg,std,q25,q50,q75,count,null_percentage
60,waterDepth,SMALLINT,-999,999,477,4.373455093450716,16.6137961933647,0.0,1.0,2.0,2712269,8.72
61,floodZoneCurrent,VARCHAR,A,X,62,,,,,,2712269,71.68
62,buildingDescriptionCode,SMALLINT,1,21,18,1.3388721969995547,1.5010850565190803,1.0,1.0,1.0,2712269,63.98
63,rentalPropertyIndicator,BOOLEAN,false,true,2,,,,,,2712269,0.0
64,state,VARCHAR,AK,WY,57,,,,,,2712269,0.0
65,reportedCity,VARCHAR,Currently Unavailable,Currently Unavailable,1,,,,,,2712269,0.0
66,reportedZipCode,VARCHAR,,99999,26382,,,,,,2712269,0.0
67,countyCode,VARCHAR,01001,78030,2950,,,,,,2712269,2.3
68,censusTract,VARCHAR,01001020100,78030961200,61953,,,,,,2712269,5.07
69,censusBlockGroupFips,VARCHAR,010010201001,780309612002,122677,,,,,,2712269,5.07


## Highlighted Features: Amount Paid on Claims Summary Statistics
We will analyze these attributes extensively throughout this notebook.

**amountPaidOnBuildingClaim**: Dollar amount paid on the building claim. In some instances, a negative amount may appear which occurs when a check issued to a policy holder is not cashed and has to be re-issued.

**amountPaidOnContentsClaim**: Dollar amount paid on the contents claim. In some instances, a negative amount may appear, which occurs when a check issued to a policy holder is not cashed and has to be re-issued.

**amountPaidOnIncreasedCostOfComplianceClaim**: ICC coverage is one of several flood insurance resources for policyholders who need additional help rebuilding after a flood. It provides up to $30,000 to help cover the cost of mitigation measures that will reduce the flood risk.

Source: https://www.fema.gov/openfema-data-page/fima-nfip-redacted-claims-v2

Note: All dollar amounts are reported in nominal terms and have not been adjusted for inflation.

In [29]:
# examine summary statistics on paid total claims
con.sql("""
    SELECT
        column_name, column_type, count, null_percentage,
        min, max, approx_unique,
        ROUND(avg::DOUBLE, 2) AS avg,
        ROUND(std::DOUBLE, 2) AS std,
        ROUND(q25::DOUBLE, 2) AS q25,
        ROUND(q50::DOUBLE, 2) AS q50,
        ROUND(q75::DOUBLE, 2) AS q75
    FROM (SUMMARIZE claims)
    WHERE column_name IN (
        'amountPaidOnBuildingClaim',
        'amountPaidOnContentsClaim',
        'amountPaidOnIncreasedCostOfComplianceClaim'
    )
""").df()

Unnamed: 0,column_name,column_type,count,null_percentage,min,max,approx_unique,avg,std,q25,q50,q75
0,amountPaidOnBuildingClaim,"DECIMAL(12,2)",2712269,20.94,-201667.5,10741476.93,1340761,33420.78,70287.18,2375.0,10019.0,39481.0
1,amountPaidOnContentsClaim,"DECIMAL(12,2)",2712269,20.94,-80000.0,757048.95,491712,7241.49,22309.4,0.0,0.0,4920.0
2,amountPaidOnIncreasedCostOfComplianceClaim,"DECIMAL(12,2)",2712269,20.94,-6450.0,60000.0,8778,443.99,3430.72,0.0,0.0,0.0


In [30]:
# top 10 paid total claim records
con.sql("""
    SELECT
        dateOfLoss,
        floodEvent,
        state,
        policyCount,
        ROUND(
            amountPaidOnBuildingClaim
            + amountPaidOnContentsClaim
            + amountPaidOnIncreasedCostOfComplianceClaim, 0)::INT AS paidTotalClaim,
        ROUND(amountPaidOnBuildingClaim, 0)::INT AS amountPaidOnBuildingClaim,
        ROUND(amountPaidOnContentsClaim, 0)::INT AS amountPaidOnContentsClaim,
        ROUND(amountPaidOnIncreasedCostOfComplianceClaim, 0)::INT AS amountPaidICC
    FROM claims
    ORDER BY paidTotalClaim DESC
    LIMIT 10
""").df()

Unnamed: 0,dateOfLoss,floodEvent,state,policyCount,paidTotalClaim,amountPaidOnBuildingClaim,amountPaidOnContentsClaim,amountPaidICC
0,2022-09-28,Hurricane Ian,FL,198,10841477,10741477,100000,0
1,2005-08-29,Hurricane Katrina,MS,48,10000000,10000000,0,0
2,2012-10-29,Hurricane Sandy,NY,184,9467720,9467720,0,0
3,2004-09-15,Hurricane Ivan,FL,64,9169507,9100033,39474,30000
4,2001-06-09,Tropical Storm Allison,TX,233,9023558,8973270,50288,0
5,2022-09-28,Hurricane Ian,FL,36,9000000,9000000,0,0
6,2022-09-28,Hurricane Ian,FL,72,8758773,8758773,0,0
7,2022-09-28,Hurricane Ian,FL,54,8734196,8734196,0,0
8,2005-08-29,Hurricane Katrina,MS,71,8386100,8349000,37100,0
9,2022-09-28,Hurricane Ian,FL,51,7778937,7778937,0,0


In [31]:
# top 10 paid total claim records for only one policy
con.sql("""
    SELECT
        dateOfLoss,
        floodEvent,
        state,
        policyCount,
        ROUND(
            amountPaidOnBuildingClaim
            + amountPaidOnContentsClaim
            + amountPaidOnIncreasedCostOfComplianceClaim, 0)::INT AS paidTotalClaim,
        ROUND(amountPaidOnBuildingClaim, 0)::INT AS amountPaidOnBuildingClaim,
        ROUND(amountPaidOnContentsClaim, 0)::INT AS amountPaidOnContentsClaim,
        ROUND(amountPaidOnIncreasedCostOfComplianceClaim, 0)::INT AS amountPaidICC
    FROM claims
    WHERE policyCount == 1
    ORDER BY paidTotalClaim DESC
    LIMIT 10
""").df()

Unnamed: 0,dateOfLoss,floodEvent,state,policyCount,paidTotalClaim,amountPaidOnBuildingClaim,amountPaidOnContentsClaim,amountPaidICC
0,2008-06-13,,IA,1,1250000,500000,750000,0
1,2023-04-13,April Florida Flooding,FL,1,1065210,1000000,65210,0
2,2016-05-24,,AR,1,1054205,500000,554205,0
3,2012-10-29,Hurricane Sandy,NJ,1,1050000,500000,550000,0
4,2012-10-29,Hurricane Sandy,NJ,1,1039512,500000,539512,0
5,2011-09-07,Tropical Storm Lee,NY,1,1000000,500000,500000,0
6,2001-06-05,Tropical Storm Allison,TX,1,1000000,500000,500000,0
7,2017-08-27,Hurricane Harvey,TX,1,1000000,500000,500000,0
8,2003-09-15,,DE,1,1000000,500000,500000,0
9,2011-09-07,Tropical Storm Lee,PA,1,1000000,500000,500000,0


## Additional Attributes: Building Damage, Values, and Replacement Costs Summary Statistics

In [32]:
# examine summary statistics on building damage amounts, property values and replacement costs columns
con.sql("""
    SELECT
        column_name, column_type, count, null_percentage,
        min, max, approx_unique,
        ROUND(avg::DOUBLE, 2) AS avg,
        ROUND(std::DOUBLE, 2) AS std,
        ROUND(q25::DOUBLE, 2) AS q25,
        ROUND(q50::DOUBLE, 2) AS q50,
        ROUND(q75::DOUBLE, 2) AS q75,
    FROM (SUMMARIZE claims)
    WHERE column_name IN (
        'buildingPropertyValue',
        'buildingReplacementCost',
        'buildingDamageAmount'
    )
""").df()

Unnamed: 0,column_name,column_type,count,null_percentage,min,max,approx_unique,avg,std,q25,q50,q75
0,buildingPropertyValue,BIGINT,2712269,22.16,0,2143596000,441823,1178552.97,33428601.47,61457.0,112345.0,195667.0
1,buildingReplacementCost,BIGINT,2712269,22.16,0,2147400000,470294,1361595.43,36825292.66,0.0,125033.0,232914.0
2,buildingDamageAmount,BIGINT,2712269,22.08,0,927700000,212080,38208.02,797984.72,3537.0,11726.0,42962.0


## Additional Attributes: Elevation and Water Depth Summary Statistics

In [33]:
# examine summary statisitics on elevation and water depth columns
con.sql("""
    SELECT
        column_name, column_type, count, null_percentage,
        min, max, approx_unique,
        ROUND(avg::DOUBLE, 2) AS avg,
        ROUND(std::DOUBLE, 2) AS std,
        ROUND(q25::DOUBLE, 2) AS q25,
        ROUND(q50::DOUBLE, 2) AS q50,
        ROUND(q75::DOUBLE, 2) AS q75
    FROM (SUMMARIZE claims)
    WHERE column_name IN (
        'baseFloodElevation',
        'waterDepth',
        'lowestAdjacentGrade',
        'lowestFloorElevation',
        'elevationDifference',
    )
""").df()

Unnamed: 0,column_name,column_type,count,null_percentage,min,max,approx_unique,avg,std,q25,q50,q75
0,elevationDifference,"DECIMAL(6,1)",2712269,72.96,-9989.0,998.0,374,1.24,28.9,0.0,1.0,3.0
1,baseFloodElevation,"DECIMAL(6,1)",2712269,75.68,-9999.0,9998.0,10670,127.92,771.91,7.0,9.0,14.0
2,lowestAdjacentGrade,"DECIMAL(6,1)",2712269,81.09,-99999.9,9998.9,12901,51.97,1432.84,3.0,6.0,11.0
3,lowestFloorElevation,"DECIMAL(6,1)",2712269,76.37,-9999.0,9998.9,13545,98.36,563.35,7.0,10.0,17.0
4,waterDepth,SMALLINT,2712269,8.72,-999.0,999.0,477,4.37,16.61,0.0,1.0,2.0


# Analysis
Note: All dollar amounts are reported in nominal terms and have not been adjusted for inflation.

## Claims Statistics

In [34]:
con.sql("""
    SELECT
        COUNT(id) AS countClaims,
        ROUND(
            SUM(amountPaidOnBuildingClaim)
            + SUM(amountPaidOnContentsClaim)
            + SUM(amountPaidOnIncreasedCostOfComplianceClaim), 0)::BIGINT AS paidTotalClaim, 
        ROUND(SUM(amountPaidOnBuildingClaim), 0)::BIGINT AS paidBuildingClaim,
        ROUND(SUM(amountPaidOnContentsClaim), 0)::BIGINT AS paidContentsClaim, 
        ROUND(SUM(amountPaidOnIncreasedCostOfComplianceClaim), 0)::BIGINT AS paidICC
    FROM claims
""").df()

Unnamed: 0,countClaims,paidTotalClaim,paidBuildingClaim,paidContentsClaim,paidICC
0,2712269,88145828459,71665544599,15528226921,952056938


**Table xx: Top 20 States by Number of Claims**

In [35]:
con.sql("""
    SELECT
        state,
        COUNT(id) AS countClaims,
        ROUND(
            SUM(amountPaidOnBuildingClaim)
            + SUM(amountPaidOnContentsClaim)
            + SUM(amountPaidOnIncreasedCostOfComplianceClaim), 0)::BIGINT AS paidTotalClaim, 
        ROUND(SUM(amountPaidOnBuildingClaim), 0)::BIGINT AS paidBuildingClaim,
        ROUND(SUM(amountPaidOnContentsClaim), 0)::BIGINT AS paidContentsClaim, 
        ROUND(SUM(amountPaidOnIncreasedCostOfComplianceClaim), 0)::BIGINT AS paidICC
    FROM claims
    GROUP BY state
    ORDER BY countClaims DESC
    LIMIT 20
""").df()

Unnamed: 0,state,countClaims,paidTotalClaim,paidBuildingClaim,paidContentsClaim,paidICC
0,LA,484666,20947513582,16527767559,4141923176,277822847
1,FL,447278,18384455856,15836650628,2507931591,39873637
2,TX,392749,17278352758,13302979646,3917178105,58195007
3,NJ,201265,6475796363,5352658075,872585670,250552618
4,NY,174868,5736650506,4962160483,720696292,53793731
5,NC,108782,2229981486,1888471315,304395942,37114230
6,PA,76689,1434820809,1126172001,293470206,15178602
7,MS,64182,3123903360,2387164446,675757041,60981873
8,CA,53267,744106757,638192339,104848121,1066297
9,IL,52538,590353164,493999341,81082965,15270858


**Table xx: Top 20 States by Total Paid Claims**

In [36]:
con.sql("""
    SELECT
        state,
        COUNT(id) AS countClaims,
        ROUND(
            SUM(amountPaidOnBuildingClaim)
            + SUM(amountPaidOnContentsClaim)
            + SUM(amountPaidOnIncreasedCostOfComplianceClaim), 0)::BIGINT AS paidTotalClaim, 
        ROUND(SUM(amountPaidOnBuildingClaim), 0)::BIGINT AS paidBuildingClaim,
        ROUND(SUM(amountPaidOnContentsClaim), 0)::BIGINT AS paidContentsClaim, 
        ROUND(SUM(amountPaidOnIncreasedCostOfComplianceClaim), 0)::BIGINT AS paidICC
    FROM claims
    GROUP BY state
    ORDER BY paidTotalClaim DESC
    LIMIT 20
""").df()

Unnamed: 0,state,countClaims,paidTotalClaim,paidBuildingClaim,paidContentsClaim,paidICC
0,LA,484666,20947513582,16527767559,4141923176,277822847
1,FL,447278,18384455856,15836650628,2507931591,39873637
2,TX,392749,17278352758,13302979646,3917178105,58195007
3,NJ,201265,6475796363,5352658075,872585670,250552618
4,NY,174868,5736650506,4962160483,720696292,53793731
5,MS,64182,3123903360,2387164446,675757041,60981873
6,NC,108782,2229981486,1888471315,304395942,37114230
7,PA,76689,1434820809,1126172001,293470206,15178602
8,AL,44835,1193138174,967795331,212455483,12887360
9,SC,49506,1035539113,898006633,133682678,3849802


**Table xx: Top 20 Counties by Number of Claims**

In [37]:
con.sql("""
    SELECT
        state,
        countyCode,
        COUNT(id) AS countClaims,
        ROUND(
            SUM(amountPaidOnBuildingClaim)
            + SUM(amountPaidOnContentsClaim)
            + SUM(amountPaidOnIncreasedCostOfComplianceClaim), 0)::BIGINT AS paidTotalClaim,  
        ROUND(SUM(amountPaidOnBuildingClaim), 0)::BIGINT AS paidBuildingClaim,
        ROUND(SUM(amountPaidOnContentsClaim), 0)::BIGINT AS paidContentsClaim, 
        ROUND(SUM(amountPaidOnIncreasedCostOfComplianceClaim), 0)::BIGINT AS paidICC
    FROM claims
    GROUP BY state, countyCode
    ORDER BY countClaims DESC
    LIMIT 20
""").df()

Unnamed: 0,state,countyCode,countClaims,paidTotalClaim,paidBuildingClaim,paidContentsClaim,paidICC
0,TX,48201,170714,8757473076,6801326001,1936097596,20049478
1,LA,22051,135459,3581575679,2667508197,852374316,61693166
2,LA,22071,127191,7299472607,5895174387,1293804705,110493515
3,FL,12086,61937,848599959,666613901,181986058,0
4,TX,48167,60588,2469378449,1898457084,549950596,20970769
5,NJ,34029,52800,2608104753,2160918465,275259388,171926899
6,FL,12103,51624,3358698015,2895731336,462233188,733492
7,NY,36059,51486,2280445421,1973196756,290493947,16754717
8,FL,12071,48362,3837968086,3441477426,393582512,2908147
9,LA,22103,38318,1778918351,1381100007,383543475,14274869


**Table xx: Top 20 Counties by Total Paid Claims**

In [38]:
con.sql("""
    SELECT
        state,
        countyCode,
        COUNT(id) AS countClaims,
        ROUND(
            SUM(amountPaidOnBuildingClaim)
            + SUM(amountPaidOnContentsClaim)
            + SUM(amountPaidOnIncreasedCostOfComplianceClaim), 0)::BIGINT AS paidTotalClaim,  
        ROUND(SUM(amountPaidOnBuildingClaim), 0)::BIGINT AS paidBuildingClaim,
        ROUND(SUM(amountPaidOnContentsClaim), 0)::BIGINT AS paidContentsClaim, 
        ROUND(SUM(amountPaidOnIncreasedCostOfComplianceClaim), 0)::BIGINT AS paidICC
    FROM claims
    GROUP BY state, countyCode
    ORDER BY paidTotalClaim DESC
    LIMIT 20
""").df()

Unnamed: 0,state,countyCode,countClaims,paidTotalClaim,paidBuildingClaim,paidContentsClaim,paidICC
0,TX,48201,170714,8757473076,6801326001,1936097596,20049478
1,LA,22071,127191,7299472607,5895174387,1293804705,110493515
2,FL,12071,48362,3837968086,3441477426,393582512,2908147
3,LA,22051,135459,3581575679,2667508197,852374316,61693166
4,FL,12103,51624,3358698015,2895731336,462233188,733492
5,NJ,34029,52800,2608104753,2160918465,275259388,171926899
6,TX,48167,60588,2469378449,1898457084,549950596,20970769
7,NY,36059,51486,2280445421,1973196756,290493947,16754717
8,LA,22087,23918,2245139592,1680136546,553968821,11034225
9,LA,22103,38318,1778918351,1381100007,383543475,14274869


**Table xx: Top 20 Flood Events by Number of Claims**

In [39]:
con.sql("""
    SELECT
        floodEvent,
        yearOfLoss,
        COUNT(id) AS countClaims,
        ROUND(
            SUM(amountPaidOnBuildingClaim)
            + SUM(amountPaidOnContentsClaim)
            + SUM(amountPaidOnIncreasedCostOfComplianceClaim), 0)::BIGINT AS paidTotalClaim,  
        ROUND(SUM(amountPaidOnBuildingClaim), 0)::BIGINT AS paidBuildingClaim,
        ROUND(SUM(amountPaidOnContentsClaim), 0)::BIGINT AS paidContentsClaim, 
        ROUND(SUM(amountPaidOnIncreasedCostOfComplianceClaim), 0)::BIGINT AS paidICC
    FROM claims
    WHERE floodEvent NOT NULL
    GROUP BY floodEvent, yearOfLoss
    ORDER BY countClaims DESC
    LIMIT 20
""").df()

Unnamed: 0,floodEvent,yearOfLoss,countClaims,paidTotalClaim,paidBuildingClaim,paidContentsClaim,paidICC
0,Hurricane Katrina,2005,208348,16261697056,12659081935,3360020221,242594900
1,Hurricane Sandy,2012,144848,8957466107,7707744797,951637071,298084239
2,Hurricane Harvey,2017,92398,9055711922,6925370027,2115077279,15264616
3,Hurricane Ike,2008,58126,2702511916,2073801567,577791589,50918760
4,Hurricane Helene,2024,57843,6027667592,5294925051,731897793,844748
5,Hurricane Irene,2011,52493,1347399996,1139897981,183189121,24312894
6,Hurricane Ian,2022,48754,4838681069,4306968612,528552815,3159642
7,Flooding,1995,47489,731729839,554632707,177097132,0
8,Tropical Storm Allison,2001,35561,1104979705,820333554,279280479,5365672
9,Hurricane Irma,2017,33339,1114753522,945900736,162153778,6699008


**Table xx: Top 20 Flood Events by Total Paid Claims**

In [40]:
con.sql("""
    SELECT
        floodEvent,
        yearOfLoss,
        COUNT(id) AS countClaims,
        ROUND(
            SUM(amountPaidOnBuildingClaim)
            + SUM(amountPaidOnContentsClaim)
            + SUM(amountPaidOnIncreasedCostOfComplianceClaim), 0)::BIGINT AS paidTotalClaim,  
        ROUND(SUM(amountPaidOnBuildingClaim), 0)::BIGINT AS paidBuildingClaim,
        ROUND(SUM(amountPaidOnContentsClaim), 0)::BIGINT AS paidContentsClaim, 
        ROUND(SUM(amountPaidOnIncreasedCostOfComplianceClaim), 0)::BIGINT AS paidICC
    FROM claims
    WHERE floodEvent NOT NULL
    GROUP BY floodEvent, yearOfLoss
    ORDER BY paidTotalClaim DESC
    LIMIT 20
""").df()

Unnamed: 0,floodEvent,yearOfLoss,countClaims,paidTotalClaim,paidBuildingClaim,paidContentsClaim,paidICC
0,Hurricane Katrina,2005,208348,16261697056,12659081935,3360020221,242594900
1,Hurricane Harvey,2017,92398,9055711922,6925370027,2115077279,15264616
2,Hurricane Sandy,2012,144848,8957466107,7707744797,951637071,298084239
3,Hurricane Helene,2024,57843,6027667592,5294925051,731897793,844748
4,Hurricane Ian,2022,48754,4838681069,4306968612,528552815,3159642
5,Hurricane Ike,2008,58126,2702511916,2073801567,577791589,50918760
6,Mid-summer severe storms,2016,30018,2533534783,2175218939,347276988,11038856
7,Hurricane Irene,2011,52493,1347399996,1139897981,183189121,24312894
8,Hurricane Ida,2021,28317,1347194328,1115954177,229931261,1308891
9,Hurricane Ivan,2004,20137,1325419294,1083795424,221959720,19664150


**Table xx: Top 20 States and Flood Events by Number of Claims**

In [41]:
con.sql("""
    SELECT
        state,
        floodEvent,
        yearOfLoss,
        COUNT(id) AS countClaims,
        ROUND(
            SUM(amountPaidOnBuildingClaim)
            + SUM(amountPaidOnContentsClaim)
            + SUM(amountPaidOnIncreasedCostOfComplianceClaim), 0)::BIGINT AS paidTotalClaim,  
        ROUND(SUM(amountPaidOnBuildingClaim), 0)::BIGINT AS paidBuildingClaim,
        ROUND(SUM(amountPaidOnContentsClaim), 0)::BIGINT AS paidContentsClaim, 
        ROUND(SUM(amountPaidOnIncreasedCostOfComplianceClaim), 0)::BIGINT AS paidICC
    FROM claims
    WHERE floodEvent NOT NULL
    GROUP BY ALL
    ORDER BY countClaims DESC
    LIMIT 20
""").df()

Unnamed: 0,state,floodEvent,yearOfLoss,countClaims,paidTotalClaim,paidBuildingClaim,paidContentsClaim,paidICC
0,LA,Hurricane Katrina,2005,176276,13347164644,10441743931,2721451049,183969663
1,TX,Hurricane Harvey,2017,91872,9040370286,6912549831,2112555840,15264616
2,NJ,Hurricane Sandy,2012,74983,4372261883,3673280769,457640033,241341081
3,NY,Hurricane Sandy,2012,57405,4215062163,3702376632,465410811,47274721
4,FL,Hurricane Helene,2024,54458,5781758403,5080129005,700784650,844748
5,FL,Hurricane Ian,2022,47490,4792735437,4265342788,524398008,2994642
6,TX,Hurricane Ike,2008,44095,2230307106,1723323896,481446923,25536286
7,LA,Flooding,1995,36820,594357775,442113752,152244023,0
8,LA,Mid-summer severe storms,2016,30018,2533534783,2175218939,347276988,11038856
9,FL,Hurricane Irma,2017,28756,985666540,834520911,145158635,5986994


**Table xx: Top 20 States and Flood Events by Total Paid Claims**

In [42]:
con.sql("""
    SELECT
        state,
        floodEvent,
        yearOfLoss,
        COUNT(id) AS countClaims,
        ROUND(
            SUM(amountPaidOnBuildingClaim)
            + SUM(amountPaidOnContentsClaim)
            + SUM(amountPaidOnIncreasedCostOfComplianceClaim), 0)::BIGINT AS paidTotalClaim,  
        ROUND(SUM(amountPaidOnBuildingClaim), 0)::BIGINT AS paidBuildingClaim,
        ROUND(SUM(amountPaidOnContentsClaim), 0)::BIGINT AS paidContentsClaim, 
        ROUND(SUM(amountPaidOnIncreasedCostOfComplianceClaim), 0)::BIGINT AS paidICC
    FROM claims
    WHERE floodEvent NOT NULL
    GROUP BY ALL
    ORDER BY paidTotalClaim DESC
    LIMIT 20
""").df()

Unnamed: 0,state,floodEvent,yearOfLoss,countClaims,paidTotalClaim,paidBuildingClaim,paidContentsClaim,paidICC
0,LA,Hurricane Katrina,2005,176276,13347164644,10441743931,2721451049,183969663
1,TX,Hurricane Harvey,2017,91872,9040370286,6912549831,2112555840,15264616
2,FL,Hurricane Helene,2024,54458,5781758403,5080129005,700784650,844748
3,FL,Hurricane Ian,2022,47490,4792735437,4265342788,524398008,2994642
4,NJ,Hurricane Sandy,2012,74983,4372261883,3673280769,457640033,241341081
5,NY,Hurricane Sandy,2012,57405,4215062163,3702376632,465410811,47274721
6,LA,Mid-summer severe storms,2016,30018,2533534783,2175218939,347276988,11038856
7,MS,Hurricane Katrina,2005,19051,2521512578,1910623703,558996059,51892816
8,TX,Hurricane Ike,2008,44095,2230307106,1723323896,481446923,25536286
9,FL,Hurricane Milton,2024,21448,1043689059,943310626,100278565,99868


**Table xx: Top 20 Rated Flood Zones by Number of Claims**  

In [43]:
con.sql("""
    SELECT
        ratedFloodZone as ratedFloodZone,
        COUNT(id) AS countClaims,
        ROUND(
            SUM(amountPaidOnBuildingClaim)
            + SUM(amountPaidOnContentsClaim)
            + SUM(amountPaidOnIncreasedCostOfComplianceClaim), 0)::BIGINT AS paidTotalClaim,  
        ROUND(SUM(amountPaidOnBuildingClaim), 0)::BIGINT AS paidBuildingClaim,
        ROUND(SUM(amountPaidOnContentsClaim), 0)::BIGINT AS paidContentsClaim, 
        ROUND(SUM(amountPaidOnIncreasedCostOfComplianceClaim), 0)::BIGINT AS paidICC
    FROM claims
    GROUP BY ratedFloodZone
    ORDER BY countClaims DESC
    LIMIT 20
""").df()

Unnamed: 0,ratedFloodZone,countClaims,paidTotalClaim,paidBuildingClaim,paidContentsClaim,paidICC
0,AE,934394,42224109737,35523442901,6181434631,519232206
1,X,404262,13758371627,10656117320,3073520010,28734297
2,A,201023,3245027249,2589229139,618510548,37287562
3,C,163811,2854901248,2120676642,724109225,10115382
4,,139030,696659870,491536385,205049316,74169
5,B,114982,3831518356,2840023155,972658315,18836887
6,A04,51801,2186540884,1801344331,364556827,20639726
7,A01,51360,1526374899,1189314596,321004209,16056094
8,A05,48584,1274257006,1058752273,186583042,28921691
9,VE,47839,1941278588,1729085731,194118081,18074776


From the Data Dictionary:
>Formerly called floodZone. NFIP Flood Zone derived from the Flood Insurance Rate Map (FIRM) used to rate the insured property. A - Special Flood with no Base Flood Elevation on FIRM; AE, A1-A30 - Special Flood with Base Flood Elevation on FIRM; A99 - Special Flood with Protection Zone; AH, AHB* - Special Flood with Shallow Ponding; AO, AOB* - Special Flood with Sheet Flow; X, B - Moderate Flood from primary water source. Pockets of areas subject to drainage problems; X, C - Minimal Flood from primary water source. Pockets of areas subject to drainage problems; D - Possible Flood; V - Velocity Flood with no Base Flood Elevation on FIRM; VE, V1-V30 - Velocity Flood with Base Flood Elevation on FIRM; AE, VE, X - New zone designations used on new maps starting January 1, 1986, in lieu of A1-A30, V1-V30, and B and C; AR - A Special Flood Hazard Area that results from the decertification of a previously accredited flood protection system that is determined to be in the process of being restored to provide base flood protection; AR Dual Zones - (AR/AE, AR/A1-A30, AR/AH, AR/AO, AR/A) Areas subject to flooding from failure of the flood protection system (Zone AR) which also overlap an existing Special Flood Hazard Area as a dual zone; *AHB, AOB, ARE, ARH, ARO, and ARA are not risk zones shown on a map, but are acceptable values for rating purposes*

# Sample Workflow of Saving Files
## Saving Claims by State and Year of Loss

In [44]:
# sanity check with pandas
df = con.sql("""
    SELECT
        UPPER(state) AS state,
        yearOfLoss,
        COUNT(*) AS claims
    FROM claims 
    WHERE yearOfLoss >= 2009
    GROUP BY ALL
    ORDER BY state ASC, yearOfLoss ASC      
""").df()

df

Unnamed: 0,state,yearOfLoss,claims
0,AK,2009,17
1,AK,2010,2
2,AK,2011,7
3,AK,2012,66
4,AK,2013,46
...,...,...,...
903,WY,2019,20
904,WY,2021,2
905,WY,2022,2
906,WY,2023,13


In [45]:
# write out as Parquet file
con.sql("""
    COPY (
        SELECT
            UPPER(state) AS state,
            yearOfLoss,
            COUNT(*) AS claims
        FROM claims 
        WHERE yearOfLoss >= 2009
        GROUP BY ALL
        ORDER BY state ASC, yearOfLoss ASC 
    )
    TO 'data/claims-state-year.parquet' (FORMAT 'parquet');
""")

%ls data/

FimaNfipClaims.parquet       nfip-data.db
FimaNfipPolicies.parquet     nfip.db
claims-nyc-year.parquet      policies-nyc-year.parquet
claims-state-year.parquet    policies-state-year.parquet
claims.db                    policies.db


In [46]:
# sanity check
duckdb.sql("""
    SELECT *
    FROM 'data/claims-state-year.parquet'
    LIMIT 10
""")

┌─────────┬────────────┬────────┐
│  state  │ yearOfLoss │ claims │
│ varchar │   int16    │ int64  │
├─────────┼────────────┼────────┤
│ AK      │       2009 │     17 │
│ AK      │       2010 │      2 │
│ AK      │       2011 │      7 │
│ AK      │       2012 │     66 │
│ AK      │       2013 │     46 │
│ AK      │       2014 │     22 │
│ AK      │       2015 │      7 │
│ AK      │       2016 │      8 │
│ AK      │       2017 │      6 │
│ AK      │       2018 │      1 │
├─────────┴────────────┴────────┤
│ 10 rows             3 columns │
└───────────────────────────────┘

In [47]:
# sanity check
duckdb.sql("""
    SELECT COUNT(*) AS count
    FROM 'data/claims-state-year.parquet'
""")

┌───────┐
│ count │
│ int64 │
├───────┤
│   908 │
└───────┘

## Saving Claims by New York City Counties (Boroughs) and Year of Loss

In [48]:
con.sql("""
    COPY (
        SELECT
            UPPER(state) AS state,
            countyCode,
            yearOfLoss,
            COUNT(*) AS claims
        FROM claims 
        WHERE yearOfLoss >= 2009
            AND countyCode IN ('36005', '36047', '36061', '36081', '36085')
        GROUP BY ALL
        ORDER BY countyCode ASC, yearOfLoss ASC
    )
    TO 'data/claims-nyc-year.parquet' (FORMAT 'parquet');
""")

%ls data/

FimaNfipClaims.parquet       nfip-data.db
FimaNfipPolicies.parquet     nfip.db
claims-nyc-year.parquet      policies-nyc-year.parquet
claims-state-year.parquet    policies-state-year.parquet
claims.db                    policies.db


In [49]:
# sanity check
duckdb.sql("""
    SELECT *
    FROM 'data/claims-nyc-year.parquet'
    LIMIT 10
""")

┌─────────┬────────────┬────────────┬────────┐
│  state  │ countyCode │ yearOfLoss │ claims │
│ varchar │  varchar   │   int16    │ int64  │
├─────────┼────────────┼────────────┼────────┤
│ NY      │ 36005      │       2009 │      4 │
│ NY      │ 36005      │       2010 │     91 │
│ NY      │ 36005      │       2011 │    119 │
│ NY      │ 36005      │       2012 │    547 │
│ NY      │ 36005      │       2013 │      3 │
│ NY      │ 36005      │       2014 │      8 │
│ NY      │ 36005      │       2015 │      3 │
│ NY      │ 36005      │       2016 │      2 │
│ NY      │ 36005      │       2017 │      5 │
│ NY      │ 36005      │       2018 │     18 │
├─────────┴────────────┴────────────┴────────┤
│ 10 rows                          4 columns │
└────────────────────────────────────────────┘

In [50]:
# sanity check
duckdb.sql("""
    SELECT COUNT(*) AS count
    FROM 'data/claims-nyc-year.parquet'
""")

┌───────┐
│ count │
│ int64 │
├───────┤
│    81 │
└───────┘

In [51]:
# preview size of file
!du -sh data/* | sort -rh

 11G	data/policies.db
3.2G	data/FimaNfipPolicies.parquet
2.9G	data/nfip-data.db
609M	data/claims.db
305M	data/nfip.db
194M	data/FimaNfipClaims.parquet
8.0K	data/policies-state-year.parquet
4.0K	data/policies-nyc-year.parquet
4.0K	data/claims-state-year.parquet
4.0K	data/claims-nyc-year.parquet


In [52]:
# close connection
con.close()