# Download NFIP Redacted Claims and Policies Data for New York City

# Objective

This notebook demonstrates the following:

- Downloading the [NFIP Redacted Claims](https://www.fema.gov/openfema-data-page/fima-nfip-redacted-claims-v2) and [Redacted Policies](https://www.fema.gov/openfema-data-page/fima-nfip-redacted-policies-v2) data from [OpenFEMA](https://www.fema.gov/about/reports-and-data/openfema)
- Learn how to use the [OpenFEMA API](https://www.fema.gov/about/openfema/api)
- Working with [Parquet](https://parquet.apache.org/) files using [DuckDB](https://duckdb.org/)
- Conducting basic exploratory data analysis (EDA) with DuckDB
- Filtering and writing out records specific to New York City

While the [OpenFEMA API](https://www.fema.gov/about/openfema/api) offers a method to [page through large datasets](https://www.fema.gov/about/openfema/working-with-large-data-sets) (e.g., [NFIP Redacted Policies](https://www.fema.gov/openfema-data-page/fima-nfip-redacted-policies-v2)), which is ideal for cases where you don't want to download the entire dataset locally, the size of the NFIP Redacted Claims dataset allows us to download the full Parquet file. For this analysis, we use DuckDB to efficiently filter and write out only the records relevant to New York City, saving both Parquet and CSV formats.

Note: Due to GitHub's file size limitations, the full NFIP Redacted Claims and Policies datasets are excluded from this repository.

In [1]:
import csv
import time
from datetime import datetime
import logging

import requests
from bs4 import BeautifulSoup
import duckdb

import nfip_download

In [2]:
# reproducibility
%reload_ext watermark
%watermark -v -p requests,duckdb

Python implementation: CPython
Python version       : 3.11.0
IPython version      : 8.6.0

requests: 2.28.1
duckdb  : 1.0.0



In [3]:
# data retrieved
current_date = datetime.now()
print(f"The data was retrieved on {current_date.strftime('%Y-%m-%d')}.")

The data was retrieved on 2025-03-24.


# Getting Started: OpenFEMA
- [OpenFEMA](https://www.fema.gov/about/reports-and-data/openfema): The public’s resource for FEMA program data. Promoting a culture of Open Government and increasing transparency, participation, and collaboration among the Whole Community in support of FEMA's mission to help people before, during, and after disasters.


- [OpenFEMA Developer Resources](https://www.fema.gov/about/openfema/developer-resources): Welcome to the OpenFEMA Developer Resources page, devoted to providing additional development information regarding our Application Programming Interface (API) for use in your applications and mashups.  The API is free of charge and does not currently have user registration requirements.


- [OpenFEMA API Documentation](https://www.fema.gov/about/openfema/api): As part of the OpenFEMA initiative, FEMA is providing read-only API based access to datasets (Entities). The data is exposed using a RESTful interface that uses query string parameters to manage the query. Use of the service is free and does not require a subscription or API key.


- [OpenFEMA Terms and Conditions](https://www.fema.gov/about/openfema/terms-conditions): Respect the OpenFEMA API and content on this website. Use the Site in a lawful manner. Do not modify the Site or attempt to use it to publish or transmit malicious software or content. FEMA shall not be liable for any damages resulting from the use of this website, API services, or content. Do not attempt to reidentify the individuals whose data may be aggregated. We may suspend your access to this website if we feel you have not complied with these terms and conditions..

# OpenFEMA Dataset: FIMA NFIP Redacted Claims - v2

[OpenFEMA Dataset: FIMA NFIP Redacted Claims - v2](https://www.fema.gov/openfema-data-page/fima-nfip-redacted-claims-v2): Congress passed the National Flood Insurance Act (NFIA), 42 U.S.C. 4001 in 1968, creating the National Flood Insurance Program (NFIP) in order to reduce future flood losses through flood hazard identification, floodplain management, and providing insurance protection. This dataset provides details on NFIP claims transactions. It is derived from the NFIP system of record, staged in the NFIP reporting platform and redacted to protect policy holder personally identifiable information.

![dataset-page](images/dataset-page.png)

Screenshot of NFIP Redacted Claims dataset page.

# Download Data Dictionary

In [4]:
# URL of the page containing the table
url = 'https://www.fema.gov/openfema-data-page/fima-nfip-redacted-claims-v2'

# Request the page content
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

# Find the third table with the 'usa-table' class
table = soup.find_all('table', class_='usa-table')[2]  # index 2 for the third table

# Extract table headers
headers = [header.text.strip() for header in table.find_all('th')]

# Extract table rows
rows = []
for row in table.find_all('tr')[1:]:  # Skip the header row
    cells = row.find_all('td')
    if cells:
        row_data = [cell.text.strip() for cell in cells]
        rows.append(row_data)

# Write to CSV file
with open('data-dictionary.csv', 'w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerow(headers)  # Write header
    writer.writerows(rows)    # Write data rows

print("data dictionary has been written.")

data dictionary has been written.


# Download Claims and Policies Data for New York City
I built a script that utilizes the OpenFEMA API to programmatically export NFIP Claims and Policies data by County FIPS code.

In [5]:
datasets = ['claims', 'policies']

county_fips = [
    '36005',
    '36047',
    '36061',
    '36081',
    '36085'
]

for dataset in datasets:
    for county in county_fips:
        nfip_download.download_data(dataset, county)
        time.sleep(1)

2025-03-24 22:06:51,578 - INFO - Starting to download claims data for county FIPS: 36005
2025-03-24 22:06:54,236 - INFO - Chunk is less than 10,000 rows returned, finishing download.
2025-03-24 22:06:54,237 - INFO - Download completed. Total rows fetched: 1942
2025-03-24 22:06:54,480 - INFO - Data saved to data/claims-36005.json
2025-03-24 22:06:55,484 - INFO - Starting to download claims data for county FIPS: 36047
2025-03-24 22:06:59,533 - INFO - Chunk is less than 10,000 rows returned, finishing download.
2025-03-24 22:06:59,533 - INFO - Download completed. Total rows fetched: 6264
2025-03-24 22:07:00,243 - INFO - Data saved to data/claims-36047.json
2025-03-24 22:07:01,254 - INFO - Starting to download claims data for county FIPS: 36061
2025-03-24 22:07:03,742 - INFO - Chunk is less than 10,000 rows returned, finishing download.
2025-03-24 22:07:03,742 - INFO - Download completed. Total rows fetched: 1485
2025-03-24 22:07:03,909 - INFO - Data saved to data/claims-36061.json
2025-03

In [6]:
# preview size of file
!du -sh data/* | sort -rh

614M	data/policies-36081.json
421M	data/policies-36047.json
290M	data/policies-36085.json
104M	data/policies-36061.json
 98M	data/policies-36005.json
 48M	data/claims-36085.json
 37M	data/claims-36081.json
 16M	data/claims-36047.json
4.8M	data/claims-36005.json
3.7M	data/claims-36061.json


# Create Claims Table using DuckDB

In [7]:
# create a DuckDB database instance
con = duckdb.connect()

# create table claims of dataset
con.execute("""
    CREATE TABLE claims AS
        FROM read_json('data/claims-*.json')
""")

# sanity check
sql = """
    SELECT *
    FROM claims
    LIMIT 5
"""

con.sql(sql)

┌──────────────────────┬──────────────────────┬──────────────────────┬───┬──────────┬───────────┬──────────────────────┐
│ agricultureStructu…  │       asOfDate       │ basementEnclosureC…  │ … │ latitude │ longitude │          id          │
│       boolean        │       varchar        │        int64         │   │  double  │  double   │         uuid         │
├──────────────────────┼──────────────────────┼──────────────────────┼───┼──────────┼───────────┼──────────────────────┤
│ false                │ 2020-09-10T19:02:5…  │                    1 │ … │     40.8 │     -73.8 │ 80349555-9429-4ec2…  │
│ false                │ 2024-09-23T14:02:0…  │                    1 │ … │     40.9 │     -73.8 │ e1d4d5a4-c438-4ef1…  │
│ false                │ 2020-01-22T16:55:5…  │                    2 │ … │     40.8 │     -73.9 │ c9b02a95-815a-4c69…  │
│ false                │ 2020-01-22T16:55:5…  │                    2 │ … │     40.8 │     -73.9 │ c43c75c0-9add-4c96…  │
│ false                │ 2020-01

In [8]:
# sanity check
sql = """
    SELECT COUNT(*) AS count
    FROM claims
"""

con.sql(sql)

┌───────┐
│ count │
│ int64 │
├───────┤
│ 43978 │
└───────┘

In [9]:
# sanity check
sql = """
    SELECT
        COUNT(column_name) AS count_columns
    FROM
        (DESCRIBE FROM claims)
"""

con.sql(sql)

┌───────────────┐
│ count_columns │
│     int64     │
├───────────────┤
│            73 │
└───────────────┘

In [10]:
# sanity check
sql = """
    SELECT
        asOfDate
    FROM
        claims
    ORDER BY
        asOfDate DESC
    LIMIT 1
"""

con.sql(sql)

┌──────────────────────────┐
│         asOfDate         │
│         varchar          │
├──────────────────────────┤
│ 2025-03-10T15:48:30.729Z │
└──────────────────────────┘

In [11]:
# sanity check
sql = """
    SELECT
        column_name,
        column_type
    FROM
        (DESCRIBE claims)
"""

con.sql(sql).show(max_rows=80)

┌────────────────────────────────────────────┬─────────────┐
│                column_name                 │ column_type │
│                  varchar                   │   varchar   │
├────────────────────────────────────────────┼─────────────┤
│ agricultureStructureIndicator              │ BOOLEAN     │
│ asOfDate                                   │ VARCHAR     │
│ basementEnclosureCrawlspaceType            │ BIGINT      │
│ policyCount                                │ BIGINT      │
│ crsClassificationCode                      │ BIGINT      │
│ dateOfLoss                                 │ VARCHAR     │
│ elevatedBuildingIndicator                  │ BOOLEAN     │
│ elevationCertificateIndicator              │ VARCHAR     │
│ elevationDifference                        │ BIGINT      │
│ baseFloodElevation                         │ DOUBLE      │
│ ratedFloodZone                             │ VARCHAR     │
│ houseWorship                               │ BOOLEAN     │
│ locationOfContents    

In [12]:
# sanity check
sql = """
    SELECT
        column_name,
        null_percentage
    FROM
        (SUMMARIZE FROM claims)
    WHERE
        null_percentage > 0
    ORDER BY
        null_percentage DESC
"""

con.sql(sql).show(max_rows=80)

┌────────────────────────────────────────────┬─────────────────┐
│                column_name                 │ null_percentage │
│                  varchar                   │  decimal(9,2)   │
├────────────────────────────────────────────┼─────────────────┤
│ crsClassificationCode                      │          100.00 │
│ floodCharacteristicsIndicator              │           99.70 │
│ eventDesignationNumber                     │           97.31 │
│ lowestAdjacentGrade                        │           87.20 │
│ lowestFloorElevation                       │           86.93 │
│ baseFloodElevation                         │           86.62 │
│ elevationDifference                        │           86.50 │
│ nonPaymentReasonBuilding                   │           80.12 │
│ nonPaymentReasonContents                   │           74.26 │
│ nfipCommunityNumberCurrent                 │           73.34 │
│ floodZoneCurrent                           │           73.34 │
│ nfipCommunityName      

# Write Out Files as Parquet

In [13]:
# copy to a Parquet file
sql = """
    COPY (SELECT * FROM claims)
    TO 'data/nfip-claims-nyc.parquet'
"""

con.sql(sql)      

In [14]:
# confirm download
%ls data/

claims-36005.json        claims-36085.json        policies-36061.json
claims-36047.json        nfip-claims-nyc.parquet  policies-36081.json
claims-36061.json        policies-36005.json      policies-36085.json
claims-36081.json        policies-36047.json


In [15]:
# preview size of file
!du -sh data/* | sort -rh

614M	data/policies-36081.json
421M	data/policies-36047.json
290M	data/policies-36085.json
104M	data/policies-36061.json
 98M	data/policies-36005.json
 48M	data/claims-36085.json
 37M	data/claims-36081.json
 16M	data/claims-36047.json
4.8M	data/claims-36005.json
3.7M	data/claims-36061.json
3.5M	data/nfip-claims-nyc.parquet


In [17]:
# sanity check on exported Parquet file
sql = """
    SELECT COUNT(*) AS count
    FROM read_parquet('data/nfip-claims-nyc.parquet');
"""

duckdb.sql(sql)

┌───────┐
│ count │
│ int64 │
├───────┤
│ 43978 │
└───────┘

# Create Policies Table using DuckDB
We perform the same workflow as the Claims dataset but with Policies data.

In [18]:
# create a DuckDB database instance
con = duckdb.connect()

# create table policies of dataset
con.execute("""
    CREATE TABLE policies AS
        FROM read_json('data/policies-*.json')
""")

# sanity check
sql = """
    SELECT *
    FROM policies
    LIMIT 5
"""

con.sql(sql)

┌──────────────────────┬────────────────────┬──────────────────────┬───┬──────────┬───────────┬──────────────────────┐
│ agricultureStructu…  │ baseFloodElevation │ basementEnclosureC…  │ … │ latitude │ longitude │          id          │
│       boolean        │       double       │        int64         │   │  double  │  double   │         uuid         │
├──────────────────────┼────────────────────┼──────────────────────┼───┼──────────┼───────────┼──────────────────────┤
│ false                │               13.0 │                    0 │ … │     40.8 │     -73.8 │ c4b3dc22-1f60-4d08…  │
│ false                │               NULL │                 NULL │ … │     40.9 │     -73.8 │ e4f4f206-5a14-46e9…  │
│ false                │               NULL │                    1 │ … │     40.8 │     -73.9 │ 93f58e3e-f563-4587…  │
│ false                │               NULL │                    2 │ … │     40.8 │     -73.8 │ d4a416b2-f4ad-4cfb…  │
│ false                │               NULL │   

In [19]:
# sanity check
sql = """
    SELECT COUNT(*) AS count
    FROM policies
"""

con.sql(sql)

┌────────┐
│ count  │
│ int64  │
├────────┤
│ 548267 │
└────────┘

In [20]:
# copy to a Parquet file
sql = """
    COPY (SELECT * FROM policies)
    TO 'data/nfip-policies-nyc.parquet'
"""

con.sql(sql)      

In [21]:
# preview size of file
!du -sh data/* | sort -rh

614M	data/policies-36081.json
421M	data/policies-36047.json
290M	data/policies-36085.json
104M	data/policies-36061.json
 98M	data/policies-36005.json
 48M	data/claims-36085.json
 37M	data/nfip-policies-nyc.parquet
 37M	data/claims-36081.json
 16M	data/claims-36047.json
4.8M	data/claims-36005.json
3.7M	data/claims-36061.json
3.5M	data/nfip-claims-nyc.parquet


In [22]:
# sanity check on exported Parquet file
sql = """
    SELECT COUNT(*) AS count
    FROM read_parquet('data/nfip-policies-nyc.parquet');
"""

duckdb.sql(sql)

┌────────┐
│ count  │
│ int64  │
├────────┤
│ 548267 │
└────────┘