Urban Data Science & Smart Cities <br>
URSP688Y Spring 2026<br>
Instructor: Chester Harvey <br>
Urban Studies & Planning <br>
National Center for Smart Growth <br>
University of Maryland

# Exercise02

## Problem

The District Court of Maryland provides data on eviction-related court filings on the [Maryland Open Data Portal](https://opendata.maryland.gov/Housing/District-Court-of-Maryland-Eviction-Case-Data/mvqb-b4hf/about_data), and the Department of Housing and Community Development (DHCD) summarizes these data on a [dashboard](https://app.powerbigov.us/view?r=eyJrIjoiYWI1Yzg0YjYtNDFkZS00MDUyLThlMDctYmE1ZjY5MGI0MWJhIiwidCI6IjdkM2I4ZDAwLWY5YmUtNDZlNy05NDYwLTRlZjJkOGY3MzE0OSJ9&pageName=ReportSection). This dashboard includes some helpful information about filing types and how the eviction process works.

Despite efforts to make eviction data public, the eviction process is complicated and the data are messy and technical, making them difficult to understand. Take, for example, how the table represents evictions: each row represents a court filing, but only some of these filings indicate an eviction (`'Event Type' == 'Warrant of Restitution - Return of Service - Evicted'`).

Please help DHCD summarize the filings to understand how tenants are impacted by the eviction process, including:

1. ***(I demo this below)*** How many unique cases get filed each month?
2. ***(You do)*** What percent of cases end in eviction?
3. ***(You do)*** Do cases of different types (e.g., tennant holding over, breach of lease, failure to pay rent) culminate in eviction at different rates?
4. ***(Optional)*** How long, on average, does it take for cases to move from a petition being filed to warrent of restitution?
5. ***(Optional)*** What is the eviction rate per person per year at the county level? Which county has a higher rate?
6. ***(Optional)*** Are there other interesting ways you can summarize these data?

To address these questions, you will need to familiarize yourself with how the data are structured, including what each row represents (see above), what data are stored in each column, how consistently these data are structured into categories, etc. You may have to convert data types, correct inconsistencies in categories or simplify how categories are represented, change names of columns so they're intuitive, or do other data wrangling to make the data easier to work with. Then you will have to group/aggregate the data by one or more dimensions to make summary calculations.

I recommend approaching this in two phases: 

- First, write a general-purpose function that takes the raw table as an input——you can decide whether the argument to this function is a path to the CSV or a Pandas DataFrame into which you have already the CSV--and cleans it up into a tidy dataframe that is easy to use for the second phase and for future analyses. You can imagine using this function as a handy pre-processor whenever you download an updated copy of the eviction data. ***I started writing this function for you below, but you can modify or add to it to fit your purposes.***

- Second, write multiple single-purpose functions (though there may be opportunities to generalize) that take your tidy dataframe as an input and do calculations with it to address each of the questions above. ***I wrote an example function to answer the first question above—how many unique cases get filed each month? You try writing functions for questions 2–3. 4–6 are optional if you want an extra challenge.***

- You may also write additional helper functions that are called by your main functions.

## Data

Included in this folder are the three CSVs I used in Demo 4: 
- `District_Court_of_Maryland_Eviction_Case_Data_MG_PG.csv`: Court filings for Montgomery and Prince George's County. I recommend starting with these filings as a 'minimal example.' Ideally, your code will be able to handle filings from other counties or the whole state if they are provided in the same format. You are welcome to try using data from the whole state, but it's not required. ***To economize storage space on GitHub, please don't commit statewide data in your pull request.***
- `acs2024_5yr_B01003_mg.csv`: Zipcode-level populations from the American Community Survey (2024 5-year estimates) for Montgomery County, downloaded from [Census Reporter](https://censusreporter.org/data/table/?table=B01003&geo_ids=05000US24031,860|05000US24031&primary_geo_id=05000US24031)
- `acs2024_5yr_B01003_pg.csv`: Zipcode-level populations from the American Community Survey (2024 5-year estimates) for Prince George's County, downloaded from [Census Reporter](https://censusreporter.org/data/table/?table=B01003&geo_ids=05000US24033,860|05000US24033&primary_geo_id=05000US24033)

## Getting Started
Let's get started working on this together, based on the code we wrote in Demo 4.

### Dependencies

In [None]:
# Import dependencies--external packages our code will depend on
import pandas as pd

### Phase 0: Load Data

In [None]:
# Load the filings from CSV
df_filings_raw = pd.read_csv('District_Court_of_Maryland_Eviction_Case_Data_MG_PG.csv')

In [None]:
# Explore the raw dataframe. How would it be useful to tidy it up?
df_filings_raw.head(2)

### Phase 1: Make a Tidy DataFrame

In [None]:
# Function to make a string all lowercase and replace spaces with underscores
# (Can you see where I use this helper function several times below?)
def lower_underscore(string):
    return string.lower().replace(' ','_')

# Function clean up filings into a tidy dataframe
def tidy_court_filings(df):
    """Make a tidy DataFrame of court filings
    
    Input a Pandas DataFrame from a CSV downloaded at
    https://opendata.maryland.gov/Housing/District-Court-of-Maryland-Eviction-Case-Data/mvqb-b4hf/about_data
    """
    # Drop unnecessary columns
    df = df.drop(columns=['Unnamed: 0'])
    
    # Convert column names to all lowercase with underscores instead of spaces
    # (This is a person style, but I find it easier to operate with column names that are simple and don't have spaces.)
    # One way to do this is with a for loop:
    new_column_names = []
    for col in df.columns:
        col = lower_underscore(col)
        new_column_names.append(col)
    df.columns = new_column_names
      
    # Make sure dates are stored as datetimes instead of strings
    df['event_date'] = pd.to_datetime(df['event_date'])
    df['evicted_date'] = pd.to_datetime(df['evicted_date'])

    # Make sure zip codes and years are integers
    # (Can you think of a situation where we wouldn't want to store them as integers?)
    df['tenant_zip_code'] = df['tenant_zip_code'].astype('Int64')
    df['tenant_zip_code'] = df['tenant_zip_code'].astype('Int64')

    # Make sure event types are consistently classified and use simple class names
    # (You can use the `map` method with a dictionary as the argument to reclassify
    # every value in a column that equals a key in the dictionary with value for that key.)
    reclassifier = {
        'Petition - For Warrant of Restitution Filed': 'petition',
        'petition - For Warrant of Restitution Filed': 'petition',
        'Warrant of Restitution - Return of Service - Cancelled': 'warrant_cancelled',
        'Warrant of Restitution - Return of Service - Evicted': 'warrant_evicted',
        'Warrant of Restitution - Return of Service - Expired': 'warrant_expired',
    }
    df['event_type'] = df['event_type'].map(reclassifier)

    # Make sure case types are consistently classified and use simple class names
    reclassifier = {}
    for case_type in df['case_type']:
        reclassifier[case_type] = lower_underscore(case_type)
    df['case_type'] = df['case_type'].map(reclassifier)

    # Make sure city names are consistently title case
    df['tenant_city'] = df['tenant_city'].str.title()

    # Recalculate event year and eviction year based on dates, just to be confident they're accurate
    # and make sure they're stored as integers
    df['event_year'] = df['event_date'].dt.year.astype('Int64')
    df['eviction_year'] = df['evicted_date'].dt.year.astype('Int64')

    # Change 'evicted_date' to 'eviction_date' for grammatical consistency with 'eviction_year'
    # (Okay, now I'm getting picky, but you can see how you can get into the details with making
    # a dataframe nice and tidy)
    df = df.rename(columns={'evicted_date': 'eviction_date'})

    # Put columns in a more intuitive order
    # (Again, highly optional, but can be useful for tidying)
    df = df[[
        'case_number',
        'case_type',
        'event_date',
        'event_year',
        'event_type',
        'event_comment',
        'eviction_date',
        'eviction_year',
        'county',
        # 'location', # I believe 'location' refers to the district court location, which isn't important to us, so let's leave it out
        'tenant_city',
        'tenant_state',
        'tenant_zip_code',
    ]]    

    return df

df_filings_tidy = tidy_court_filings(df_filings_raw)

df_filings_tidy.head(2)

### Phase 2: Analyze Tidy DataFrame to Answer Questions

#### 1. How many unique cases get filed each month?

In [None]:
def unique_cases_per_month(df, distinguish_year=True):
    """Calculate unique cases filed in each calendar month

    df: Pandas DataFrame pre-processed by the `tidy_court_filings` function

    distinguish_year: True or False (default: True)
        If True, counts are broken out by year and month
        If False, counts from all years are combined in like calendar months
    """
    # Make sure that earliest events are first
    df = df.sort_values('event_date')
    # Keep only the first row for each case number
    unique_cases = df.drop_duplicates('case_number', keep='first')

    # Count cases in each month and year
    # adapted from https://stackoverflow.com/questions/38792122/how-to-group-and-count-rows-by-month-and-year-using-pandas
    if distinguish_year:
        case_counts = unique_cases.groupby([
            unique_cases['event_date'].dt.year.rename('year'), 
            unique_cases['event_date'].dt.month.rename('month')
        ])['case_number'].count()
    else:
        case_counts = unique_cases.groupby([ 
            unique_cases['event_date'].dt.month.rename('month')
        ])['case_number'].count()

    # Structure output in a dataframe with months and years as columns
    case_counts = pd.DataFrame(case_counts).reset_index()
    
    return case_counts
 

In [None]:
unique_cases_per_month(df_filings_tidy)

In [None]:
unique_cases_per_month(df_filings_tidy, distinguish_year=False)

#### 2. What percent of cases end in eviction?

#### 3. Do cases of different types (e.g., tennant holding over, breach of lease, failure to pay rent) culminate in eviction at different rates?