Urban Data Science & Smart Cities <br>
URSP688Y Spring 2026<br>
Instructor: Chester Harvey <br>
Urban Studies & Planning <br>
National Center for Smart Growth <br>
University of Maryland

# Demo 5 - More Tables & Debugging

- More Tables
    - Drop duplicate rows
    - Count rows withing groups
    - Concatenate tables
    - Join columns from another table
- Errors and Debugging

## Data Wrangling with Tables

In [None]:
# Import Pandas package for working with tables
import pandas as pd

In [None]:
# Load raw eviction data
eviction_cases_df = pd.read_csv('District_Court_of_Maryland_Eviction_Case_Data_MG_PG.csv')

In [None]:
# Investivate the dataframe
eviction_cases_df.head()

In [None]:
# Convert data in date columns to datetime objects so they can be sorted properly

def convert_column_to_datetime(df, column):
    """
    Converts a column in a dataframe to datetime objects

    Args:
        df (Pandas DataFrame): input dataframe
        column (string): column in input dataframe to be converted to datetime

    Returns:
        Pandas DataFrame: copy of input dataframe with the converted datetime column
    """
    df = df.copy()
    df[column] = pd.to_datetime(df[column])
    return df

for date_column in ['Event Date', 'Evicted Date']:
    eviction_cases_df = convert_column_to_datetime(eviction_cases_df, date_column)

In [None]:
# Investigate data types of columns
eviction_cases_df.dtypes

How many unique cases are there?

In [None]:
# .drop_duplicates() method

How many unique cases per zip code?

In [None]:
# .groupby() method

Which zip codes have the most unique cases *per person*?

Let's join data from [CensusReporter](https://censusreporter.org/).

### Combining/Merging/Joining Tables

Combining information from multiple tables into a single table is one of the most useful data wrangling operations.

There are lots of different ways to join tables, but two basic types are:

1. Joining column with a shared key, which outputs a table that is wider than either input.
2. Concatenating rows with shared column names, which outputs a table that is longer than either input.

#### Joining columns based on a key

![joining columns with a shared key](https://rforhr.com/horizontal_join.png)


#### Concatenating rows with the same column names
![joining rows with shared column names](https://rforhr.com/vertical_join.png)

In [None]:
# Load census reporter data, ignoring the row with data for the whole county (first row under the header)

# Can we write a function that loads a census reporter csv and skips the second row?

In [None]:
# Combine into a single dataframe

# pd.concat() function

In [None]:
# Rename columns with readable names
column_map = {
    'name':'census_zip', 
    'B01003001':'population', 
    'B01003001, Error':'population_error'
}

In [None]:
# Make sure zip codes are stored as strings in both the eviction and census dataframes

# .astype() method

In [None]:
# Clean up the dataframe

# .fillna() method

# .drop() method

In [None]:
# Calculate evictions per population

## Errors and debugging

Errors are frustrating and inevitable. Even professional programmers spend much of their time debugging.

Luckily, there are good tools and techniques for making debugging a little easier.

Despite these, you will probably nearly tear your hair out with some frequency, especially as a beginner. It will get better with time.

There are two types of errors in programming: logic and syntax. They both result in your program not achieving its goal, but the first may not be as easily detectable because the code may still run.

### Logic errors
These are issues with how you have approached or executed your problem. If your code runs but produces nonsensical results, there is probably a logic error. However, your erroneous code might also produce logical but *wrong* results; you might never notice until the problem has rippled downstream. It's best to address this proactively by planning your code well so it's less likely to be illogical, and writing readable code that can be easily reviewed.

Here's a logic error. Can you find it? (Hint: the issue is syntactical, but it's still a logic error because the code works without throwing an error.)

In [None]:
def check_adult(age, cutoff=18):
    if age > cutoff:
        adult = False
    else:
        adult = True
    return adult

check_adult(20)

### Syntax errors
These are more obvious because your code will simply fail. There are lots of tools for figuring out where and why.

Error messages are usually the starting place for debugging a syntax error.

In [None]:
def check_adult(age, cutoff=18):
    if age < cutoff:
        adult = False
    else:
        adult = True
    return adult

check_adult('20')

### Debugging
We can also use an "interactive debugger" to help diagnose our problem by stepping through the code one line at a time.

The debugger allows you to set "breakpoints" where the code will stop running temporarily, a table that shows the values of variables at that time, and buttons to step through the code.

In [None]:
def check_adult(age, cutoff=18):
    if age < cutoff:
        adult = False
    else:
        adult = True
    return adult

check_adult(10)