<a href="https://colab.research.google.com/github/Reben80/Data110-32008--Sp25/blob/main/MC_trepass_towing_plotly.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exploring Trespass Towing Data Using Plotly

## About the Data

### Montgomery County Trespass Towing Dataset

The core of the data used to analyze trespass towing comes from a database maintained by the Montgomery County Police Department. An updated dataset is published to the dataMontgomery website monthly. Its current version includes nearly 100,000 entries detailing the the date each vehicle was towed, its location (address and coordinates), the reason it was towed, and information on the vehicle's make, model, and year of manufacture.

### American Community Survey

The American Community Survey (ACS) is an ongoing survey conducted by the U.S. Census Bureau that collects detailed demographic, social, economic, and housing data from a sample of U.S. households. Unlike the decennial census, which provides a snapshot every ten years, the ACS provides updated data every year, making it a crucial resource for understanding current trends and conditions across communities.
Key Features of the ACS:

-   Covers topics like income, education, employment, housing, migration, and more

-   Data available in 1-year and 5-year estimates

-   Useful for government planning, business decisions, academic research, and journalism

The Census Bureau provides an API that allows developers and researchers to programmatically access ACS data.

## About plotly

Plotly is a graphing library for Python that allows users to create a wide variety of plots, from basic line charts to complex 3D visualizations. It is particularly popular for creating interactive visualizations that can be embedded in web applications, Jupyter notebooks, or exported as standalone HTML files.

Plotly is built on top of D3.js and stack.gl, which allows it to offer high-quality visualizations with a simple and intuitive Python API.

Key Features:

-   Interactive plots (zoom, pan, hover, etc.)

-   Wide range of charts (line, bar, scatter, pie, heatmaps, maps, 3D plots, etc.)

-   Easy integration with Dash (Plotly’s web app framework)

-   Export options (static images, HTML, etc.)

In [1]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import plotly.graph_objects as go
import plotly.express as px

# Get the Antique color sequence
colors = px.colors.qualitative.T10

tows = pd.read_csv('/content/tows 10.csv', parse_dates=['Tow Date'])

In [4]:
tows.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 93059 entries, 0 to 93058
Data columns (total 25 columns):
 #   Column                                            Non-Null Count  Dtype         
---  ------                                            --------------  -----         
 0   Tow Date                                          93059 non-null  datetime64[ns]
 1   Vehicle Year                                      90389 non-null  float64       
 2   Vehicle Make                                      92814 non-null  object        
 3   Vehicle Model                                     90196 non-null  object        
 4   Notes                                             90135 non-null  object        
 5   Location                                          93059 non-null  object        
 6   City                                              93045 non-null  object        
 7   lon                                               92797 non-null  float64       
 8   lat                       

### Tree Map

A treemap is a data visualization technique used to display hierarchical data using nested rectangles. Each branch of the hierarchy is represented as a rectangle, which contains smaller rectangles representing its sub-branches or leaves. The size of each rectangle is typically proportional to a quantitative value, such as revenue, population, or file size, while color can be used to represent another dimension, like category or growth rate. Treemaps are especially useful for showing part-to-whole relationships and comparing proportions within a complex dataset at a glance.

In [5]:
# Create a tree graph of reasons for towing
reason_counts = tows['Reason for tow'].value_counts().reset_index()
reason_counts.columns = ['Reason for tow', 'Count']

# Create the tree graph
fig = go.Figure(go.Treemap(
    labels=reason_counts['Reason for tow'],
    parents=[''] * len(reason_counts),  # All reasons are at the root level
    values=reason_counts['Count'],
    textinfo='label+value',
    hovertemplate='<b>%{label}</b><br>Number of tows: %{value}<extra></extra>'
))

# Update layout
fig.update_layout(
    title='Distribution of Towing Reasons',
    width=1000,
    height=600,
    template='plotly_white'
)

# Save the plot as HTML
fig.show()

### Towing Numbers over Time and Projection

Time series projections are a method of forecasting future values based on historical data that is collected at regular intervals over time. Projections in time series analysis help us predict future trends, identify patterns, and make informed decisions based on past behavior.

Can we see any patterns in monthly data? Is the number of vehicles being towed increasing or decreasing over time? What predictions can we make?


In [6]:
tows_projection_data = tows.dropna(subset=['Tow Date'])

# Extract year and month for grouping
tows_projection_data['YearMonth'] = tows_projection_data['Tow Date'].dt.to_period('M')

# Aggregate tow counts by month, skipping missing months
monthly_tows = tows_projection_data.groupby('YearMonth').size().reset_index(name='Tow Count')

# Convert YearMonth period to datetime for analysis
monthly_tows['Month Start'] = monthly_tows['YearMonth'].dt.start_time
monthly_tows['Months Since Start'] = (monthly_tows['Month Start'] - monthly_tows['Month Start'].min()).dt.days // 30

# Prepare data for regression
X = monthly_tows[['Months Since Start']]
y = monthly_tows['Tow Count']

# Train a linear regression model
model = LinearRegression()
model.fit(X, y)

# Predict for future months
future_months = np.arange(monthly_tows['Months Since Start'].max() + 1,
                          monthly_tows['Months Since Start'].max() + 13).reshape(-1, 1)
future_month_dates = [monthly_tows['Month Start'].max() + pd.Timedelta(days=30 * i) for i in range(1, 13)]
future_predictions = model.predict(future_months)


X does not have valid feature names, but LinearRegression was fitted with feature names



Now we are ready to plot the projection.

In [7]:
fig = go.Figure()

# Add Actual Data
fig.add_trace(go.Scatter(
    x=monthly_tows['Month Start'],
    y=monthly_tows['Tow Count'],
    mode='markers+lines',
    name='Actual Data',
    marker=dict(color='#4c78a8'),
    line=dict(color='#4c78a8')
))

# Add Trend Line
fig.add_trace(go.Scatter(
    x=monthly_tows['Month Start'],
    y=model.predict(X),
    mode='lines',
    name='Trend Line',
    line=dict(color='#54a24b', dash='solid')
))

# Add Projections
fig.add_trace(go.Scatter(
    x=future_month_dates,
    y=future_predictions,
    mode='markers+lines',
    name='Projection',
    marker=dict(color='#e45756'),
    line=dict(color='#e45756', dash='dot')
))

# Customize layout
fig.update_layout(
    title='Monthly Tow Count Trend and Future Projections',
    xaxis_title='Month',
    yaxis_title='Tow Count',
    legend_title='Legend',
    template='plotly_white'
)

# Display the graph
fig.show()

### Monthly Towing Rates over Time by Company

It can be valuable to split a timeseries using a categorical variable. In this case we will look at individual trends in the monthly towing of the top 5 companies.

In [9]:
# Checking the most common companies to identify the top 5
top_companies = tows['Trade Name'].value_counts().head(5).index

# Filtering the dataset for rows with these top 5 companies
top_companies_data = tows[tows['Trade Name'].isin(top_companies)]

# Aggregating tow counts by date
top_five_daily_counts = top_companies_data.groupby(['Tow Date', 'Trade Name']).size().unstack(fill_value=0)

# Resampling with 'ME' for month-end frequency and calculating the mean
monthly_avg = top_five_daily_counts.resample('ME').mean()

# Plot the timeseries
fig = go.Figure()

# Add a line for each company
for idx, company in enumerate(monthly_avg.columns):
    fig.add_trace(go.Scatter(
        x=monthly_avg.index,
        y=monthly_avg[company],
        mode='lines+markers',
        name=company,
        line=dict(color=colors[idx % len(colors)]),
        hovertemplate='<b>%{x}</b><br>Average: %{y:.2f}<extra></extra>'
    ))

# Customize layout
fig.update_layout(
    title='Monthly Tow Count Trends',
    xaxis_title='Month',
    yaxis_title='Average Daily Tow Count',
    legend_title='Legend',
    template='plotly_white',
    width=1000,  # Set width to 1200 pixels
    height=800   # Set height to 800 pixels
)

# Add range slider and selector
fig.update_xaxes(
    rangeslider_visible=True,
    rangeselector=dict(
        buttons=list([
            dict(count=1, label='1m', step='month', stepmode='backward'),
            dict(count=6, label='6m', step='month', stepmode='backward'),
            dict(count=1, label='YTD', step='year', stepmode='todate'),
            dict(count=1, label='1y', step='year', stepmode='backward'),
            dict(step='all')
        ])
    )
)

# Display the graph
fig.show()

## Rate of Change - Measuring Volatility

In [12]:
# Calculating the monthly rate of change (differential)
# built in function
monthly_rate_of_change = monthly_avg.diff()

# Month-over-Month Changes Plot
fig = go.Figure()

# Add a trace for each company
for idx, company in enumerate(monthly_rate_of_change.columns):
    fig.add_trace(
        go.Scatter(
            x=monthly_rate_of_change.index,
            y=monthly_rate_of_change[company],
            name=company,
            mode='lines',
            line=dict(color=colors[idx % len(colors)])  # Use same colors as first plot
        )
    )

# Update layout
fig.update_layout(
    title='Month-Over-Month Change in Tow Counts for Top 5 TowingCompanies',
    xaxis_title='Month',
    yaxis_title='Change in Average Number of Tows',
    width=820,
    height=580,
    showlegend=True,
    legend_title='Towing Company',
    plot_bgcolor='white'
)

fig.update_xaxes(
    showgrid=False,
    gridwidth=1,
    gridcolor='LightGray',
    showline=False,
    linewidth=2,
    linecolor='black',
    mirror=False
)
fig.update_yaxes(
    showgrid=True,
    gridwidth=1,
    gridcolor='LightGray',
    showline=True,
    linewidth=2,
    linecolor='black',
    mirror=False
)

fig.add_hline(y=0, line_width=1, line_color='black')

years = [2022, 2023, 2024]




fig.show()

## Bubble Chart - Four Variables

A bubble chart is a type of data visualization that displays three dimensions of data using bubbles (circles) on a two-dimensional plane. Like a scatter plot, it uses the x- and y-axes to represent two variables, while the size of each bubble conveys a third variable, usually indicating magnitude or volume. A fourth variable can be included by including color.

In [None]:
# Group the data by 'TractFIPS' to aggregate the number of tows per tract

# MC divided into 148 Tract
grouped_data = tows.groupby('TractFIPS').agg({
    'median_household_income': 'mean',   # Mean income for each tract
    'pop_density': 'mean',               # Mean population density for each tract
    'cei':'mean',
    'geoid': 'count'                # Number of tows in each tract, assuming 'Unnamed: 0' is a unique identifier
}).reset_index()

# Rename columns for better understanding
grouped_data.rename(columns={'geoid': 'number_of_tows'}, inplace=True)


fig = px.scatter(grouped_data,
    x='median_household_income',
    y='pop_density',
    size='number_of_tows',
    color='cei',
    color_continuous_scale='RdYlBu',
    size_max=50,  # Maximum bubble size
    hover_data={
        'median_household_income': ':,.0f',
        'pop_density': ':,.0f',
        'number_of_tows': True,
        'cei': ':.2f'
    },
    template='simple_white',
    title='Income vs Population Density with Number of Towing Incidents',
    labels={
        'median_household_income': 'Median Household Income',
        'pop_density': 'Population Density',
        'number_of_tows': 'Number of Tows',
        'cei': 'Community Equity Index'
    }
)

# Update layout for better visualization
fig.update_layout(
    plot_bgcolor='white',
    width=850,
    height=550,
    coloraxis_colorbar_title='CEI',
    title_font_size=18
)

fig.update_traces(marker=dict(line=dict(color='black', width=1)))

# Show the plot
fig.show()


#### What if we remove outliers?

In [None]:
# Remove outliers using the Interquartile Range (IQR) method
def remove_outliers(df, column):
    Q1 = df[column].quantile(0.25)
    Q3 = df[column].quantile(0.75)
    IQR = Q3 - Q1
    return df[(df[column] >= (Q1 - 1.5 * IQR)) & (df[column] <= (Q3 + 1.5 * IQR))]

# Remove outliers from Population Density and Resident Density only
df_cleaned = remove_outliers(tows, 'pop_density')
df_cleaned = remove_outliers(tows, 'resident_density')

# Group the data by 'TractFIPS' to aggregate the number of tows per tract
grouped_data = df_cleaned.groupby('TractFIPS').agg({
    'median_household_income': 'mean',   # Mean income for each tract
    'pop_density': 'mean',               # Mean population density for each tract
    'cei':'mean',
    'geoid': 'count'                # Number of tows in each tract, assuming 'Unnamed: 0' is a unique identifier
}).reset_index()

# Rename columns for better understanding
grouped_data.rename(columns={'geoid': 'number_of_tows'}, inplace=True)


fig = px.scatter(grouped_data,
    x='median_household_income',
    y='pop_density',
    size='number_of_tows',
    color='cei',
    color_continuous_scale='RdYlBu',
    size_max=50,  # Maximum bubble size
    hover_data={
        'median_household_income': ':,.0f',
        'pop_density': ':,.0f',
        'number_of_tows': True,
        'cei': ':.2f'
    },
    template='simple_white',
    title='Income vs Population Density with Number of Towing Incidents',
    labels={
        'median_household_income': 'Median Household Income',
        'pop_density': 'Population Density',
        'number_of_tows': 'Number of Tows',
        'cei': 'Community Equity Index'
    }
)

# Update layout for better visualization
fig.update_layout(
    plot_bgcolor='white',
    width=850,
    height=550,
    coloraxis_colorbar_title='CEI',
    title_font_size=18
)

fig.update_traces(marker=dict(line=dict(color='black', width=1)))

# Show the plot
fig.show()
