# **PROJECT 2: MASS SHOOTINGS IN THE USA VISUALIZATION WITH INTERACTIONS, USING ALTAIR AND STREAMLIT**
#### **Laura Humet and Maria Sans** - Information Visualization - GCED

## Code for Data Preparation

In [1]:
import pandas as pd
gun_violence = pd.read_csv('gun_violence_cleaned.csv')
print(gun_violence.columns)

Index(['Incident ID', 'Incident Month', 'Incident Year', 'State',
       'Population per State', 'FIPS_States', 'City Or County', 'Location',
       'json', 'Latitude', 'Longitude', 'json2', 'Region Code',
       'Population per County', 'FIPS_County', 'County Name', 'Victims Killed',
       'Victims Injured', 'Suspects Killed', 'Suspects Injured',
       'Suspects Arrested'],
      dtype='object')


In [2]:
additional_data_2017 = pd.read_csv("additional_data_2017.csv")
print(additional_data_2017.columns)

Index(['Incident ID', 'Incident Month', 'Incident Year', 'State',
       'Population per State', 'FIPS_States', 'Latitude', 'Longitude',
       'Population per County', 'FIPS_County', 'County Name', 'Victims Killed',
       'Victims Injured', 'Suspects Killed', 'Suspects Injured',
       'Suspects Arrested'],
      dtype='object')


In [3]:
gun_violence.drop(columns=['json', 'json2', 'City Or County', 'Location', 'Region Code'], inplace=True)
gun_violence = pd.concat([gun_violence, additional_data_2017], ignore_index=True)

In [4]:
gun_violence['Population per State'] = gun_violence['Population per State'].astype(int)
gun_violence['Population per County'] = gun_violence['Population per County'].astype(int)

In [5]:
regions = pd.read_csv('updated-state-region-fips.csv')

In [6]:
regions = regions.drop(columns=['State'])
gun_violence['FIPS_States'] = gun_violence['FIPS_States'].astype(str)
regions['FIPS_States'] = regions['FIPS_States'].astype(str)

# Now merge the datasets
shootings = pd.merge(gun_violence, regions, on='FIPS_States', how='left')

shootings


Unnamed: 0,Incident ID,Incident Month,Incident Year,State,Population per State,FIPS_States,Latitude,Longitude,Population per County,FIPS_County,County Name,Victims Killed,Victims Injured,Suspects Killed,Suspects Injured,Suspects Arrested,Region
0,271363,December,2014,Louisiana,4657757,22,29.951049,-90.082308,383997,22071,Orleans Parish,0,4,0,0,0,Southeast
1,270036,December,2014,California,39538223,6,38.552289,-121.335208,1585055,6067,Sacramento County,0,4,0,0,0,Southwest
2,269679,December,2014,California,39538223,6,34.074600,-118.053430,10014009,6037,Los Angeles County,1,3,0,0,0,Southwest
3,269167,December,2014,Illinois,12812508,17,38.607625,-90.130144,257400,17163,St. Clair County,1,3,0,0,0,Midwest
4,268598,December,2014,Missouri,6154913,29,38.613395,-90.259382,301578,29510,St. Louis city,1,3,0,0,0,Midwest
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4231,2492314,January,2023,Ohio,11799448,39,40.005680,-83.028661,1323807,39049,Franklin County,1,4,0,0,2,Midwest
4232,932181,September,2017,Washington,7800000,53,0.000000,0.000000,574,53063,Spokane County,1,3,0,0,1,Northwest
4233,906509,August,2017,Montana,1133000,30,0.000000,0.000000,429,30033,Big Horn County,3,2,0,0,0,Northwest
4234,904106,August,2017,Washington,7800000,53,0.000000,0.000000,8650,53077,Yakima County,0,6,0,0,1,Northwest


In [7]:
shootings.to_csv('shootings.csv', index=False)

In [8]:
!pip install --upgrade altair



In [9]:
!pip install altair==5.4.1

Collecting altair==5.4.1
  Downloading altair-5.4.1-py3-none-any.whl.metadata (9.4 kB)
Downloading altair-5.4.1-py3-none-any.whl (658 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m658.1/658.1 kB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: altair
  Attempting uninstall: altair
    Found existing installation: altair 5.5.0
    Uninstalling altair-5.5.0:
      Successfully uninstalled altair-5.5.0
Successfully installed altair-5.4.1


# **Q1 and Q3: How has the number of mass shootings evolved in the big US regions between two concrete years?** \\
*For this, we need you to aggregate the data in the 5 regions (Southeast, Northeast, Midwest, Northwest, and Southwest), and let the user select the first and last year of the comparison. Same for states, both views coordinated. \\
We also would like you to be able to select a state, and show the detailed information on the counties of the state.*

In [10]:
import pandas as pd
shootings = pd.read_csv('shootings.csv')

shootings = shootings.rename(columns = {'County Name':'County'})
shootings['mass_shootings_per_state'] = shootings.groupby(['Incident Year', 'State'])['State'].transform('count')
shootings['mass_shootings_per_region'] = shootings.groupby(['Incident Year', 'Region'])['Region'].transform('count')
shootings['mass_shootings_per_county'] = shootings.groupby(['Incident Year', 'County'])['County'].transform('count')

In [11]:
# Real population per region based on recent estimates
population_per_region_dict = {
    'Northeast': 57700000,
    'Midwest': 68600000,
    'Southwest': 10200000,
    'Southeast': 26500000,
    'Northwest': 13300000
}

# Map the population per region from the dictionary
shootings['Population per Region'] = shootings['Region'].map(population_per_region_dict)

In [12]:
import altair as alt
from vega_datasets import data

# Define selection for the region and for the year
region_selection = alt.selection_point(fields=['Region'])

year_range = {'x': [shootings['Incident Year'].min(), shootings['Incident Year'].max()]}
year_selection = alt.selection_interval(encodings=['x'], value=year_range)

state_selection = alt.selection_point(fields=['State', 'FIPS_State'], empty='none')

In [13]:
shootings['mass_shootings_per_state']/shootings['Population per State'] * 100000

Unnamed: 0,0
0,0.236165
1,0.091051
2,0.091051
3,0.195122
4,0.129977
...,...
4231,0.254249
4232,0.038462
4233,0.088261
4234,0.038462


In [14]:
from vega_datasets import data
import numpy as np

map = alt.topo_feature(data.us_10m.url, 'states')

## REGION CHARTS ##
map_regions = alt.Chart(map).mark_geoshape().encode(
    color = alt.condition(region_selection, alt.Color('Region:N').legend(title='Regions'), alt.value('#EEEEEE')),
    stroke = alt.value('lightgray'),
    tooltip = alt.Tooltip('Region:N')
).transform_lookup(
    lookup = 'id',
    from_ = alt.LookupData(shootings, 'FIPS_States', ['Region'])
).transform_filter(
    alt.datum.Region != None
).add_params(
    region_selection
).properties(
    width = 300,
    height = 200
).project('albersUsa')

regions_chart = alt.Chart(shootings).mark_line(strokeWidth = 2, clip=True).encode(
    x = alt.X('Incident Year:N', axis = alt.Axis(title = 'Year'), scale=alt.Scale(nice=False, padding=10)).scale(domain=year_selection),
    y = alt.Y('mass_shootings_per_100k_per_region:Q', axis = alt.Axis(title = 'Mass Shootings'), scale=alt.Scale(nice=False, padding=10)),
    color = alt.Color('Region:N').legend(title='Regions'),
    opacity=alt.condition(region_selection, alt.value(1.0), alt.value(0.1)),
    tooltip=[alt.Tooltip('Region:N', title='Region'), alt.Tooltip('mass_shootings_per_100k_per_region:Q', title='Mass Shootings per 100k citizens')]
).transform_calculate(
    mass_shootings_per_100k_per_region="datum.mass_shootings_per_region / datum['Population per Region'] * 100000"
).add_params(
    region_selection
).transform_filter(
    year_selection
).properties(
    width = 300,
    height = 200
)
regions = alt.vconcat(map_regions, regions_chart, title={'text':'Evolution of Mass Shootings per 100k citizens in the US by regions',
                                                         'subtitle':'Select a region'})

## STATE CHARTS ##
map_states = alt.Chart(map).mark_geoshape().encode(
    color = alt.condition(state_selection, alt.Color('State:N').legend(title='States'), alt.value('#EEEEEE')),
    stroke = alt.value('lightgray'),
    tooltip = alt.Tooltip('State:N')
).transform_lookup(
    lookup = 'id',
    from_ = alt.LookupData(shootings, 'FIPS_States', ['State', 'Region'])
).transform_filter(
    alt.datum.State != None
).transform_filter(
    region_selection
).add_params(
    state_selection, region_selection
).properties(
    width = 300,
    height = 200
).project('albersUsa')

states_chart = alt.Chart(shootings).mark_line(strokeWidth = 2, clip=True).encode(
    x = alt.X('Incident Year:N', axis = alt.Axis(title = 'Year'), scale=alt.Scale(nice=False, padding=10)).scale(domain=year_selection),
    y = alt.Y('mass_shootings_per_100k_per_state:Q', axis = alt.Axis(title = 'Mass Shootings'), scale=alt.Scale(nice=False, padding=10)),
    color = alt.Color('State:N').legend(title='States'),
    opacity=alt.condition(state_selection, alt.value(1.0), alt.value(0.1)),
    tooltip=[alt.Tooltip('State:N', title='State'), alt.Tooltip('mass_shootings_per_100k_per_state:Q', title='Mass Shootings per 100k citizens')]
).transform_calculate(
    mass_shootings_per_100k_per_state="datum.mass_shootings_per_state / datum['Population per State'] * 100000"
).transform_filter(
    region_selection
).transform_filter(
    year_selection
).add_params(
    state_selection, region_selection
).properties(
    width = 300,
    height = 200
)

states = alt.vconcat(map_states, states_chart, title={'text':['Evolution of Mass Shootings in the US per 100k citizens', 'for each state in the selected region'],
                                                      'subtitle': 'Select a state'})

## COUNTY CHARTS ##
counties_chart = alt.Chart(shootings).mark_line(strokeWidth = 2, point=True, clip=True).encode(
    x = alt.X('Incident Year:N', axis = alt.Axis(title = 'Year'), scale=alt.Scale(nice=False, padding=10)).scale(domain=year_selection),
    y = alt.Y('mass_shootings_per_100k_per_county:Q', axis = alt.Axis(title = 'Mass Shootings'), scale=alt.Scale(nice=False, padding=10)),
    color = alt.Color('County:N', legend=alt.Legend(
            title='States',
            orient='right',
            offset=10,
            format=alt.condition(state_selection, alt.value('visible'), alt.value('none'))
        )),
    tooltip=[alt.Tooltip('County:N', title='County'), alt.Tooltip('mass_shootings_per_100k_per_county:Q', title='Mass Shootings per 100k citizens')]
).transform_calculate(
    mass_shootings_per_100k_per_county="datum.mass_shootings_per_county / datum['Population per County'] * 100000"
).transform_filter(
    state_selection
).transform_filter(
    year_selection
).transform_filter(
    alt.datum.mass_shootings_per_county >= 3
).add_params(
    state_selection
).properties(
    width = 300,
    height = 300,
    title={
        'text':['Evolution of Mass Shootings per 100k citizens in the US', 'for each county in the selected state'],
        'subtitle':'(At least 3 mass shootings per county)'
        }
)

year_selection_chart = alt.Chart(shootings, width=600, height=60).mark_line().encode(
    x = alt.X('Incident Year:N', title='Year'),
    y = alt.Y('mass_shootings_per_region:Q', title=None),
    color = alt.Color('State:N', legend=None)
).add_params(
    year_selection
).properties(
    width=300,
    height=50,
    title={
        "text": "Select Year Range",
        "subtitle": "Drag across the years below to select the range of interest."
    }
)


counties_and_years = alt.vconcat(counties_chart, year_selection_chart).resolve_scale(color='independent')

alt.hconcat(regions, states, counties_and_years).resolve_scale(color='independent')

### Visualization Design Decisions
To explore the evolution of mass shootings in the U.S, we used **line charts** to effectively depict trends over time, ensuring clarity in illustrating changes. By normalizing the data to *mass shootings per 100k citizens* we ensured a meaningful comparison across regions, states, and counties with varying population sizes.

### Region and State Selection

We included maps to help the user have a geographic context, making it easier to locate and select specific regions or states. This interactive approach ensures not overwhelming users with excessive data points. Moreover, it helps users to select the region they want to explore.
We decided to isolate the line charts for each specific region to be more perceivable, since showing the evolution of all states in the U.S and the same for the counties would make the chart difficult to understand.


### County-Level Detail

For county-level visualizations, we encoded the chart in a way that it only shows the data of a selected state.
We also **limited data display to counties with at least three mass shootings** to maintain focus on significant treds. Interactive filters ensure the chart is shown only when a state is selected, avoiding visual clutter. Additionally, data points were added to the line charts to address gaps caused by years without mass shootings, ensuring no data is inadvertently excluded.

This design choices ensure users can intuitively explore trends and answer questions accurately, such as identifying regions with the steepest increases or understanding county-level dynamics.

# **Q2**: Given a concrete year, how has the number of mass shooting per citizen grown or decreased across the different regions in the US compared to the first year of sampled data?


In [15]:
import altair as alt

# Group data by year and region, calculating rate per million
region_yearly = (
    shootings.groupby(['Incident Year', 'Region', 'Population per State'])
    .size()
    .reset_index(name='Mass Shootings')
)
region_yearly['Rate per Million'] = (region_yearly['Mass Shootings'] / region_yearly['Population per State']) * 1e6

# Filter data for 2014 and the selected year
def filter_data(selected_year):
    filtered_data = region_yearly[region_yearly['Incident Year'].isin([2014, selected_year])]
    return filtered_data

# Dropdown filter for selecting a year
year_dropdown = alt.binding_select(
    options=sorted(region_yearly['Incident Year'].unique())[1:],
    name='Comparison Year'
)
year_selection = alt.param(
    name="Select",
    value={'Incident Year': 2015},
    bind=year_dropdown
)


In [16]:
# Calculate "Shootings per Capita" first
shootings['Shootings per Capita'] = 1 / shootings['Population per Region']

# Calculate "Shootings per 100k"
shootings['Shootings per 100k'] = shootings['Shootings per Capita'] * 100000  # Scale to 100,000 citizens

# Aggregate by Region and Year
region_year_data = shootings.groupby(['Region', 'Incident Year', 'Population per Region']).agg(
    shootings_per_100k=('Shootings per 100k', 'sum')  # Sum of shootings per 100k
).reset_index()


print(region_year_data.head())

first_year = region_year_data['Incident Year'].min()

valid_years = region_year_data[region_year_data['Incident Year'] > first_year]['Incident Year'].unique()

year_dropdown = alt.binding_select(
    options=sorted(valid_years),
    name="Select Year: "
)

year_select = alt.selection_point(fields=["Incident Year"], bind=year_dropdown, value=2015, name='year_select')

slope_chart = alt.Chart(region_year_data, width=300, height=400).transform_calculate(
    is_selected_or_first_year="datum['Incident Year'] == " + str(first_year) + " || datum['Incident Year'] == year_select['Incident Year'][0]"
).mark_line(point=True).encode(
    x=alt.X('Incident Year:N', title="Year"),
    y=alt.Y('shootings_per_100k:Q', title="Shootings per 100,000 Citizens"),
    color=alt.Color('Region:N', legend=alt.Legend(title="Region")),
    tooltip=[
        'Region:N',
        'Incident Year:O',
        alt.Tooltip('shootings_per_100k:Q', title="Shootings per 100k", format=".2f"),
        'Population per Region:Q'
    ]
).transform_filter(
    "toBoolean(datum.is_selected_or_first_year)"
).add_params(
    year_select
).properties(
    title="Change in Shootings per 100,000 Citizens Across Regions Compared to First Year"
)

slope_chart

    Region  Incident Year  Population per Region  shootings_per_100k
0  Midwest           2014               68600000            0.086006
1  Midwest           2015               68600000            0.109329
2  Midwest           2016               68600000            0.131195
3  Midwest           2017               68600000            0.126822
4  Midwest           2018               68600000            0.122449


### Slope Chart Design Decisions

We decided to create a **slope chart** because it is effective for displaying changes over time, particulary for comparing the relative changes between categories, which in this case are Regions.
Slope charts are **intuitive in showing how values have increased or decreased**, making it easier for the user to identify trends between a baseline year (2014) and a user-selected year.

### Interaction and Functionality

For the interaction, we created a **dropdown menu** that allows users to select a year between 2015 and 2023 to compare against 2014.
The x-axis encodes two quantitative variables (years), while the y-axis represents the number of mass shootings per 100,000 citizens in each Region. To ensure clarity, we used distinct colors for each region with a clear legend, helping users distinguish trends at a glance.

Additionaly, tooltips provide specific data when hovering over the points, which gives an additional and useful information.

We decided to plot shootings per 100,000 citizens to ensure comparability across regions with varying populations.

Therefore, the chart is **simple but informative** providing and easy-to-read view of complex time-series data.

Regarding synchronization with the other two questions, we initially considered unifying the dropdown menus across all charts to make everything interconnected. However, we realized that this might hinder the user's ability to compare different years at a glance. As a result, we decided to keep the dropdown menus separate for each chart, preserving the ability to quickly focus on specific year comparisons.