## **Visualization 1: Nationwide UFO Reports Map**

### **Description**
This plot visualizes UFO sightings across the United States, aggregated by state. The map shows points for each UFO sighting, with users being able to filter the data by state and year. The first map shows sightings across the entire United States, while the second map zooms in on the selected state. The color encoding represents the frequency of sightings in different locations.

### **Design Choices**
- **Encoding:** The map uses geographic shapes to represent different states. Points are plotted based on the latitude and longitude of UFO sightings, with color indicating their number of occurrences in each state.
- **Interactivity:** 
  - A dropdown menu allows users to select specific states to focus on. 
  - A slider is provided to filter the data by year, enabling users to explore trends over time.
  - The nationwide map (left) will only show reports for the nation, while the state map (right) will zoom into the selected state, showing the reports for that particular state.

### **Data Transformation**
- State abbreviations were transformed into full state names for better clarity.
- The dataset is filtered to focus on the continental United States, excluding Alaska and Hawaii.
- The sightings are aggregated by year for better visualization and trends analysis.

### **Interactivity Explanation**
The interactivity allows the user to focus on specific states by selecting them from the dropdown menu. Users can also adjust the timeline by using the year slider to observe trends and patterns of UFO sightings over time. When a state is selected, the nationwide map will only display sightings for that state, while the state map will zoom in on the selected state and show sightings specifically for that region.

In [None]:
import altair as alt
import pandas as pd
from vega_datasets import data

#These plots are too large to submit if the outputs are generated.
#But they work fine, they are just not shown, you are welcome to test it. 

alt.data_transformers.disable_max_rows()

url = "https://github.com/UIUC-iSchool-DataViz/is445_data/raw/main/ufo-scrubbed-geocoded-time-standardized-00.csv"
df = pd.read_csv(url, header=None)

df.columns = [
    'Date_Time', 'City', 'State', 'Country', 'Shape', 'Duration', 
    'Duration_Reported', 'Comments', 'Date_Posted', 'Latitude', 'Longitude'
]

df['Date_Time'] = pd.to_datetime(df['Date_Time'], errors='coerce')
df['Year'] = df['Date_Time'].dt.year

df_us = (df[(df['Country'] == 'us') & 
            (~df['State'].str.lower().isin(['ak', 'hi']))]
         .assign(State=df['State'].str.upper()))

state_abbr_to_full = {
    "AL": "Alabama", "AZ": "Arizona", "AR": "Arkansas", "CA": "California",
    "CO": "Colorado", "CT": "Connecticut", "DE": "Delaware", "FL": "Florida", "GA": "Georgia",
    "ID": "Idaho", "IL": "Illinois", "IN": "Indiana", "IA": "Iowa",
    "KS": "Kansas", "KY": "Kentucky", "LA": "Louisiana", "ME": "Maine", "MD": "Maryland",
    "MA": "Massachusetts", "MI": "Michigan", "MN": "Minnesota", "MS": "Mississippi",
    "MO": "Missouri", "MT": "Montana", "NE": "Nebraska", "NV": "Nevada", "NH": "New Hampshire",
    "NJ": "New Jersey", "NM": "New Mexico", "NY": "New York", "NC": "North Carolina",
    "ND": "North Dakota", "OH": "Ohio", "OK": "Oklahoma", "OR": "Oregon", "PA": "Pennsylvania",
    "RI": "Rhode Island", "SC": "South Carolina", "SD": "South Dakota", "TN": "Tennessee",
    "TX": "Texas", "UT": "Utah", "VT": "Vermont", "VA": "Virginia", "WA": "Washington",
    "WV": "West Virginia", "WI": "Wisconsin", "WY": "Wyoming"
}
df_us["StateName"] = df_us["State"].map(state_abbr_to_full)


states_list = ['All States'] + sorted(df_us['StateName'].dropna().unique().tolist())
state_selector = alt.param(name='StateSelector', bind=alt.binding_select(options=states_list, name="Select State:"), value='All States')

us_map = alt.topo_feature(data.us_10m.url, 'states')

background = alt.Chart(us_map).mark_geoshape(
    fill='lightgray',
    stroke='white'
).transform_filter(
    'datum.id != 2 && datum.id != 15'
).properties(
    width=500,
    height=500
)

slider = alt.binding_range(min=df_us['Year'].min(), max=df_us['Year'].max(), step=1, name='Year:')
selection = alt.param(name='YearSelector', bind=slider, value=df_us['Year'].max())

points_nation = alt.Chart(df_us).mark_circle(size=30, color='red').encode(
    longitude='Longitude:Q',
    latitude='Latitude:Q',
    tooltip=[
        alt.Tooltip('City', title='City'),
        alt.Tooltip('State', title='State'),
        alt.Tooltip('Date_Time', title='Date/Time'),
        alt.Tooltip('Shape', title='Shape')
    ]
).add_params(
    selection
).transform_filter(
    'datum.Year == YearSelector'
)

main_map = background + points_nation

state_map = alt.Chart(us_map).mark_geoshape(
    fill='lightgray',
    stroke='black'
).transform_filter(
    'datum.properties.name == StateSelector && StateSelector != "All States"'
).add_params(
    state_selector
).properties(
    width=300,
    height=400
)

points_state = alt.Chart(df_us).mark_circle(size=50, color='blue').encode(
    longitude='Longitude:Q',
    latitude='Latitude:Q',
    tooltip=[
        alt.Tooltip('City', title='City'),
        alt.Tooltip('Date_Time', title='Date/Time'),
        alt.Tooltip('Shape', title='Shape')
    ]
).transform_filter(
    'datum.StateName == StateSelector && StateSelector != "All States" && datum.Year == YearSelector'
)

zoomed_map = state_map + points_state

final_map = alt.hconcat(
    main_map,
    zoomed_map
).resolve_scale(
    color='independent'
)

final_map

**Visualization 2: Time Series Line Plot**

**Description**

This plot shows the trend of UFO reports over time, grouped by two-year intervals. It aggregates the sightings by state and visualizes the number of reports for each state within each year group. Users can filter the data by selecting a specific state from a dropdown and adjusting the year range using a slider.

**Design Choices**

- **Encoding:** The X-axis represents the years, grouped into two-year intervals. The Y-axis shows the number of UFO reports per group. Each state is represented by a different color.
- **Interactivity:**
  - A dropdown menu allows users to select specific states to focus on.
  - A slider allows users to adjust the year group range, enabling them to see trends for different periods.
  - Hovering over the chart provides detailed tooltips showing the number of UFO sightings for the selected year group.

**Data Transformation**

- The "Year" data is grouped into two-year intervals to facilitate clearer trend analysis.
- State abbreviations are mapped to full state names for better readability.
- The dataset is cleaned to exclude any rows with missing data for `Date_Time` or `StateName`.

**Interactivity Explanation**

- Users can select a state from the dropdown menu to filter the data.
- The year range slider enables users to focus on specific periods.
- Hovering over data points displays the exact number of reports for that year group and state, providing further insights into the trends.

In [None]:
import altair as alt
import pandas as pd
from vega_datasets import data



alt.data_transformers.disable_max_rows()

url = "https://github.com/UIUC-iSchool-DataViz/is445_data/raw/main/ufo-scrubbed-geocoded-time-standardized-00.csv"
df = pd.read_csv(url, header=None)

df.columns = [
    'Date_Time', 'City', 'State', 'Country', 'Shape', 'Duration', 
    'Duration_Reported', 'Comments', 'Date_Posted', 'Latitude', 'Longitude'
]
df['Date_Time'] = pd.to_datetime(df['Date_Time'], errors='coerce')
df['Year'] = df['Date_Time'].dt.year
df['Year_Group'] = (df['Year'] // 2) * 2 
df['Month'] = df['Date_Time'].dt.to_period('M')

if 'State' in df.columns:
    df['State'] = df['State'].str.upper()

state_abbr_to_full = {
    "AL": "Alabama", "AK": "Alaska", "AZ": "Arizona", "AR": "Arkansas", "CA": "California",
    "CO": "Colorado", "CT": "Connecticut", "DE": "Delaware", "FL": "Florida", "GA": "Georgia",
    "HI": "Hawaii", "ID": "Idaho", "IL": "Illinois", "IN": "Indiana", "IA": "Iowa",
    "KS": "Kansas", "KY": "Kentucky", "LA": "Louisiana", "ME": "Maine", "MD": "Maryland",
    "MA": "Massachusetts", "MI": "Michigan", "MN": "Minnesota", "MS": "Mississippi",
    "MO": "Missouri", "MT": "Montana", "NE": "Nebraska", "NV": "Nevada", "NH": "New Hampshire",
    "NJ": "New Jersey", "NM": "New Mexico", "NY": "New York", "NC": "North Carolina",
    "ND": "North Dakota", "OH": "Ohio", "OK": "Oklahoma", "OR": "Oregon", "PA": "Pennsylvania",
    "RI": "Rhode Island", "SC": "South Carolina", "SD": "South Dakota", "TN": "Tennessee",
    "TX": "Texas", "UT": "Utah", "VT": "Vermont", "VA": "Virginia", "WA": "Washington",
    "WV": "West Virginia", "WI": "Wisconsin", "WY": "Wyoming"
}
df['StateName'] = df['State'].map(state_abbr_to_full)

cleaned_df = df.dropna(subset=['Date_Time', 'StateName'])

states_list = ['All States'] + sorted(cleaned_df['StateName'].dropna().unique().tolist())

state_selector = alt.param(name='StateSelector', bind=alt.binding_select(options=states_list, name="Select State:"), value='All States')

year_groups = sorted(cleaned_df['Year_Group'].dropna().unique().tolist())

slider = alt.binding_range(min=min(year_groups), max=max(year_groups), step=2, name='Year Group:')
time_selector = alt.param(name='YearGroupSelector', bind=slider, value=max(year_groups))

annual_counts = cleaned_df.groupby(['Year_Group', 'StateName']).size().reset_index(name='Count')

nearest = alt.selection_point(
    on='mouseover', 
    nearest=True, 
    fields=['Year_Group'], 
    empty='none'
)

line_chart = alt.Chart(annual_counts).mark_line().encode(
    x=alt.X('Year_Group:O', title='Year (Grouped by 2 Years)'),
    y=alt.Y('Count:Q', title='Number of Reports'),
    color=alt.Color('StateName:N', title='State'),
    tooltip=[alt.Tooltip('Count:Q', title='Number of Reports')]
).transform_filter(
    '(StateSelector == "All States" || datum.StateName == StateSelector) && datum.Year_Group <= YearGroupSelector'
).add_params(
    state_selector,
    time_selector
).properties(
    width=800,
    height=400,
    title="UFO Sightings by State (Grouped by 2 Years)"
)

points = line_chart.mark_circle(size=80, color='red').encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
).add_params(nearest)

final_chart = line_chart + points
final_chart

**Visualization 3: UFO Report Counts by Shape**

**Description**

This plot visualizes the number of UFO sightings reported by shape, grouped by year. Users can adjust the year range using a slider to focus on specific periods. Each bar represents a specific UFO shape, and the height of the bar corresponds to the number of sightings for that shape within the selected year group.

**Design Choices**

- **Encoding:**
  - The X-axis represents the different UFO shapes.
  - The Y-axis represents the number of reports for each shape.
  - The color encoding differentiates the shapes with distinct colors.
- **Interactivity:**
  - A slider allows users to adjust the year group range, enabling them to explore the data across different periods.
  - Hovering over each bar displays the exact number of sightings for the respective UFO shape and year group.

**Data Transformation**

- The "Year" data is grouped into two-year intervals to facilitate clearer trend analysis.
- Only rows with valid UFO shapes and state names are included in the final dataset.
- The dataset is grouped by `Shape` and `Year_Group` to calculate the count of reports for each combination.

**Interactivity Explanation**

- The slider allows users to filter the data by year group, focusing on specific periods of time.
- Hovering over each bar provides additional insights by showing the exact count of UFO sightings for each shape and year group.

In [None]:


import altair as alt
import pandas as pd
from vega_datasets import data





alt.data_transformers.disable_max_rows()

url = "https://github.com/UIUC-iSchool-DataViz/is445_data/raw/main/ufo-scrubbed-geocoded-time-standardized-00.csv"
df = pd.read_csv(url, header=None)

df.columns = [
    'Date_Time', 'City', 'State', 'Country', 'Shape', 'Duration', 
    'Duration_Reported', 'Comments', 'Date_Posted', 'Latitude', 'Longitude'
]
df['Date_Time'] = pd.to_datetime(df['Date_Time'], errors='coerce')
df['Year'] = df['Date_Time'].dt.year
df['Year_Group'] = (df['Year'] // 2) * 2  

cleaned_df = df.dropna(subset=['Shape', 'State'])

shapes_list = sorted(cleaned_df['Shape'].dropna().unique().tolist())

year_groups = sorted(cleaned_df['Year_Group'].dropna().unique().tolist())

slider = alt.binding_range(min=min(year_groups), max=max(year_groups), step=2, name='Year Group:')
time_selector = alt.param(name='YearGroupSelector', bind=slider, value=max(year_groups))

shape_counts = cleaned_df.groupby(['Shape', 'Year_Group']).size().reset_index(name='Count')

bar_chart = alt.Chart(shape_counts).mark_bar().encode(
    x=alt.X('Shape:N', title='Shape', sort='-y'),
    y=alt.Y('Count:Q', title='Number of Reports'),
    color=alt.Color('Shape:N', title='UFO Shape', legend=alt.Legend(labelFontSize=14)),
    tooltip=['Shape', 'Year_Group', 'Count']
).transform_filter(
    'datum.Year_Group <= YearGroupSelector'
).add_params(
    time_selector
).properties(
    width=800,
    height=400,
    title="UFO Report Counts by Shape"
)

bar_chart