In [6]:
import pandas as pd
import altair as alt

alt.data_transformers.disable_max_rows()

ufo_url = 'https://github.com/UIUC-iSchool-DataViz/is445_data/raw/main/ufo-scrubbed-geocoded-time-standardized-00.csv'
column_names = ["datetime", "city", "state", "country", "shape", "duration_sec", "duration", "comments", "date_posted", "latitude", "longitude"]
df = pd.read_csv(ufo_url, header=None, names=column_names)
df = df.sample(n=14000, random_state=42)
df['datetime'] = pd.to_datetime(df['datetime'], errors='coerce')
df = df.dropna(subset=['latitude', 'longitude', 'datetime'])

scatter = alt.Chart(df).mark_circle(size=60).encode(
    x=alt.X('longitude:Q', title='Longitude'),
    y=alt.Y('latitude:Q', title='Latitude'),
    color=alt.Color('datetime:T', scale=alt.Scale(scheme='inferno'), title='Date of Sighting'),
    tooltip=[alt.Tooltip('datetime:T', title='Date & Time'),
             alt.Tooltip('latitude:Q', title='Latitude'),
             alt.Tooltip('longitude:Q', title='Longitude')]
).properties(
    title='Geospatial Scatter Plot of UFO Sightings',
    width=600,
    height=400
)

shape_counts = df['shape'].value_counts().reset_index()
shape_counts.columns = ['shape', 'count']

hover = alt.selection_single(on='mouseover', empty='none', fields=['shape'])

bar_chart = alt.Chart(shape_counts).mark_bar().encode(
    x=alt.X('shape:N', title='UFO Shape', sort='-y'),
    y=alt.Y('count:Q', title='Number of Sightings'),
    color=alt.condition(hover, alt.value('steelblue'), alt.value('lightgray')),
    tooltip=[alt.Tooltip('shape:N', title='Shape'), alt.Tooltip('count:Q', title='Count')]
).properties(
    title='Distribution of UFO Shapes',
    width=600,
    height=400
).add_selection(
    hover
)

scatter.save('scatter_plot.json')
bar_chart.save('bar_chart.json')

scatter & bar_chart

Deprecated since `altair=5.0.0`. Use selection_point instead.
  hover = alt.selection_single(on='mouseover', empty='none', fields=['shape'])
Deprecated since `altair=5.0.0`. Use add_params instead.
  ).add_selection(


# Viz1: Scatter Plot of UFO Sightings

Here, I provide a geospatial scatter plot wherein I plot the reported UFO sightings against their latitude and longitude coordinates. The map provides a spatial indication of where the sightings are occurring, and therefore it is easy for one to identify clusters and trends in different geographical locations. For the design, I employed quantitative encodings for both axes (longitude for x-axis and latitude for y-axis) to place each sighting on the map accordingly. I quantified the datetime variable as a temporal field with a sequential color scheme (inferno) that encodes sightings over time by mapping darker colors to more recent incidents and lighter colors to older incidents. On the analysis end, I broke up the raw date strings into actual datetime objects and excluded any records missing coordinate or date information for the sake of data integrity. While the story itself depends on tooltips to provide interactivity through the display of the precise date and coordinate information on hovering over any point, the latter function takes learning to another level by providing viewers with more context without taking up visual space.

# Viz2: Barchart of UFO Shapes

The second plot discloses the categorical organization of the UFO data by means of a bar graph that is used to portray the frequency of shapes in the UFO sightings. It shifts the focus from spatial-temporal patterns to an investigation of the heterogeneity found in shapes. While creating this chart, I grouped the data based on counted frequency of each and every shape, i.e., raw data converted into tabulated form with most and least frequent shapes. The category variable (category of UFO) is plotted on the x-axis and quantitative value for UFO sighting on the y-axis such that comparison is simple across categories. For better readability and user interaction, I used interactive hover effect: when you hover over one single bar, it becomes highlighted in steelblue and other bars light grayed. This interactivity allows users to highlight one single category instantly, and better readability and comparison of values of different shapes are possible. Data transformation step included splitting the data set by 'shape' column and resetting index so that it would be available for visualization to aggregate properly and as accurately as possible.