# Bigfoot Dataset

In [41]:
import pandas as pd
import altair as alt
import matplotlib.pyplot as plt
import seaborn as sns

In [42]:
url = "https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_data/main/bfro_reports_fall2022.csv"
df = pd.read_csv("bfro_reports_fall2022.csv")
df.head()

FileNotFoundError: [Errno 2] No such file or directory: 'bfro_reports_fall2022.csv'

In [None]:
df.isna().sum()

observed                36
location_details       739
county                   0
state                    0
season                   0
title                  950
latitude               950
longitude              950
date                   950
number                   0
classification           0
geohash                950
temperature_high      1649
temperature_mid       1797
temperature_low       1794
dew_point             1614
humidity              1614
cloud_cover           1894
moon_phase            1591
precip_intensity      2254
precip_probability    2254
precip_type           3187
pressure              2336
summary               1591
uv_index              1591
visibility            1930
wind_bearing          1600
wind_speed            1598
location               950
dtype: int64

In [None]:
df.shape

(4747, 29)

In [None]:
df = df.dropna(subset=['latitude', 'longitude', 'date'])
df['date'] = pd.to_datetime(df['date'], errors='coerce')
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month_name()
df.head(5)

Unnamed: 0,observed,location_details,county,state,season,title,latitude,longitude,date,number,...,precip_type,pressure,summary,uv_index,visibility,wind_bearing,wind_speed,location,year,month
2,I was on my way to Claremont from Lebanon on R...,Close to Claremont down 120 not far from Kings...,Sullivan County,New Hampshire,Summer,Report 55269: Dawn sighting at Stevens Brook o...,43.41549,-72.33093,2016-06-07,55269.0,...,rain,998.87,Mostly cloudy throughout the day.,6.0,9.7,262.0,0.49,POINT(-72.33093000000001 43.415490000000005),2016,June
3,I was northeast of Macy Nebraska along the Mis...,Latitude & Longitude : 42.158230 -96.344197,Thurston County,Nebraska,Spring,Report 59757: Possible daylight sighting of a ...,42.15685,-96.34203,2018-05-25,59757.0,...,,1008.07,Partly cloudy in the morning.,10.0,8.25,193.0,3.33,POINT(-96.34203000000001 42.15685),2018,May
4,"While this incident occurred a long time ago, ...","Ward County, Just outside of a the Minuteman T...",Ward County,North Dakota,Spring,Report 751: Hunter describes described being s...,48.25422,-101.3166,2000-04-21,751.0,...,rain,1011.47,Partly cloudy until evening.,6.0,10.0,237.0,11.14,POINT(-101.3166 48.254220000000004),2000,April
5,"In early spring 1988, some friends of mine and...","Yancey County, North Carolina, near the summit...",Yancey County,North Carolina,Spring,Report 3339: Deep impressions seen in the snow,35.74875,-82.26195,1988-03-15,3339.0,...,,1014.47,Partly cloudy until evening and breezy through...,7.0,9.5,348.0,16.94,POINT(-82.26195 35.74875),1988,March
6,This happened summertime early 70's (I think 7...,To get there take Highway 78 south out of Absa...,Stillwater County,Montana,Summer,Report 47215: Female fly fisherman's lucid rec...,45.31278,-109.6449,1971-12-15,47215.0,...,,,,,,,,POINT(-109.6449 45.31278),1971,December


In [None]:
df.columns

Index(['observed', 'location_details', 'county', 'state', 'season', 'title',
       'latitude', 'longitude', 'date', 'number', 'classification', 'geohash',
       'temperature_high', 'temperature_mid', 'temperature_low', 'dew_point',
       'humidity', 'cloud_cover', 'moon_phase', 'precip_intensity',
       'precip_probability', 'precip_type', 'pressure', 'summary', 'uv_index',
       'visibility', 'wind_bearing', 'wind_speed', 'location', 'year',
       'month'],
      dtype='object')

## Visualization 1

**Description:**

The first visualization is a bar chart displaying the number of Bigfoot Sightings by year. Each bar represents the total numbers of sightings in a given year, allowing for easy understanding of sightings, as well as trends and patterns. 

**Design Choices:**

-The X axis was encoded as an ordinal variable and sorted into chronological order, to ensure that the chart would display the years in chronological order rather than sorting them based on the number of sightings in that year.

-The y-axis was encoded as quantitative to show the number of sightings. 

-The color was encoded as quantitative, and the color scheme was “oranges”, further visualizing the years that had more sightings. By making the years with a greater number of sightings darker, it can enhance the contrast and show patterns in the years. By using a color scheme with a single color it also allows individuals with color blindness or other visual impairments understand the pattern using the contrast of the colors, rather than the colors themselves, ensuring the chart is accessible. 

This visualization is static, so it cannot be interacted with.

In [None]:
sightings_by_year = df.groupby('year').size().reset_index(name='count')

years_sorted = sorted(sightings_by_year['year'].dropna())

chart1 = alt.Chart(sightings_by_year).mark_bar().encode(
    x=alt.X('year:O', sort=years_sorted, title='Year'), 
    y=alt.Y('count:Q', title='Number of Sightings'),
    color=alt.Color('count:Q', scale=alt.Scale(scheme='oranges'))
).properties(
    title='Bigfoot Sightings by Year',
    width=1000,
    height=400
)

chart1

## Visualization 2

**Description:**

The second visualization is a bar chart showing the total number of Bigfoot sightings in each U.S state. Each horizontal bar represents a state, and the length is the total number of sightings in that state. This plot can help identify some geography information and show which states have more sightings. 

**Design Choices:**

-The X-axis of this chart is quantitative, and is encoded with the number of sightings.

-The Y-axis is encoded as nominal categorical variables, since the labels are state names and don't have an inherent numeric order assigned to them. I sorted the bars in descending order to show the states with the most sightings on the top of the graph. 

-The color is also quantitative, and has a sequential blue color scale for a similar reason as the first chart: to show the magnitude of the sightings and make the contrast easier to visually compare them. From this choice, it is easily understood that Washington has far more sightings than any other state even without looking at the actual number of reports. 

-The data was aggregated by state, and this chart is static, and not interactable. The horizontal orientation, compared to vertical, makes it easier to read the states and make the chart overall easier to read and understand.

In [None]:
sightings_by_state = df.groupby('state').size().reset_index(name='count')

chart2 = alt.Chart(sightings_by_state).mark_bar().encode(
    y=alt.Y('state:N', sort='-x', title='State'),
    x=alt.X('count:Q', title='Number of Reports'),
    color=alt.Color('count:Q', scale=alt.Scale(scheme='blues'))
).properties(
    title='Bigfoot Sightings by U.S. State',
    width=500,
    height=600
)
chart2

## Visualization 3

**Description:**

The third visualization is an interactive line chart showing how Bigfoot sightings have changed over time for a particular state. Users can select the state from a dropdown menu located at the bottom of the chart, and the chart updates to show that state’s sightings by year. Every point represents the number of sightings in a given year, clearly displaying trends and changes. 

**Design Choices:**

-The X_axis is encoded as ordinal, and shows the chronological order of the years where sightings have occurred. 

-The Y_axis is quantitative, and displays the number of sightings, which corresponds to the line as points in the line graph. 

-The Color is a set, fixed color of midnight blue, contrasting against the white background and making it easier to see the line, even if an individual is usually impaired. I aggregated the dataset by both year and state to allow for the chart to calculate yearly sightings by state. 

**Interactivity:**

The chart is interactive using the dropdown menu, interactive sizing option,and the tooltip. The dropdown menu allows for a user to choose which state they want to look into with more detail. The .interactive() feature makes it easy to adjust the size of the line chart for visibility and exploratory purposes. The tooltip allows for a user to hover over a particular point, which will display the state, year and count, making it even easier to see. 

In [None]:
yearly = df.groupby(['year', 'state']).size().reset_index(name='count')

state_dropdown = alt.binding_select(options=sorted(yearly['state'].unique()), name='Select state: ')
state_select = alt.selection_point(fields=['state'], bind=state_dropdown, value='California')

chart3 = (
    alt.Chart(yearly)
    .mark_line(point=True, color='midnightblue')
    .encode(
        x=alt.X('year:O', title='Year'),
        y=alt.Y('count:Q', title='Sightings'),
        #color=alt.Color('state:N', legend=None),
        tooltip=['state', 'year', 'count']
    )
    .add_params(state_select)
    .transform_filter(state_select)
    .properties(
        title='Yearly Bigfoot Sightings by State (Interactive)',
        width=700
    )
).interactive()
chart3

In [None]:
chart1.save('chart1.html')
chart2.save('chart2.html')
chart3.save('chart3.html')