### Motivation
**What is your dataset?**

This project contains several datasets, in order to explore the phenomena of UFO sightings through multiple lenses. The first and main dataset is the NUFORC UFO Sightings dataset, sourced from the National UFO Reporting Center and hosted on Kaggle, comprises around 80,000 individual reports of unidentified flying object (UFO) sightings. These records span more than a century, offering a foundation for both quantitative and qualitative analysis of UFO sightings. The second dataset this project contains is covering US military bases, it contains 776 entries, and 10 attributes, describing the location, the name of the base and the branch of service which the base is tied to. The third and last dataset used, is a dataset over historic populations in the different US States.

**Why did you choose this/these particular dataset(s)?**

The interest in this dataset arises from a foundation of scientific curiosity. UFO sightings have long been a hot bed of conspiracy theory ranging from alleged extraterrestrial encounters to secret government projects. At the same time, they have played a prominent role in popular culture, featured in iconic films and television shows such as The X-Files, Close Encounters of the Third Kind, E.T., and The Thing, among many others. This makes the study of UFO reports not only intriguing but also culturally significant. We are particularly excited about the opportunity to explore these sightings through both quantitative and qualitative analysis. Regardless of whether one is a skeptic or a true believer, there is no denying the existence of UFOs in the strict sense of the term; Unidentified Flying Objects observed by individuals looking up at the sky. This dataset offers a substantial sample size spanning multiple decades and includes several key attributes that provide strong insights into these encounters. These include the reported shape of the object, geographic coordinates, state annotations, the duration of each sighting, and the year of occurrence. The militaray base dataset is interesting due to the coupling with government projects, such as air force experiemental flyings. The military base dataset is interesting because it is linked to government projects, such as experimental flights by the Air Force. The historic state population dataset, is interesting due to the correlation between popualtion and UFO sights.

**What was your goal for the end user's experience?**

Our goal is to provide an engaging and informative experience that encourages critical thinking about UFO sightings. By taking a skeptical perspective, we aim to help users explore alternative explanations for sightings—such as proximity to military bases or population density. Through tools like heat maps and time series visualizations, we wanted users to interact with the data and uncover patterns that might suggest grounded, non-extraterrestrial causes for reported sightings.

### Basic Stats
**Write about your choices in data cleaning and preprocessing**

For the UFO sightings dataset, our first decision was to filter out all entries outside the United States. This choice was based on two main reasons. First, the vast majority of sightings approximately 74,000 out of 80,000 occurred within the U.S., making it the most relevant and data-rich region for analysis. Second, this allowed us to narrow the scope of the project to a more manageable and consistent geographic focus.

In the military base dataset, we further refined the data to include only U.S. Air Force bases. While all branches of the military operate aircraft, the Air Force is uniquely dedicated to aeronautics. Additionally, it would be difficult to reliably distinguish between different types of army bases (e.g., infantry vs. air units), making the Air Force the most suitable focus for this analysis. We removed all rows containing missing (NaN) values from each dataset.

**Write a short section that discusses the dataset stats, containing key points/plots from your exploratory data analysis**




In [7]:
import pandas as pd
import numpy as np
df = pd.read_csv("../data/complete.csv", on_bad_lines='skip')
df

  df = pd.read_csv("../data/complete.csv", on_bad_lines='skip')


Unnamed: 0,datetime,city,state,country,shape,duration (seconds),duration (hours/min),comments,date posted,latitude,longitude
0,10/10/1949 20:30,san marcos,tx,us,cylinder,2700,45 minutes,This event took place in early fall around 194...,4/27/2004,29.8830556,-97.941111
1,10/10/1949 21:00,lackland afb,tx,,light,7200,1-2 hrs,1949 Lackland AFB&#44 TX. Lights racing acros...,12/16/2005,29.38421,-98.581082
2,10/10/1955 17:00,chester (uk/england),,gb,circle,20,20 seconds,Green/Orange circular disc over Chester&#44 En...,1/21/2008,53.2,-2.916667
3,10/10/1956 21:00,edna,tx,us,circle,20,1/2 hour,My older brother and twin sister were leaving ...,1/17/2004,28.9783333,-96.645833
4,10/10/1960 20:00,kaneohe,hi,us,light,900,15 minutes,AS a Marine 1st Lt. flying an FJ4B fighter/att...,1/22/2004,21.4180556,-157.803611
...,...,...,...,...,...,...,...,...,...,...,...
88674,9/9/2013 22:00,napa,ca,us,other,1200.0,hour,Napa UFO&#44,9/30/2013,38.297222,-122.284444
88675,9/9/2013 22:20,vienna,va,us,circle,5.0,5 seconds,Saw a five gold lit cicular craft moving fastl...,9/30/2013,38.901111,-77.265556
88676,9/9/2013 23:00,edmond,ok,us,cigar,1020.0,17 minutes,2 witnesses 2 miles apart&#44 Red &amp; White...,9/30/2013,35.652778,-97.477778
88677,9/9/2013 23:00,starr,sc,us,diamond,0.0,2 nights,On September ninth my wife and i noticed stran...,9/30/2013,34.376944,-82.695833


### Data Analysis
//Describe your data analysis and explain what you've learned about the dataset.//

//If relevant, talk about your machine-learning.//

### Genre. Which genre of data story did you use?
This project lends itself naturally to a magazine-style format. Instead of simply representing af straightforward narative, it showcases a collection of interconnected visualisations, therefore it would also require some explanatory text, inbetween the visiaulations in order to discuss their meaning and relation.

**Visual Narative** 

For visual structuring the project is structured using a consisten visual platform, it uses the same color scheme for all visualisations, through a consistent visual narrative.


Which tools did you use from each of the 3 categories of Narrative Structure (Figure 7 in Segal and Heer). Why?

**Narative structure**

This project employed linear ordering. The layout follows a linear structure, guiding the reader from top to bottom. Each section pairs explanatory text with its corresponding plot, encouraging readers to read, reflect on the visualization, and then move naturally to the next insight in the sequence.

The project also used hover/highlighting in order to increase the information of each plot shown, for example in the correlation plots it is possible to observe which state, what year, and the number of sightings each datapoint represents. Using hover/highlighting it is possible to add this additional information without it adding more noise to the visualisation. Another interactivity method used was filetering, figure 5, allowing the reader to filter for states, in order to see correlation of a unique state. 

For messaging the project used captions for captions and annotations for every visualisation in order to simplify for the reader the information of the visualisation. Thereby we expect the reader can extrapolate more information more efficiently from each visualisation. The project has an introduction in order to clue the reader in on what this project will investigate, but also the motivation behind the investigation. Additionally, the visualizations are accompanied by a written article that offers further explanation, background, and interpretation to enhance the overall narrative.


### Visualizations
Explain the visualizations you've chosen.

Why are they right for the story you want to tell?

### Discussion. Think critically about your creation
What went well?

What is still missing? What could be improved?, Why?

### Contributions. Who did what?
You should write (just briefly) which group member was the main responsible for which elements of the assignment. (I want you guys to understand every part of the assignment, but usually there is someone who took lead role on certain portions of the work. That's what you should explain).
It is not OK simply to write "All group members contributed equally".
Make sure that you use references when they're needed and follow academic standards.