<a href="https://www.kaggle.com/code/mikedelong/understand-danger-with-scatter-plots?scriptVersionId=221007268" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [1]:
import pandas as pd
df = pd.read_csv(filepath_or_buffer='/kaggle/input/nasa-neo-near-earth-object-dataset/neo_data.csv', 
                 parse_dates=['Close Approach Date', 'Close Approach Date (Full)']).drop(columns=['NASA JPL URL'])
df['year'] = df['Close Approach Date'].dt.year
df.head()

Unnamed: 0,ID,Neo Reference ID,Name,Limited Name,Designation,Absolute Magnitude (H),Min Diameter (km),Max Diameter (km),Min Diameter (m),Max Diameter (m),...,Epoch Date Close Approach,Relative Velocity (km/s),Relative Velocity (km/h),Relative Velocity (miles/h),Miss Distance (astronomical),Miss Distance (lunar),Miss Distance (km),Miss Distance (miles),Orbiting Body,year
0,2000433,2000433,433 Eros (A898 PA),Eros,433,10.41,22.006703,49.208483,22006.702711,49208.483223,...,-2177879400000,5.578619,20083.029075,12478.81326,0.314929,122.507447,47112730.0,29274490.0,Earth,1900
1,2000433,2000433,433 Eros (A898 PA),Eros,433,10.41,22.006703,49.208483,22006.702711,49208.483223,...,-1961526540000,4.394491,15820.167199,9830.036668,0.471486,183.407876,70533230.0,43827320.0,Earth,1907
2,2000433,2000433,433 Eros (A898 PA),Eros,433,10.41,22.006703,49.208483,22006.702711,49208.483223,...,-1663036860000,4.816784,17340.422466,10774.664171,0.499257,194.211053,74687810.0,46408860.0,Earth,1917
3,2000433,2000433,433 Eros (A898 PA),Eros,433,10.41,22.006703,49.208483,22006.702711,49208.483223,...,-1446083220000,4.596055,16545.797588,10280.915173,0.359786,139.956944,53823290.0,33444240.0,Earth,1924
4,2000433,2000433,433 Eros (A898 PA),Eros,433,10.41,22.006703,49.208483,22006.702711,49208.483223,...,-1228247580000,5.920819,21314.946723,13244.278979,0.174073,67.714454,26040970.0,16181110.0,Earth,1931


In [2]:
from plotly import express
from plotly.offline import init_notebook_mode

init_notebook_mode(connected=True)

columns = ['Limited Name', 'Orbiting Body']
express.bar(data_frame=df[columns].groupby(by=columns).size().reset_index().sort_values(ascending=False, by=0), x='Limited Name', y=0, color='Orbiting Body').show(renderer='iframe_connected',)

Most of the Near Earth Objects mostly or entirely orbit Earth.

In [3]:
express.bar(data_frame=df, x='Limited Name', color='year').show(renderer='iframe_connected',)

Here we learn that some of our data is historical and some of it is forecast.

In [4]:
express.histogram(data_frame=df[df['year'] <2024].sort_values(by='Limited Name'), x='year', color='Limited Name', nbins=124).show(renderer='iframe_connected',)
express.histogram(data_frame=df[df['year'] <2024].sort_values(by='Limited Name'), x='year', color='Is Potentially Hazardous', nbins=124).show(renderer='iframe_connected',)

If we look at just the historical data relative to 2023, we see that we do not have observations of every object every year.

In [5]:
express.scatter(data_frame=df.sort_values(by='Limited Name'), x='Limited Name', y='year', color='Miss Distance (km)').show(renderer='iframe_connected',)

This chart breaks out our data by object, but because most of the miss distances are clustered the color doesn't tell us much.

In [6]:
express.scatter(data_frame=df.sort_values(by='Limited Name'), x='Limited Name', y='year', color='Is Potentially Hazardous').show(renderer='iframe_connected',)

This is what we really want to know: according to the dataset only three of these objects are potentially hazardous.

In [7]:
express.histogram(data_frame=df, x='Miss Distance (km)', facet_col='Is Potentially Hazardous',).show(renderer='iframe_connected',)

Distance and danger are only somewhat correlated: all dangerous objects are close, but not all close objects are dangerous.

In [8]:
express.histogram(data_frame=df, x='Relative Velocity (km/s)', facet_col='Is Potentially Hazardous',).show(renderer='iframe_connected',)

Similarly relative velocity and danger are not highly correlated; we might expect the more hazardous objects to be moving quickly, but the not-hazardous cohort is actually moving faster on average.

In [9]:
express.scatter(data_frame=df, y='Relative Velocity (km/s)', x='Miss Distance (km)', hover_name='Limited Name', size='Min Diameter (km)', color='year', log_x=True).show(renderer='iframe_connected',)

If we plot distance, velocity, size, and time together we can sort of see orbits of specific objects in some cases.

In [10]:
express.scatter(data_frame=df, y='Relative Velocity (km/s)', x='Miss Distance (km)', hover_name='Limited Name', size='Min Diameter (km)', color='Is Potentially Hazardous', symbol='Orbiting Body', log_x=True).show(renderer='iframe_connected',)

We can color the same plot by danger rather than year; this looks kind of cool but I'm not sure it tells us anything.

In [11]:
express.scatter(data_frame=df, x='year', y='Miss Distance (km)', hover_name='Limited Name', size='Min Diameter (km)', color='Is Potentially Hazardous', log_y=True).show(renderer='iframe_connected',)

Similarly we can plot year x distance and the nearest near-miss events stand out, sort of.