# Meteorite landings on Earth

### Maaike de Jong

In this notebook I analyse and visualize the NASA Meteorite Landings dataset, which can be found here. 
A short background and description of the dataset can be found in the readme file. 

The Meteoritical Society collects data on meteorites that have fallen to Earth from outer space. This dataset includes the location, mass, composition, and fall year for over 45,000 meteorites that have struck our planet.

Notes on missing or incorrect data points:
* a few entries here contain date information that was incorrectly parsed into the NASA database. As a spot check: any date that is before 860 CE or after 2016 are incorrect; these should actually be BCE years. There may be other errors and we are looking for a way to identify them.

* a few entries have latitude and longitude of 0N/0E (off the western coast of Africa, where it would be quite difficult to recover meteorites). Many of these were actually discovered in Antarctica, but exact coordinates were not given. 0N/0E locations should probably be treated as NA.

In [None]:
# import packages
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# import dataset
df = pd.read_csv('../data/meteorite-landings.csv')
df.head()

In [None]:
# inspect data types
#df.dtypes
df.info()
# transform year into date time?

In [None]:
# Check missing values
percent_missing = df.isnull().sum() * 100 / len(df)
missing_value_df = pd.DataFrame({'column_name': df.columns,
                                 'percent_missing': percent_missing})
missing_value_df

In [None]:
# convert year from float to integer
df['year'] = df['year'].fillna(0).astype(int)
df.info() 

In [None]:
# inspect data with describe
df.describe()

In [None]:
# clean data
# filter out data with year < 860 and > 2016 
df2 = df.loc[(df['year'] >= 860) & (df['year'] <= 2016)]

In [None]:
# clean data
# filter out data with 0,0 coordinates
df3 = df2.loc[(df2['reclat'] != 0) & (df2['reclong'] != 0)]

In [None]:
df3.info()

In [None]:
# inspecting properties and distribution of variables

# types of recclass
meteor_types = pd.DataFrame(df3['recclass'].value_counts())
meteor_types.info()

# There are 441 distinct meteorite types in the data. If I want to plot different types I'll have to make a selection

In [None]:
# inspecting properties and distribution of variables

# distribution of mass

#df3.plot.hist(y = 'mass')
sns.boxplot(x = 'mass', data = df3)

# the distribution is difficult to visualize because of outliers with a lot of mass
# if I want to do anything with mass I have to make groupings or convert the data

In [None]:
sns.violinplot(x = 'mass', data = df3)

In [None]:
# inspecting properties and distribution of variables

# column 'fall'

df3['fall'].value_counts()

# There are many more 'found' meteorites than 'fell' meteorites

In [None]:
# inspecting properties and distribution of variables

# column 'year'

sns.boxplot(x = 'year', data = df3)

In [None]:
sns.violinplot(x = 'year', data = df3)

# the vast majority of observations are between 1900 and 2016, which is to be expected

In [None]:
# Next steps:
# 1. plot the points on a world map
# 2. plot the points on a world map with different colours for differnt types (make a selection of types)
# 3. Visualise the number of found meteorites per country. Need to find country for each coordinate. Figure out how to do this. 
# 4. Create maps for several different countries/ hotspots, e.g. : Netherlands, Morocco, Namibia

# Other ideas:
# 5. Visualise by type
# 6. Interactive map


In [None]:
# import geospatial packages

import geopandas as gpd

In [None]:
# create geodataframe, with geometry column from long, lat columns

gdf = gpd.GeoDataFrame(df3, geometry = gpd.points_from_xy(df3['reclong'], df3['reclat']))
gdf.head()

In [None]:
#check whether the geometry column is the right datatype

type(gdf.geometry)

In [None]:
# import world map geodataframe

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world.head()

In [None]:
# plot world map with country borders

plt.rcParams['figure.figsize'] = [16,8]
world.plot(facecolor = 'silver', edgecolor = 'grey');

In [None]:
# Plot layer of meteorites

gdf.plot(marker='*', color='green', markersize=5);

In [None]:
# There's a weird point that must be a mistake. longitude is larger than 300
# find entry in df 

gdf.loc[(df3['reclong'] > 300)]

In [None]:
# remove entry

gdf = gdf.drop([22946], axis=0)

In [None]:
# Plot meteorites again to check whether the point is gone

gdf.plot(marker='*', color='green', markersize=5);

In [None]:
# Before combining maps, ensure they share a common CRS (so they will align)

# first assign a crs to the gdf (a common one is WGS84 latitude-longitude coordinate system)

gdf.crs = "EPSG:4326"

# then, align crs to world crs

gdf = gdf.to_crs(world.crs)

In [None]:
base = world.plot(color='silver', edgecolor='grey')
gdf.plot(ax=base, marker='o', color='blue', markersize=2);

In [None]:
# spatial join meteorite gdf and world gdf to assign countries to meteorite coordinates
# make sure to include all countries even if no meteorite data


In [None]:
# aggregate meteorites per country


In [None]:
# make map with number of meteorites per country indicated with colours
# see: https://geopandas.org/mapping.html

In [None]:
# make maps for several countries