# Meteorite landings on Earth

Maaike de Jong

Ironhack Amsterdam Data Analytics

04 April 2020

## 1. Introduction
In this notebook I analyse and visualize the NASA Meteorite Landings dataset, which can be found here. 
A short background and description of the dataset can be found in the readme file. 

The Meteoritical Society collects data on meteorites that have fallen to Earth from outer space. This dataset includes the location, mass, composition, and fall year for over 45,000 meteorites that have struck our planet.

Notes on missing or incorrect data points:
* a few entries here contain date information that was incorrectly parsed into the NASA database. As a spot check: any date that is before 860 CE or after 2016 are incorrect; these should actually be BCE years. There may be other errors and we are looking for a way to identify them.

* a few entries have latitude and longitude of 0N/0E (off the western coast of Africa, where it would be quite difficult to recover meteorites). Many of these were actually discovered in Antarctica, but exact coordinates were not given. 0N/0E locations should probably be treated as NA.

## 2. Import Packages and data

In [None]:
# import packages
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import geopandas as gpd

In [None]:
# import dataset
df = pd.read_csv('../data/meteorite-landings.csv')
df.head()

## 3. Inspecting and cleaning the data

In [None]:
# inspect data types
#df.dtypes
df.info()
# transform year into date time?

In [None]:
# Check missing values
percent_missing = df.isnull().sum() * 100 / len(df)
missing_value_df = pd.DataFrame({'column_name': df.columns,
                                 'percent_missing': percent_missing})
missing_value_df

In [None]:
# convert year from float to integer
df['year'] = df['year'].fillna(0).astype(int)
df.info() 

In [None]:
# inspect data with describe
df.describe()

In [None]:
# clean data
# filter out data with year < 860 and > 2016 
df2 = df.loc[(df['year'] >= 860) & (df['year'] <= 2016)]

In [None]:
# clean data
# filter out data with 0,0 coordinates
df3 = df2.loc[(df2['reclat'] != 0) & (df2['reclong'] != 0)]

In [None]:
df3.info()

In [None]:
# inspecting properties and distribution of variables

# types of recclass
meteor_types = pd.DataFrame(df3['recclass'].value_counts())
meteor_types.head()

# There are 441 distinct meteorite types in the data. 
#If I want to plot different types I'll have to make a selection

In [None]:
# Plotting the counts of the meteor types we can see that are 8 most common types, 
# after which there are many types with relatively low numbers
meteor_types.head(20).plot.bar()


The most common types of meteorites in the data are:
L6, H5, L5, H6, H4, LL5, LL6, L4

These all fall in the category 'ordinary chondrites'

From wikipedia:
A chondrite /ˈkɒndraɪt/ is a stony (non-metallic) meteorite that has not been modified, by either melting or differentiation of the parent body. They are formed when various types of dust and small grains in the early Solar System accreted to form primitive asteroids. Some such bodies that are captured in the planet’s gravity well become the most common type of meteorite by (whether quickly, or after many orbits) arriving on a trajectory toward the Earth’s surface. Estimates for their contribution to the total meteorite population vary between 85.7% and 86.2%.

In [None]:
# inspecting properties and distribution of variables

# distribution of mass

#df3.plot.hist(y = 'mass')
sns.boxplot(x = 'mass', data = df3)

# the distribution is difficult to visualize because of outliers with a lot of mass
# if I want to do anything with mass I have to make groupings or convert the data

In [None]:
sns.violinplot(x = 'mass', data = df3)

In [None]:
# inspecting properties and distribution of variables

# column 'fall'

df3['fall'].value_counts()

# There are many more 'found' meteorites than 'fell' meteorites

In [None]:
# inspecting properties and distribution of variables

# column 'year'

sns.boxplot(x = 'year', data = df3)

In [None]:
sns.violinplot(x = 'year', data = df3)

# the vast majority of observations are between 1900 and 2016, which is to be expected

## 4. Plotting the meteorites on a world map

In [None]:
# create geodataframe, with geometry column from long, lat columns

gdf = gpd.GeoDataFrame(df3, geometry = gpd.points_from_xy(df3['reclong'], df3['reclat']))
gdf.head()

In [None]:
#check whether the geometry column is the right datatype

type(gdf.geometry)

In [None]:
# import world map geodataframe

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world.rename(columns = {'name': 'country'}, inplace = True)
world.head()

In [None]:
# plot world map with country borders

plt.rcParams['figure.figsize'] = [16,8]
world.plot(facecolor = 'silver', edgecolor = 'grey');

In [None]:
# Plot layer of meteorites

gdf.plot(marker='*', color='green', markersize=5);

In [None]:
# There's a weird point that must be a mistake. longitude is larger than 300
# find entry in df 

gdf.loc[(df3['reclong'] > 300)]

In [None]:
# remove entry

gdf = gdf.drop([22946], axis=0)

In [None]:
# check shape

gdf.shape

In [None]:
# Plot meteorites again to check whether the point is gone

gdf.plot(marker='*', color='blue', markersize=5);

In [None]:
# Before combining maps, ensure they share a common CRS (so they will align)

# first assign a crs to the gdf (a common one is WGS84 latitude-longitude coordinate system)

gdf.crs = "EPSG:4326"

# then, align crs to world crs

gdf = gdf.to_crs(world.crs)

In [None]:
base = world.plot(color='cornsilk', edgecolor='grey')
gdf.plot(ax=base, marker='o', color='red', markersize=2);

## 5. Creating a choropleth map of meteorite numbers by country

In [None]:
# Create df total meteorites per country
# first, spatial join meteorite gdf and world gdf to assign countries to meteorite coordinates

gdf_countries = gpd.sjoin(world, gdf, how="right", op="contains")
gdf_countries.head(50)

In [None]:
# rename country column
gdf_countries.rename(columns = {'name_x': 'country'}, inplace = True)
gdf_countries.head()

In [None]:
# check shape
gdf_countries.shape

In [None]:
# aggregate meteorites per country
by_country = gdf_countries.groupby('country')[['id']].count()
by_country = by_country.reset_index()
by_country.rename(columns = {'id': 'meteorite_count'}, inplace = True)              
by_country

In [None]:
# make new df with all country data, even if no meteorite data (otherwise there will be gaps in the map)

countries_count = pd.merge(world, by_country, on = 'country', how = 'left')
countries_count.head()

In [None]:
# NaN values for meteorite count can be filled with 0
countries_count.fillna(value = 0, inplace = True)
countries_count.head()

In [None]:
# make choropleth map with number of meteorites per country indicated with colours
# see: https://geopandas.org/mapping.html
# include legend

plt.rcParams['figure.figsize'] = [20,12]
fig, ax = plt.subplots(1, 1)
countries_count.plot(column='meteorite_count', ax=ax, legend=True, cmap='summer_r') #legend_kwds={'label': "Found meteorites by country",'orientation': "horizontal"})


In [None]:
countries_count.plot(column='meteorite_count', legend=True, cmap='summer_r', scheme='fisher_jenks_sampled')
# tried out a few schemes and 'fisher_jenks_sampled' and 'jenks_caspall_sampled' looks best so far
# figure out how to do 'user_defined'
# also figure out how to increase legend size

These maps highligt several interesting things:
* By far the most meteorites were found in Antarctica. This is because several expeditions dedicated to finding meteorites were organised in recent years (check). Meteorites are relatively easy to find in Antarctica because they fall on the ice sheet and are very visible. 
* Other countries where a lot of meteorites were found are: the US, Australia, Chile, Morocco, Algeria, Libia and Oman. 
* These patterns are most likely explained by two main factors: national interest in meteorites (US, where the Meteoritical Society is based), and landscape/ ecotype. Apart from the US and Antartica, the other counties mentioned above all largely consist of desert, where meteorites are easier to find. 

Let's now have a look at the map but without Antarctica

In [None]:
countries_count2 = countries_count[countries_count['continent'] != 'Antarctica']

In [None]:
countries_count2.plot(column='meteorite_count', legend=True, cmap='summer_r', scheme='jenks_caspall')

## 6. Zooming in on one country: Algeria

In [None]:
# plotting separate countries/ areas
# what happens when we plot a subset of the data?

world.head()

In [None]:
# make map for morocco
# import more high-res country data
world2 = gpd.read_file('../data/50m_cultural/ne_50m_admin_0_countries.shp')
world2.plot()

In [None]:
world2.head()

In [None]:
# select data morocco
morocco = world2.loc[world2['SOVEREIGNT'] == 'Morocco']
morocco

In [None]:
plt.rcParams['figure.figsize'] = [12,8]
morocco.plot(facecolor="none", edgecolor="black")

In [None]:
# select meteorite data for morocco
morocco_count = gdf_countries.loc[gdf_countries['country'] == 'Morocco']
morocco_count.head()
morocco_count.shape

In [None]:
base = morocco.plot(color='cornsilk', edgecolor='black')
morocco_count.plot(ax=base, marker='o', color='red', markersize=5);

In [None]:
# select map shape Algeria
algeria = world2.loc[world2['SOVEREIGNT'] == 'Algeria']
algeria

In [None]:
# select meteorite data for algeria

algeria_count = gdf_countries.loc[gdf_countries['country'] == 'Algeria']
algeria_count.head()

In [None]:
base = algeria.plot(color='cornsilk', edgecolor='black')
algeria_count.plot(ax=base, marker='o', color='red', markersize=5);

In [None]:
# it's interesting to see that there are several clusters of meteorites. 
# Let's look at meteorite types in more detail

In [None]:
# Let's look at the meteorite types in the data
algeria_top10 = pd.DataFrame(algeria_count['recclass'].value_counts().head(10))
algeria_top10 = algeria_top10.reset_index()
algeria_top10

In [None]:
# interestingly, there is a rare type of meteorite on the 6th spot: CR2
# which is the Carbonaceous Chondrite Type
# Select meteorite data for algeria, top 10 most common meteorites

algeria_count10 = algeria_count.loc[algeria_count['recclass'].isin(algeria_top10['index'])]
algeria_count10.head()

In [None]:
# plot them on the map, colourcoded
base = algeria.plot(color='cornsilk', edgecolor='black')
algeria_count10.plot(ax=base, marker='o', column='recclass', legend = True, markersize=5);

In [None]:
# to visualize our rare meteorite type better, we select and plot only this type
algeria_CR2 = algeria_count.loc[algeria_count['recclass'] == 'CR2']
algeria_CR2.head()

In [None]:
# plot CR2 on map

base = algeria.plot(color='cornsilk', edgecolor='black')
algeria_CR2.plot(ax=base, marker='o', color = 'blue', markersize=5)

# There are nearly all in the same spot. Perhaps a bigger meteorite of this type fell and these are pieces of it

In [None]:
# looking at the data, these meteorites are all from location 'Acfer'.
# after some googling, it turns out that there was a recent discovery in a meteorite from this place, 
# which seems to show that there are extra-terrestrial proteins present in the meteorite
# https://arxiv.org/abs/2002.11688 (paper still under peer-review)
# This is definitely an interesting meteorite hotspot to explore!

## 7. Conclusion

XX