# Meteorite landings on Earth

Maaike de Jong

Ironhack Amsterdam Data Analytics

Project 3, April 2020

## 1. Introduction

In this notebook I analyse and visualize the NASA Meteorite Landings dataset, which can be found [here](https://www.kaggle.com/nasa/meteorite-landings). The [Meteoritical Society](https://www.lpi.usra.edu/meteor/) collects data on meteorites that have fallen to Earth from outer space. 

#### Aim of this project
Meteorites have fascinated me for a long time. When they enter our atmosphere as shooting stars they have often travelled for millions of kilometers. Many of the meteorites that reach our planet are very old, dating from the early days of our solar system, so they are older than the rocks from Earth. My goal is to one day find a meteorite myself. This dataset offers a great opportunity to visually investigate where the chances of finding a meteorite are highest. In this notebook I take several approaches to visualising data on maps to figure out where I should go to find a meteorite.

#### About the data
This dataset includes the location, mass, composition, and fall year for over 45,000 meteorites that have struck our planet. There are a few notes on Kaggle on missing or incorrect data points in this dataset, which I'll take into account during data cleaning:
* a few entries here contain date information that was incorrectly parsed into the NASA database. As a spot check: any date that is before 860 CE or after 2016 are incorrect; these should actually be BCE years. There may be other errors and we are looking for a way to identify them.
* a few entries have latitude and longitude of 0N/0E (off the western coast of Africa, where it would be quite difficult to recover meteorites). Many of these were actually discovered in Antarctica, but exact coordinates were not given. 0N/0E locations should probably be treated as NA.

## 2. Import Packages and data

In [None]:
# import packages
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.patches as mpatches
import seaborn as sns
import geopandas as gpd

In [None]:
# import dataset
df = pd.read_csv('../data/meteorite-landings.csv')
df.head()

## 3. Inspecting and cleaning the data

Before I start with visualising the data, I inspect the data to check variable properties and distributions and to find and fix mistakes or unwanted datapoints. 

In [None]:
# inspect data types
#df.dtypes
df.info()

In [None]:
# Check missing values
percent_missing = df.isnull().sum() * 100 / len(df)
missing_value_df = pd.DataFrame({'column_name': df.columns,
                                 'percent_missing': percent_missing})
missing_value_df
# there are some missing values for mass and year (3% and 6% respectively). 
# There are quite a few missing geolocations(16%).

In [None]:
# convert year from float to integer
df['year'] = df['year'].fillna(0).astype(int)
df.info() 

In [None]:
# inspect data with describe
df.describe()

In the notes that came with the dataset it states that years before 860 and after 2016 are unreliable, and that there are several incorrect coordinates of (0,0), so those need to be filtered out.

In [None]:
# clean data
# filter out data with year < 860 and > 2016 
df2 = df.loc[(df['year'] >= 860) & (df['year'] <= 2016)]

In [None]:
# clean data
# filter out data with 0,0 coordinates
df3 = df2.loc[(df2['reclat'] != 0) & (df2['reclong'] != 0)]

In [None]:
df3.info()

In [None]:
# inspecting properties and distribution of variables

# types of recclass
meteor_types = pd.DataFrame(df3['recclass'].value_counts())
meteor_types.head()

# There are 441 distinct meteorite types in the data. 
# If I want to plot different types I'll have to make a selection

In [None]:
# Plotting the counts of the meteor types we can see that are 8 most common types, 
# after which there are many types with relatively low numbers

plt.rcParams['figure.figsize'] = [16,8]
meteor_types.head(50).plot.bar()


#### Types of meteorite

The most common types of meteorites in the data are:
L6, H5, L5, H6, H4, LL5, LL6, L4

These all fall in the category 'ordinary chondrites'

From wikipedia:
A chondrite /ˈkɒndraɪt/ is a stony (non-metallic) meteorite that has not been modified, by either melting or differentiation of the parent body. They are formed when various types of dust and small grains in the early Solar System accreted to form primitive asteroids. Some such bodies that are captured in the planet’s gravity well become the most common type of meteorite by (whether quickly, or after many orbits) arriving on a trajectory toward the Earth’s surface. Estimates for their contribution to the total meteorite population vary between 85.7% and 86.2%.

In [None]:
# inspecting properties and distribution of variables

# distribution of mass

#df3.plot.hist(y = 'mass')
sns.boxplot(x = 'mass', data = df3)

# the distribution is difficult to visualize because of outliers with a lot of mass
# if I want to do anything with mass I have to make groupings or convert the data

In [None]:
sns.violinplot(x = 'mass', data = df3)

In [None]:
# inspecting properties and distribution of variables

# column 'fall'

df3['fall'].value_counts()

# There are many more 'found' meteorites than 'fell' meteorites

In [None]:
# inspecting properties and distribution of variables

# column 'year'

sns.boxplot(x = 'year', data = df3)

In [None]:
sns.violinplot(x = 'year', data = df3)

# the vast majority of observations are between 1900 and 2016, which is to be expected

## 4. Plotting the meteorites on a world map

There are many ways to visualise the meteorite data on maps, but as a start I'll plot all the meteorites as points on a world map.

To do this, there are two main steps:
* convert the data into geospatial data so it can be spatially plotted with a package such as geopandas
* get data to plot a world map
* plot the world map data and the meteorite data in two layers

In [None]:
# create geodataframe, with geometry column from long, lat columns

gdf = gpd.GeoDataFrame(df3, geometry = gpd.points_from_xy(df3['reclong'], df3['reclat']))
gdf.head()

In [None]:
#check whether the geometry column is the right datatype. It should be a geopandas geoseries

type(gdf.geometry)

In [None]:
# import world map geodataframe. The lowres worldmap dataset from Natural Earth can be imported directly from geopandas

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world.rename(columns = {'name': 'country'}, inplace = True)
world.head()

In [None]:
# plot world map with country borders

plt.rcParams['figure.figsize'] = [16,8]
world.plot(facecolor = 'silver', edgecolor = 'grey');

In [None]:
# Plot layer of meteorites

gdf.plot(marker='*', color='blue', markersize=5);

In [None]:
# There's a weird point that must be a mistake. longitude is larger than 300
# find entry in df 

gdf.loc[(df3['reclong'] > 300)]

In [None]:
# remove entry

gdf = gdf.drop([22946], axis=0)

In [None]:
# Plot meteorites again to check whether the point is gone

gdf.plot(marker='*', color='blue', markersize=5);

In [None]:
# Before combining maps, ensure they share a common CRS (coordinate system) so that they will spatially align

# first assign a CRS to the gdf (a common one is WGS84 latitude-longitude coordinate system)

gdf.crs = "EPSG:4326"

# then, align crs to world crs

gdf = gdf.to_crs(world.crs)

In [None]:
# Plotting the world map data and the meteorite points in two layers:

base = world.plot(color='cornsilk', edgecolor='grey')
gdf.plot(ax=base, marker='o', color='blue', markersize=2);

#### Insights from this map
From this map it looks like by far the most meteorites were found in the US. That might be because the Meteoritical Society is based in the US. 
What also stands out to me is that the densities are lowest in the most forested areas: Amazon basin, Congo basin, and South-East Asia and the boreal forests of Canada and Northern Russia. This makes sense because finding meteorites in forested areas must be more difficult than open areas, and also soil turnover is high in forests. 

Although this figure gives a nice overview of overall densities, there is also a lot of overlap between data points (because there are so many). As a next step I'll visualise the number of meteorites per country. 

## 5. Creating a choropleth map of meteorite numbers by country

To get a more detailed insight into the spatial distribution of found meteorites, I will create a so-called choropleth map of numbers of meteorites per country. 

Definition of a choropleth map (from Wikipedia):  
*A choropleth map is a type of thematic map in which areas are shaded or patterned in proportion to a statistical variable that represents an aggregate summary of a geographic characteristic within each area, such as population density or per-capita income.*

There are several steps involved in creating this map:  

**1. Create a dataframe with the total number of meteorites per country.**  
For this we first need to assign country names to the meteorite data. The dataset as it comes doesn't have country information. So based on the long, lat coordinates of the meteorites we have to group the data points corresponding to the country polygon shapes in the worldmap dataset. We can do this in geopandas with a spatial join.  

**2. Create a new dataframe combining the worldmap data and the meteorite counts per country.**  
Doing this we'll end up with a lot of missing data for the counts per country. We can fill these in with zeroes because for these countries zero meteorites were found.  

**3. Create the choropleth map in geopandas.**  
This will require some tweaking of the colour-representation of the meteorite counts as this data is highly skewed as we will see.  


In [None]:
# Create df total meteorites per country
# first, spatial join meteorite gdf and world gdf to assign countries to meteorite coordinates

gdf_countries = gpd.sjoin(world, gdf, how="right", op="contains")
gdf_countries.head()

In [None]:
# rename country column
gdf_countries.rename(columns = {'name_x': 'country'}, inplace = True)
gdf_countries.head()

In [None]:
# check shape

gdf_countries.shape

In [None]:
# aggregate total number of meteorites per country

by_country = gdf_countries.groupby('country')[['id']].count()
by_country = by_country.reset_index()
by_country.rename(columns = {'id': 'meteorite_count'}, inplace = True)              
by_country

In [None]:
# make new df with all country data, even if no meteorite data is available (otherwise there will be gaps in the worldmap)

countries_count = pd.merge(world, by_country, on = 'country', how = 'left')
countries_count.head()

In [None]:
# Fill NaN values for meteorite count with zeroes

countries_count.fillna(value = 0, inplace = True)
countries_count.head()

In [None]:
# make choropleth map with number of meteorites per country indicated with colours
# see: https://geopandas.org/mapping.html
# include legend

plt.rcParams['figure.figsize'] = [20,12]
fig, ax = plt.subplots(1, 1)
countries_count.plot(column='meteorite_count', ax=ax, legend=True, cmap='summer_r') #legend_kwds={'label': "Found meteorites by country",'orientation': "horizontal"})


#### Insights from this map

This map shows very clearly that by far the most meteorites have been found in Antarctica. In comparison the numbers for the other countries are much lower, which makes the colour contrasts for the rest of the countries very small. 

According to our data 22099 meteorites were found in Antarctica, more or less half of our dataset. This is because several expeditions specifically dedicated to finding meteorites were organised in recent years. Meteorites are relatively easy to find in Antarctica because they fall on the ice sheet and are very visible.  

To create more contrast between the countries, we can tweak the colour scaling with the scheme option.  


In [None]:
# plot choropleth map with scheme option with user defined settings

plt.rcParams['figure.figsize'] = [20, 10]
fig, ax = plt.subplots(1, 1)

countries_count.plot(column='meteorite_count', legend=True, ax = ax, cmap='summer_r', scheme='user_defined', classification_kwds = {'bins':[0, 5, 25, 250, 2500, 25000]})

cmap = cm.get_cmap('summer_r')
patch1 = mpatches.Patch(color=cmap(0.0), label = '0')
patch2 = mpatches.Patch(color=cmap(0.2), label = '1 - 5')
patch3 = mpatches.Patch(color=cmap(0.4), label = '5 - 25')
patch4 = mpatches.Patch(color=cmap(0.6), label = '25 - 250')
patch5 = mpatches.Patch(color=cmap(0.8), label = '250 - 2500')
patch6 = mpatches.Patch(color=cmap(1.0), label = '2500 - 25000')

plt.legend(handles = [patch1, patch2, patch3, patch4, patch5, patch6], prop = {'size':12})

plt.show()


#### Insights from this map

Apart from Antarctica, other countries where a lot of meteorites were found are: the US, Australia, Chile, Morocco, Algeria, Libia and Oman.  

These patterns are most likely explained by two main factors: national interest in meteorites (US, where the Meteoritical Society is based), and landscape/ ecotype. Apart from the US and Antartica, the other counties mentioned above all largely consist of desert, where meteorites are easier to find.   

Let's now have a look at the map without Antarctica.

In [None]:
# create new dataframe without Antarctica data

countries_count2 = countries_count[countries_count['continent'] != 'Antarctica']

In [None]:
# plot the map

plt.rcParams['figure.figsize'] = [20, 10]
fig, ax = plt.subplots(1, 1)

countries_count2.plot(column='meteorite_count', legend=True, ax = ax, cmap='summer_r', scheme='user_defined', classification_kwds = {'bins':[0, 5, 25, 250, 1000, 5000]})

cmap = cm.get_cmap('summer_r')
patch1 = mpatches.Patch(color=cmap(0.0), label = '0')
patch2 = mpatches.Patch(color=cmap(0.2), label = '1 - 5')
patch3 = mpatches.Patch(color=cmap(0.4), label = '5 - 25')
patch4 = mpatches.Patch(color=cmap(0.6), label = '25 - 250')
patch5 = mpatches.Patch(color=cmap(0.8), label = '250 - 1000')
patch6 = mpatches.Patch(color=cmap(1.0), label = '1000 - 5000')

plt.legend(handles = [patch1, patch2, patch3, patch4, patch5, patch6], prop = {'size':12})

plt.show()


#### Insights from this map

Removing Antarctica gives a more precise view of the differences between countries. According to this map the most meteorites were found in the US, Libya and Oman. Other countries that score high are Australia, Algeria and Chile.  

When looking at possible countries to travel to for a meteorite searching expedition, Algeria looks like a suitable option. It's relatively nearby (in contrast to Australia, US and Chile) and it's relatively safe for a solo female traveller (in contrast to Libya). Oman is also supposed to be safe but one of the first google hits I got when searching for meteorites and Oman was [this](https://www.newscientist.com/article/mg21128190-200-meteorite-hunter-my-two-months-in-an-omani-jail/), so perhaps not my first choice of country.  

Let's zoom in on Algeria.



## 6. Zooming in on one country: Algeria

To investigate the meteorite situation in Algeria, we can plot the meteorite points on the map of Algeria. To do this, we need to plot the country polygon shape data as a base map and overlay the meteorite points. We'll start by importing a higher resolution worldmap datafile that I obtained from [Natural Earth Data](https://www.naturalearthdata.com/) and selecting the Algerian shape data. After that we can select the data for Algeria from the gdf_countries geodataframe that we created in an earlier step.

In [None]:
# import more high-res worldmap data

world2 = gpd.read_file('../data/50m_cultural/ne_50m_admin_0_countries.shp')
world2.plot()

In [None]:
# inspect worlddata gdf

world2.head()

In [None]:
# select map shape Algeria

algeria = world2.loc[world2['SOVEREIGNT'] == 'Algeria']
algeria

In [None]:
# select meteorite data for algeria

algeria_count = gdf_countries.loc[gdf_countries['country'] == 'Algeria']
algeria_count.head()

In [None]:
# plot two layer map for Algeria

base = algeria.plot(color='cornsilk', edgecolor='black')
algeria_count.plot(ax=base, marker='o', color='red', markersize=5);

#### Insights from this map

This map shows that there are several clear clusters of meterorite finds. They could be cause by expeditions, or perhaps by larger meteorite impacts.  

Let's look at meteorite types in more detail.  


In [None]:
# counting the top 10 meteorite types in the Algeria data

algeria_top10 = pd.DataFrame(algeria_count['recclass'].value_counts().head(10))
algeria_top10 = algeria_top10.reset_index()
algeria_top10

Interestingly, there is a generally rare type of meteorite on the 7th spot: CR2. In the main data this type is ranked 25th in frequency.  

This is the Carbonaceous Chondrite Type, a rare and much sought-after type of meteorite.  

Let's look at the distribution of the top 10 meteorites in Algeria.  


In [None]:
# Select meteorite data for algeria, top 10 most common meteorites

algeria_count10 = algeria_count.loc[algeria_count['recclass'].isin(algeria_top10['index'])]
algeria_count10.head()

In [None]:
# plot them on the map, colourcoded

base = algeria.plot(color='cornsilk', edgecolor='black')
algeria_count10.plot(ax=base, marker='o', column='recclass', legend = True, markersize=5);

In [None]:
# there are a lot of points in the same spot, making it difficult to see where the CR2 type is
# to visualize the rare meteorite type better, we select and plot only this type

algeria_CR2 = algeria_count.loc[algeria_count['recclass'] == 'CR2']
algeria_CR2.head()

In [None]:
# plot CR2 on map

base = algeria.plot(color='cornsilk', edgecolor='black')
algeria_CR2.plot(ax=base, marker='o', color = 'blue', markersize=5)


#### Insights from this maps

The CR2 meteorites are nearly all in the same spot (except one). Perhaps a bigger meteorite of this type fell and these are all pieces of it.  

Looking at the meteorite data, these meteorites are all from location 'Acfer'. After some googling, it turns out that there was a recent discovery in a meteorite from this place, which seems to show that there are extra-terrestrial proteins present in the meteorite. See this scientific [publication](https://arxiv.org/abs/2002.11688) (currently under peer-review).  

This is definitely an interesting meteorite hotspot to explore!


## 7. Conclusions and next steps

This visual analysis gave me a clear overview of the numbers of found meteorites per country, and which countries are promising locations for a meteorite-hunting expedition. For Algeria it appears that there is a particularly interesting spot, where a relatively high number of rarer meteorites were found. 

A few possible next steps for this analysis are:
1. Creating maps with geographical features. For instance, this would give me a better idea of the exact area and landscape of the Algeria hotspot. 
2. Have a closer look into the distribution of meteorite types, like I now did for Algeria but at global level or for other countries.
3. Create interactive maps (e.g. with Bokeh) and maps that can be integrated into websites (Plotly).  

