![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSE8B3-dtSdbcGCskxy9oF2kAMkP99zwXOQKA&usqp=CAU)

# HISTORY

The Meteoritical Society collects data on meteorites that have fallen to Earth from outer space. This dataset includes the location, mass, composition, and fall year for over 45,000 meteorites that have struck our planet.

#### Notes on missing or incorrect data points:

A few entries here contain date information that was incorrectly parsed into the NASA database. As a spot check: any date that is before 860 CE or after 2016 are incorrect; these should actually be BCE years. There may be other errors and we are looking for a way to identify them.
A few entries have latitude and longitude of 0N/0E (off the western coast of Africa, where it would be quite difficult to recover meteorites). Many of these were actually discovered in Antarctica, but exact coordinates were not given. 0N/0E locations should probably be treated as NA.


#### The Data
Note that a few column names start with "rec" (e.g., recclass, reclat, reclon). These are the recommended values of these variables, according to The Meteoritical Society. In some cases, there were historical reclassification of a meteorite, or small changes in the data on where it was recovered; this dataset gives the currently recommended values.

#### The dataset contains the following variables:

* name: the name of the meteorite (typically a location, often modified with a number, year, composition, etc)
* id: a unique identifier for the meteorite
* nametype: one of:
-- valid: a typical meteorite
-- relict: a meteorite that has been highly degraded by weather on Earth
* recclass: the class of the meteorite; one of a large number of classes based on physical, chemical, and other characteristics (see the Wikipedia article on meteorite classification for a primer)
* mass: the mass of the meteorite, in grams
* fall: whether the meteorite was seen falling, or was discovered after its impact; one of:
-- Fell: the meteorite's fall was observed
-- Found: the meteorite's fall was not observed
* year: the year the meteorite fell, or the year it was found (depending on the value of fell)
* reclat: the latitude of the meteorite's landing
* reclong: the longitude of the meteorite's landing
* GeoLocation: a parentheses-enclose, comma-separated tuple that combines reclat and reclong

#### What can we do with this data?
Here are a couple of thoughts on questions to ask and ways to look at this data:

How does the geographical distribution of observed falls differ from that of found meteorites?
-- this would be great overlaid on a cartogram or alongside a high-resolution population density map
are there any geographical differences or differences over time in the class of meteorites that have fallen to Earth?

# PACKAGES AND LIBRARIES

In [None]:
! pip install dataprep by

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from warnings import filterwarnings
from mpl_toolkits.mplot3d import Axes3D
import statsmodels.api as sm
import missingno as msno
import statsmodels.stats.api as sms
from sklearn.decomposition import PCA
from sklearn.preprocessing import scale
from sklearn.neighbors import LocalOutlierFactor
from scipy.stats import levene
from scipy.stats import shapiro
from scipy.stats.stats import pearsonr
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split, cross_val_score, cross_val_predict
from sklearn.preprocessing import scale
from sklearn.model_selection import ShuffleSplit, GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score
from sklearn import model_selection
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import BaggingRegressor
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPRegressor
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LinearRegression
from sklearn.cross_decomposition import PLSRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import RidgeCV
from sklearn.linear_model import Lasso
from sklearn.linear_model import LassoCV
from sklearn.linear_model import ElasticNet
from sklearn.linear_model import ElasticNetCV
from sklearn import linear_model
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.ensemble import GradientBoostingRegressor, GradientBoostingClassifier
import xgboost as xgb
from xgboost import XGBRegressor, XGBClassifier
from lightgbm import LGBMRegressor, LGBMClassifier
from catboost import CatBoostRegressor, CatBoostClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn import tree
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report, roc_auc_score, roc_curve
from yellowbrick.cluster import KElbowVisualizer
from sklearn.cluster import KMeans
from sklearn.pipeline import Pipeline
from sklearn.manifold import Isomap,TSNE
from sklearn.feature_selection import mutual_info_classif
from tqdm.notebook import tqdm
from scipy.stats import ttest_ind
import plotly.express as px
import plotly.graph_objs as go
import plotly.offline as pyo
import scipy.stats as stats
import pymc3 as pm
from dataprep.eda import *
from dataprep.eda import plot
from dataprep.eda import plot_diff
from dataprep.eda import plot_correlation
from dataprep.eda import plot_missing
import plotly.figure_factory as ff
from collections import Counter
import pandas_profiling as pp
from mpl_toolkits.basemap import Basemap
import datetime as dt
import plotly.express as px
import plotly.graph_objects as go

filterwarnings("ignore", category=DeprecationWarning) 
filterwarnings("ignore", category=FutureWarning) 
filterwarnings("ignore", category=UserWarning)

# DATA PROCESS & EXPLORATORY DATA ANALYSIS (EDA)

#### READING CSV

In [None]:
Meteorite_CSV = pd.read_csv("../input/meteorite-landings/meteorite-landings.csv")

Data = Meteorite_CSV.copy()

Numeric_Data = Data.select_dtypes(include=["float32","float64","int32","int64"])

#### GENERAL ANALYSIS

In [None]:
Data

In [None]:
print("INFO:\n")
print(Data.info())

In [None]:
print("DESCRIBE:\n")
print(Numeric_Data.describe().T)

In [None]:
print("CORRELATION:\n")
print(Numeric_Data.corr())

In [None]:
print("COVARIANCE:\n")
print(Numeric_Data.cov())

In [None]:
print("COLUMNS:\n")
print(Data.columns)

In [None]:
print("SHAPE: ",Data.shape)
print("SIZE: ",Data.size)

In [None]:
print("NaN:\n")
print(Data.isnull().sum())

In [None]:
print("NaN BOOL:\n")
print(Data.isna())

In [None]:
print("DUPLICATED\n")
print(Data.duplicated().sum())

#### ARRANGEMENT CSV

In [None]:
print("NAME TYPE\n")
print(Data["nametype"].value_counts())

In [None]:
print("YEAR\n")
print(Data["year"].value_counts())

In [None]:
print(Data["year"].sort_values(ascending=False))

In [None]:
Data.sort_values(by=["year"],inplace=True)

Data = Data.reset_index(drop=True)

In [None]:
Data.drop("id",axis=1,inplace=True) # we don't need ID feature for this overview

In [None]:
Data

In [None]:
print("CLASS\n")
print(Data["recclass"].value_counts())

##### CLASSIFY THE TYPES FOR A CLEANER IMAGE AND REVIEW

* **CC**: Carbonaceous Chondrite
* **OC**: Ordinary Chondrite
* **M**: Martian
* **AA**: Asteroidal Achondrites
* **PA**: Primitive Achondrites
* **L**: Lunar
* **EC**: Enstatite Chondrite
* **AOC**: Other Chondrite Groups(not in one of the major classes)
* **P**: Pallasites
* **MG**: Mesosiderite Group
* **MIM**: Magmatic Iron Meteorite Groups
* **NMIM**: Non-magmatic or Primitive Iron Meteorite Groups
* **UN**: Not Enough Information (Ungrouped or Unknown)


CHECK:

https://en.wikipedia.org/wiki/Meteorite_classification

In [None]:
Data.recclass.replace(to_replace=['Acapulcoite', 'Acapulcoite/Lodranite', 'Acapulcoite/lodranite',
           'Lodranite','Lodranite-an','Winonaite','Achondrite-prim'],value='PA',inplace=True)

Data.recclass.replace(to_replace=['Angrite', 'Aubrite','Aubrite-an','Ureilite', 'Ureilite-an','Ureilite-pmict',
           'Brachinite','Diogenite', 'Diogenite-an', 'Diogenite-olivine', 'Diogenite-pm',
           'Eucrite', 'Eucrite-Mg rich', 'Eucrite-an', 'Eucrite-br','Eucrite-cm',
           'Eucrite-mmict', 'Eucrite-pmict', 'Eucrite-unbr','Howardite'],value='AA',inplace=True)

Data.recclass.replace(to_replace=['Lunar', 'Lunar (anorth)', 'Lunar (bas. breccia)',
           'Lunar (bas/anor)', 'Lunar (bas/gab brec)', 'Lunar (basalt)',
           'Lunar (feldsp. breccia)', 'Lunar (gabbro)', 'Lunar (norite)'],value='L',inplace=True)

Data.recclass.replace(to_replace=['Martian', 'Martian (OPX)','Martian (chassignite)', 'Martian (nakhlite)',
           'Martian (shergottite)','Martian (basaltic breccia)'],value='M',inplace=True)

Data.recclass.replace(to_replace=['C','C2','C4','C4/5','C6','C1-ung', 'C1/2-ung','C2-ung',
           'C3-ung', 'C3/4-ung','C4-ung','C5/6-ung',
           'CB', 'CBa', 'CBb', 'CH/CBb', 'CH3', 'CH3 ', 'CI1', 'CK', 'CK3',
           'CK3-an', 'CK3.8', 'CK3/4', 'CK4', 'CK4-an', 'CK4/5', 'CK5',
           'CK5/6', 'CK6', 'CM', 'CM-an', 'CM1', 'CM1/2', 'CM2', 'CM2-an',
           'CO3', 'CO3 ', 'CO3.0', 'CO3.1', 'CO3.2', 'CO3.3', 'CO3.4', 'CO3.5',
           'CO3.6', 'CO3.7', 'CO3.8', 'CR', 'CR-an', 'CR1', 'CR2', 'CR2-an',
           'CV2', 'CV3', 'CV3-an','CR7'],value='CC',inplace=True)

Data.recclass.replace(to_replace=['OC', 'OC3','H', 'H(5?)', 'H(?)4', 'H(L)3', 'H(L)3-an', 'H-an','H-imp melt',
           'H-melt rock', 'H-metal', 'H/L3', 'H/L3-4', 'H/L3.5',
           'H/L3.6', 'H/L3.7', 'H/L3.9', 'H/L4', 'H/L4-5', 'H/L4/5', 'H/L5',
           'H/L6', 'H/L6-melt rock', 'H/L~4', 'H3', 'H3 ', 'H3-4', 'H3-5',
           'H3-6', 'H3-an', 'H3.0', 'H3.0-3.4', 'H3.1', 'H3.10', 'H3.2',
           'H3.2-3.7', 'H3.2-6', 'H3.2-an', 'H3.3', 'H3.4', 'H3.4-5',
           'H3.4/3.5', 'H3.5', 'H3.5-4', 'H3.6', 'H3.6-6', 'H3.7', 'H3.7-5',
           'H3.7-6', 'H3.7/3.8', 'H3.8', 'H3.8-4', 'H3.8-5', 'H3.8-6',
           'H3.8-an', 'H3.8/3.9', 'H3.8/4', 'H3.9', 'H3.9-5', 'H3.9-6',
           'H3.9/4', 'H3/4', 'H4', 'H4 ', 'H4(?)', 'H4-5', 'H4-6', 'H4-an',
           'H4/5', 'H4/6', 'H5', 'H5 ', 'H5-6', 'H5-7', 'H5-an',
           'H5-melt breccia', 'H5/6', 'H6', 'H6 ', 'H6-melt breccia', 'H6/7',
           'H7', 'H?','H~4', 'H~4/5', 'H~5', 'H~6','L', 'L(?)3',
           'L(H)3', 'L(LL)3', 'L(LL)3.05', 'L(LL)3.5-3.7', 'L(LL)5', 'L(LL)6',
           'L(LL)~4', 'L-imp melt', 'L-melt breccia', 'L-melt rock', 'L-metal',
           'L/LL', 'L/LL(?)3', 'L/LL-melt rock', 'L/LL3', 'L/LL3-5', 'L/LL3-6',
           'L/LL3.10', 'L/LL3.2', 'L/LL3.4', 'L/LL3.5', 'L/LL3.6/3.7', 'L/LL4',
           'L/LL4-6', 'L/LL4/5', 'L/LL5', 'L/LL5-6', 'L/LL5/6', 'L/LL6',
           'L/LL6-an', 'L/LL~4', 'L/LL~5', 'L/LL~6', 'L3', 'L3-4', 'L3-5',
           'L3-6', 'L3-7', 'L3.0', 'L3.0-3.7', 'L3.0-3.9', 'L3.05', 'L3.1',
           'L3.10', 'L3.2', 'L3.2-3.5', 'L3.2-3.6', 'L3.3', 'L3.3-3.5',
           'L3.3-3.6', 'L3.3-3.7', 'L3.4', 'L3.4-3.7', 'L3.5', 'L3.5-3.7',
           'L3.5-3.8', 'L3.5-3.9', 'L3.5-5', 'L3.6', 'L3.6-4', 'L3.7',
           'L3.7-3.9', 'L3.7-4', 'L3.7-6', 'L3.7/3.8', 'L3.8', 'L3.8-5',
           'L3.8-6', 'L3.8-an', 'L3.9', 'L3.9-5', 'L3.9-6', 'L3.9/4', 'L3/4',
           'L4', 'L4 ', 'L4-5', 'L4-6', 'L4-an', 'L4-melt rock', 'L4/5', 'L5',
           'L5 ', 'L5-6', 'L5-7', 'L5/6', 'L6', 'L6 ', 'L6-melt breccia',
           'L6-melt rock', 'L6/7', 'L7', 'LL', 'LL(L)3', 'LL-melt rock', 'LL3',
           'LL3-4', 'LL3-5', 'LL3-6', 'LL3.0', 'LL3.00', 'LL3.1', 'LL3.1-3.5',
           'LL3.10', 'LL3.15', 'LL3.2', 'LL3.3', 'LL3.4', 'LL3.5', 'LL3.6',
           'LL3.7', 'LL3.7-6', 'LL3.8', 'LL3.8-6', 'LL3.9', 'LL3.9/4', 'LL3/4',
           'LL4', 'LL4-5', 'LL4-6', 'LL4/5', 'LL4/6', 'LL5', 'LL5-6', 'LL5-7',
           'LL5/6', 'LL6', 'LL6 ', 'LL6(?)', 'LL6/7', 'LL7', 'LL7(?)',
           'LL<3.5', 'LL~3', 'LL~4', 'LL~4/5', 'LL~5', 'LL~6',
           'L~3', 'L~4', 'L~5', 'L~6','Relict H','Relict OC','LL-melt breccia','H-melt breccia',
                                 'L5-melt breccia','LL-imp melt','H3.05','LL6-melt breccia',
                                 'LL3.05','H4-melt breccia','LL3.8-4','L3.00','L~4-6','LL6-an',
                                 'L4-melt breccia','LL(L)3.1','L3-melt breccia','H3.15'],value='OC',inplace=True)

Data.recclass.replace(to_replace=['EH','EH-imp melt', 'EH3', 'EH3/4-an', 'EH4', 'EH4/5', 'EH5', 'EH6',
           'EH6-an', 'EH7', 'EH7-an', 'EL3', 'EL3/4', 'EL4', 'EL4/5', 'EL5',
           'EL6', 'EL6 ', 'EL6/7', 'EL7','E','E3','E4', 'E5','E6','EL-melt rock'],value='EC',inplace=True)

Data.recclass.replace(to_replace=['K', 'K3','R', 'R3', 'R3-4', 'R3-5', 'R3-6', 'R3.4', 'R3.5-6',
           'R3.6', 'R3.7', 'R3.8', 'R3.8-5', 'R3.8-6', 'R3.9', 'R3/4', 'R4',
           'R4/5', 'R5', 'R6','R3.5-4'],value='AOC',inplace=True)

Data.recclass.replace(to_replace=['Pallasite', 'Pallasite, PES','Pallasite, PMG',
           'Pallasite, PMG-an', 'Pallasite, ungrouped',
           'Pallasite?'],value='P',inplace=True)

Data.recclass.replace(to_replace=['Mesosiderite', 'Mesosiderite-A','Mesosiderite-A1',
           'Mesosiderite-A2', 'Mesosiderite-A3','Mesosiderite-A3/4',
           'Mesosiderite-A4', 'Mesosiderite-B','Mesosiderite-B1',
           'Mesosiderite-B2', 'Mesosiderite-B4','Mesosiderite-C',
           'Mesosiderite-C2', 'Mesosiderite-an','Mesosiderite?'],value='MG',inplace=True)

Data.recclass.replace(to_replace=['Iron, IC', 'Iron, IC-an', 'Iron, IIAB', 'Iron, IIAB-an',
           'Iron, IIC', 'Iron, IID', 'Iron, IID-an','Iron, IIF', 'Iron, IIG',
           'Iron, IIIAB', 'Iron, IIIAB-an', 'Iron, IIIAB?', 'Iron, IIIE',
           'Iron, IIIE-an', 'Iron, IIIF', 'Iron, IVA', 'Iron, IVA-an',
           'Iron, IVB'],value='MIM',inplace=True)

Data.recclass.replace(to_replace=['Iron, IAB complex', 'Iron, IAB-MG','Iron, IAB-an', 'Iron, IAB-sHH',
           'Iron, IAB-sHL', 'Iron, IAB-sLH','Iron, IAB-sLL', 'Iron, IAB-sLM',
           'Iron, IAB-ung', 'Iron, IAB?','Iron, IIE',
           'Iron, IIE-an', 'Iron, IIE?'],value='NMIM',inplace=True)

Data.recclass.replace(to_replace=['Iron','Iron?','Relict iron','Chondrite-fusion crust',
           'Fusion crust','Impact melt breccia',
           'Enst achon-ung','Iron, ungrouped','Stone-uncl', 'Stone-ung',
           'Unknown','Achondrite-ung','Chondrite-ung',
           'Enst achon','E-an',  'E3-an',  'E5-an','Howardite-an','C3.0-ung','Iron, IAB-sHL-an'],value='UN',inplace=True)



In [None]:
print("CLASS\n")
print(Data["recclass"].value_counts())

##### WEIGHTS ARE WRITTEN IN GRAMS, CONVERT TO KG

In [None]:
Data["mass"] = Data["mass"] / 1000

# 1000 GR - 1 KG 
# OF COURSE YOU KNOW!

In [None]:
print(Data["mass"])

In [None]:
print(Data["year"].max())

##### YEAR 2501? HOW CAN IT BE?
##### LET'S DROP VALUES BIGGER THAN 2021

In [None]:
print(Data.where(Data["year"] > 2021).value_counts().all())

In [None]:
print(Data.where(Data["year"] > 2021).value_counts().sum())

In [None]:
print(Data.where(Data["year"] > 2021).value_counts())

In [None]:
print(Data.where(Data["year"] > 2021))

In [None]:
Data.drop([45426,45427],inplace=True) # rows

In [None]:
print(Data.where(Data["year"] > 2021).value_counts().sum())

In [None]:
print(Data["year"].max())

# NOW IT'S OKAY

#### NaN CHECKING

In [None]:
plt.style.use("dark_background")

In [None]:
msno.matrix(Data,figsize=(15,5))
plt.show()

##### It took the Latitude and Longitude values simultaneously as NaN. This indicates that there is harmony between them. For this reason, we cannot delete rows with NaN.

In [None]:
msno.bar(Data,figsize=(15,5))
plt.show()

In [None]:
msno.dendrogram(Data,figsize=(15,5))
plt.show()

In [None]:
msno.heatmap(Data,figsize=(15,5))
plt.show()

In [None]:
figure = plt.figure(figsize=(15,5))
plt.title("NaN COUNT")
Nan_Checking = Data.isna().sum().sort_values(ascending=False).to_frame()
sns.heatmap(Nan_Checking,fmt="d",cmap="viridis")
plt.show()

#### NaN FILLING

In [None]:
Data["mass"].fillna(Data.groupby(["recclass"])["mass"].transform("mean"),inplace=True)

In [None]:
print(Data["mass"].isnull().sum())

##### REMEMBER:
It took the Latitude and Longitude values simultaneously as NaN. This indicates that there is harmony between them. For this reason, we cannot delete rows with NaN.


* I CHOOSE A RANDOM POINT IN THE PACIFIC OCEAN WITHOUT DETECTING DATA AND LET'S ADJUST ALL UNKNOWN COORDINATES ACCORDING TO THERE.
* USE IT FOR YOURS: https://www.latlong.net

* LET'S SAVE THIS DATA BEFORE FILLING THE LAT AND LONG VALUES OURSELVES

In [None]:
Prot_Data = Data.copy() # for corr,cov,encoding and checking main lat-lon

In [None]:
Data["reclat"].fillna(-54.572062,inplace=True) # any

In [None]:
Data["reclong"].fillna(11.675271,inplace=True) # any

In [None]:
Data.dropna(inplace=True)

# We can apply it to the rest

In [None]:
print(Data.isnull().sum())

In [None]:
msno.matrix(Data,figsize=(15,5))
plt.show()

# WE ARE OKAY NOW

#### AS CATEGORICAL - ADDITIONAL

In [None]:
Categorical_Data = Data.copy()

In [None]:
Categorical_Data["year"]

In [None]:
Categorical_Data["year"] = pd.Categorical(Categorical_Data["year"])
Categorical_Data["recclass"] = pd.Categorical(Categorical_Data["recclass"])
Categorical_Data["nametype"] = pd.Categorical(Categorical_Data["nametype"])
Categorical_Data["fall"] = pd.Categorical(Categorical_Data["fall"])
Categorical_Data["name"] = pd.Categorical(Categorical_Data["name"])

In [None]:
print(Categorical_Data.info())

#### AS ENCODED - ADDITIONAL

In [None]:
Encoded_Data = Prot_Data.copy()

In [None]:
Enc_Func = LabelEncoder()

In [None]:
print(Encoded_Data["recclass"].value_counts())
print("---"*10)
print(Encoded_Data["nametype"].value_counts())
print("---"*10)
print(Encoded_Data["fall"].value_counts())

In [None]:
Encoded_Data["recclass"] = Enc_Func.fit_transform(Encoded_Data["recclass"])
Encoded_Data["nametype"] = Enc_Func.fit_transform(Encoded_Data["nametype"])
Encoded_Data["fall"] = Enc_Func.fit_transform(Encoded_Data["fall"])

In [None]:
print(Encoded_Data["recclass"].value_counts())
print("---"*10)
print(Encoded_Data["nametype"].value_counts())
print("---"*10)
print(Encoded_Data["fall"].value_counts())

In [None]:
print(Encoded_Data.columns)

In [None]:
Encoded_Data.drop("GeoLocation",axis=1,inplace=True)

In [None]:
Encoded_Data

#### CORRELATION WITH ENCODED DATA

In [None]:
Corr_Pearson = Encoded_Data.corr(method="pearson")
Corr_Spearman = Encoded_Data.corr(method="spearman")

In [None]:
figure = plt.figure(figsize=(16,7))
plt.title("PEARSON")
sns.heatmap(Corr_Pearson,annot=True,vmin=-1,center=0,vmax=1,linewidths=2,linecolor="black",cmap="jet")
plt.show()

In [None]:
figure = plt.figure(figsize=(16,7))
plt.title("SPEARMAN")
sns.heatmap(Corr_Spearman,annot=True,vmin=-1,center=0,vmax=1,linewidths=2,linecolor="black",cmap="jet")
plt.show()

#### COVARIANCE WITH ENCODED DATA

In [None]:
Cov_Result = Encoded_Data.cov()

In [None]:
figure = plt.figure(figsize=(16,7))
plt.title("COVARIANCE")
sns.heatmap(Cov_Result,annot=True,vmin=-1,center=0,vmax=1,linewidths=2,linecolor="black",cmap="jet")
plt.show()

#### GROUPBY MEANING

In [None]:
Data

In [None]:
print("MASS MEAN:\n")
print(Data.groupby(["recclass","fall"])["mass"].mean())

In [None]:
print("MASS MEAN:\n")
print(Data.groupby(["recclass","nametype"])["mass"].mean())

In [None]:
print("MASS MEAN:\n")
print(Data.groupby(["fall","nametype"])["mass"].mean())

In [None]:
print("MASS MEAN:\n")
print(Data.groupby(["year","recclass"])["mass"].mean())

In [None]:
print("MASS MEAN:\n")
print(Data.groupby(["year","nametype"])["mass"].mean())

In [None]:
print("MASS MEAN:\n")
print(Data.groupby(["year","fall"])["mass"].mean())

In [None]:
print("LATITUDE MEAN:\n")
print(Prot_Data.groupby(["year"])["reclat"].mean()) # remember Why Prot_Data is for

In [None]:
figure = plt.figure(figsize=(15,5))
plt.title("LAT - YEAR")
plt.plot(Prot_Data.groupby(["year"])["reclat"].mean()) # remember Why Prot_Data is for

In [None]:
print("LONGITUDE MEAN:\n")
print(Prot_Data.groupby(["year"])["reclong"].mean()) # remember Why Prot_Data is for

In [None]:
figure = plt.figure(figsize=(15,5))
plt.title("LON - YEAR")
plt.plot(Prot_Data.groupby(["year"])["reclong"].mean()) # remember Why Prot_Data is for

##### SO INTERESTING! LET'S CHECK LAT AND LON MEAN FOR YEAR 2012
* LAT: 1.790740
* LON: -4.184776
* THAT IS : 
off the coast of Ivory Coast
WOW!
* CHECK THIS for YOURS: https://www.maps.ie/coordinates.html

##### WE NEED TO CONVERT THIS RESULT AS ANOTHER CSV

In [None]:
LAT_Year = Prot_Data.groupby(["year"])["reclat"].mean()
LON_Year = Prot_Data.groupby(["year"])["reclong"].mean()

In [None]:
print(type(LAT_Year))
print(type(LON_Year))

In [None]:
Lat_Lon_Data = pd.concat([LAT_Year,LON_Year],axis=1)
Lat_Lon_Data["year"] = Lat_Lon_Data.index
Lat_Lon_Data.index.name = None
Lat_Lon_Data = Lat_Lon_Data.reset_index()
Lat_Lon_Data = Lat_Lon_Data.drop("index",axis=1)
# WE WILL USE IT FOR LATER

In [None]:
print(Lat_Lon_Data.head(-1))

In [None]:
print(Lat_Lon_Data.isnull().sum())

##### LET'S CONTINUE TO CHECK GROUPBY MEAN

In [None]:
print("MASS MEAN:\n")
print(Data.groupby(["recclass"])["mass"].mean())

In [None]:
figure = plt.figure(figsize=(15,5))
plt.title("MASS - RECCLASS")
plt.plot(Data.groupby(["recclass"])["mass"].mean())

In [None]:
print("MASS MEAN:\n")
print(Data.groupby(["fall"])["mass"].mean())

In [None]:
figure = plt.figure(figsize=(15,5))
plt.title("MASS - FALL")
plt.plot(Data.groupby(["fall"])["mass"].mean())

In [None]:
figure,axis = plt.subplots(1,3,figsize=(18,5))

axis[0].set_title("FALL-MASS")
axis[0].plot(Data.groupby(["fall"])["mass"].mean())

axis[1].set_title("CLASS-MASS")
axis[1].plot(Data.groupby(["recclass"])["mass"].mean())

axis[2].set_title("YEAR-MASS")
axis[2].plot(Data.groupby(["year"])["mass"].mean())

plt.tight_layout()
plt.show()

In [None]:
figure,axis = plt.subplots(1,3,figsize=(18,5))

axis[0].set_title("MASS")
axis[0].hist(Data["mass"])

axis[1].set_title("CLASS")
axis[1].hist(Data["recclass"])

axis[2].set_title("YEAR")
axis[2].hist(Data["year"])

plt.tight_layout()
plt.show()

In [None]:
figure = plt.figure(figsize=(12,8))
    
sns.scatterplot(x=Data["year"],y=Data["mass"],hue=Data["fall"])
plt.title("YEAR-MASS / FALL")
plt.legend(prop=dict(size=10))
    
plt.tight_layout()
plt.show()

In [None]:
figure = plt.figure(figsize=(12,8))
    
sns.scatterplot(x=Data["year"],y=Data["mass"],hue=Data["recclass"])
plt.title("YEAR-MASS / CLASS")
plt.legend(prop=dict(size=10))
    
plt.tight_layout()
plt.show()

In [None]:
figure = plt.figure(figsize=(12,8))
    
sns.lineplot(x=Data["year"],y=Data["mass"],hue=Data["fall"])
plt.title("YEAR-MASS / FALL")
plt.legend(prop=dict(size=10))
    
plt.tight_layout()
plt.show()

In [None]:
figure = plt.figure(figsize=(12,8))
    
sns.lineplot(x=Data["year"],y=Data["mass"],hue=Data["recclass"])
plt.title("YEAR-MASS / CLASS")
plt.legend(prop=dict(size=10))
    
plt.tight_layout()
plt.show()

In [None]:
figure = plt.figure(figsize=(12,8))
    
sns.barplot(x=Data["year"],y=Data["fall"])
plt.title("YEAR-FALL")
plt.legend(prop=dict(size=10))
    
plt.tight_layout()
plt.show()

In [None]:
figure = plt.figure(figsize=(12,8))
    
sns.barplot(x=Data["year"],y=Data["recclass"])
plt.title("YEAR-CLASS")
plt.legend(prop=dict(size=10))
    
plt.tight_layout()
plt.show()

In [None]:
figure = plt.figure(figsize=(12,8))

Data.year.hist(bins=np.arange(1900,2014,1))
plt.title("YEARS")

In [None]:
figure = plt.figure(figsize=(12,8))

Data.mass.hist(bins=np.arange(0.05,27,1))
plt.title("MASS")

In [None]:
figure = plt.figure(figsize=(12,8))

Data.nametype.hist()
plt.title("TYPE")

#### SPECIAL SPLITTING

In [None]:
Fall_Type_Found = Data[Data["fall"] == "Found"]
Fall_Type_Fell = Data[Data["fall"] == "Fell"]

In [None]:
Fall_Type_Found = Fall_Type_Found.reset_index(drop=True)
Fall_Type_Fell = Fall_Type_Fell.reset_index(drop=True)

In [None]:
print(Fall_Type_Found.head(-1))

In [None]:
print(Fall_Type_Fell.head(-1))

In [None]:
figure = plt.figure(figsize=(12,8))

Fall_Type_Found.year.hist(bins=np.arange(1900,2014,1))
plt.title("YEARS")

In [None]:
figure = plt.figure(figsize=(12,8))

Fall_Type_Fell.year.hist(bins=np.arange(1900,2014,1))
plt.title("YEARS")

#### ANOTHER CHECKING

In [None]:
plot_diff([Fall_Type_Found,Fall_Type_Fell])

In [None]:
plot(Data)

In [None]:
plot(Data, "mass")

In [None]:
plot(Data, "year")

In [None]:
plot_correlation(Encoded_Data)

In [None]:
pp.ProfileReport(Data)

#### MAP

* GENERAL CHECK

In [None]:
figure = plt.figure(figsize=(17,12))

Map_Plot = Basemap(projection="cyl",resolution='c')
Map_Plot.drawmapboundary(fill_color="w")
Map_Plot.drawcoastlines(linewidth=0.5)
Map_Plot.drawmeridians(range(0, 360, 20),linewidth=0.7)
Map_Plot.drawparallels([-66,-23,0.0,23,66],linewidth=0.7)
lon,lat = Map_Plot(Data["reclong"],Data["reclat"])
Map_Plot.scatter(lon,lat,marker="*",alpha=0.20,color="r",edgecolor="None")
plt.title("METEORITE IMPACT")

#### CHECK THAT POPULATION DENSITY

![](https://d3j021pzfm19r2.cloudfront.net/wp-content/uploads/2014/12/world-population-density-map.jpg)

#### CHECK THAT BIODIVERSITY MAP

![](https://upload.wikimedia.org/wikipedia/commons/d/d9/Biodiversity_Hotspots_Map.jpg)

#### IT IS ALMOST SIMILAR RIGHT? 
### WOW!

#### DON'T FORGET
* Homo sapiens, the first modern humans, evolved from their early hominid predecessors between 200,000 and 300,000 years ago. They developed a capacity for language about 50,000 years ago.
* The first modern humans began moving outside of Africa starting about 70,000-100,000 years ago.

In [None]:
figure = plt.figure(figsize=(17,12))

Map_Plot = Basemap(projection="cyl",resolution='c')

Map_Plot.drawcoastlines(linewidth=0.5)
Map_Plot.drawmeridians(range(0, 360, 20),linewidth=0.7)
Map_Plot.drawparallels([-66,-23,0.0,23,66],linewidth=0.7)
lon,lat = Map_Plot(Data["reclong"],Data["reclat"])
Map_Plot.scatter(lon,lat,marker="*",alpha=0.20,color="r",edgecolor="None")
plt.title("METEORITE IMPACT")

In [None]:
figure = plt.figure(figsize=(17,12))

Map_Plot = Basemap(width=12000000,height=9000000,projection='lcc',resolution='c',lat_1=45.,lat_2=55,lat_0=50,lon_0=-107.)
Map_Plot.drawmapboundary(fill_color="w")
Map_Plot.drawcoastlines(linewidth=0.5)
Map_Plot.drawmeridians(range(0, 360, 20),linewidth=0.7)
Map_Plot.drawparallels([-66,-23,0.0,23,66],linewidth=0.7)

lon,lat = Data["reclong"][0],Data["reclat"][0]

xpt,ypt = Map_Plot(lon,lat)
lonpt, latpt = Map_Plot(xpt,ypt,inverse=True)

Map_Plot.plot(xpt,ypt,'ro')
plt.title("SINGLE EXAMPLE METEORITE IMPACT")
plt.show()

In [None]:
figure = plt.figure(figsize=(17,12))
Map_Plot = Basemap(projection='cyl',llcrnrlat=10,llcrnrlon=-20,urcrnrlat=50,urcrnrlon=40,resolution='c')
Map_Plot.etopo()
Map_Plot.drawcountries()

Map_Plot.scatter(Data.reclong,Data.reclat,edgecolor='none',color='r',alpha=0.6)

plt.title("MEDITERRANEAN-AFRICA METEORITE IMPACT", fontsize=15)

In [None]:
figure = plt.figure(figsize=(17,12))
Map_Plot = Basemap(projection='cyl',llcrnrlat=5,llcrnrlon=-10,urcrnrlat=30,urcrnrlon=70,resolution='c')
Map_Plot.etopo()
Map_Plot.drawcountries()

Map_Plot.scatter(Data.reclong,Data.reclat,edgecolor='none',color='r',alpha=0.6)

plt.title("METEORITE IMPACT", fontsize=15)

In [None]:
figure = plt.figure(figsize=(17,12))
Map_Plot = Basemap(projection='cyl',llcrnrlat=50,llcrnrlon=-40,urcrnrlat=70,urcrnrlon=10,resolution='c')
Map_Plot.bluemarble()
Map_Plot.drawcountries()

Map_Plot.scatter(Data.reclong,Data.reclat,edgecolor='none',color='r',alpha=0.6)

plt.title("METEORITE IMPACT", fontsize=15)

In [None]:
figure = plt.figure(figsize=(17,12))

Map_Plot = Basemap(projection="cyl",resolution='c')
Map_Plot.drawmapboundary(fill_color="w")
Map_Plot.drawcoastlines(linewidth=0.5)
Map_Plot.drawmeridians(range(0, 360, 20),linewidth=0.7)
Map_Plot.drawparallels([-66,-23,0.0,23,66],linewidth=0.7)
lon_FE,lat_FE = Map_Plot(Fall_Type_Fell["reclong"],Fall_Type_Fell["reclat"])
lon_FO,lat_FO = Map_Plot(Fall_Type_Found["reclong"],Fall_Type_Found["reclat"])
Map_Plot.scatter(lon_FE,lat_FE,marker="*",alpha=0.20,color="r",edgecolor="None")
Map_Plot.scatter(lon_FO,lat_FO,marker="*",alpha=0.20,color="k",edgecolor="None")
plt.title("TYPE")

* MEAN FOR YEARS

* REMEMBER THAT DATA WE CREATED! IT IS IMPORTANT!

In [None]:
figure = plt.figure(figsize=(17,12))

Map_Plot = Basemap(projection="cyl",resolution='c')
Map_Plot.drawmapboundary(fill_color="w")
Map_Plot.drawcoastlines(linewidth=0.5)
Map_Plot.drawmeridians(range(0, 360, 20),linewidth=0.7)
Map_Plot.drawparallels([-66,-23,0.0,23,66],linewidth=0.7)
lon,lat = Map_Plot(Lat_Lon_Data["reclong"],Lat_Lon_Data["reclat"])
Map_Plot.scatter(lon,lat,marker=".",color="k",edgecolor="None")
plt.title("METEORITE IMPACT")

In [None]:
figure = plt.figure(figsize=(17,12))

Map_Plot = Basemap(projection="cyl",resolution='c')
Map_Plot.drawmapboundary(fill_color="w")
Map_Plot.drawcoastlines(linewidth=0.5)
Map_Plot.drawmeridians(range(0, 360, 20),linewidth=0.7)
Map_Plot.drawparallels([-66,-23,0.0,23,66],linewidth=0.7)
lon,lat = Map_Plot(Lat_Lon_Data["reclong"],Lat_Lon_Data["reclat"])
Map_Plot.plot(lon,lat,color="r")
plt.title("METEORITE IMPACT")

#### LET'S CHECK EARTHQUAKES! WHO KNOWS,WE MAY FIND A CONNECTION BETWEEN

In [None]:
Eartquakes_CSV = pd.read_csv("../input/earthquake-database/database.csv")

In [None]:
print(Eartquakes_CSV.columns)

In [None]:
figure = plt.figure(figsize=(17,12))

Map_Plot = Basemap(projection="cyl",resolution='c')
Map_Plot.drawmapboundary(fill_color="w")
Map_Plot.drawcoastlines(linewidth=0.5)
Map_Plot.drawmeridians(range(0, 360, 20),linewidth=0.7)
Map_Plot.drawparallels([-66,-23,0.0,23,66],linewidth=0.7)
lon,lat = Map_Plot(Eartquakes_CSV["Longitude"],Eartquakes_CSV["Latitude"])
Map_Plot.scatter(lon,lat,marker=".",color="k",edgecolor="None")
plt.title("EARTHQUAKES")

In [None]:
figure = plt.figure(figsize=(17,12))

Map_Plot = Basemap(projection="cyl",resolution='c')
Map_Plot.drawmapboundary(fill_color="w")
Map_Plot.drawcoastlines(linewidth=0.5)
Map_Plot.drawmeridians(range(0, 360, 20),linewidth=0.7)
Map_Plot.drawparallels([-66,-23,0.0,23,66],linewidth=0.7)
lon,lat = Map_Plot(Eartquakes_CSV["Longitude"],Eartquakes_CSV["Latitude"])
lon_M,lat_M = Map_Plot(Data["reclong"],Data["reclat"])
Map_Plot.scatter(lon_M,lat_M,marker="*",alpha=0.20,color="r",edgecolor="None")
Map_Plot.scatter(lon,lat,marker=".",color="k",edgecolor="None")
plt.title("EARTHQUAKES")

##### NO! WE ONLY SEE TECTONIC PLATES!

#### MAYBE UFO SIGHTING?

In [None]:
UFO_Csv = pd.read_csv("../input/ufo-sightings-around-the-world/ufo_sighting_data.csv",low_memory=False)

In [None]:
print(UFO_Csv.columns)

In [None]:
figure = plt.figure(figsize=(17,12))

Map_Plot = Basemap(projection="cyl",resolution='c')
Map_Plot.drawmapboundary(fill_color="w")
Map_Plot.drawcoastlines(linewidth=0.5)
Map_Plot.drawmeridians(range(0, 360, 20),linewidth=0.7)
Map_Plot.drawparallels([-66,-23,0.0,23,66],linewidth=0.7)
lon,lat = Map_Plot(UFO_Csv["longitude"],UFO_Csv["latitude"])
Map_Plot.scatter(lon,lat,marker=".",color="k",edgecolor="None")
plt.title("UFO SIGHTING")

##### IT WILL BE MOST IN AMERICA OF COURSE! HOLLYWOOD!
##### NO WAY!

#### Locations of some prominent places on Earth? LET'S CHECK!

In [None]:
Wonder_CSV = pd.read_csv("../input/wonders-of-world/wonders_of_world.csv")

In [None]:
print(Wonder_CSV.columns)

In [None]:
figure = plt.figure(figsize=(17,12))

Map_Plot = Basemap(projection="cyl",resolution='c')
Map_Plot.drawmapboundary(fill_color="w")
Map_Plot.drawcoastlines(linewidth=0.5)
Map_Plot.drawmeridians(range(0, 360, 20),linewidth=0.7)
Map_Plot.drawparallels([-66,-23,0.0,23,66],linewidth=0.7)
lon,lat = Map_Plot(Wonder_CSV["Longitude"],Wonder_CSV["Latitude"])
Map_Plot.scatter(lon,lat,marker="o",color="k",edgecolor="None")
plt.title("PROMINENT PLACES")

In [None]:
figure = plt.figure(figsize=(17,12))

Map_Plot = Basemap(projection="cyl",resolution='c')
Map_Plot.drawmapboundary(fill_color="w")
Map_Plot.drawcoastlines(linewidth=0.5)
Map_Plot.drawmeridians(range(0, 360, 20),linewidth=0.7)
Map_Plot.drawparallels([-66,-23,0.0,23,66],linewidth=0.7)
lon,lat = Map_Plot(Wonder_CSV["Longitude"],Wonder_CSV["Latitude"])
lon_M,lat_M = Map_Plot(Data["reclong"],Data["reclat"])
Map_Plot.scatter(lon_M,lat_M,marker="*",alpha=0.20,color="r",edgecolor="None")
Map_Plot.scatter(lon,lat,marker="o",color="k",edgecolor="None")
plt.title("PROMINENT PLACES")

##### HMMM? But it could also be about civilizations.

## END OF THE PROJECT! THANK YOU!