# <center> Meteroite Data Analysis

**The dataset contains the following variables:**

* `name`: the name of the meteorite
* `id`: a unique identifier for the meteorite
* `nametype`: one of: -- valid
* `recclass`: the class of the meteorite;
* `mass`: the mass of the meteorite
* `fall`: whether the meteorite was seen falling, or was discovered after its impact; one of: -- Fell: the meteorite's fall was observed -- Found: the meteorite's fall was not observed
* `year`: the year the meteorite fell, or the year it was found (depending on the value of fell)
* `reclat`: the latitude of the meteorite's landing
* `reclong`: the longitude of the meteorite's landing
* `GeoLocation`: a parentheses-enclose, comma-separated tuple that combines reclat and reclong

#### Insights to be drawn -

* [Get all the Earth meteorites that fell before the year 2000](#two)
* [Get all the earth meteorites co-ordinates who fell before the year 1970](#co)
* [Assuming that the mass of the earth meteorites was in kg, get all those whose mass was more than 10000kg](#ten)

In [1]:
# Importing all the required liberaries
import pandas as pd
from datetime import datetime
import plotly.express as px
import plotly.graph_objects as go
import plotly.offline as pyo
from plotly.offline import init_notebook_mode
init_notebook_mode(connected=True)

In [2]:
# load dataset 
meteroite_df = pd.read_csv("Meteorite Data.csv")
meteroite_df.head() # top 5 entries

Unnamed: 0,Name,Id,Nametype,Recclass,Mass,Fall,Year,Reclat,Reclong,Type geolocation,Coordinates geolocation,:@computed_region_cbhk_fwbd,:@computed_region_nnqa_25f4
0,Aachen,1,Valid,L5,21.0,Fell,1880-01-01T00:00:00.000,50.775,6.08333,Point,"[6.08333, 50.775]",,
1,Aarhus,2,Valid,H6,720.0,Fell,1951-01-01T00:00:00.000,56.18333,10.23333,Point,"[10.23333, 56.18333]",,
2,Abee,6,Valid,EH4,107000.0,Fell,1952-01-01T00:00:00.000,54.21667,-113.0,Point,"[-113, 54.21667]",,
3,Acapulco,10,Valid,Acapulcoite,1914.0,Fell,1976-01-01T00:00:00.000,16.88333,-99.9,Point,"[-99.9, 16.88333]",,
4,Achiras,370,Valid,L6,780.0,Fell,1902-01-01T00:00:00.000,-33.16667,-64.95,Point,"[-64.95, -33.16667]",,


In [3]:
# drop unrequired columns
drop_columns = ["Id", ":@computed_region_cbhk_fwbd", ":@computed_region_nnqa_25f4", "Type geolocation"]
meteroite_df = meteroite_df.drop(columns=drop_columns, axis=1)
meteroite_df.head()

Unnamed: 0,Name,Nametype,Recclass,Mass,Fall,Year,Reclat,Reclong,Coordinates geolocation
0,Aachen,Valid,L5,21.0,Fell,1880-01-01T00:00:00.000,50.775,6.08333,"[6.08333, 50.775]"
1,Aarhus,Valid,H6,720.0,Fell,1951-01-01T00:00:00.000,56.18333,10.23333,"[10.23333, 56.18333]"
2,Abee,Valid,EH4,107000.0,Fell,1952-01-01T00:00:00.000,54.21667,-113.0,"[-113, 54.21667]"
3,Acapulco,Valid,Acapulcoite,1914.0,Fell,1976-01-01T00:00:00.000,16.88333,-99.9,"[-99.9, 16.88333]"
4,Achiras,Valid,L6,780.0,Fell,1902-01-01T00:00:00.000,-33.16667,-64.95,"[-64.95, -33.16667]"


In [4]:
print("size")
meteroite_df.shape

size


(1000, 9)

In [5]:
print("basic info about dataset")
meteroite_df.info()

basic info about dataset
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 9 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Name                     1000 non-null   object 
 1   Nametype                 1000 non-null   object 
 2   Recclass                 1000 non-null   object 
 3   Mass                     972 non-null    float64
 4   Fall                     1000 non-null   object 
 5   Year                     999 non-null    object 
 6   Reclat                   988 non-null    float64
 7   Reclong                  988 non-null    float64
 8   Coordinates geolocation  988 non-null    object 
dtypes: float64(3), object(6)
memory usage: 70.4+ KB


In [6]:
# having null values ?
meteroite_df.isnull().sum()

Name                        0
Nametype                    0
Recclass                    0
Mass                       28
Fall                        0
Year                        1
Reclat                     12
Reclong                    12
Coordinates geolocation    12
dtype: int64

In [7]:
# any duplicate row?
any_dubplicate_row = any(meteroite_df.duplicated()) == True
any_dubplicate_row

False

In [8]:
# drop the one null row form the dataset
meteroite_df.dropna(axis=0, subset="Year", inplace=True)
meteroite_df["Year"].isnull().sum()

0

In [9]:
# statistics of our data
meteroite_df.describe()

Unnamed: 0,Mass,Reclat,Reclong
count,971.0,987.0,987.0
mean,50241.62,29.721674,19.170612
std,754372.5,23.196879,68.676105
min,0.15,-44.11667,-157.86667
25%,680.0,21.491945,-5.241665
50%,2900.0,35.95,17.53333
75%,10100.0,45.819,76.008335
max,23000000.0,66.34833,174.4


In [10]:
# stastical description of categorical columns
meteroite_df.describe(include=object)

Unnamed: 0,Name,Nametype,Recclass,Fall,Year,Coordinates geolocation
count,999,999,999,999,999,987
unique,999,1,118,2,247,986
top,Aachen,Valid,L6,Fell,1933-01-01T00:00:00.000,"[12.43333, 7.05]"
freq,1,999,242,996,16,2


### Get all the Earth meteorites that fell before the year 2000 <a name="two"></a>

In [11]:
# Convert year column into datetime format
meteroite_df["Year"] = meteroite_df['Year'].apply(lambda y: datetime.strptime(y, "%Y-%m-%dT%H:%M:%S.%f").year)

In [12]:
meteriote_before_2000 = meteroite_df.loc[(meteroite_df["Year"] < 2000) & (meteroite_df["Fall"] != "Found")]
meteriote_before_2000

Unnamed: 0,Name,Nametype,Recclass,Mass,Fall,Year,Reclat,Reclong,Coordinates geolocation
0,Aachen,Valid,L5,21.0,Fell,1880,50.77500,6.08333,"[6.08333, 50.775]"
1,Aarhus,Valid,H6,720.0,Fell,1951,56.18333,10.23333,"[10.23333, 56.18333]"
2,Abee,Valid,EH4,107000.0,Fell,1952,54.21667,-113.00000,"[-113, 54.21667]"
3,Acapulco,Valid,Acapulcoite,1914.0,Fell,1976,16.88333,-99.90000,"[-99.9, 16.88333]"
4,Achiras,Valid,L6,780.0,Fell,1902,-33.16667,-64.95000,"[-64.95, -33.16667]"
...,...,...,...,...,...,...,...,...,...
994,Timochin,Valid,H5,65500.0,Fell,1807,54.50000,35.20000,"[35.2, 54.5]"
995,Tirupati,Valid,H6,230.0,Fell,1934,13.63333,79.41667,"[79.41667, 13.63333]"
997,Tjabe,Valid,H6,20000.0,Fell,1869,-7.08333,111.53333,"[111.53333, -7.08333]"
998,Tjerebon,Valid,L5,16500.0,Fell,1922,-6.66667,106.58333,"[106.58333, -6.66667]"


In [13]:
print("Minimum - Maximum Year\n")
meteroite_df["Year"].min(), meteroite_df["Year"].max()

Minimum - Maximum Year



(861, 2013)

In [14]:
# histogram represents year and count of falling
fig = px.histogram(data_frame=meteroite_df, x="Year", color="Fall")
# gap between bars
fig.update_layout(bargap=0.1, title="Histogram with Year and couts")
fig.show()

**`In the above histogram, we can clearly see that the maximum number of meteorite fell between the years 1920 to 1940.`**

### Get all the earth meteorites co-ordinates who fell before the year 1970 <a name="co"></a>

In [15]:
# cleaing column into more stirng form
meteroite_df["Coordinates geolocation"] = meteroite_df["Coordinates geolocation"].apply(
    lambda cord: cord.strip('[]').strip() if type(cord) == str else "")

In [16]:
meteroite_df

Unnamed: 0,Name,Nametype,Recclass,Mass,Fall,Year,Reclat,Reclong,Coordinates geolocation
0,Aachen,Valid,L5,21.0,Fell,1880,50.77500,6.08333,"6.08333, 50.775"
1,Aarhus,Valid,H6,720.0,Fell,1951,56.18333,10.23333,"10.23333, 56.18333"
2,Abee,Valid,EH4,107000.0,Fell,1952,54.21667,-113.00000,"-113, 54.21667"
3,Acapulco,Valid,Acapulcoite,1914.0,Fell,1976,16.88333,-99.90000,"-99.9, 16.88333"
4,Achiras,Valid,L6,780.0,Fell,1902,-33.16667,-64.95000,"-64.95, -33.16667"
...,...,...,...,...,...,...,...,...,...
995,Tirupati,Valid,H6,230.0,Fell,1934,13.63333,79.41667,"79.41667, 13.63333"
996,Tissint,Valid,Martian (shergottite),7000.0,Fell,2011,29.48195,-7.61123,"-7.61123, 29.48195"
997,Tjabe,Valid,H6,20000.0,Fell,1869,-7.08333,111.53333,"111.53333, -7.08333"
998,Tjerebon,Valid,L5,16500.0,Fell,1922,-6.66667,106.58333,"106.58333, -6.66667"


In [17]:
# meteroite coordinates who fell before year 1970
coordinate_before_1970 = meteroite_df.loc[(meteroite_df["Year"] < 1970) & 
                                                                     (meteroite_df["Fall"] == "Fell")]
coordinate_before_1970["Coordinates geolocation"]

0          6.08333, 50.775
1       10.23333, 56.18333
2           -113, 54.21667
4        -64.95, -33.16667
5               71.8, 32.1
              ...         
994             35.2, 54.5
995     79.41667, 13.63333
997    111.53333, -7.08333
998    106.58333, -6.66667
999        34.76667, 47.85
Name: Coordinates geolocation, Length: 780, dtype: object

In [18]:
# Create map shows meteroite location which fell before 1970
fig2 = px.scatter_geo(coordinate_before_1970,
                          lat="Reclat",
                          lon="Reclong",
                          hover_name="Name",
                          color=coordinate_before_1970["Year"],
                          template="simple_white",
                          hover_data=['Mass']
                         )

fig2.update_layout(title = "Meteroite fall Location before year 1970")
fig2.show()

**`After analyzing we observe that, 780 meteriotes fell before the year 1970.`** 

### Assuming that the mass of the earth meteorites was in kg, get all those whose mass was more than 10000kg <a name="ten"></a>

In [19]:
# meteroites whose mass is more than 10000kg
meteroite_mass_more_10000kg = meteroite_df[meteroite_df["Mass"] > 10000]
meteroite_mass_more_10000kg

Unnamed: 0,Name,Nametype,Recclass,Mass,Fall,Year,Reclat,Reclong,Coordinates geolocation
2,Abee,Valid,EH4,107000.0,Fell,1952,54.21667,-113.00000,"-113, 54.21667"
7,Agen,Valid,H5,30000.0,Fell,1814,44.21667,0.61667,"0.61667, 44.21667"
11,Aïr,Valid,L6,24000.0,Fell,1925,19.08333,8.38333,"8.38333, 19.08333"
16,Akyumak,Valid,"Iron, IVA",50000.0,Fell,1981,39.91667,42.81667,"42.81667, 39.91667"
27,Alfianello,Valid,L6,228000.0,Fell,1883,45.26667,10.15000,"10.15, 45.26667"
...,...,...,...,...,...,...,...,...,...
991,Tieschitz,Valid,H/L3.6,28000.0,Fell,1878,49.60000,17.11667,"17.11667, 49.6"
992,Tilden,Valid,L6,74800.0,Fell,1927,38.20000,-89.68333,"-89.68333, 38.2"
994,Timochin,Valid,H5,65500.0,Fell,1807,54.50000,35.20000,"35.2, 54.5"
997,Tjabe,Valid,H6,20000.0,Fell,1869,-7.08333,111.53333,"111.53333, -7.08333"


In [20]:
# Plotting graph which shows mass and its class
mass = meteroite_mass_more_10000kg["Mass"]/1000
fig3 = px.scatter(data_frame=meteroite_mass_more_10000kg, x='Recclass', y=mass, color='Year')
# Labels
fig3.update_layout(title="Mass of meteroite based on class", yaxis_title="Mass")
fig3.show()