# Introduction to the dataset being explored!

### Summarize the characteristics of the dataset in words: what does it represent, what are the fields/columns/rows, what data types are they, etc (30 pts)


The Meteoritical Society gathers data about meteorites that have fallen to Earth from space. This dataset is the sample data, which contains 1000 rows from the larger dataset's 45,000 rows. It contains information on over 1000 meteorites that have impacted our planet, including their name, id, nametype, recclass, mass fall, position, year, reclat, reclong, and geolocation.

The dataset contains the following variables:

->name: Meteorite name

->id: Meteorite's unique identifier

->nametype: 1. valid: a typical meteorite / 2. relict: a meteorite that has been highly degraded by weather on Earth

->recclass: Meteorite Classficcation

->mass: Meteorite's mass in grams

->fall: 1. Fell: the meteorite's fall was observed / 2. Found: the meteorite's fall was not observed

->year: Year of Falling or observing of Meteorite

->reclat: Meteorite's landing lattitude

->reclong: Meteorite's landing longitude

->GeoLocation: reclat and reclong as comma separated tuple enclosed in parentheses

### The datatype of the columns present in the dataset can be retreived using meteorite.info() code which gives the following result -:

->name: object

->id: int64

->nametype: object

->recclass: object

->mass: float64

->fall: object

->year: object

->reclat: float64

->reclong: float64

->GeoLocation: object

### What is the "name" of the dataset?

The dataset is called 'Earth_Meteorite_Landings'.

### Where did you obtain it?

The dataset is available on NASA website under Space-Science -> Meteorite-Landings section.

### Where can we obtain it? (i.e., URL)

There are 2 versions of the dataset that is available on NASA's website, one is with 45,716 rows which is the full dataset and can be found here https://data.nasa.gov/Space-Science/Meteorite-Landings/gh4g-9sfh/data , for the project I've selected a sample dataset having 1000 entires which can be found here https://data.nasa.gov/resource/gh4g-9sfh.json 

### What is the license of the dataset? What are we allowed to do with it?

This dataset is publicly available on the NASA website and we are allowed to explore it.
Privacy policy can be found here https://www.nasa.gov/about/highlights/HP_Privacy.html

### How big is it in file size and in items?

The size of the file (sample data used for analysis) is 242 KB and has 1000 rows of entries.

### Part 2 Final Project


In [1]:
#Reading other useful libraries for Part2
import json
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import bqplot
import ipywidgets
import bqplot.pyplot

In [2]:
#Reading the dataset again for Part as additional filtering may be required and manipulation to year field
meteor_landing = pd.read_json("Earth_Meteorite_Landings.json") #reading the meteorite json dataset again
meteor_landing

Unnamed: 0,name,id,nametype,recclass,mass,fall,year,reclat,reclong,geolocation,:@computed_region_cbhk_fwbd,:@computed_region_nnqa_25f4
0,Aachen,1,Valid,L5,21.0,Fell,1880-01-01T00:00:00.000,50.77500,6.08333,"{'latitude': '50.775', 'longitude': '6.08333'}",,
1,Aarhus,2,Valid,H6,720.0,Fell,1951-01-01T00:00:00.000,56.18333,10.23333,"{'latitude': '56.18333', 'longitude': '10.23333'}",,
2,Abee,6,Valid,EH4,107000.0,Fell,1952-01-01T00:00:00.000,54.21667,-113.00000,"{'latitude': '54.21667', 'longitude': '-113.0'}",,
3,Acapulco,10,Valid,Acapulcoite,1914.0,Fell,1976-01-01T00:00:00.000,16.88333,-99.90000,"{'latitude': '16.88333', 'longitude': '-99.9'}",,
4,Achiras,370,Valid,L6,780.0,Fell,1902-01-01T00:00:00.000,-33.16667,-64.95000,"{'latitude': '-33.16667', 'longitude': '-64.95'}",,
...,...,...,...,...,...,...,...,...,...,...,...,...
995,Tirupati,24009,Valid,H6,230.0,Fell,1934-01-01T00:00:00.000,13.63333,79.41667,"{'latitude': '13.63333', 'longitude': '79.41667'}",,
996,Tissint,54823,Valid,Martian (shergottite),7000.0,Fell,2011-01-01T00:00:00.000,29.48195,-7.61123,"{'latitude': '29.48195', 'longitude': '-7.61123'}",,
997,Tjabe,24011,Valid,H6,20000.0,Fell,1869-01-01T00:00:00.000,-7.08333,111.53333,"{'latitude': '-7.08333', 'longitude': '111.533...",,
998,Tjerebon,24012,Valid,L5,16500.0,Fell,1922-01-01T00:00:00.000,-6.66667,106.58333,"{'latitude': '-6.66667', 'longitude': '106.583...",,


In [3]:
meteor_landing = meteor_landing.drop([":@computed_region_cbhk_fwbd", ":@computed_region_nnqa_25f4"], axis=1) #dropping extra fields which are of no use


In [4]:
meteor_landing.head()

Unnamed: 0,name,id,nametype,recclass,mass,fall,year,reclat,reclong,geolocation
0,Aachen,1,Valid,L5,21.0,Fell,1880-01-01T00:00:00.000,50.775,6.08333,"{'latitude': '50.775', 'longitude': '6.08333'}"
1,Aarhus,2,Valid,H6,720.0,Fell,1951-01-01T00:00:00.000,56.18333,10.23333,"{'latitude': '56.18333', 'longitude': '10.23333'}"
2,Abee,6,Valid,EH4,107000.0,Fell,1952-01-01T00:00:00.000,54.21667,-113.0,"{'latitude': '54.21667', 'longitude': '-113.0'}"
3,Acapulco,10,Valid,Acapulcoite,1914.0,Fell,1976-01-01T00:00:00.000,16.88333,-99.9,"{'latitude': '16.88333', 'longitude': '-99.9'}"
4,Achiras,370,Valid,L6,780.0,Fell,1902-01-01T00:00:00.000,-33.16667,-64.95,"{'latitude': '-33.16667', 'longitude': '-64.95'}"


In [5]:
meteor_landing.mass.value_counts()

4000.0     15
1000.0     14
2000.0     13
6000.0     13
5000.0     12
           ..
968.0       1
288.0       1
1915.0      1
2910.0      1
65500.0     1
Name: mass, Length: 639, dtype: int64

In [6]:
meteor_landing.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   name         1000 non-null   object 
 1   id           1000 non-null   int64  
 2   nametype     1000 non-null   object 
 3   recclass     1000 non-null   object 
 4   mass         972 non-null    float64
 5   fall         1000 non-null   object 
 6   year         999 non-null    object 
 7   reclat       988 non-null    float64
 8   reclong      988 non-null    float64
 9   geolocation  988 non-null    object 
dtypes: float64(3), int64(1), object(6)
memory usage: 78.2+ KB


In [7]:
meteor_landing_new = meteor_landing.drop(meteor_landing[meteor_landing.mass == 'nan'].index) #drop rows where mass is 'nan'
meteor_landing_new = meteor_landing_new.drop(meteor_landing_new[meteor_landing_new.year == '<NA>'].index) #drop rows where year is <NA>
meteor_landing_new["year"] = pd.to_datetime(meteor_landing_new['year'], errors = 'coerce') 
meteor_landing_new

Unnamed: 0,name,id,nametype,recclass,mass,fall,year,reclat,reclong,geolocation
0,Aachen,1,Valid,L5,21.0,Fell,1880-01-01,50.77500,6.08333,"{'latitude': '50.775', 'longitude': '6.08333'}"
1,Aarhus,2,Valid,H6,720.0,Fell,1951-01-01,56.18333,10.23333,"{'latitude': '56.18333', 'longitude': '10.23333'}"
2,Abee,6,Valid,EH4,107000.0,Fell,1952-01-01,54.21667,-113.00000,"{'latitude': '54.21667', 'longitude': '-113.0'}"
3,Acapulco,10,Valid,Acapulcoite,1914.0,Fell,1976-01-01,16.88333,-99.90000,"{'latitude': '16.88333', 'longitude': '-99.9'}"
4,Achiras,370,Valid,L6,780.0,Fell,1902-01-01,-33.16667,-64.95000,"{'latitude': '-33.16667', 'longitude': '-64.95'}"
...,...,...,...,...,...,...,...,...,...,...
995,Tirupati,24009,Valid,H6,230.0,Fell,1934-01-01,13.63333,79.41667,"{'latitude': '13.63333', 'longitude': '79.41667'}"
996,Tissint,54823,Valid,Martian (shergottite),7000.0,Fell,2011-01-01,29.48195,-7.61123,"{'latitude': '29.48195', 'longitude': '-7.61123'}"
997,Tjabe,24011,Valid,H6,20000.0,Fell,1869-01-01,-7.08333,111.53333,"{'latitude': '-7.08333', 'longitude': '111.533..."
998,Tjerebon,24012,Valid,L5,16500.0,Fell,1922-01-01,-6.66667,106.58333,"{'latitude': '-6.66667', 'longitude': '106.583..."


In [8]:
meteor_landing_new["year"].dtype

dtype('<M8[ns]')

In [9]:
#Checking for the minimum year value present in the dataset
meteor_landing_new['year'].min()

Timestamp('1688-01-01 00:00:00')

In [10]:
#Checking for the maximum year value present in the dataset
meteor_landing_new['year'].max()

Timestamp('2013-01-01 00:00:00')

### This tells us that our dataset is between year 1688 and 2013

In [11]:
meteor_landing_new.columns

Index(['name', 'id', 'nametype', 'recclass', 'mass', 'fall', 'year', 'reclat',
       'reclong', 'geolocation'],
      dtype='object')

In [12]:
#Let's list the first 5 years from our dataset
meteor_landing_new.iloc[0:5]["year"]

0   1880-01-01
1   1951-01-01
2   1952-01-01
3   1976-01-01
4   1902-01-01
Name: year, dtype: datetime64[ns]

In [13]:
#Count of total number of unique years in the dataset
meteor_landing_new["year"].nunique()

230

In [14]:
#Count of total number of unique fall category in the dataset
meteor_landing_new["fall"].nunique()

2

In [15]:
#Displaying unique fall category in the dataset
meteor_landing_new["fall"].unique()

array(['Fell', 'Found'], dtype=object)

In [16]:
#List the total number of unique recclass present
meteor_landing_new["recclass"].nunique()

118

In [17]:
#List all the unqiue recclass present in the dataset
meteor_landing_new["recclass"].unique()

array(['L5', 'H6', 'EH4', 'Acapulcoite', 'L6', 'LL3-6', 'H5', 'L',
       'Diogenite-pm', 'Unknown', 'H4', 'H', 'Iron, IVA', 'CR2-an', 'LL5',
       'CI1', 'L/LL4', 'Eucrite-mmict', 'CV3', 'Ureilite-an',
       'Stone-uncl', 'L3', 'Angrite', 'LL6', 'L4', 'Aubrite',
       'Iron, IIAB', 'Iron, IAB-sLL', 'Iron, ungrouped', 'CM2', 'OC',
       'Mesosiderite-A1', 'LL4', 'C2-ung', 'LL3.8', 'Howardite',
       'Eucrite-pmict', 'Diogenite', 'LL3.15', 'LL3.9', 'Iron, IAB-MG',
       'H/L3.9', 'Iron?', 'Eucrite', 'H4-an', 'L/LL6', 'Iron, IIIAB',
       'H/L4', 'H4-5', 'L3.7', 'LL3.4', 'Martian (chassignite)', 'EL6',
       'H3.8', 'H3-5', 'H5-6', 'Mesosiderite', 'H5-7', 'L3-6', 'H4-6',
       'Ureilite', 'Iron, IID', 'Mesosiderite-A3/4', 'CO3.3', 'H3',
       'EH3/4-an', 'Iron, IIE', 'L/LL5', 'H3.7', 'CBa', 'H4/5', 'H3/4',
       'H?', 'H3-6', 'L3.4', 'Iron, IAB-sHL', 'L3.7-6', 'EH7-an', 'Iron',
       'CR2', 'CO3.2', 'K3', 'L5/6', 'CK4', 'Iron, IIE-an', 'L3.6',
       'LL3.2', 'Pallasite', 'CO

In [18]:
#Lets drop all the rows where mass and year is zero
meteor_landing_new.drop(meteor_landing_new[meteor_landing_new['mass']==0].index, inplace=True)
meteor_landing_new.drop(meteor_landing_new[meteor_landing_new['year']==0].index, inplace=True)

In [19]:
#This will clean the remaining Null, NA, 0 values for all the fields in the dataset
meteor_landing_new = meteor_landing_new.dropna()

In [20]:
#Check the datatype of all the fields in the dataset
meteor_landing_new.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 948 entries, 0 to 999
Data columns (total 10 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   name         948 non-null    object        
 1   id           948 non-null    int64         
 2   nametype     948 non-null    object        
 3   recclass     948 non-null    object        
 4   mass         948 non-null    float64       
 5   fall         948 non-null    object        
 6   year         948 non-null    datetime64[ns]
 7   reclat       948 non-null    float64       
 8   reclong      948 non-null    float64       
 9   geolocation  948 non-null    object        
dtypes: datetime64[ns](1), float64(3), int64(1), object(5)
memory usage: 81.5+ KB


In [21]:
meteor_landing_new

Unnamed: 0,name,id,nametype,recclass,mass,fall,year,reclat,reclong,geolocation
0,Aachen,1,Valid,L5,21.0,Fell,1880-01-01,50.77500,6.08333,"{'latitude': '50.775', 'longitude': '6.08333'}"
1,Aarhus,2,Valid,H6,720.0,Fell,1951-01-01,56.18333,10.23333,"{'latitude': '56.18333', 'longitude': '10.23333'}"
2,Abee,6,Valid,EH4,107000.0,Fell,1952-01-01,54.21667,-113.00000,"{'latitude': '54.21667', 'longitude': '-113.0'}"
3,Acapulco,10,Valid,Acapulcoite,1914.0,Fell,1976-01-01,16.88333,-99.90000,"{'latitude': '16.88333', 'longitude': '-99.9'}"
4,Achiras,370,Valid,L6,780.0,Fell,1902-01-01,-33.16667,-64.95000,"{'latitude': '-33.16667', 'longitude': '-64.95'}"
...,...,...,...,...,...,...,...,...,...,...
995,Tirupati,24009,Valid,H6,230.0,Fell,1934-01-01,13.63333,79.41667,"{'latitude': '13.63333', 'longitude': '79.41667'}"
996,Tissint,54823,Valid,Martian (shergottite),7000.0,Fell,2011-01-01,29.48195,-7.61123,"{'latitude': '29.48195', 'longitude': '-7.61123'}"
997,Tjabe,24011,Valid,H6,20000.0,Fell,1869-01-01,-7.08333,111.53333,"{'latitude': '-7.08333', 'longitude': '111.533..."
998,Tjerebon,24012,Valid,L5,16500.0,Fell,1922-01-01,-6.66667,106.58333,"{'latitude': '-6.66667', 'longitude': '106.583..."


In [22]:
#Let's plot a scatter plot for all the entries of latitude and longitude from our dataset using bqplot
# 1. Now that we have our data let's try bqplot with scatter plot

#Data in this case is our dataset above

# 2. Building the scales

x_sc = bqplot.LinearScale()
y_sc = bqplot.LinearScale()

# 3. Setting up the x-axis and y-axis
# Take Reclong as Longitude from the dataset for the x-axis of the scatterplot
# Take Reclat as Latitude from the dataset for the y-axis of the scatterplot

x_ax = bqplot.Axis(scale=x_sc,label='Longitude')
y_ax = bqplot.Axis(scale=y_sc, orientation='vertical', label='Latitude')

# 4. Creating the Marks based on the reclong and reclat of the dataset
scatters = bqplot.Scatter(x=meteor_landing_new['reclong'], y=meteor_landing_new['reclat'], 
                         scales={'x':x_sc, 'y':y_sc})

# Combining everything into a figure to display the scatterplot
fig = bqplot.Figure(marks=[scatters], axes=[x_ax,y_ax])
fig

Figure(axes=[Axis(label='Longitude', scale=LinearScale()), Axis(label='Latitude', orientation='vertical', scal…

In [23]:
#Since above scatter plot doesn't differntiate between rows plotted,
#Let's highlight the scatters based on mass of meteorites fallen on earth
#Everything else remains same, we just need to add color_axis to our scatter plot based on mass of meteorites

# 1. Now that we have our data let's try bqplot with scatter plot

#Data in this case is our dataset above

# 2. Building the scales
x_sc = bqplot.LinearScale()
y_sc = bqplot.LinearScale()

#Adding color_scale to the bqplot
col_sc = bqplot.ColorScale()

# 3. Setting up the x-axis and y-axis

x_ax = bqplot.Axis(scale=x_sc,label='Longitude')
y_ax = bqplot.Axis(scale=y_sc, orientation='vertical', label='Latitude')

#Adding the color_axis to the bqplot based on Mass of meteorites fallen and positioned vertically and placed on right side
col_ax = bqplot.ColorAxis(scale=col_sc, label='Mass of Meteorites', orientation='horizontal')

# 4. Creating the Marks based on the reclong and reclat of the dataset and adding the color axis and scales to it.
scatters = bqplot.Scatter(x=meteor_landing_new['reclong'], y=meteor_landing_new['reclat'], color=meteor_landing_new['mass'],
                         scales={'x':x_sc, 'y':y_sc, 'color':col_sc})

# Combining everything into a figure to display the scatterplot
fig = bqplot.Figure(marks=[scatters], axes=[x_ax,y_ax,col_ax])
fig

Figure(axes=[Axis(label='Longitude', scale=LinearScale()), Axis(label='Latitude', orientation='vertical', scal…

In [24]:
#Lets verify if the highest mass showing up on scatter plot should have close to 2.2 * 10^7 => 22000000 mass value
#It's showing up correctly on the scatter plot as verified by using max() for mass attribute in the dataset
meteor_landing_new['mass'].max()

23000000.0

In [25]:
#To see more variety of colors since min and max value of mass has huge difference
#We can use np.log10 and see the variance in mass of meteorites by different colors

#1. #Data in this case is our dataset m2_new above

# 2. Building the scales
x_sc = bqplot.LinearScale()
y_sc = bqplot.LinearScale()
col_sc = bqplot.ColorScale()

# 3. Setting up the x-axis, y-axis and color-axis
x_ax = bqplot.Axis(scale=x_sc,label='Longitude')
y_ax = bqplot.Axis(scale=y_sc, orientation='vertical', label='Latitude')
col_ax = bqplot.ColorAxis(scale=col_sc, label='log(mass)', orientation='vertical', side='right')

# 4. Creating the Marks based on the reclong and reclat of the dataset and using np.log10
scatters = bqplot.Scatter(x=meteor_landing_new['reclong'], y=meteor_landing_new['reclat'], \
                          color=np.log10(meteor_landing_new['mass']),
                         scales={'x':x_sc, 'y':y_sc, 'color':col_sc})

# Combining everything into a figure to display the scatterplot
fig = bqplot.Figure(marks=[scatters], axes=[x_ax,y_ax,col_ax])
fig

Figure(axes=[Axis(label='Longitude', scale=LinearScale()), Axis(label='Latitude', orientation='vertical', scal…

### Now in the above scatter plot by using np.log10 we can now see how mass value varies for each meteors which is better from earlier where everything was almost maroon when log10 wasn't used.

### Still in above visualization we can't see everything easily. As there might be a lot of overlapping happening between meteor data and for Python also it would take quite alot of time when plotting scatters on top of each other. Hence, the solution to this would be to use a histogram instead of scatter plot which we will do now.

In [26]:
#nlong is bin for longitude
#nlat is bin for latitude
#longmin is minimum value of longitude for our dataset which will be used for filtering data
#longmax is maximum value of longitude for our dataset which will be used for filtering data
#latmin is minimum value of latitude for our dataset which will be used for filtering data
#latmax is maximum value of latitude for our dataset which will be used for filtering data
#We will be taking log=True to show different colors in our histogram


def generate_histogram_from_reclat_reclong(meteor_landing_new, nreclong=20, nreclat=20, reclongmin=-150, reclongmax=150,
                                     reclatmin=-40, reclatmax=70,
                                     takeLog=True):
    reclong_bins = np.linspace(reclongmin, reclongmax, nreclong+1)
    reclat_bins = np.linspace(reclatmin, reclatmax, nreclat+1)
    hist2d, reclong_edges, reclat_edges = np.histogram2d(meteor_landing_new['reclong'], 
                                                   meteor_landing_new['reclat'], 
                                                   weights=meteor_landing_new['mass'],
                                                  bins = [reclong_bins,reclat_bins])
    hist2d = hist2d.T
    if takeLog:
        hist2d[hist2d <= 0] = np.nan # set zeros to NaNs
        # then take log
        hist2d = np.log10(hist2d)
    reclong_centers = (reclong_edges[:-1] + reclong_edges[1:]) / 2
    reclat_centers = (reclat_edges[:-1] + reclat_edges[1:]) / 2
    return hist2d, reclong_centers, reclat_centers, reclong_edges, reclat_edges

In [27]:
hist2d, reclong_centers, reclat_centers, reclong_edges, reclat_edges = generate_histogram_from_reclat_reclong(meteor_landing_new)

In [28]:
#Shape of histogram give 20X20 grid head map
hist2d.shape

(20, 20)

In [29]:
#Center of bins in the histograms
reclong_centers

array([-142.5, -127.5, -112.5,  -97.5,  -82.5,  -67.5,  -52.5,  -37.5,
        -22.5,   -7.5,    7.5,   22.5,   37.5,   52.5,   67.5,   82.5,
         97.5,  112.5,  127.5,  142.5])

In [30]:
# Edges of our histogram, Data is going to be delineated across these edges in our histogram
reclong_edges

array([-150., -135., -120., -105.,  -90.,  -75.,  -60.,  -45.,  -30.,
        -15.,    0.,   15.,   30.,   45.,   60.,   75.,   90.,  105.,
        120.,  135.,  150.])

In [31]:
len(reclong_centers), len(reclong_edges)

(20, 21)

In [32]:
# plot data with grid heat map based on our histogram

# 2. Building the scales for the Grid HeatMap
col_sc = bqplot.ColorScale(scheme='RdYlGn')
x_sc = bqplot.LinearScale()
y_sc = bqplot.LinearScale()

# 3. Building the color, x and y axis for our grid heatmap
col_ax = bqplot.ColorAxis(scale=col_sc, orientation='vertical', side='right')
x_ax = bqplot.Axis(scale=x_sc, label='Longitude')
y_ax = bqplot.Axis(scale=y_sc, orientation='vertical', label='Latitude')

# 4. Creating the marks
heat_map = bqplot.GridHeatMap(color=hist2d,
                             row=reclat_centers,
                             column=reclong_centers, 
                             scales={'color':col_sc,'row':y_sc, 'column':x_sc},
                             interactions={'click':'select'},
                             selected_style={'fill':'cyan'})

fig = bqplot.Figure(marks=[heat_map], axes=[col_ax,x_ax, y_ax])
fig

Figure(axes=[ColorAxis(orientation='vertical', scale=ColorScale(), side='right'), Axis(label='Longitude', scal…

In [33]:
#bqplot.ColorScale?

In [34]:
#Creating our label for interactivity
myLabel = ipywidgets.Label()

In [35]:
# 2. scales
col_sc = bqplot.ColorScale(scheme='RdYlGn')
x_sc = bqplot.LinearScale()   #use of linear scale and not ordinal as latitudes and longitudes are numerical variables
y_sc = bqplot.LinearScale()

# 3. axis
col_ax = bqplot.ColorAxis(scale=col_sc, orientation='vertical', side='right')
x_ax = bqplot.Axis(scale=x_sc, label='Longitude')
y_ax = bqplot.Axis(scale=y_sc, orientation='vertical', label='Latitude')

# 4. marks
heat_map = bqplot.GridHeatMap(color=hist2d,
                             row=reclat_centers,
                             column=reclong_centers, 
                             scales={'color':col_sc,'row':y_sc, 'column':x_sc},
                             interactions={'click':'select'},
                             selected_style={'fill':'cyan'})

fig = bqplot.Figure(marks=[heat_map], axes=[col_ax,x_ax, y_ax])

In [36]:
#What happens to our label when we click on a block of grid heatmap? Let's build that selection
def on_selection(change):
    if len(change['owner'].selected==1): # only allow user to select one grid
        i,j = change['owner'].selected[0] 
        v = hist2d[i,j]                  #getting i,j value from our histogram built above
        myLabel.value = 'Total Mass in log =' + str(v)

In [37]:
#Observating changes is important
heat_map.observe(on_selection,'selected')

In [38]:
#building horizontal box where myLabel would be placed on top and then Grid HeatMap on bottom
myDashboard = ipywidgets.VBox([myLabel,fig])
myDashboard

VBox(children=(Label(value=''), Figure(axes=[ColorAxis(orientation='vertical', scale=ColorScale(), side='right…

In [39]:
import datetime as dt # formatting our data

In [40]:
#Formatting our data based on the year value
#Set the range for year to be 1688 which is the minimum year value in our dataset for x-axis
#Set the range for year to be 2013 which is the maximum year value in our dataset for x-axis

# 2. Building the scales for our scatter plot
#Min year in our dataset is 1688 and maximum year in our dataset is 2013
x_scs = bqplot.DateScale(min=dt.datetime(1688,1,1), max=dt.datetime(2013,1,1))

In [41]:
#Let's check the range for our mass field in the dataset
meteor_landing_new['mass'].min(), meteor_landing_new['mass'].max()

(0.15, 23000000.0)

In [42]:
#Since we have such a big variation in mass. Let's use the log scale to cut down the value for mass on y-axis

In [43]:
y_scs = bqplot.LogScale()

In [44]:
# 3. Building the axis for our Scatter Plot

x_axs = bqplot.Axis(label='Year', scale=x_scs)
y_axs = bqplot.Axis(label='Mass', scale=y_scs, orientation='vertical')

In [45]:
#Let's select one grid of the grid heat map to plot on the scatter plot as 1000 records from sample data would become messy on the plot

i,j = 13, 4 #(This grid has close to 30 data point and would be good for analysis)

In [46]:
#What is the minimum and maximum longtitude for the selected grid on the heatmap?
reclong_edges[j], reclong_edges[j+1]

(-90.0, -75.0)

In [47]:
#What is the minimum and maximum latitude for the selected grid on the heatmap?
reclat_edges[i],reclat_edges[i+1]

(31.5, 37.0)

In [48]:
#Now that we know the min and max of both latitude and longitude. Let's subset our data based on that for analysis
minrecLong, maxrecLong = reclong_edges[j], reclong_edges[j+1] #Setting min and max longitude for analysis
minrecLat, maxrecLat = reclat_edges[i],reclat_edges[i+1]      #Setting min and max latitude for analysis

In [49]:
#Filter our data based on min and max latitude selected in above variables
#where longitude >= minLong and longitude <=maxLong AND latitude >= minLat and latitude <= maxLat
region_mask = (meteor_landing_new['reclong']>=minrecLong) & (meteor_landing_new['reclong']<=maxrecLong) &\
    (meteor_landing_new['reclat']>=minrecLat) & (meteor_landing_new['reclat']<=maxrecLat)

In [50]:
#Most of our data would be false as we are selecting only one grid out of 20X20, so only selected one will be True
region_mask

0      False
1      False
2      False
3      False
4      False
       ...  
995    False
996    False
997    False
998    False
999    False
Length: 948, dtype: bool

In [51]:
#First check the min and max latitude for the region mask created and then all latitude should fall under that
minrecLat, maxrecLat, meteor_landing_new['reclat'][region_mask]

(31.5,
 37.0,
 54     34.75000
 67     35.96667
 68     34.50000
 118    34.16667
 171    36.08333
 175    36.50000
 187    36.16667
 197    35.03333
 207    32.10250
 218    35.63333
 220    36.83333
 225    34.40000
 257    36.40000
 293    35.55000
 297    32.53333
 299    36.10000
 305    36.78333
 307    33.01667
 309    34.48333
 536    34.58333
 557    32.03333
 595    35.80000
 640    35.25000
 646    35.41667
 663    36.60000
 786    35.30000
 799    31.95000
 850    35.03333
 970    33.18836
 Name: reclat, dtype: float64)

### As we can see above all the latitude value is falling between 31.5 and 37.0 which is expected

In [52]:
#Lets see the actual minimum and maximum latitude in our selected region on the grid
meteor_landing_new['reclat'][region_mask].min(), meteor_landing_new['reclat'][region_mask].max()

(31.95, 36.83333)

In [53]:
##Lets see the minimum and maximum year in our masked region on the grid
meteor_landing_new['year'][region_mask].min(), meteor_landing_new['year'][region_mask].max()

(Timestamp('1810-01-01 00:00:00'), Timestamp('1984-01-01 00:00:00'))

In [54]:
##Lets see the minimum and maximum mass in our masked region on the grid
meteor_landing_new['mass'][region_mask].min(), meteor_landing_new['mass'][region_mask].max()

(167.0, 56000.0)

In [55]:
#list mass of all the entries in the masked region
meteor_landing_new['mass'][region_mask]

54       265.0
67      3700.0
68       345.0
118     6000.0
171     7300.0
175     1360.0
187     4300.0
197     8400.0
207     1455.0
218      167.0
220    17000.0
225     2000.0
257     5000.0
293    56000.0
297     3200.0
299      220.0
305     6067.0
307    16300.0
309      650.0
536      877.0
557      340.0
595     1443.0
640     8600.0
646     1880.0
663    12600.0
786     1800.0
799     3760.0
850      668.0
970     5560.0
Name: mass, dtype: float64

In [56]:
#list year of all the entries in the masked region
meteor_landing_new['year'][region_mask]

54    1933-01-01
67    1929-01-01
68    1922-01-01
118   1843-01-01
171   1874-01-01
175   1810-01-01
187   1835-01-01
197   1933-01-01
207   1984-01-01
218   1892-01-01
220   1919-01-01
225   1868-01-01
257   1827-01-01
293   1934-01-01
297   1900-01-01
299   1889-01-01
305   1924-01-01
307   1829-01-01
309   1868-01-01
536   1907-01-01
557   1869-01-01
595   1983-01-01
640   1849-01-01
646   1913-01-01
663   1950-01-01
786   1855-01-01
799   1921-01-01
850   1903-01-01
970   1954-01-01
Name: year, dtype: datetime64[ns]

In [57]:
#To verify our scatter plot, let's pick one mass and year from above and see if the plot is done correctly.
#For example-: let's take min(year) i.e 1810 for that mass is 1360 as we can see above.

In [58]:
mass_scatt = bqplot.Scatter(x=meteor_landing_new['year'][region_mask], y=meteor_landing_new['mass'][region_mask],
                               scales={'x':x_scs, 'y':y_scs})

In [59]:
fig_mass = bqplot.Figure(marks=[mass_scatt],axes=[x_axs,y_axs])
fig_mass

Figure(axes=[Axis(label='Year', scale=DateScale(max=datetime.datetime(2013, 1, 1, 0, 0), min=datetime.datetime…

In [60]:
#We can see above that for year close to 1810, the mass on the y-axis is close to 1400,
#which shows scatter plot is done correctly

In [61]:
#Now that we have build everything, let's put everything together

In [62]:
#### GRID HEAT MAP #####

# 2. scales
col_sc = bqplot.ColorScale(scheme='RdYlGn')
x_sc = bqplot.LinearScale()
y_sc = bqplot.LinearScale()

# 3. axis
col_ax = bqplot.ColorAxis(scale=col_sc, orientation='vertical', side='right')
x_ax = bqplot.Axis(scale=x_sc, label='Longitude')
y_ax = bqplot.Axis(scale=y_sc, orientation='vertical', label='Latitude')

# 4. marks
heat_map = bqplot.GridHeatMap(color=hist2d,
                             row=reclat_centers,
                             column=reclong_centers, 
                             scales={'color':col_sc,'row':y_sc, 'column':x_sc},
                             interactions={'click':'select'},
                             selected_style={'fill':'cyan'})

fig = bqplot.Figure(marks=[heat_map], axes=[col_ax,x_ax, y_ax])

In [63]:
#Getting the label in place as well
myLabel = ipywidgets.Label()

In [64]:
######## SCATTER PLOT ######
#2.scales
x_scs = bqplot.DateScale(min=dt.datetime(1688,1,1), max=dt.datetime(2013,1,1))
y_scs = bqplot.LogScale()

# 3. Axis
x_axs = bqplot.Axis(label='Year', scale=x_scs)
y_axs = bqplot.Axis(label='Mass', scale=y_scs, orientation='vertical')

#Hard coding the value of i and j to start with to get the values from selection of grid

i,j = 13, 4 

minrecLong, maxrecLong = reclong_edges[j], reclong_edges[j+1]
minrecLat, maxrecLat = reclat_edges[i],reclat_edges[i+1]
# want all the data that has reclong >= minrecLong AND reclong <= maxrecLong AND reclat >= minrecLat AND reclat <= maxrecLat
region_mask = (meteor_landing_new['reclong']>=minrecLong) & (meteor_landing_new['reclong']<=maxrecLong) &\
    (meteor_landing_new['reclat']>=minrecLat) & (meteor_landing_new['reclat']<=maxrecLat)

# 4. Marks
mass_scatt = bqplot.Scatter(x=meteor_landing_new['year'][region_mask], y=meteor_landing_new['mass'][region_mask],
                               scales={'x':x_scs, 'y':y_scs})

fig_mass = bqplot.Figure(marks=[mass_scatt],axes=[x_axs,y_axs])

In [65]:
def on_selection(change):
    if len(change['owner'].selected==1): # only allow user to select one grid
        i,j = change['owner'].selected[0]
        v = hist2d[i,j]
        myLabel.value = 'Total Mass =' + str(v) 
        
heat_map.observe(on_selection,'selected')

In [66]:
figures = ipywidgets.HBox([fig,fig_mass]) #Getting together gridheatmap and scatter plot in horizontal box
fig.layout.min_width='400px'      #Setting minium width for Grid heatmap
fig_mass.layout.min_width='400px' #Setting minium width for scatterplot
myDashboard = ipywidgets.VBox([myLabel,figures])
myDashboard

VBox(children=(Label(value=''), HBox(children=(Figure(axes=[ColorAxis(orientation='vertical', scale=ColorScale…

In [67]:
#Till now we have hardcoded the value of selection of i and j and that is why the scatter plot is unaffected even
#after selecting. Let's now go and enable the change of scatter plot based on selecting grid on heatmap

In [68]:
#Lets see what all keys we can use from scatter plot to make changes in scatter plot
mass_scatt.keys

['_model_module',
 '_model_module_version',
 '_model_name',
 '_view_count',
 '_view_module',
 '_view_module_version',
 '_view_name',
 'apply_clip',
 'color',
 'colors',
 'default_size',
 'default_skew',
 'display_legend',
 'display_names',
 'drag_color',
 'drag_size',
 'enable_delete',
 'enable_hover',
 'enable_move',
 'fill',
 'hovered_point',
 'hovered_style',
 'interactions',
 'label_display_horizontal_offset',
 'label_display_vertical_offset',
 'labels',
 'marker',
 'names',
 'names_unique',
 'opacities',
 'opacity',
 'preserve_domain',
 'restrict_x',
 'restrict_y',
 'rotation',
 'scales',
 'scales_metadata',
 'selected',
 'selected_style',
 'size',
 'skew',
 'stroke',
 'stroke_width',
 'tooltip',
 'tooltip_location',
 'tooltip_style',
 'unhovered_style',
 'unselected_style',
 'update_on_move',
 'visible',
 'x',
 'y']

In [69]:
mass_scatt.x

array(['1933-01-01T00:00:00.000000000', '1929-01-01T00:00:00.000000000',
       '1922-01-01T00:00:00.000000000', '1843-01-01T00:00:00.000000000',
       '1874-01-01T00:00:00.000000000', '1810-01-01T00:00:00.000000000',
       '1835-01-01T00:00:00.000000000', '1933-01-01T00:00:00.000000000',
       '1984-01-01T00:00:00.000000000', '1892-01-01T00:00:00.000000000',
       '1919-01-01T00:00:00.000000000', '1868-01-01T00:00:00.000000000',
       '1827-01-01T00:00:00.000000000', '1934-01-01T00:00:00.000000000',
       '1900-01-01T00:00:00.000000000', '1889-01-01T00:00:00.000000000',
       '1924-01-01T00:00:00.000000000', '1829-01-01T00:00:00.000000000',
       '1868-01-01T00:00:00.000000000', '1907-01-01T00:00:00.000000000',
       '1869-01-01T00:00:00.000000000', '1983-01-01T00:00:00.000000000',
       '1849-01-01T00:00:00.000000000', '1913-01-01T00:00:00.000000000',
       '1950-01-01T00:00:00.000000000', '1855-01-01T00:00:00.000000000',
       '1921-01-01T00:00:00.000000000', '1903-01-01

In [70]:
mass_scatt.y

array([  265.,  3700.,   345.,  6000.,  7300.,  1360.,  4300.,  8400.,
        1455.,   167., 17000.,  2000.,  5000., 56000.,  3200.,   220.,
        6067., 16300.,   650.,   877.,   340.,  1443.,  8600.,  1880.,
       12600.,  1800.,  3760.,   668.,  5560.])

In [71]:
#We should be able to update the x and y of the scatter plot based on our selection of gridheatmap

In [72]:
#### GRID HEAT MAP #####

# 2. scales
col_sc = bqplot.ColorScale(scheme='RdYlGn')
x_sc = bqplot.LinearScale()
y_sc = bqplot.LinearScale()

# 3. axis
col_ax = bqplot.ColorAxis(scale=col_sc, orientation='vertical', side='right')
x_ax = bqplot.Axis(scale=x_sc, label='Longitude')
y_ax = bqplot.Axis(scale=y_sc, orientation='vertical', label='Latitude')

# 4. marks
heat_map = bqplot.GridHeatMap(color=hist2d,
                             row=reclat_centers,
                             column=reclong_centers, 
                             scales={'color':col_sc,'row':y_sc, 'column':x_sc},
                             interactions={'click':'select'},
                             selected_style={'fill':'cyan'})

fig = bqplot.Figure(marks=[heat_map], axes=[col_ax,x_ax, y_ax])

In [73]:
myLabel = ipywidgets.Label()

In [74]:
######## SCATTER PLOT ######
#2.scales
x_scs = bqplot.DateScale(min=dt.datetime(1688,1,1), max=dt.datetime(2013,1,1))
y_scs = bqplot.LogScale()

# 3. Axis
x_axs = bqplot.Axis(label='Year', scale=x_scs)
y_axs = bqplot.Axis(label='Mass', scale=y_scs, orientation='vertical')

i,j = 13, 4 # this has a lot of data (I think!)
minrecLong, maxrecLong = reclong_edges[j], reclong_edges[j+1]
minrecLat, maxrecLat = reclat_edges[i],reclat_edges[i+1]
# want all the data that has reclong >= minrecLong AND reclong <= maxrecLong AND reclat >= minrecLat AND reclat <= maxrecLat
region_mask = (meteor_landing_new['reclong']>=minrecLong) & (meteor_landing_new['reclong']<=maxrecLong) &\
    (meteor_landing_new['reclat']>=minrecLat) & (meteor_landing_new['reclat']<=maxrecLat)

# 4. Marks
mass_scatt = bqplot.Scatter(x=meteor_landing_new['year'][region_mask], y=meteor_landing_new['mass'][region_mask],
                               scales={'x':x_scs, 'y':y_scs})

fig_mass = bqplot.Figure(marks=[mass_scatt],axes=[x_axs,y_axs])

In [75]:
def on_selection(change):
    if len(change['owner'].selected==1): # only allow user to select one grid
        i,j = change['owner'].selected[0]
        v = hist2d[i,j]
        myLabel.value = 'Total mass =' + str(v)
        # update mask based on new i,j combo
        minrecLong, maxrecLong = reclong_edges[j], reclong_edges[j+1]
        minrecLat, maxrecLat = reclat_edges[i],reclat_edges[i+1]
        # want all the data that has reclong >= minrecLong AND reclong <= maxrecLong AND reclat >= minrecLat AND reclat <= maxrecLat
        region_mask = (meteor_landing_new['reclong']>=minrecLong) & (meteor_landing_new['reclong']<=maxrecLong) &\
            (meteor_landing_new['reclat']>=minrecLat) & (meteor_landing_new['reclat']<=maxrecLat)
        # update x/y based on new selection
        mass_scatt.x = meteor_landing_new['year'][region_mask]
        mass_scatt.y = meteor_landing_new['mass'][region_mask]
        
heat_map.observe(on_selection,'selected')

In [76]:
figures = ipywidgets.HBox([fig,fig_mass])
fig.layout.min_width='400px'
fig_mass.layout.min_width='400px'
myDashboard = ipywidgets.VBox([myLabel,figures])
myDashboard

VBox(children=(Label(value=''), HBox(children=(Figure(axes=[ColorAxis(orientation='vertical', scale=ColorScale…

### One paragraph explaining how to use the dashboard you created, to help someone who is not an expert understand your dataset.

So, now that the dashboard has been developed, the end-user who will be using it for analytical purposes only has to choose a grid from the grid heat map. This selection will filter a particular latitude and longitude for our analysis purposes and based on this the scatter plot will be created for that latitude and longitude. The grid heatmap on the left side has a legend on the right side which starts from 1 and end with 7 with colors varying from red->yellow->green. More red means the total mass of meteorites for that selected bin is lesser and as the grid colors changes from yellow to green the total mass of the meteorites increases for that selection. To make dashboard interactive, when a user selects the grid it's coded to be changed to 'cyan' color, this will help the user to know which particular grid is in sleection as that current point of time. The scatter plot which can be seen on the right side of the dashboard plots individual mass of the meteorites for that selected latitudes and longitudes and also tells us around which year these meteorites had fallen.

### As it becomes time-taking activity for python to plot each individual plots on the scatter plot and also take care of the overlapping of scatter. It's better to construct histograms instead of scatter plots for analysing this data
### Let's go ahead and build the same grid heat map but now instead of scatter plot we will construct histograms for showing the total mass of meteroites for that year

# **************************************************************************************** #

In [77]:
#### GRID HEAT MAP #####

# 2. scales
col_sc = bqplot.ColorScale(scheme='RdYlGn')
x_sc = bqplot.LinearScale()
y_sc = bqplot.LinearScale()

# 3. axis
col_ax = bqplot.ColorAxis(scale=col_sc, orientation='vertical', side='right')
x_ax = bqplot.Axis(scale=x_sc, label='Longitude')
y_ax = bqplot.Axis(scale=y_sc, orientation='vertical', label='Latitude')

# 4. marks
heat_map = bqplot.GridHeatMap(color=hist2d,
                             row=reclat_centers,
                             column=reclong_centers, 
                             scales={'color':col_sc,'row':y_sc, 'column':x_sc},
                             interactions={'click':'select'},
                             selected_style={'fill':'cyan'})

fig = bqplot.Figure(marks=[heat_map], axes=[col_ax,x_ax, y_ax])

In [78]:
myLabel = ipywidgets.Label()

In [79]:
####### BAR PLOT ######
# for each date bin -- count up the TOTAL duration in seconds (across all observations)
# 2. scales
x_scb = bqplot.LinearScale()
y_scb = bqplot.LinearScale()
# 3. axis 
x_axb = bqplot.Axis(label='Year', scale=x_scb)
y_axb = bqplot.Axis(label='Total Mass', scale=y_scb, orientation='vertical')

# masking
i,j = 13, 4 # this has a lot of data (I think!)
minrecLong, maxrecLong = reclong_edges[j], reclong_edges[j+1]
minrecLat, maxrecLat = reclat_edges[i],reclat_edges[i+1]
# want all the data that has reclong >= minrecLong AND reclong <= maxrecLong AND reclat >= minrecLat AND reclat <= maxrecLat
region_mask = (meteor_landing_new['reclong']>=minrecLong) & (meteor_landing_new['reclong']<=maxrecLong) &\
    (meteor_landing_new['reclat']>=minrecLat) & (meteor_landing_new['reclat']<=maxrecLat)

In [80]:
meteor_landing_new['year']

0     1880-01-01
1     1951-01-01
2     1952-01-01
3     1976-01-01
4     1902-01-01
         ...    
995   1934-01-01
996   2011-01-01
997   1869-01-01
998   1922-01-01
999   1905-01-01
Name: year, Length: 948, dtype: datetime64[ns]

In [81]:
meteor_landing_new['year'].dt.year

0      1880
1      1951
2      1952
3      1976
4      1902
       ... 
995    1934
996    2011
997    1869
998    1922
999    1905
Name: year, Length: 948, dtype: int64

In [82]:
meteor_landing_new['ext_year'] = meteor_landing_new['year'].dt.year # save year from datetime format in new column ext_year

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [83]:
meteor_landing_new['ext_year']

0      1880
1      1951
2      1952
3      1976
4      1902
       ... 
995    1934
996    2011
997    1869
998    1922
999    1905
Name: ext_year, Length: 948, dtype: int64

In [84]:
# histogram the mass of meteorites per year
#Building 1-D histograms
mass, mass_edges = np.histogram(meteor_landing_new['ext_year'][region_mask], 
                             weights=meteor_landing_new['mass'][region_mask], bins=10)

In [85]:
#Lets look at the mass value now
mass

array([ 6360., 26600., 10400., 10290.,   387.,  6625., 30872., 64665.,
       18160.,  2898.])

In [86]:
#Lets look at the edges now
mass_edges

array([1810. , 1827.4, 1844.8, 1862.2, 1879.6, 1897. , 1914.4, 1931.8,
       1949.2, 1966.6, 1984. ])

In [87]:
#Length of mass and length of edges
len(mass), len(mass_edges)

(10, 11)

In [88]:
#Lets get the center for each bins of the histogram
mass_centers = (mass_edges[:-1] + mass_edges[1:])/2

In [89]:
mass_centers

array([1818.7, 1836.1, 1853.5, 1870.9, 1888.3, 1905.7, 1923.1, 1940.5,
       1957.9, 1975.3])

In [90]:
mass_hist = bqplot.Bars(x=mass_centers, y=mass, scales={'x':x_scb, 'y':y_scb})

In [91]:
#bqplot.Bars?

In [92]:
fig_bars = bqplot.Figure(marks=[mass_hist],axes=[x_axb, y_axb], title='Year of Meteor Landing and Total Mass')
#fig_bars

In [93]:
mass_hist.keys

['_model_module',
 '_model_module_version',
 '_model_name',
 '_view_count',
 '_view_module',
 '_view_module_version',
 '_view_name',
 'align',
 'apply_clip',
 'base',
 'color',
 'color_mode',
 'colors',
 'display_legend',
 'enable_hover',
 'fill',
 'interactions',
 'label_display',
 'label_display_format',
 'label_display_horizontal_offset',
 'label_display_vertical_offset',
 'label_font_style',
 'labels',
 'opacities',
 'opacity_mode',
 'orientation',
 'padding',
 'preserve_domain',
 'scales',
 'scales_metadata',
 'selected',
 'selected_style',
 'stroke',
 'stroke_width',
 'tooltip',
 'tooltip_location',
 'tooltip_style',
 'type',
 'unselected_style',
 'visible',
 'x',
 'y']

In [94]:
def on_selection(change):
    if len(change['owner'].selected==1): # only allow user to select one grid
        i,j = change['owner'].selected[0]
        v = hist2d[i,j]
        myLabel.value = 'Total mass in log=' + str(v)
        # update mask based on new i,j combo
        minrecLong, maxrecLong = reclong_edges[j], reclong_edges[j+1]
        minrecLat, maxrecLat = reclat_edges[i],reclat_edges[i+1]
        # want all the data that has reclong >= minrecLong AND reclong <= maxrecLong AND reclat >= minrecLat AND reclat <= maxrecLat
        region_mask = (meteor_landing_new['reclong']>=minrecLong) & (meteor_landing_new['reclong']<=maxrecLong) &\
            (meteor_landing_new['reclat']>=minrecLat) & (meteor_landing_new['reclat']<=maxrecLat)
        # update x/y based on new selection
        mass, mass_edges = np.histogram(meteor_landing_new['ext_year'][region_mask], 
                                     weights=meteor_landing_new['mass'][region_mask], bins=10) 
        mass_centers = (mass_edges[:-1] + mass_edges[1:])/2 #Finding out the center for our histogram
        mass_hist.x=mass_centers
        mass_hist.y=mass
        
heat_map.observe(on_selection,'selected')

In [95]:
figures = ipywidgets.HBox([fig,fig_bars])
fig.layout.min_width='400px'
fig_bars.layout.min_width='600px'
dashboard = ipywidgets.VBox([myLabel,figures])

In [96]:
dashboard

VBox(children=(Label(value=''), HBox(children=(Figure(axes=[ColorAxis(orientation='vertical', scale=ColorScale…

### One paragraph explaining how to use the dashboard you created, to help someone who is not an expert understand your dataset.

So now that we have built the dataset, the end-user who is going to used this dashboard for analysis purposes has to simply select a grid from the grid heat map. This selection will filter a particular latitide and longitude for the histogram to plot the total mass for that group of years binned together. The grid heatmap on the left side has a legend on the right side which starts from 1 and end with 7 with colors varying from red->yellow->green. More red means the total mass of meteorites for that selected bin is lesser and as the grid colors changes from yellow to green the total mass of the meteorites increases for that selection. To make dashboard interactive, when a user selects the grid it's coded to be changed to 'cyan' color, this will help the user to know which particular grid is in sleection as that current point of time. The histogram which we will see on that right hand side will have the sum of mass of meteorites for that binned years together.

### A list of 1 or more contextual datasets you have identified, links to where they reside, and a sentence about why they might be useful in telling the final story.

Now that we have analysed the meteorite landings dataset, it would be interesting to see if there has been any effect on earth's temperature because of the frequent meteroite landings.

In the dataset which we can find on climate.nasa.gov website we have 3 fields - first one being the year, second one is the global temperature increase since 1880 till 2021. Since, in our meteorite landings dataset we have data from 1688 to 2013, it would be interesting to see how the earth's global temperature has increase/decreased with respect to the meteorite landings. We can club total number of meteroite landings in group of decades and also take average of increase and decrease in temperature for each decade and see if there is a direct relationship between the two. This will help us to understand the story and relationship between the two.

Main dataset description page
https://climate.nasa.gov/vital-signs/global-temperature/

Dataset page
https://data.giss.nasa.gov/gistemp/graphs/graph_data/Global_Mean_Estimates_based_on_Land_and_Ocean_Data/graph.txt

### References
https://data.nasa.gov/resource/gh4g-9sfh.json

https://www.w3schools.com/python/matplotlib_pie_charts.asp

https://data.nasa.gov/resource/gh4g-9sfh.json

https://www.nasa.gov/about/highlights/HP_Privacy.html

https://realpython.com/visualizing-python-plt-scatter/

https://realpython.com/python-histograms/ 


In-class notebook and Prep notebook for week 7 was also reference to build the interactivity between the grid heatmap and scatterplot and bar graph.

https://uiuc-ischool-dataviz.github.io/is445_spring2022/nbv.html?notebook_name=%2Fis445_spring2022%2Fweek07%2FinClass_week07.ipynb

https://uiuc-ischool-dataviz.github.io/is445_spring2022/nbv.html?notebook_name=%2Fis445_spring2022%2Fweek07%2Fprep_notebook_week07_part1.ipynb

https://uiuc-ischool-dataviz.github.io/is445_spring2022/nbv.html?notebook_name=%2Fis445_spring2022%2Fweek07%2Fprep_notebook_week07_part2.ipynb