# *Introduction*

The goal of this notebook is to show the paleontology collection at the Museum of Nature. Within this notebook I hope to show you not only the number of their collections but also the number of subspecies they have and where it was collected from. By discovering this information, it shows a glimpse into how involved the Museum of Nature is when collecting palaeontological specimen. 

In running each cell, the viewer will be able to see a sample scatterplot on the number of fossils within the collection, as well as get an idea on the amount of fossils which have been catalogued within a given year. The viewer will also explore bar graphs showing the different kingdom groups and subspecies within the collection. Finally the viewer will learn where the fossils came from, who identified them and when the fossils were identified.

In running these cells, the viewer will have a better understanding of the different fossil species within the Museum of Nature's collection and have a general idea of how many of each species have been collected.

To begin, click on the In []: and press Shift + Enter. Then wait for a number to appear in the brackets before clicking on the next cell.

In [None]:
!pip install pandas

In [None]:
!pip install plotly

In [None]:
import pandas as pd
df = pd.read_csv("nature-for-kate.txt")

In [None]:
print('Hi! What is your name?')
user_name = input()
print(f'Pleased to meet you {user_name}!')

In [None]:
print(f'Ok {user_name}, Lets take a look at the fossil collection within the Museum of Nature')

## Data

### Examining the imported data 

We are going to start off by examing the data we just imported and we are going to look through the data to see what information is currently available for us to work with. We are then going to examine the different column headers to figure out what key words we can use in our search queries. 

In [None]:
# let's take a look at what information is available for us to look at 
df 

In [None]:
# let's see what all the column headers are 
df.columns.tolist()

### Scatterplot with sample data

Now that we have a general idea of what key words we can use, were now going to take that information to ask the data to conduct some sample scatterplots. First we will ask the data to show us a sample of how many artifacts there are within the museum, and then we will look at a sample of how many of those artifacts have already been recorded into their data. 

In [None]:
# let's see what a scatterplot sample can reveal regarding how many fossils are within the Museum of Nature
# We are going to use the 'year' and the 'individualCount' collumn headers to give us some results.

import pandas as pd
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import ColumnDataSource
from bokeh.models.tools import HoverTool
output_notebook()

sample = df.sample(3000) 
source = ColumnDataSource(sample)

p = figure()
p.circle(x='year', y='individualCount', 
         source=source,
         size=10, color='green')

p.title.text = 'Fossils in the Museum of Nature'
p.xaxis.axis_label = 'Year'
p.yaxis.axis_label = 'Amount of Fossils in the Collection'  

hover = HoverTool()
hover.tooltips=[
         ('Genus','@genus'),
         ('Species', '@species'), 
         ('Class', '@class'),
         ('Family', '@family'),
]
p.add_tools(hover)

show(p)

### Scatterplot with sample data on recorded artifacts

In [None]:
# Let's see what a scatterplot can reveal on how many species were recorded into the collection within a given year
# Let's ask the data to run a sample of 3000 artifacts with the column headers year and record number

import pandas as pd
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import ColumnDataSource
from bokeh.models.tools import HoverTool
output_notebook()

sample = df.sample(3000) 
source = ColumnDataSource(sample)

p = figure()
p.circle(x='year', y='recordNumber', 
         source=source,
         size=10, color='blue')

p.title.text = 'Number of Fossils Already Recorded within a Given Year in the Museum of Nature'
p.xaxis.axis_label = 'Year'
p.yaxis.axis_label = 'Amount of Species Recorded'  

hover = HoverTool()
hover.tooltips=[
         ('Genus','@genus'),
         ('Species', '@species'), 
         ('Record Number', '@recordNumber'),
]
p.add_tools(hover)

show(p)

### Bar graph showing the different species

Since we looked at a sample of their collection, why don't we now look at their entire collection of fossils. We will start off our search by looking at the different kingodm categories. Then we will look at the different phylums. After that we will zoom in further, and look at the genus. Lastly we will look at the different scientific names of these fossils.

In [None]:
# Now let's see what a bar graph can show us regarding the different kingdom species within the collection

import plotly.express as px
fig = px.histogram(df, x="kingdom")
fig.show()

In [None]:
# What about if we change the bar graph to show phylum instead. Lets see what it shows

import plotly.express as px
fig = px.histogram(df, x="phylum")
fig.show()

In [None]:
# How about if we look at the genus this time

import plotly.express as px
fig = px.histogram(df, x="genus")
fig.show()

In [None]:
# How about if we look at the scientific names

import plotly.express as px
fig = px.histogram(df, x="verbatimScientificName")
fig.show()

### Fossil Origination

Finally, were going to examine the overall data on these fossils. 
We will start off by looking at where the fossils came from. 
Then we will look at who identified them. 
Lastly, we will look at when these fossils were identified.

In [None]:
# Now let's take a look at where these fossils came from

import plotly.express as px
fig = px.histogram(df, x="stateProvince")
fig.show()

In [None]:
# How about we ask the data to tell us who identified these fossils

import plotly.express as px
fig = px.histogram(df, x="identifiedBy")
fig.show()

In [None]:
# Finally let's ask the data to show us when the fossils were identified

import plotly.express as px
fig = px.histogram(df, x="dateIdentified")
fig.show()

## Conclusion

Hopefully now, by going through the data and running each cell, you now have a better idea and understanding of what kinds of fossils the Museum of Nature has in their collection, the different species they posses and where the fossils came from. 

After getting a general idea of the various search criterias that were available, you might even have a few more ideas of your own to try and filter through to see what other resuts might appear. 

Should you wish to conduct some more research on your own, there is a list of sources below containing the data which was used in this notebook.

## References

Shepherd K, Torgersen J (2021). Canadian Museum of Nature Fossil Vertebrate Collection. Version 1.7. Canadian Museum of Nature. Occurrence dataset https://doi.org/10.15468/blxvml accessed via GBIF.org on 2021-04-12.

Shepherd K, Shorthouse D (2021). Canadian Museum of Nature Fossil Invertebrate Collection. Version 1.7. Canadian Museum of Nature. Occurrence dataset https://doi.org/10.15468/sh5u7g accessed via GBIF.org on 2021-04-12.

## Further Readings

[Link to Fantastic Fossils: A Guide to Finding and Identifying Prehistoric Life](https://www.jstor.org/stable/10.7312/prot19578)

[Link to Molecuar and Fossil Dating:A Compatible Match?](https://www.jstor.org/stable/41406189)

[Link to Using Observations of Fossils to Reconstruct Ancient Environments](https://www.jstor.org/stable/43691333)
