# Making a GPlates feature collection from PBDB

The paleobiology database is a wonderful online resource containing a rich set of information on Earth's fossil record. The main webiste is here:  
https://paleobiodb.org/#/

This notebook shows how to relatively simply , use this link to go directly to the page where you can formulate a specific http request that selects data based on your chosen criteria:  
https://paleobiodb.org/classic/displayDownloadGenerator

Requirements:  
The process below requires a python module called 'requests', which is freely available and can be installed using pip for example. 


### Part 1 - Get the Fossil Data

In [6]:
import requests
from io import StringIO
import pandas as pd
import pygplates

# this is the string that defines the request - you can get this string from the pbdb download generator
# listed above
# note.1, assumes that you select 'csv' as your output format
# note.2, the download generator will also control which fields are available in the output. This
#       example is fairly minimal, many more fields are available
url = 'https://paleobiodb.org/data1.2/occs/list.csv?base_name=Bryozoa&max_ma=200&min_ma=0&show=coords'

# send the request to the server - the entire output is contained within the object 'r'
r = requests.get(url)

# uncomment this line to see the entire output message and data
#print r.text

# this line reads the text part of the output (in this case, csv-formatted text) into a pandas dataframe.
# note that the 'StringIO' is necessary because pandas is used to reading files - r.text is not a file,
# but 'StringIO(r.text)' makes the data readable by pandas 
df = pd.read_csv(StringIO(r.text))

# print the columns in the data table to see what we have
df.columns


Index([u'occurrence_no', u'record_type', u'reid_no', u'flags',
       u'collection_no', u'identified_name', u'identified_rank',
       u'identified_no', u'difference', u'accepted_name', u'accepted_rank',
       u'accepted_no', u'early_interval', u'late_interval', u'max_ma',
       u'min_ma', u'reference_no', u'lng', u'lat'],
      dtype='object')

### Part 2 - Generate a GPlates Feature Collection

In [7]:
# put the points into a feature collection, using Lat,Long coordinates from dataframe
point_features = []
for index,row in df.iterrows():
    point = pygplates.PointOnSphere(float(row.lat),float(row.lng))
    point_feature = pygplates.Feature()
    point_feature.set_geometry(point)
    point_feature.set_valid_time(row.max_ma,row.min_ma)
    point_features.append(point_feature)

    

### Part 3 (optional) - assign plate ids based on an existing reconstruction model

In [8]:
# static polygons are the 'partitioning features'
static_polygons = pygplates.FeatureCollection('../Data/Seton_etal_ESR2012_StaticPolygons_2012.1.gpmlz')

# The partition_into_plates function requires a rotation model, since sometimes this would be
# necessary even at present day (for example to resolve topological polygons)
rotation_model=pygplates.RotationModel('../Data/Seton_etal_ESR2012_2012.1.rot')
    
# The partition points function can then be used as before
partitioned_point_features = pygplates.partition_into_plates(static_polygons,
                                                       rotation_model,
                                                       point_features)


### Part 4 - Write the features to a file

In [12]:
output_features = pygplates.FeatureCollection(partitioned_point_features)

# Note that the output format could be shp, gmt, gpml or gpmlz - the extension controls the format
output_features.write('FossilData.gpmlz')
