# Section 1 Preliminary Data Exploration with Pandas

### Description of the dataset

Compiled occurrence records for prey items of listed species found in California drylands with associated environmental data. This is a synthesized dataset from 1945 to 2022 containing non-sensitive data. There is an associated metadata file. 

*citation:* Rachel King, Jenna Braun, Michael Westphal, & CJ Lortie. (2023). Compiled occurrence records for prey items of listed species found in California drylands with associated environmental data. Knowledge Network for Biocomplexity. doi:10.5063/F1VM49RH.

Date of access: 10/3/24

[link to data](https://knb.ecoinformatics.org/view/doi%3A10.5063%2FF1VM49RH)

In [1]:
import pandas as pd

## 2. Metadata Exploration

In [3]:
# Access metadata from repository
pd.read_csv("https://knb.ecoinformatics.org/knb/d1/mn/v2/object/urn%3Auuid%3A3baf7289-bf90-4db3-ad11-58785c09b26e")

Unnamed: 0,attribute,description
0,class,taxonomic class of observed organism
1,order,taxonomic order of observed organism
2,family,taxonomic family of observed organism
3,genus,genus name
4,species,full latin binomial name
5,lat,latitude of occurrence record in decimal degrees
6,long,longitude of occurrence record in decimal degrees
7,eventDate,date and time of observation in YYYY-MM-DD HH:...
8,day,day of the month of observation
9,month,month of observation


## 3. Data loading

In [5]:
# Load data
prey = pd.read_csv('https://knb.ecoinformatics.org/knb/d1/mn/v2/object/urn%3Auuid%3A23d42528-1048-45d4-85d1-7e13b666e744')

In [6]:
type(prey)

pandas.core.frame.DataFrame

## 4. Look at your data

In [8]:
# See all columns when viewing a dataframe
pd.set_option("display.max.columns", None)

In [9]:
prey

Unnamed: 0,class,order,family,genus,species,lat,long,eventDate,day,month,year,institutionCode,coordinateUncertaintyInMeters,basisOfRecord,individualCount,datasetKey,ID,bio1,bio10,bio11,bio12,bio13,bio14,bio15,bio16,bio17,bio18,bio19,bio2,bio3,bio4,bio5,bio6,bio7,bio8,bio9,dist,dem90_hf,lc_class,ndvi
0,Insecta,Odonata,Libellulidae,Sympetrum,Sympetrum corruptum,37.173971,-121.856329,2020-09-29T08:24:00Z,29.0,9.0,2020,iNaturalist,684.0,HUMAN_OBSERVATION,,50c9509d-22c7-4a22-a47d-8c48425ef4a7,1,141.925064,199.384964,89.602631,727.380493,151.920853,1.809903,89.132339,391.141083,7.979710,13.979711,391.141083,139.787125,53.000000,4352.049805,289.341888,27.307756,262.034149,89.602631,196.067307,8.935099,352.799927,grassland,6343.969238
1,Malacostraca,Isopoda,Armadillidiidae,Armadillidium,Armadillidium vulgare,37.688580,-122.436608,2021-12-04T14:49:59Z,4.0,12.0,2021,iNaturalist,6.0,HUMAN_OBSERVATION,,50c9509d-22c7-4a22-a47d-8c48425ef4a7,2,136.366943,165.841537,104.141464,609.651916,126.271736,1.001933,87.368050,318.303192,8.236301,12.390024,318.301269,83.996452,53.266750,2455.836914,217.470047,61.847702,155.622360,107.227386,159.852875,2.005335,112.895271,grassland,4995.110352
2,Insecta,Hymenoptera,Apidae,Bombus,Bombus vosnesenskii,33.774911,-116.679713,2021-06-18T08:52:00Z,18.0,6.0,2021,iNaturalist,31.0,HUMAN_OBSERVATION,,50c9509d-22c7-4a22-a47d-8c48425ef4a7,3,82.885788,161.338638,19.747343,724.200195,120.836441,4.910511,70.099121,353.376068,49.060783,77.778824,342.061676,144.624817,47.098640,5776.527344,254.696030,-47.440784,302.136810,20.866125,134.750717,4.422367,2315.607666,grassland,5118.055664
3,Insecta,Hymenoptera,Apidae,Apis,Apis mellifera,32.848001,-117.050126,2016-04-17T09:28:00Z,17.0,4.0,2016,iNaturalist,15.0,HUMAN_OBSERVATION,,50c9509d-22c7-4a22-a47d-8c48425ef4a7,4,171.959030,221.238907,130.363754,323.523682,62.686253,1.000000,87.075172,179.273834,4.989321,9.743295,163.154465,118.763413,53.612831,3539.726807,286.711762,65.920944,220.790817,133.117722,213.238907,4.316342,199.882156,closed_shrubland,3649.374756
4,Insecta,Hemiptera,Lygaeidae,Oncopeltus,Oncopeltus fasciatus,32.739453,-117.133980,2019-04-06T18:44:31Z,6.0,4.0,2019,iNaturalist,16.0,HUMAN_OBSERVATION,,50c9509d-22c7-4a22-a47d-8c48425ef4a7,5,171.996231,214.353073,136.138779,273.203491,58.141060,1.000000,91.661606,162.390411,4.000000,7.109778,142.560562,91.184349,50.718666,3033.009521,259.712738,81.205818,178.506927,138.277557,205.749908,0.651661,72.789574,urban,3468.256104
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
111151,Insecta,Hymenoptera,Formicidae,Pogonomyrmex,Pogonomyrmex californicus,33.660853,-115.721380,1980-04-10T00:00:00Z,10.0,4.0,1980,MCZ,3036.0,PRESERVED_SPECIMEN,1.0,4bfac3ea-8763-4f4b-a71a-76a6f5f243d3,111152,200.539291,297.044373,111.539291,131.447403,21.313471,0.464611,50.358723,45.637127,7.404018,33.626942,44.287857,158.105041,43.560184,7319.267090,393.674744,34.807846,358.866913,217.827911,227.579758,0.925702,548.778870,barren,1187.127441
111152,Insecta,Hymenoptera,Formicidae,Pogonomyrmex,Pogonomyrmex californicus,35.933333,-116.716667,1998-03-26T00:00:00Z,26.0,3.0,1998,MCZ,,PRESERVED_SPECIMEN,2.0,4bfac3ea-8763-4f4b-a71a-76a6f5f243d3,111153,230.166946,344.166931,117.216942,49.012608,8.380343,1.000000,50.700531,21.778294,4.391233,9.430344,19.810688,163.872391,39.000000,8878.634766,442.558167,26.608175,415.950012,141.977646,273.793304,7.429875,-13.553053,grassland,792.632141
111153,Insecta,Hymenoptera,Formicidae,Liometopum,Liometopum occidentale,34.156833,-118.726833,2010-05-21T00:00:00Z,21.0,5.0,2010,MCZ,,PRESERVED_SPECIMEN,40.0,4bfac3ea-8763-4f4b-a71a-76a6f5f243d3,111154,161.271088,216.747421,116.372208,499.779816,106.116470,0.000000,96.797958,289.033630,4.470360,11.920360,279.654663,128.898209,53.252625,3962.391846,291.092560,51.291302,239.801269,118.917351,209.822205,2.387915,347.122986,grassland,4219.497559
111154,Insecta,Hymenoptera,Formicidae,Pogonomyrmex,Pogonomyrmex californicus,32.989167,-116.658500,2010-05-25T00:00:00Z,25.0,5.0,2010,MCZ,,PRESERVED_SPECIMEN,30.0,4bfac3ea-8763-4f4b-a71a-76a6f5f243d3,111155,140.342300,217.269943,76.576256,559.577881,102.406273,3.000000,78.449997,293.045471,25.936378,36.503853,270.567810,154.063629,49.683960,5537.141602,315.901031,8.395161,307.505859,83.576256,199.928528,6.917791,981.674377,grassland,4865.626465


## 5. pd.DataFrame preliminary exploration

head()
tail()
info()
nunique()

In [12]:
# head() let's you view the first 5 rows of a dataframe
prey.head()

Unnamed: 0,class,order,family,genus,species,lat,long,eventDate,day,month,year,institutionCode,coordinateUncertaintyInMeters,basisOfRecord,individualCount,datasetKey,ID,bio1,bio10,bio11,bio12,bio13,bio14,bio15,bio16,bio17,bio18,bio19,bio2,bio3,bio4,bio5,bio6,bio7,bio8,bio9,dist,dem90_hf,lc_class,ndvi
0,Insecta,Odonata,Libellulidae,Sympetrum,Sympetrum corruptum,37.173971,-121.856329,2020-09-29T08:24:00Z,29.0,9.0,2020,iNaturalist,684.0,HUMAN_OBSERVATION,,50c9509d-22c7-4a22-a47d-8c48425ef4a7,1,141.925064,199.384964,89.602631,727.380493,151.920853,1.809903,89.132339,391.141083,7.97971,13.979711,391.141083,139.787125,53.0,4352.049805,289.341888,27.307756,262.034149,89.602631,196.067307,8.935099,352.799927,grassland,6343.969238
1,Malacostraca,Isopoda,Armadillidiidae,Armadillidium,Armadillidium vulgare,37.68858,-122.436608,2021-12-04T14:49:59Z,4.0,12.0,2021,iNaturalist,6.0,HUMAN_OBSERVATION,,50c9509d-22c7-4a22-a47d-8c48425ef4a7,2,136.366943,165.841537,104.141464,609.651916,126.271736,1.001933,87.36805,318.303192,8.236301,12.390024,318.301269,83.996452,53.26675,2455.836914,217.470047,61.847702,155.62236,107.227386,159.852875,2.005335,112.895271,grassland,4995.110352
2,Insecta,Hymenoptera,Apidae,Bombus,Bombus vosnesenskii,33.774911,-116.679713,2021-06-18T08:52:00Z,18.0,6.0,2021,iNaturalist,31.0,HUMAN_OBSERVATION,,50c9509d-22c7-4a22-a47d-8c48425ef4a7,3,82.885788,161.338638,19.747343,724.200195,120.836441,4.910511,70.099121,353.376068,49.060783,77.778824,342.061676,144.624817,47.09864,5776.527344,254.69603,-47.440784,302.13681,20.866125,134.750717,4.422367,2315.607666,grassland,5118.055664
3,Insecta,Hymenoptera,Apidae,Apis,Apis mellifera,32.848001,-117.050126,2016-04-17T09:28:00Z,17.0,4.0,2016,iNaturalist,15.0,HUMAN_OBSERVATION,,50c9509d-22c7-4a22-a47d-8c48425ef4a7,4,171.95903,221.238907,130.363754,323.523682,62.686253,1.0,87.075172,179.273834,4.989321,9.743295,163.154465,118.763413,53.612831,3539.726807,286.711762,65.920944,220.790817,133.117722,213.238907,4.316342,199.882156,closed_shrubland,3649.374756
4,Insecta,Hemiptera,Lygaeidae,Oncopeltus,Oncopeltus fasciatus,32.739453,-117.13398,2019-04-06T18:44:31Z,6.0,4.0,2019,iNaturalist,16.0,HUMAN_OBSERVATION,,50c9509d-22c7-4a22-a47d-8c48425ef4a7,5,171.996231,214.353073,136.138779,273.203491,58.14106,1.0,91.661606,162.390411,4.0,7.109778,142.560562,91.184349,50.718666,3033.009521,259.712738,81.205818,178.506927,138.277557,205.749908,0.651661,72.789574,urban,3468.256104


In [13]:
# tail() lets you view the last 5 rows of a dataframe
prey.tail()

Unnamed: 0,class,order,family,genus,species,lat,long,eventDate,day,month,year,institutionCode,coordinateUncertaintyInMeters,basisOfRecord,individualCount,datasetKey,ID,bio1,bio10,bio11,bio12,bio13,bio14,bio15,bio16,bio17,bio18,bio19,bio2,bio3,bio4,bio5,bio6,bio7,bio8,bio9,dist,dem90_hf,lc_class,ndvi
111151,Insecta,Hymenoptera,Formicidae,Pogonomyrmex,Pogonomyrmex californicus,33.660853,-115.72138,1980-04-10T00:00:00Z,10.0,4.0,1980,MCZ,3036.0,PRESERVED_SPECIMEN,1.0,4bfac3ea-8763-4f4b-a71a-76a6f5f243d3,111152,200.539291,297.044373,111.539291,131.447403,21.313471,0.464611,50.358723,45.637127,7.404018,33.626942,44.287857,158.105041,43.560184,7319.26709,393.674744,34.807846,358.866913,217.827911,227.579758,0.925702,548.77887,barren,1187.127441
111152,Insecta,Hymenoptera,Formicidae,Pogonomyrmex,Pogonomyrmex californicus,35.933333,-116.716667,1998-03-26T00:00:00Z,26.0,3.0,1998,MCZ,,PRESERVED_SPECIMEN,2.0,4bfac3ea-8763-4f4b-a71a-76a6f5f243d3,111153,230.166946,344.166931,117.216942,49.012608,8.380343,1.0,50.700531,21.778294,4.391233,9.430344,19.810688,163.872391,39.0,8878.634766,442.558167,26.608175,415.950012,141.977646,273.793304,7.429875,-13.553053,grassland,792.632141
111153,Insecta,Hymenoptera,Formicidae,Liometopum,Liometopum occidentale,34.156833,-118.726833,2010-05-21T00:00:00Z,21.0,5.0,2010,MCZ,,PRESERVED_SPECIMEN,40.0,4bfac3ea-8763-4f4b-a71a-76a6f5f243d3,111154,161.271088,216.747421,116.372208,499.779816,106.11647,0.0,96.797958,289.03363,4.47036,11.92036,279.654663,128.898209,53.252625,3962.391846,291.09256,51.291302,239.801269,118.917351,209.822205,2.387915,347.122986,grassland,4219.497559
111154,Insecta,Hymenoptera,Formicidae,Pogonomyrmex,Pogonomyrmex californicus,32.989167,-116.6585,2010-05-25T00:00:00Z,25.0,5.0,2010,MCZ,,PRESERVED_SPECIMEN,30.0,4bfac3ea-8763-4f4b-a71a-76a6f5f243d3,111155,140.3423,217.269943,76.576256,559.577881,102.406273,3.0,78.449997,293.045471,25.936378,36.503853,270.56781,154.063629,49.68396,5537.141602,315.901031,8.395161,307.505859,83.576256,199.928528,6.917791,981.674377,grassland,4865.626465
111155,Insecta,Lepidoptera,Nymphalidae,Vanessa,Vanessa cardui,36.2514,-117.7314,1992-09-02T00:00:00Z,2.0,9.0,1992,NTNU-VM,100000.0,PRESERVED_SPECIMEN,1.0,1bec5de3-758c-4ed2-ab13-1597601ad07a,111156,102.202797,197.274719,18.412592,294.105835,48.132629,6.007638,59.663151,133.747376,27.179491,31.66029,133.747376,141.080276,41.0,7069.728027,293.179962,-47.896355,341.076324,20.24128,171.378082,23.553705,1913.307861,grassland,1628.940918


In [14]:
# info() lets you see a summary of information about the dataframe
prey.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 111156 entries, 0 to 111155
Data columns (total 40 columns):
 #   Column                         Non-Null Count   Dtype  
---  ------                         --------------   -----  
 0   class                          111156 non-null  object 
 1   order                          111054 non-null  object 
 2   family                         111156 non-null  object 
 3   genus                          111156 non-null  object 
 4   species                        111156 non-null  object 
 5   lat                            111156 non-null  float64
 6   long                           111156 non-null  float64
 7   eventDate                      111156 non-null  object 
 8   day                            110998 non-null  float64
 9   month                          111127 non-null  float64
 10  year                           111156 non-null  int64  
 11  institutionCode                110414 non-null  object 
 12  coordinateUncertaintyInMeters 

In [16]:
# nunique() gives you the number of unique observations in each column
prey.nunique()

class                                 4
order                                18
family                               63
genus                                92
species                              97
lat                               91909
long                              91965
eventDate                        102140
day                                  31
month                                12
year                                 77
institutionCode                      77
coordinateUncertaintyInMeters      4450
basisOfRecord                         4
individualCount                      42
datasetKey                           88
ID                               111156
bio1                              11778
bio10                             11981
bio11                             11803
bio12                             12117
bio13                             12020
bio14                              4586
bio15                             11848
bio16                             12102


In [23]:
# shape gives you the dimensions (# rows and # columns) of the data frame
prey.shape

(111156, 40)

In [21]:
# columns list the column names of the data frame 
prey.columns

Index(['class', 'order', 'family', 'genus', 'species', 'lat', 'long',
       'eventDate', 'day', 'month', 'year', 'institutionCode',
       'coordinateUncertaintyInMeters', 'basisOfRecord', 'individualCount',
       'datasetKey', 'ID', 'bio1', 'bio10', 'bio11', 'bio12', 'bio13', 'bio14',
       'bio15', 'bio16', 'bio17', 'bio18', 'bio19', 'bio2', 'bio3', 'bio4',
       'bio5', 'bio6', 'bio7', 'bio8', 'bio9', 'dist', 'dem90_hf', 'lc_class',
       'ndvi'],
      dtype='object')

In [24]:
# dtypes lists the type of each of the variables in the data frame 
prey.dtypes

class                             object
order                             object
family                            object
genus                             object
species                           object
lat                              float64
long                             float64
eventDate                         object
day                              float64
month                            float64
year                               int64
institutionCode                   object
coordinateUncertaintyInMeters    float64
basisOfRecord                     object
individualCount                  float64
datasetKey                        object
ID                                 int64
bio1                             float64
bio10                            float64
bio11                            float64
bio12                            float64
bio13                            float64
bio14                            float64
bio15                            float64
bio16           

## 6. Update column names

In [26]:
# institutionCode and datasetKey to institution_code and dataset_key
prey = prey.rename(columns = {'institutionCode' : 'institution_code',
                             'datasetKey' : 'dataset_key'})
prey

Unnamed: 0,class,order,family,genus,species,lat,long,eventDate,day,month,year,institution_code,coordinateUncertaintyInMeters,basisOfRecord,individualCount,dataset_key,ID,bio1,bio10,bio11,bio12,bio13,bio14,bio15,bio16,bio17,bio18,bio19,bio2,bio3,bio4,bio5,bio6,bio7,bio8,bio9,dist,dem90_hf,lc_class,ndvi
0,Insecta,Odonata,Libellulidae,Sympetrum,Sympetrum corruptum,37.173971,-121.856329,2020-09-29T08:24:00Z,29.0,9.0,2020,iNaturalist,684.0,HUMAN_OBSERVATION,,50c9509d-22c7-4a22-a47d-8c48425ef4a7,1,141.925064,199.384964,89.602631,727.380493,151.920853,1.809903,89.132339,391.141083,7.979710,13.979711,391.141083,139.787125,53.000000,4352.049805,289.341888,27.307756,262.034149,89.602631,196.067307,8.935099,352.799927,grassland,6343.969238
1,Malacostraca,Isopoda,Armadillidiidae,Armadillidium,Armadillidium vulgare,37.688580,-122.436608,2021-12-04T14:49:59Z,4.0,12.0,2021,iNaturalist,6.0,HUMAN_OBSERVATION,,50c9509d-22c7-4a22-a47d-8c48425ef4a7,2,136.366943,165.841537,104.141464,609.651916,126.271736,1.001933,87.368050,318.303192,8.236301,12.390024,318.301269,83.996452,53.266750,2455.836914,217.470047,61.847702,155.622360,107.227386,159.852875,2.005335,112.895271,grassland,4995.110352
2,Insecta,Hymenoptera,Apidae,Bombus,Bombus vosnesenskii,33.774911,-116.679713,2021-06-18T08:52:00Z,18.0,6.0,2021,iNaturalist,31.0,HUMAN_OBSERVATION,,50c9509d-22c7-4a22-a47d-8c48425ef4a7,3,82.885788,161.338638,19.747343,724.200195,120.836441,4.910511,70.099121,353.376068,49.060783,77.778824,342.061676,144.624817,47.098640,5776.527344,254.696030,-47.440784,302.136810,20.866125,134.750717,4.422367,2315.607666,grassland,5118.055664
3,Insecta,Hymenoptera,Apidae,Apis,Apis mellifera,32.848001,-117.050126,2016-04-17T09:28:00Z,17.0,4.0,2016,iNaturalist,15.0,HUMAN_OBSERVATION,,50c9509d-22c7-4a22-a47d-8c48425ef4a7,4,171.959030,221.238907,130.363754,323.523682,62.686253,1.000000,87.075172,179.273834,4.989321,9.743295,163.154465,118.763413,53.612831,3539.726807,286.711762,65.920944,220.790817,133.117722,213.238907,4.316342,199.882156,closed_shrubland,3649.374756
4,Insecta,Hemiptera,Lygaeidae,Oncopeltus,Oncopeltus fasciatus,32.739453,-117.133980,2019-04-06T18:44:31Z,6.0,4.0,2019,iNaturalist,16.0,HUMAN_OBSERVATION,,50c9509d-22c7-4a22-a47d-8c48425ef4a7,5,171.996231,214.353073,136.138779,273.203491,58.141060,1.000000,91.661606,162.390411,4.000000,7.109778,142.560562,91.184349,50.718666,3033.009521,259.712738,81.205818,178.506927,138.277557,205.749908,0.651661,72.789574,urban,3468.256104
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
111151,Insecta,Hymenoptera,Formicidae,Pogonomyrmex,Pogonomyrmex californicus,33.660853,-115.721380,1980-04-10T00:00:00Z,10.0,4.0,1980,MCZ,3036.0,PRESERVED_SPECIMEN,1.0,4bfac3ea-8763-4f4b-a71a-76a6f5f243d3,111152,200.539291,297.044373,111.539291,131.447403,21.313471,0.464611,50.358723,45.637127,7.404018,33.626942,44.287857,158.105041,43.560184,7319.267090,393.674744,34.807846,358.866913,217.827911,227.579758,0.925702,548.778870,barren,1187.127441
111152,Insecta,Hymenoptera,Formicidae,Pogonomyrmex,Pogonomyrmex californicus,35.933333,-116.716667,1998-03-26T00:00:00Z,26.0,3.0,1998,MCZ,,PRESERVED_SPECIMEN,2.0,4bfac3ea-8763-4f4b-a71a-76a6f5f243d3,111153,230.166946,344.166931,117.216942,49.012608,8.380343,1.000000,50.700531,21.778294,4.391233,9.430344,19.810688,163.872391,39.000000,8878.634766,442.558167,26.608175,415.950012,141.977646,273.793304,7.429875,-13.553053,grassland,792.632141
111153,Insecta,Hymenoptera,Formicidae,Liometopum,Liometopum occidentale,34.156833,-118.726833,2010-05-21T00:00:00Z,21.0,5.0,2010,MCZ,,PRESERVED_SPECIMEN,40.0,4bfac3ea-8763-4f4b-a71a-76a6f5f243d3,111154,161.271088,216.747421,116.372208,499.779816,106.116470,0.000000,96.797958,289.033630,4.470360,11.920360,279.654663,128.898209,53.252625,3962.391846,291.092560,51.291302,239.801269,118.917351,209.822205,2.387915,347.122986,grassland,4219.497559
111154,Insecta,Hymenoptera,Formicidae,Pogonomyrmex,Pogonomyrmex californicus,32.989167,-116.658500,2010-05-25T00:00:00Z,25.0,5.0,2010,MCZ,,PRESERVED_SPECIMEN,30.0,4bfac3ea-8763-4f4b-a71a-76a6f5f243d3,111155,140.342300,217.269943,76.576256,559.577881,102.406273,3.000000,78.449997,293.045471,25.936378,36.503853,270.567810,154.063629,49.683960,5537.141602,315.901031,8.395161,307.505859,83.576256,199.928528,6.917791,981.674377,grassland,4865.626465
