<a href="https://colab.research.google.com/github/jtklein/BatchUploadINaturalist/blob/master/batchUpload.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

In their recent paper, Heberling and Isaac ([paper here](https://bsapubs.onlinelibrary.wiley.com/doi/full/10.1002/aps3.1193)) argued for augmenting herbarium collections by linking them to observations recorded on the community based biodiversity forum iNaturalist (https://www.inaturalist.org/). In particular, this would make available photos of the plant and of its habitat before collecting and pressing it; data that has even greater value when linked to a physical specimen and which might otherwise be unavailable to potential users. Doing so works well if you are already using the smartphone app of iNaturalist in the field, as you can create observations on the fly and even use these to export collections data for databasing separately later. If you're sold on the value of the approach, what then about all that data that you collected in years past which might just be sitting more or less unused on your hard drive? In general it is also possible to create observations on the homepage or through the app from past observations. However, if your backlog of observations is large this can quickly become a massive task.
Luckily, iNaturalist also has an API that you can directly call for standard tasks such as creating or deleting an observation. Using the API also makes it extremely quick to perform these actions in batch for several hundreds of observations.

The goal of this post is to give you some ideas for a workflow on how to create several new observation records on iNaturalist. To make it more accessible we will do so for a real-life backlog from a biologist. In order to show the workflow, we have chosen to setup a Google Colaboratory notebook. These relatively new Google product has some cool features helpfull for us in this case. For example, that it provides a standardized machine setup, lets you share your code directly with other users interactively, has easy integration with Google Drive, a good place to share the photos to be uploaded. In principle it is a python Jupyter notebook in the cloud with a dedicated machine running your code. For further information about Colaboratory notebooks check out the [documentation](https://colab.research.google.com/notebooks/welcome.ipynb). We have chosen this setup because it is extremely easy to share code for this blog post, be aware though that in most cases it makes sense to run these scripts locally on your machine with a python runtime. However, this would require at least some experience working with python.

In the specific example that we will use here, the setup is the following. In other words the bare bone **recipe** we follow is:

*   The biologist has all the data collected about the plant in a table. In your case, this can be an export of an online database or locally stored file. In the example here this is an Excel table. (**One table**)
*   The **photos** of the observations are stored **in one Google Drive folder**.
* The **photos** available are linked to the data in the table by a **unique identifier**. In the example here the name of each file incorporates the collector initials plus collection number. This is sufficient to link it to the database.
*  The biologist already has **an iNaturalist account** and also **one iNaturalist project** the observations will be tied to. The iNaturalist account has to be at least a curator in this project in order to add observations to it.
* Keep in mind that in order to use the same setup with a Colaboratory notebook with access to a Google Drive a **Google account** is required.

The basic outline of the script is this: We will read in the data table. For each row (observation) in the table we will check if there are photos available on the Google Drive. For each observation with photos we will create an iNaturalist observation and put it into one project.

Each block of code has to be executed in sequence.

Here is an example of how to setup the Google Drive folder with the photos to be uploaded.

![Example of Google Drive folder setup](https://github.com/jtklein/BatchUploadINaturalist/blob/master/Example_GoogleDrive_folder.png?raw=true)

Here is an example of how to setup the data table. You will be prompted to uploaded it to the Colab instance if you follow our code.

![Example of data table 1](https://github.com/jtklein/BatchUploadINaturalist/blob/master/Example_Data_Table_1.png?raw=true)

![Example of data table 2](https://github.com/jtklein/BatchUploadINaturalist/blob/master/Example_Data_Table_2.png?raw=true)

# Install external packages
Now let’s begin with the first block of code. Here, we need to install some external packages into the Colaboratory notebook instance. These notebooks are actually offered by Google to provide a fast machine learning prototyping experience. For this reason, already quite a few packages that we will need are installed from the start. However, we will need additional packages for connecting to Google Drive and the iNaturalist API. (You might get an error for incompatible requirements, you can ignore it for this block)

In [1]:
# This only needs to be done once per notebook.
# Install external packages
# Install pyinaturalist
!pip install -q pyinaturalist
# Install the PyDrive wrapper
!pip install -U -q PyDrive

[?25l[K    17% |█████▋                          | 10kB 20.1MB/s eta 0:00:01[K    35% |███████████▎                    | 20kB 1.8MB/s eta 0:00:01[K    52% |█████████████████               | 30kB 2.7MB/s eta 0:00:01[K    70% |██████████████████████▋         | 40kB 3.5MB/s eta 0:00:01[K    88% |████████████████████████████▎   | 51kB 4.3MB/s eta 0:00:01[K    100% |████████████████████████████████| 61kB 4.5MB/s 
[31mspacy 2.0.18 has requirement numpy>=1.15.0, but you'll have numpy 1.14.6 which is incompatible.[0m
[31mgoogle-colab 1.0.0 has requirement requests~=2.18.0, but you'll have requests 2.21.0 which is incompatible.[0m
[31mfastai 1.0.51 has requirement numpy>=1.15, but you'll have numpy 1.14.6 which is incompatible.[0m
[31mdatascience 0.10.6 has requirement folium==0.2.1, but you'll have folium 0.8.3 which is incompatible.[0m
[K    100% |████████████████████████████████| 993kB 20.3MB/s 
[?25h  Building wheel for PyDrive (setup.py) ... [?25ldone
[?25h

# Import packages
We import the required packages into the python runtime.

In [0]:
# This only needs to be done once per notebook.
# Import libraries

from google.colab import files
from google.colab import auth

# PyDrive reference:
# https://gsuitedevs.github.io/PyDrive/docs/build/html/index.html
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive

from oauth2client.client import GoogleCredentials

# https://github.com/inbo/pyinaturalist
from pyinaturalist.rest_api import create_observations
from pyinaturalist.rest_api import add_photo_to_observation
from pyinaturalist.node_api import get_observations
from pyinaturalist.rest_api import delete_observation # in case we need to delete an observation

import pandas as pd
import os
import requests

# Define functions
The next three blocks of code we will not need right away. We are declaring functions here that we will use later. Later, when we iterate over all rows in the table to create observations, you will see that there is a lot to do for each row. In order to keep this block further down smaller and more readable I am defining some functions here. Having these functions dedicated to one task also makes the codebase more reusable. For instance if we would need to add an observation to a post in a different notebook we can just take the definition of the function for that from here.

Traditionally, in python the definitions of functions is at the beginning of the script. For now you can just execute these functions definitions to continue with the flow of the script. Later you can come back to check what these functions are doing internally. Or you can familiarize yourself with what these functions are doing now. 

The first function that we define here has one purpose: To extract the data from one row of the input table (corresponding to one observation in our case) into a form that is accepted by the iNaturalist API to create an observation. We know that there are a lot of different database formats out there, and in your case you are likely to have a different structure of your data table. For that reason you likely will need to change this part a lot to accomodate your structure. Keep in mind that it is generally speaking more beneficial to change the script to accomodate your data then the other way around. In the specific example here, the table has been exported from a [BRAHMS](https://herbaria.plants.ox.ac.uk/bol) database and includes a number of fairly standard fields.

So, make a copy of this block into your notebook and change it to what you need. You can keep ours as a reference.

The way we are doing it here is: We extract the information we need from one row of the database table by indexing into the row with the columns' names. For some data we have added some additional checks, simply because not all of the rows in our example dataset have this particuar information, and we need to avoid including empty datafields when creating observations. In this case, all records from the Western Cape in South Africa were known to have been from the habitat type "Fynbos" and the positional accuracy was hardcoded to 100 m; this information could otherwise be obtained from the table if it is recorded. Lastly, we transform the data into a python dictionary and return it back. For creating an observation we will be using these input parameters with the pyinaturalist package, for further possible input parameters you can refer to this [reference](https://github.com/inbo/pyinaturalist#create-a-new-observation) or directly [here](https://www.inaturalist.org/pages/api+reference#post-observations).

In [0]:
def make_observation_params(row):
    """
    Extracts the data to store in the observation from the dataframes table row
    and transforms it into a dictionary structure
    """
    # Extracting the information from the dataframe table row
    colnum = row['colnum']
    description = row['DESCRIPTION']
    number = row['NUMBER']
    genus = row['GENUS']
    species = row['SP1']
    intra = row['SP2']
    collector = row['COLLECTOR']
    additional_collector = row['ADDCOLL']
    latitude = row['LATDEC']
    longitude = row['LONGDEC']
    # Here we hardcoded a positional accuracy of 100m for the coordinates of the
    # observation because this informationwas not available when sampling in the
    # field, in your case you might also include this from the reference table
    positional_accuracy = 100 # meters
    place = row['GAZETTEER2']
    # The following line would take the date from the photo metadata:
    # photo_taken_at = row['photoTakenAt']
    # For scanned photos this may be better obtained from the data table
    year = row['year']
    month = row['COLLMM']
    day = row['COLLDD']
    photo_taken_at = "{}-{}-{}".format(year, month, day)
    institution_code = row['DUPLICATES']
    identified_by = row['DETBY']
    date_identified = row['DETYEAR']
    
    # We derived the time when the observation was made from the photo's metadata
    # in case there was none given, we extract this information from the table row
    #  if pd.isnull(photo_taken_at):
    #  photo_taken_at = "{}-{}-{}".format(year, month, day)
    
    # We build a taxon string to send as initial identification for the observation
    # In this case here we extract it from different fields in the table row
    # Note: the string does not contain intraspecific designator (e.g. ssp. or var.)
    taxon = genus
    # If species epithet was given
    if not pd.isnull(species):
      taxon = taxon + " " + species
    # If intraspecific was given
    if not pd.isnull(intra):
      taxon = taxon + " " + intra
    
    # We build a comma-seperated-list of tags that we want to add to the observation
    # In this case the unique identifier number, genus name, species epithet
    tag_list = str(colnum)
    if not pd.isnull(genus):
      tag_list = tag_list + ", " + str(genus)
    if not pd.isnull(species):
      tag_list = tag_list + ", " + str(species)
    
    # Dictionary structure of the params we want to send along
    params = {
      'observation':
        {
          'species_guess': taxon,
          'tag_list': tag_list,
          'observed_on_string': photo_taken_at,
          'latitude': latitude,
          'longitude': longitude,
          'positional_accuracy': positional_accuracy,
          'place_guess': place,
          'description': description,
          'observation_field_values_attributes': [],
        }
    }
    
    # Because not every row/observation in the original data table has values for all fields
    # we are checking which fields of the resulting dictionary have empty values and remove these
    emptyKeys = []
    for key, value in params['observation'].items():
      if key == 'observation_field_values_attributes':
        continue
      if pd.isnull(value):
        emptyKeys.append(key)
    for key in emptyKeys:
      del params['observation'][key]
    
    # We are adding the values for additional fields if given
    # recordedBySymbiota
    if not pd.isnull(collector):
      params['observation']['observation_field_values_attributes'].append(
        {'observation_field_id': 8958,'value': collector}
      )
      
    # recordNumberdwc
    if not pd.isnull(number):
      params['observation']['observation_field_values_attributes'].append(
        {'observation_field_id': 8953,'value': number}
      )
      
    # associatedCollectorsSymbiota
    if not pd.isnull(additional_collector):
      params['observation']['observation_field_values_attributes'].append(
        {'observation_field_id': 8790,'value': additional_collector}
      )
      
    # identifiedBydwc
    if not pd.isnull(identified_by):
      params['observation']['observation_field_values_attributes'].append(
        {'observation_field_id': 9598,'value': identified_by}
      )
      
    # institutionCodedwc
    if not pd.isnull(institution_code):
      params['observation']['observation_field_values_attributes'].append(
        {'observation_field_id': 10040,'value': institution_code}
      )
    
    return params

This function that we define here has one purpose: To take one photo from our Google Drive and upload it to one already existing iNaturalist observation.

The way we are doing it here is: We create a cursor to the file that we need. Which file we need is provided to the function by the file's id. Then we download this file into the notebooks runtime. By using the pyinaturalist packages's call to the iNaturalist API we upload the photo file to an already existing observation (provided by observation ID). This is an authenticated call to the API, so we need to provide our iNaturalist account API access token. After we are finished we clean up behind ourselves and delete the downloaded photo from the runtime.

In [0]:
def get_and_upload_photo_to_observation(photo_id: str="", observation_id: str="", access_token: str=""):
    """
    Get a photo from Google Drive and upload it to an iNaturalist observation
    """
    # Download the file into this Colaboratory notebook instance
    downloaded = drive.CreateFile({'id': photo_id})
    downloaded.GetContentFile(photo_id)
    # Call to iNaturalist API to add this photo to an existing observation
    r = add_photo_to_observation(
      observation_id=observation_id,
      file_object=open(photo_id, 'rb'),
      access_token=access_token
    )
    # Remove the photo file from this Colaboratory instance no longer needed
    os.remove(photo_id)

This function that we define here has one purpose: To add an existing iNaturalist observation to an existing iNaturalist project.

The way we are doing it here is: We create a dictionary with the ID of the observation to be added. Then we call the iNaturalist API directly and post to it that we want to add this observation to the project with given ID. By the time of writing this blog post, the function is not implemented in the pyinaturalist package yet. This is an authenticated call to the API, so we need to provide our iNaturalist account API access token.

In [0]:
def add_observation_to_project(observation_id: str="", project_id: str="", access_token: str=""):
    """
    Use the iNaturalist API to add an existing observation to an existing project
    Reference for this API call:
    https://api.inaturalist.org/v1/docs/#!/Projects/post_projects_id_add
    """
    # The payload of the API call
    payload = {
      "observation_id": observation_id
    }
    # This call needs to be authenticated
    headers = {"Authorization": "Bearer %s" % access_token}
    # Call the API
    response = requests.post(
        "https://api.inaturalist.org/v1/projects/{}/add".format(project_id),
        params=payload,
        headers=headers
    )
    return response.json()
     

# Authenticate with Google Drive
As first real step in our script we will establish a connection to our Google Drive. If everything goes right we will be shown a link. On the page following this link you will be asked for permission to connect your Drive to this notebook instance. Once granted you will receive a code that you need to paste in the form and press enter.

The Drive should now be connected. In principle this needs to be done only once at the start of the notebook. However, infrequently I have encountered an error like this: "InvalidConfigError: Invalid client secrets file ('Error opening file', 'client_secrets.json', 'No such file or directory', 2)". Then it helps to just let this code run once again.





In [0]:
# Authenticate and create the PyDrive client.
# This only needs to be done once per notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# List the available photos
Next to authenticating ourselves with our Google Drive we will use the Google Drive API to list all files of a given folder. Remember, in our example all photos of observations are in one folder. What we are doing here in this block of code is: First, we use the Google Drive API to search for a folder. We can use the API to list us all files that match certain search criteria. For finding our folder we can search for all folder with the query term "mimeType contains 'application/vnd.google-apps.folder'",  and the one folder needed with a specific string for the title of the folder "title = Erica". For this part of the script in our example we rely on only one folder being found with this title. Please keep in mind that the search by title is a prefix pattern matching, so if you have multiple files starting with this search term you will get multiple search results. Next, after we have identified the folder with the photos we again ask for a list of all files that are present in this folder. In our example we will only have photos in the folder. Make sure to include some more search criteria to exclude files that you do not need from this folder. We will reuse the list of photos in a later block.

For a complete reference to the available search criteria for listing Google Drive files you can check the [documentation](https://developers.google.com/drive/v2/web/search-parameters).

In [0]:

# We are using the Google Drive API to list the photos
# We are looking for the folder that contains all the photos we will use
# In this case we are looking for a folder called "iNaturalist_Danth",
# result is a list of folders that match the search criteria
folders = drive.ListFile({'q': "mimeType contains 'application/vnd.google-apps.folder' and title contains 'iNaturalist_Erica'"}).GetList()
# Get the ID of the folder
for folder in folders:
  folderID = folder['id']
  print('folder {}, id {}'.format(folder['title'], folder['id']))
# Use the Google Drive API to list all files in the folder found by ID.
listed = drive.ListFile({ 'q': "'{}' in parents and trashed = false".format(folderID) }).GetList()
# Show some information about the files in the folder
for file in listed:
  print('file {}, id {}'.format(file['title'], file['id'])) 

# Prepare the data
Next we will import our data about the observations into the notebook runtime. As mentioned earlier, in the example here the information about the observations is in the form of a table file. Specifically, we will use an Excel table file. In order for the information to be ready to use we need to upload this file to the Colaboratory instance. Here, after executing the code block the Google Colaboratory notebook provides us with a nice little upload dialog. Choose the data table file you want to use in the dialog. The file will be uploaded to the file system of the host of your runtime. You can inspect the files present in the runtime in the left-hand panel. As seen above in principal you can also host the database table on Google Drive and access it from there directly by following the above example. This is just to show you a different way of importing information into the runtime. If you run this example locally you need to locate your data file by your own.

The uploaded file is being parsed and transformed into a pandas DataFrame. pandas is a prominent python library for data handling. If you are not familiar with pandas don’t sweat it, we have tried to keep the script as easy as possible. At the end of the block we can inspect the first few rows of the uploaded table.

In [0]:
# Prompt an upload dialog
uploaded = files.upload()
# Get the filename of the uploaded file
filename = list(uploaded.keys())[0]
# Transform the table into a pandas dataframe
df = pd.read_excel(filename, engine='xlrd')
# Inspect the result
df.head()

In order not to change the orginal data table that we first imported, we will create a copy to work with. We already made a copy by uploading it from outside the runtime, however, it is also good to keep an unchanged table present in your runtime in case we need to access it again.
In the example here we are also only using a subset of the originally imported table (only the subset where Pirie, MD is the collector). In our case this comes from the fact that not all observations in this example dataset are made by the same person, so we restrict ourselves in using only the data of one person to be uploaded. The other collectors of data can use the same script to upload their data but need to change the selection criteria here. Carefully we inspect the new table structure.

In [0]:
# We will only use those rows of the table for which the collector is "Pirie, M.D."
# we create a copy to not alter the original
mask = df['COLLECTOR'] == 'Pirie, M.D.'
pirieList = df.copy()[mask]
pirieList.head()


For the rows in the table (observations) we will check if there are photos available that can be uploaded. You can in principle also create iNaturalist observations without photos, however, in our example here we restricted ourselves to only those with photos available.
To do so, we iterate over each row in the table and check if in the list of files we created earlier there are files present that match a unique identifier present in the dataset. In our example we are using a combination of collectors initials and a collection number. In case we have a match with this unique identifier we extract the file’s ID and also a time stamp for when the photo was taken from the photo’s metadata. We append the information to the DataFrame.

In [0]:
# Iterate over the rows of the table (i.e. observations)
for index, row in pirieList.iterrows():
  # Extract the unique identifier (collector+number) for this row
  colnum = "MP{}".format(row['NUMBER'])
  query = colnum + "_"
  photos = { 'photos': [] }
  # Iterate over the list of photos in the Google Drive folder
  for file in listed:
    # If the unique identifier is in the file title
    if query in file['title']:
      # Add the file id and metadata about when the photo was taken to the table
      pirieList.loc[index, 'colnum'] = colnum
      photos['photos'].append(file['id'])
      if 'imageMediaMetadata' in file:
        if 'date' in file['imageMediaMetadata']:
          # Date of the last picture
          pirieList.loc[index, 'photoTakenAt'] = file['imageMediaMetadata']['date']
  if len(photos['photos']) > 0:
    pirieList.loc[index, 'photos'] = photos

# Authenticate with iNaturalist
When we want to perform any actions on iNaturalist via the API that are tied to a user account we need to authenticate ourselves to their service. Otherwise their server has no way of knowing who is uploading these observations. Tasks that require authentication are for example creating or deleting observations among many others. For an exhaustive list of tasks that can require authentication you should check the API [documentation](https://api.inaturalist.org/v1/docs/). Several ways of authenticating yourselves to the API are available. For example, you can use a username plus password combination like in the homepage of iNaturalist. However, using such private information is not recommended when we share the notebook document. An other way of authenticating is by an access token that iNaturalist hands out to you if you are already logged into the website version. So, after you have logged into the website version go to this link here and you will see your token: [https://www.inaturalist.org/users/api_token](https://www.inaturalist.org/users/api_token). Please paste the entire information you see on the screen into this field here. This token is always only viable for 12 hours a piece, so if you come back to this notebook at a later point make sure to replace this token. The value to paste in here should look like this:  


```
{"api_token":"eyJhbGciOiJIUzUxMiJ9.eyJ1c2VyX2lkIjo1MzA2NTksImV4cCI6MTU0ODUwMDY0NH0.SKKQsSMSofOhAwkPnFJ_m0tTCu1iFyVBhgtltjLeW49VtQwk_n2rQ3OtlkHTxHccfTyNg9WwZdhAV_yxQWVimg"}

```

For the interested, you can double click into the form field anywhere to see how the code for this looks like.

In [0]:
#@title Paste your entire iNaturalist api_token object here { run: "auto", vertical-output: true, display-mode: "form" }
# Get the value from the input form
api_token =  #@param {type:"raw"}
# Validate the input and use it if OK
token = None
if type(api_token) == dict:
  try:
    token = api_token['api_token']
    print(token)
  except (KeyError):
    print("Your token object must have the key \"api_token\"")
else:
  print("Your token does not have the correct format.\nIt needs to be a python dictionary.\nIt should look something like this:\n {\"api_token\": \"xyz123\"}")

Before we create new observations on iNaturalist we will have some checks to see if there might be already the same observation present. For this reason we need the iNaturalist username.

For the interested, you can double click into the form field anywhere to see how the code for this looks like.

In [0]:
#@title Paste your iNaturalist username here { run: "auto", vertical-output: true, display-mode: "form" }
# Get the value from the input form
username = "" #@param {type:"string"}
# Validate the input and use it if OK
user_id = None
if type(username) == str:
  user_id = username
  print(user_id)
else:
  print("Your username does not have the correct format.")

# Create new observations in batch
Now, finally, the most important part. In this block of code we will perform all the main actions of this script. We will create an observation, add its photos and add this new observation to a project.
In the outer loop you can see that we are iterating over all rows of the table (DataFrame). Each row consists of the data of one observation. Each observation is found in only one row. In the example here we only want to create a new iNaturalist observation if we have some photos for this observation. So, we skip all observations that do not have photos. Then, with the unique identifier for each observation (collector initials plus collection number) we check if there is already an observation present with the same identifier on iNaturalist. We are here searching in the list of tags of the observations. The search query params can be changed of course. For a reference to searching see the iNaturalist API [documentation](https://api.inaturalist.org/v1/docs/#!/Observations/get_observations).

For creating an observation we are extracting all the necessary information from the row by using a function we defined earlier in our script. For details on this function check above. Then, we create a new observation via the iNaturalist API by using the pyinaturalist package. We are adding the new observation's ID to our row, to be saved for later. For each of the photos associated with this observation we will now use the function to upload them to iNaturalist, as described above. Lastly, if all was successful we add the new observation to the already existing project that they should be housed in. To add the observation we are using the function defined above. Keep in mind that you need to have at least curator rights in this already existing project to add observations. Also, we are using the project ID here directly. You need to paste it from iNaturalist for your project.

In [0]:
# For each row in the dataframe
for index, row in pirieList.iterrows():
  # If there are photos found
  if not pd.isnull(row['photos']):
    # The supposedly unique identifier
    colnum = row['colnum']
    # Check if there is already an observation with this collector+number
    observations = get_observations(params={ 'q': colnum, 'search_on': 'tags', 'user_id': user_id })
    if observations['total_results'] > 0:
      print('There is already an observation by {} with the number {} in tags'.format(user_id, colnum))
      continue
    # Prepare the paramaters for creating an observation with the iNaturalist API
    params = make_observation_params(row)
    print(params)
    # continue
    # API call to create the observation
    r = create_observations(params=params, access_token=token)
    new_observation_id = r[0]['id']
    print("Created new iNaturalist observation, id: {}".format(new_observation_id))
    # Save the observation id to the dataframe
    pirieList.loc[index, 'iNaturalist_ID'] = str(new_observation_id)
    # Add each of the photos found to the observation
    for photo in row['photos']['photos']:
      print("Uploading photo: {}".format(photo))
      get_and_upload_photo_to_observation(photo_id=photo, observation_id=new_observation_id, access_token=token)
    # Add the new observation to an existing project
    # the project ID is hardcoded to one specific project
    print("Adding observation to project")
    add_observation_to_project(observation_id=new_observation_id, project_id=35591, access_token=token)

# Save the new information
In this block of code all we do is write the appended DataFrame to a .csv file. We added some temporary information to the DataFrame during the script which is no longer needed. We keep the column with the resulted iNaturalist IDs of the new observations only. After execution you can find the new file in the left-hand panel of the notebook. You can also download it from there.

In [0]:
# Make a deep copy of the dataframe
output_df = pirieList.copy()
# Remove columns only needed for within this script
output_df.drop(columns='colnum', inplace=True)
output_df.drop(columns='photoTakenAt', inplace=True)
output_df.drop(columns='photos', inplace=True)
# Save to .csv in the notebook instance
output_df.to_csv('output.csv')

# Troubleshooting
If something goes wrong you can delete all observations you created with this snippet.

In [0]:
# For all rows in the dataframe
for index, row in pirieList.iterrows():
  # If there is an iNaturalist id delete the observation
  id = row['iNaturalist_ID']
  if not pd.isnull(id):
    try:
      delete_observation(int(id), token)
    except:
      pass