# Using requests and DwCAReader to read DwC-A Files from IPT

Using the two modules in python, request and dwca you can read DwC-A files directly from an IPT resource.

The required modules are included in the *environment.yml* or can be directly installed using the pip commands:

`pip install requests`

`pip install python-dwca-reader`

After isntallation you may have to restart your notebook before proceeding. This can be accomplished by selecting from the file menu `Kernel ->  Restart`

In [None]:
from dwca.read import DwCAReader
import requests
from dwca.darwincore.utils import qualname as qn

## Setting Your IPT resource variable
set the variables to point to the server and resource you want to access.

In [None]:
IPT_URL  = "https://oceantrack.org/ipt/" # Base URL of the IPT server
resourceID = "otndfobalfrytags" # IPT resource ID

## Downloading the DwC-A File from IPT
using the varibles set above, requests will retrieve the latest version of the DwC-A File which can be saved locally. 

In this example the requested file comes from the *otndfobalfrytags* resource on <a href="https://oceantrack.org/ipt/">OTN's IPT</a> server. A DwC-A version number can be optionally provided but if ommitted the latest file is requested.

In [None]:
req = requests.get(f'{IPT_URL}archive.do?r={resourceID}') # Building request URL
print(f'{IPT_URL}archive.do?r={resourceID}')

zipfilename = f'{resourceID}_DwC.zip'
# Writes the output zip file 
with open(zipfilename, 'wb') as zipfile:
    zipfile.write(req.content)
    print(f'{zipfilename} file created.')


Open it up using `DwCAReader(handle)`- we're going to use `with` since that'll make sure it's disposed of properly
and doesn't hog memory. 

In [None]:
with DwCAReader(zipfilename) as dwca:
    # Now we can interact with the object dwca. 
    # We can get the rows like this:
    dwca.rows
    
    # We can access a specific row like so:
    requested_row = dwca.get_row_by_index(1)
    
    # We can display the requested row using print
    print(requested_row)
    
    # We can also get the row by the id (dwca.get_row_by_id()) 
    # but the documentation warns that this is brittle and unreliable.
    
    # If we want to check for a specific term, we can do so like this:
    if qn('subgenus') in dwca.descriptor.core.terms:
        print('subgenus: Termname exists!')

`qn` above is short for 'qualname', and just means we don't have to put the full name (i.e, http://rs.tdwg.org/dwc/terms/termname). We `import` it as `qn` above. 

The complete API for the DwCA Reader can be found here: https://python-dwca-reader.readthedocs.io/en/latest/api.html

If we want to load data into a `pandas` dataframe, we can do it like so:

In [None]:
with DwCAReader(zipfilename) as dwca:
    core_df = dwca.pd_read(dwca.core_file_location, parse_dates=True)
    display(core_df)

After that, `core_df` is just a Pandas dataframe, and can be treated as such. 