# Xtractopy - multitrack function (name pending)

*Andrew Chin, 11/19/21*

Draft of running multiple satellite tags at a time and running the matching function at the same time. Running the base function multiple times may be ok with 1-20 animals, but if you have a lot to run, it would be more time efficient to combine all the satellite data together into one `pandas` df and running the `xtractopy` function on it. 

This function will take the following inputs:

1. multiple `pandas` dfs as objects

and output:
1. a combined df with a new column, "tag_ID", that is associated with its original df

this function could be an entirely new function that performs the basic transformation that `xtractopy` does, or it could stand alone as a datacleaning function. Depends on utility of the function for other purposes.


## Generalized function `xtractopy()`

In [None]:
# necessary packages
import datetime as dt
import xarray as xr
import numpy as np
import pandas as pd
from typing import Dict, Union
import fsspec
import matplotlib.pyplot as plt
from datetime import datetime 

## Tutorial
Below is an example of an `xtractopy` workflow from OHW 2021. We will be working with tiger sharks (*Galeocerdo cuvier*) tagged in the Gulf Stream system of the Western Atlantic Ocean.

![tigershark](tigershark_lauramcdonnell.png)

First, let's load in the track data:

In [None]:
shark_dir = "track_shark144020.csv"
track_ex = pd.read_csv(shark_dir, parse_dates=['datetime']) # in pandas, read_csv

# track_ex["lon"] = np.where(
#     track_ex["lon"] < 180,
#     track_ex["lon"] + 360,
#     track_ex["lon"])

lat_min = track_ex["lat"].min() - 2.0
lat_max = track_ex["lat"].max() + 2.0
lon_min = track_ex["lon"].min() - 2.0
lon_max = track_ex["lon"].max() + 2.0

xy_bbox = dict(latitude=slice(lat_min,lat_max), longitude=slice(lon_min,lon_max))

plt.plot(track_ex.lon,track_ex.lat)

xy_bbox

In [None]:
track_ex

In [None]:
# grab track data for a few tag datapoints
track_2014 = track_ex.iloc[0:100]
track_2014

# load in environmental data
We want to retrieve high resolution data from web repositories and servers and load them into the Python environment as an xarray. In addition, we recommend subsetting the data to the particular study grid for faster run-times. We have a built-in function, `subset_area`, that only requires three simple inputs.

here is the SST from MUR, available [here](https://registry.opendata.aws/mur/).

In [None]:
# bring in data for SST
file_location = 's3://mur-sst/zarr'
ikey = fsspec.get_mapper(file_location, anon=True)
ds_sst = xr.open_zarr(ikey,consolidated=True)
ds_sst

Identify the longitudinal extent of your study area:

In [None]:
# Subset of Gulf Stream 
max_lon_glf = -70
min_lon_glf = -82

### generalized data subset function

In [None]:
def subset_area(env_data,
                max_lon,
                min_lon):
    subset_lon = (env_data.lon >= min_lon) & (env_data.lon <= max_lon)
    subset_env_data = env_data.where(subset_lon, drop=True)
    return subset_env_data

In [None]:
gulf_stream_sst = subset_area(ds_sst, max_lon_glf, min_lon_glf)
gulf_stream_sst

In [None]:

def xtractopy(envdata,
              tagdata: pd.DataFrame,
             filename: [str]):
    """
    envdata: environmental data in an DataArray format
    tagdata: tag data in a pandas format
    filename: the name of the file .csv output file, as a "string"
    """
    def fuction_dataset_point(**kwargs) -> Dict[str, Union[float, int]]:
        pass

    def extract(function_dataset_point, 
                df: tagdata, 
                map_coordinates: Dict[str, str], 
                rename_variables: Dict[str, str]
               ) -> pd.DataFrame:
        """
        function_dataset_point: environmental data in a point format, to be transformed
        map_coordinates: key is name of column in dataframe, value is the name of the coordinate in dataset
        rename_variables: TBD
        """
    
        def get_row(row) -> Dict[str, Union[float, int]]:
            extract_coordinates = {}
        
            for key, val in map_coordinates.items():
                extract_coordinates[val] = row[key]
        
            result = function_dataset_point(**extract_coordinates)
        
            # rename variables here and transform result TBD
            return result
    
        return df.apply(
            lambda row: get_row(row), axis=1, result_type="expand"
        )


    def envdata_point(lat, lon, time) -> Dict[str, Union[float, int]]:
        ds = envdata.sel(lat=lat, lon=lon, time=time, method="nearest")

        results = {}
    
        for var in ds.variables:
            if var not in ds.coords:
                results[var] = ds[var].values
    
        return results

    combined_dat = pd.concat([tagdata, 
                        extract(envdata_point,
                                tagdata, 
                                {"lat": "lat", "lon": "lon", "datetime": "time"}, 
                                {}
                               )
                       ], axis=1)
    combined_dat.to_csv("".join([filename, ".csv"])) # need to figure out how to paste the title into the csv file
    return combined_dat


## TEST THE FUNCTION

In [None]:
# test
xtractopy(ds_sst, track_2014, "test_sst")
xtractopy(ds_ssh_renamed, track_2014, "test_ssh")
xtractopy(ds_chl_renamed, track_2014, "test_chla")

# Extract two environmental variables with `xtractopy2` function
This function is the same as `xtractopy` but accepts two environmental data xArrays. 

In [None]:
def xtractopy2(envdata1,
              envdata2,
              tagdata: pd.DataFrame,
             filename: [str]):
    """
    envdata: environmental data in an DataArray format
    tagdata: tag data in a pandas format
    filename: the name of the file .csv output file, as a "string"
    """
    def fuction_dataset_point(**kwargs) -> Dict[str, Union[float, int]]:
        pass

    def extract(function_dataset_point, 
                df: tagdata, 
                map_coordinates: Dict[str, str], 
                rename_variables: Dict[str, str]
               ) -> pd.DataFrame:
        """
        function_dataset_point: environmental data in a point format, to be transformed
        map_coordinates: key is name of column in dataframe, value is the name of the coordinate in dataset
        rename_variables: TBD
        """
    
        def get_row(row) -> Dict[str, Union[float, int]]:
            extract_coordinates = {}
        
            for key, val in map_coordinates.items():
                extract_coordinates[val] = row[key]
        
            result = function_dataset_point(**extract_coordinates)
        
            # rename variables here and transform result TBD
            return result
    
        return df.apply(
            lambda row: get_row(row), axis=1, result_type="expand"
        )


    def envdata1_point(lat, lon, time) -> Dict[str, Union[float, int]]:
        ds = envdata1.sel(lat=lat, lon=lon, time=time, method="nearest")

        results = {}
    
        for var in ds.variables:
            if var not in ds.coords:
                results[var] = ds[var].values
        return results
    
    def envdata2_point(lat, lon, time) -> Dict[str, Union[float, int]]:
        ds = envdata2.sel(lat=lat, lon=lon, time=time, method="nearest")

        results = {}
    
        for var in ds.variables:
            if var not in ds.coords:
                results[var] = ds[var].values
        return results

    combined2_dat = pd.concat([tagdata, 
                        extract(envdata1_point,
                                tagdata, 
                                {"lat": "lat", "lon": "lon", "datetime": "time"}, 
                                {}
                               ),
                        extract(envdata2_point,
                                tagdata, 
                                {"lat": "lat", "lon": "lon", "datetime": "time"}, 
                                {}
                               )
                       ], axis=1)
    combined2_dat.to_csv("".join([filename, ".csv"])) # need to figure out how to paste the title into the csv file
    return combined2_dat

In [None]:
xtractopy2(ds_sst, ds_ssh_renamed, track_2014, "test_sst_ssh")