Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New method to fetch data along a given trajectory #169

Open
gmaze opened this issue Jan 24, 2022 · 13 comments
Open

New method to fetch data along a given trajectory #169

gmaze opened this issue Jan 24, 2022 · 13 comments
Labels
argo-traj About trajectory data enhancement New feature or request good first issue Good for newcomers stale No activity over the last 90 days

Comments

@gmaze
Copy link
Member

gmaze commented Jan 24, 2022

Motivation

For some Argo data analysis it could be useful to be able to fetch data around some specific locations, eg:

it could be useful to have an access point that directly fetch this.

API for new access point

It could look like this:

from argopy import DataFetcher as ArgoDataFetcher

# Default temporal distance is 'days' and radial distance unit is 'degree':
neighbor_fetcher = ArgoDataFetcher(ds='phy').around(wmo=[6903754], dt=365, dr=1)  # All float trajectory
neighbor_fetcher = ArgoDataFetcher(ds='phy').around(wmo=[6903754], cyc=[12], dt=365, dr=1)  # Single profile
neighbor_fetcher = ArgoDataFetcher(ds='phy').around(wmo=[6903754], cyc=[12,13,14], dt=365, dr=1)  # Selected profiles

# Possibly distinguish zonal and meridional distances:
neighbor_fetcher = ArgoDataFetcher(ds='ref').around(wmo=[6903754], dt=30, dx=2, dy=1) 

# Use option to change units:
neighbor_fetcher = ArgoDataFetcher(ds='ref').around(wmo=[6903754], dt=30, dx=100, dy=50, unit='km')

# Get data/index the classic way:
neighbor_ds = neighbor_fetcher.load().data

# argopy would add new variables to the fetched data, like distances to the requested reference profiles:
neighbor_ds['distance_time']
neighbor_ds['distance_radial']
neighbor_ds['distance_zonal']
neighbor_ds['distance_meridional']

This new access point, could in fact take a path or trajectory as input:

from argopy import DataFetcher as ArgoDataFetcher

# [2021 Hurricane Larry](https://www.nhc.noaa.gov/data/tcr/index.php?season=2021&basin=atl)
traj = [[-26.00,12.00,'2021-09-01 12:00:00'],[-33.00,13.00,'2021-09-02 12:00:00'],[-40.00,14.00,'2021-09-03 12:00:00'],[-45.00,16.00,'2021-09-04 12:00:00'],[-49.00,19.00,'2021-09-05 12:00:00'],[-52.00,21.00,'2021-09-06 12:00:00'],[-55.00,24.00,'2021-09-07 12:00:00'],[-57.00,27.00,'2021-09-08 12:00:00'],[-61.00,31.00,'2021-09-09 12:00:00'],[-61.00,38.00,'2021-09-10 12:00:00'],[-49.00,52.00,'2021-09-11 12:00:00']]

neighbor_fetcher = ArgoDataFetcher(ds='phy').around(path=traj, dt=5, dr=50, unit='km')

Further

Note this API could easily be plugged into the argo access point:

from argopy import DataFetcher as ArgoDataFetcher

float_fetcher = ArgoDataFetcher(ds='phy').float(6903754)
float_ds = fetcher.load().data

neighbor_ds = float_ds.argo.around(dt=365, dr=1)
neighbor_ds = float_ds.argo.around(dt=365, dx=2, dy=1) 
neighbor_ds = float_ds.argo.around(dt=365, dx=100, dy=50, unit='km')
neighbor_ds = float_ds.argo.around(dt=365, dr=1, ds='ref')  # Fetch data from the Argo CTD reference
@matdever
Copy link

@gmaze This would be a great feature and is actually the reason why issue #168 came up: I was querying the index file too often because I was looking for near-by floats along a specific float's trajectory by using:

argo = ArgoDataFetcher().profile(WMO,profnumber).to_xarray()

d = {'profile_number': argo.CYCLE_NUMBER,
     'PSAL': argo.PSAL,
     'TEMP': argo.TEMP,
     'PRES': argo.PRES,
     'time': argo.TIME,
     'lon': argo.LONGITUDE,
     'lat': argo.LATITUDE,
    }
df = pd.DataFrame(data=d)

argo = argo_loader.region([df.lon[0]-.5, df.lon[0]+.5, df.lat[0]-.5, df.lat[0]+.5, 1500, 2100, '2020-01-01', '2022-01-01']).to_xarray()

d = {'profile_number': argo.CYCLE_NUMBER,
     'PSAL': argo.PSAL,
     'TEMP': argo.TEMP,
     'PRES': argo.PRES,
     'time': argo.TIME,
     'lon': argo.LONGITUDE,
     'lat': argo.LATITUDE,
     'WMO': argo.PLATFORM_NUMBER
    }
sbe = pd.DataFrame(data=d)

where df.lon[0] is the longitude of the float of interest at a specific profile.

The additional variables are also a great touch, I currently add the absolute distance from the reference profile using

sbe['dist'] = hs.haversine((df.lat[0],df.lon[0]),(sbe.lat[ii],sbe.lon[ii])) 

Caveats

Some thoughts should be put into avoiding duplicates, otherwise the array could unnecessarily grow.
Also, in my current approach, I have to manually remove the float of interest from the returned array using

sbe.drop(sbe[sbe['WMO'] == int(WMO)].index, inplace = True)

@gmaze
Copy link
Member Author

gmaze commented Jan 26, 2022

Great !

Avoiding duplicates may indeed be included 👍

@gmaze
Copy link
Member Author

gmaze commented Jan 26, 2022

I tried the following code to fetch data in a 2 years, 1x1 degree box around each profiles.
Using cache is key for performances

WMO = [6903075]
profnumber = np.arange(2,10)

fetcher = ArgoDataFetcher(cache=True).profile(WMO, profnumber)
argo_ds = fetcher.to_xarray()
argo_ds = argo_ds.argo.point2profile()

fig, ax = plt.subplots(nrows=1, ncols=1)
ax.plot(argo_ds['LONGITUDE'], argo_ds['LATITUDE'], 'r+-', markersize=12)

for i_prof in argo_ds['N_PROF']:
    this_prof = argo_ds.sel(N_PROF=i_prof)
    lon = this_prof['LONGITUDE'].values[np.newaxis][0]
    lat = this_prof['LATITUDE'].values[np.newaxis][0]
    tim = this_prof['TIME'].values[np.newaxis][0]
    bbox = list(np.add([lon, lon, lat, lat, 1500, 2000, tim, tim], 
                       [-.5, .5, -.5, .5, 0, 0, -np.timedelta64(12*30, 'D'), np.timedelta64(12*30, 'D')]))
    bbox[-1] = pd.to_datetime(str(bbox[-1])).strftime('%Y%m%d%H%M')
    bbox[-2] = pd.to_datetime(str(bbox[-2])).strftime('%Y%m%d%H%M')
    try:
        fetcher = ArgoDataFetcher(cache=True, ds='phy').region(bbox)
        local_ds = fetcher.load().data
        local_ds = local_ds.argo.point2profile()
    except Exception:
        print(fetcher.uri)
        pass
    ax.plot(local_ds['LONGITUDE'], local_ds['LATITUDE'], 'k.')
    ax.hlines(bbox[2:4], bbox[0], bbox[1], color='black', linewidth=0.5)
    ax.vlines(bbox[0:2], bbox[2], bbox[3], color='black', linewidth=0.5)
    ax.text(lon, lat+0.01, '%i' % this_prof['CYCLE_NUMBER'], horizontalalignment='center')
ax.grid()
ax.set_title(WMO)

download

Clearly, the overlapping boxes is something we should avoid

@quai20
Copy link
Member

quai20 commented Jan 26, 2022

Clearly, the overlapping boxes is something we should avoid
It's maybe the time to think about a "polygon" region fetching ?

@ocefpaf
Copy link
Collaborator

ocefpaf commented Jan 26, 2022

It's maybe the time to think about a "polygon" region fetching ?

Something like that is in the works for ERDDAP and should be in the next version. However, that won't help much here b/c we don't know when the argo ERDDAP server would be updated and argopy supports multiple sources of data that are non-ERDDAP. Anyway, just wanted to mention that at least this may be easier in the future.

@gmaze
Copy link
Member Author

gmaze commented Jan 27, 2022

Indeed a "polygon" selection would be great also, I think argovis has already one

However, let's imagine the case where 2 profiles are far away from each other, i.e. more precisely that boxes do not overlap:

download

In this case a polygon would not be able to answer the around access point purpose

@matdever
Copy link

A polygon selection would be useful for some application I reckon. For the problem here (i.e., fetching data along a float's trajectory), I thought that filtering out duplicates would suffice. It would solve the issue of overlapping boxes, wouldn't it? The only problem it adds is how to reference the profiles to one another (as in, one profile could be in 2 boxes).

I tried to draw it as an example to be clearer:

IMG_2263 copy

Float 1 is easy - no problem there
Float 2 is trickier. Profiles 2 and 3 could be used for Profile C and profiles 3 and 4 for profile D. If one filters out duplicates, only profiles 2, 3, and 4 will be left, but no indication to what profiles they are matched to.

Does that make sense?

@gmaze
Copy link
Member Author

gmaze commented Jan 28, 2022

I thought that filtering out duplicates would suffice

Indeed, but at the expense of generating more data transfer than necessary, since all the post-processing/filtering would be done on the client side, I would try to avoid this

Not sure this would be much of a problem if the time range is reasonable though

@gmaze
Copy link
Member Author

gmaze commented Jan 28, 2022

I tried to draw it as an example to be clearer:

Nice draw !! 😸

but no indication to what profiles they are matched to.

This issue would be solved by the distance metric I guess
For float 2, profile 4 is too far from C, and profile 2 is too from D ...

@gmaze gmaze added the good first issue Good for newcomers label Mar 21, 2022
@gmaze gmaze added the argo-traj About trajectory data label Apr 13, 2022
@github-actions
Copy link

This issue was marked as staled automatically because it has not seen any activity in 90 days

@github-actions github-actions bot added the stale No activity over the last 90 days label Jul 12, 2022
@gmaze gmaze removed the stale No activity over the last 90 days label Sep 23, 2022
@github-actions
Copy link

This issue was marked as staled automatically because it has not seen any activity in 90 days

@github-actions github-actions bot added the stale No activity over the last 90 days label Dec 23, 2022
@gmaze gmaze pinned this issue Jun 6, 2023
Copy link

This issue was closed automatically because it has not seen any activity in 365 days

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 24, 2023
@gmaze gmaze reopened this Jan 8, 2024
@github-actions github-actions bot removed closed-as-stale stale No activity over the last 90 days labels Jan 8, 2024
Copy link

This issue was marked as staled automatically because it has not seen any activity in 90 days

@github-actions github-actions bot added the stale No activity over the last 90 days label Apr 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
argo-traj About trajectory data enhancement New feature or request good first issue Good for newcomers stale No activity over the last 90 days
Projects
None yet
Development

No branches or pull requests

4 participants