# Get and Share Objects with the Pelican Client

In this section of the tutorial, we will use the Pelican command line client to fetch data objects that are available via the OSDF as well as put new objects into the OSDF. Along the way, we will also cover the structure of a Pelican object URL, as well as ways to explore the Pelican CLI. 

## The Dataset

The data we'll be working with today is the [NOAA Global Historical Climatology Network](https://www.ncei.noaa.gov/metadata/geoportal/rest/metadata/item/gov.noaa.ncdc:C00861/html) dataset. From the [README](https://docs.opendata.aws/noaa-ghcn-pds/readme.html): 

> GHCN-Daily is a dataset that contains daily observations over global land areas. It contains station-based measurements from land-based stations worldwide, about two thirds of which are for precipitation measurements only (Menne et al., 2012). GHCN-Daily is a composite of climate records from numerous sources that were merged together and subjected to a common suite of quality assurance reviews (Durre et al., 2010). 


The GHCN data set is available via Amazon AWS S3, at 

```
https://noaa-ghcn-pds.s3.amazonaws.com/
```

The OSDF is already connected to AWS through a pre-existing Pelican origin and namespace (more on this in a minute) so we will be able to access this data via Pelican and the OSDF. 

## Pelican URLs

In order to access an object in the OSDF, we need to construct a URL. The URL for OSDF objects looks like this: 

```
osdf:///<namespace>/<object>
```

Let's start with the namespace. In this example: 

* the Open Datasets in Amazon S3 are exposed through the prefix `aws-opendata`
* followed by the region given on the GHCN website: `us-east-1`
* and then finally the name of the data set in AWS, also described in the README: `noaa-ghcn-pds`

So the full namespace for this data set is: 
```
/aws-opendata/us-east-1/noaa-ghcn-pds/
```

Next, we need an object to work with. We can't (currently) list the objects in this location, but you can browse the AWS index link 
([https://noaa-ghcn-pds.s3.amazonaws.com/](https://noaa-ghcn-pds.s3.amazonaws.com/)) to see the files available.

In the top "level" of the dataset are several readme files.
Let's get the list of stations that are contained in the dataset, so we can identify what files we want to download. The file `ghcnd-stations.txt` contains the desired list. 

This is the "object name" that we want to fetch using the OSDF.
```
ghcnd-stations.txt
```

We combine the "namespace prefix" and the "object name" together to get the full OSDF URL:

```
osdf:///aws-opendata/us-east-1/noaa-ghcn-pds/ghcnd-stations.txt
```

***

> ### Note on URL formatting
> 
> The canonical form of a Pelican URL is as follows: 
> 
> ```
> pelican://<federation-root>/<namespace>/<object>
> ```
> 
> Because the OSDF is a special instance of a Pelican data federation, it has its own URL 
> structure as described previously. Technically, these two URLs are equivalent: 
> 
> * `pelican://osg-htc.org/ospool/ap40/data/alice/test.txt`
> * `osdf:///ospool/ap40/data/alice/test.txt`

***

## Get Data Objects

Constructing the URL is the tricky part; downloading the object should be easy. The following 
command will fetch the station list data object. 

In [None]:
pelican object get osdf:///aws-opendata/us-east-1/noaa-ghcn-pds/ghcnd-stations.txt ./

Once downloaded, we can view the contents: 

In [None]:
head ghcnd-stations.txt

### Download specific station data

Next we will download all the data for a specific station. For this example, we'll use the airport in Madison, WI. The 
record for that station is: 

```
USW00014837  43.1406  -89.3453  261.8 WI MADISON DANE CO RGNL AP                72641
```

In order to download the data for this station, we need the station ID - the first field in each record of the `ghcnd-stations.txt` file. For this station, this will be station ID `USW00014837`.

Once again, we will need to construct our URL. The namespace prefix hasn't changed, but the path to the 
station data object will be under the path `csv/by_station` and the filename uses the syntax `<STATION ID>.csv`. 

Building the URL, this gives: 

```
osdf:///aws-opendata/us-east-1/noaa-ghcn-pds/csv/by_station/USW00014837.csv
```

We use the same `pelican object get <URL> <destination>` syntax to fetch the object. 

In [None]:
pelican object get osdf:///aws-opendata/us-east-1/noaa-ghcn-pds/csv/by_station/USW00014837.csv ./

And we can again view the contents of the file:

In [None]:
head USW00014837.csv

## Share Data Objects

Let's visualize the data we just downloaded and share our results via the OSDF. 

In [None]:
./example.py USW00014837

This should produce a plot: 

![](./USW00014837.png)

These results can be shared using a different origin connected to the OSDF. 

As before, the first step will be constructing the URL where we want to place the data. For sharing, the namespace prefix is `osdf:///osdf-tutorial/protected`. 

Normally the object path would just be the name of the image, but to avoid collisions, we will add initials as part 
of the object path in the URL. 

In [None]:
## edit this cell to be a unique identifier!!
my_inits=percy.pelican

The destination URL will therefore be: 

```
osdf:///osdf-tutorial/protected/${my_inits}.USW00014837.png
```

And instead of `pelican object get`, we will now use `pelican object put <local_object> <destination_URL>`. Note that after running this command, you will be prompted with a link -- click on the link, authenticate with CILogon, and then return to this notebook. 

In [None]:
pelican object put USW00014837.png osdf:///osdf-tutorial/protected/${my_inits}.USW00014837.png

> If you missed the opening of the demo, the previous command might need to be run in a terminal instead of the notebook. 

## List Data Objects

For certain data origins, we can list availabile objects. This is true for the origin where we 
just uploaded our results. To see the other uploaded results, run `pelican object ls` with the 
namespace just used with `pelican object put`: 

In [None]:
pelican object ls osdf://osdf-tutorial/protected/

Do you know how to pull someone else's results to this environment? 

## Exploring Further

The syntax of the Pelican client is similar to other Linux tools like `git` or `docker`, where the command construction is: 

```
pelican <noun> <command> <arguments>
```

To see available nouns, or commands, just run a partial command or add `--help` 
to a base command. 

In [None]:
pelican object --help