# Stream Objects Using Python

## Notebook Example

If the previous example code were implemented as a notebook, instead of a standalone python script, 
it would look like this

In [1]:
from sys import argv
import pandas as pd
from my_functions import extract_temp_data, identify_season, create_histogram

In [None]:
station_id = "USW00014837"

# Read data from file
station_data_filename = f"{station_id}.csv"
station_df = pd.read_csv(station_data_filename, low_memory=False)

In [None]:
# Extract Min, Max Temperatures as dataframe
tempdf = extract_temp_data(station_df)

# Describe the dataset
print(f"{tempdf.describe()}\n")

# Label by season
tempdf['SEASON'] = tempdf.index.map(identify_season)

# Create histograms
# Saves as <STATION ID>-temp-dist.png, unless otherwise specified.
create_histogram(tempdf, station_id)

This works well for data that is small enough to download and work with locally. But what 
if data objects are too large for local use (or Python is a preferred language)? Let's delete
the local file and try to use it a different way. 

In [None]:
!rm USW00014837.csv

## Loading Data with PelicanFS

Instead of using the command line tool to download today, it's possible to use a Python module 
called `PelicanFS` to stream the data into a computational process: 

> PelicanFS is a file system interface (fsspec) for the Pelican Platform. For more information about pelican, see our main website or Github page. For more information about fsspec, visit the filesystem-spec page.

In this example, instead of creating a local path to the data, we will instead create 
a Pelican URL. 

In [None]:
station_id = "USW00014837"
osdf_prefix = 'osdf:///aws-opendata/us-east-1/noaa-ghcn-pds'
station_URL = f"{osdf_prefix}/csv/by_station/{station_id}.csv"

Then, through the magic of PelicanFS and fsspec, the data can be loaded with the URL! 

In [None]:
station_df = pd.read_csv(station_URL, low_memory=False)

In [None]:
station_df.head()

And then the rest of the analysis can proceed as before. 

In [None]:
# Extract Min, Max Temperatures as dataframe
tempdf = extract_temp_data(station_df)

# Describe the dataset
print(f"{tempdf.describe()}\n")

# Label by season
tempdf['SEASON'] = tempdf.index.map(identify_season)

# Create histograms
# Saves as <STATION ID>-temp-dist.png, unless otherwise specified.
create_histogram(tempdf, station_id)

Note that we didn't even have to import the PelicanFS module, but it did need to be installed. 

In [None]:
from pelicanfs.core import PelicanFileSystem, PelicanMap, OSDFFileSystem 