A Web API (Application Programming Interface) is a set of rules and protocols for building and interacting with software applications. A web API allows different software systems to communicate with each other over the internet using a set of defined operations. It enables services and products to exchange data and functionality easily and securely, often in the form of JSON or XML. 


In [8]:
# Import modules
import numpy as np
import pandas as pd


## Example 1: Kansas Mesonet

In this example we will learn to download data from the Kansas mesonet. Data can be accessed through a URL (Uniform Resource Locator), which is also known as a web address. In this URL we are going to pass some parameters to specify the location, date, and interval of the data. For services that output their data in comma-separated values we can use the Pandas library.

Kansas mesonet REST API: http://mesonet.k-state.edu/rest/

In [2]:
# Define function to request data
def get_ks_mesonet_data(station, start_date, end_date, variables, interval='day'):
    """Function to retrieve air temperature for a specific station and period from the Kansas Mesonet
    
    Parameters
    ----------
    station : string
        Station name
    start_date : string
        yyyy-mm-dd format
    end_date : string
        yyyy-mm-dd fomat
    variables : list
        Weather variables to download
    interval : string
        One of the following: '5min', 'hour', or 'day'
        
    Returns
    -------
    df : Dataframe
        Table with minimum and maximum air temperature for a given station
        
    Example
    -------
    df = get_mesonet_temp('Manhattan', '2022-10-01', '2023-02-14')
    
    """
    
    # Define date format as required by the API
    fmt = '%Y%m%d%H%M%S'
    start_date = pd.to_datetime(start_date).strftime(fmt)
    end_date = pd.to_datetime(end_date).strftime(fmt)
    
    # Concatenate variables using comma
    variables = ','.join(variables)
    
    # Create URL
    url = f"http://mesonet.k-state.edu/rest/stationdata/?stn={station}&int={interval}&t_start={start_date}&t_end={end_date}&vars={variables}"
    
    # A URL cannot have spaces, so we replace them with %20
    url = url.replace(" ", "%20")
    
    # Crete Dataframe and replace missing values by NaN
    df = pd.read_csv(url, na_values='M') # Request data and replace missing values represented by "M" for NaN values.
    
    return df


In [3]:
# Use function to request data
station = 'Manhattan'
start_date = '2024-01-01'
end_date = '2024-01-15'
variables = ['TEMP2MAVG','PRECIP']

df = get_ks_mesonet_data(station, start_date, end_date, variables)

In [4]:
# Inspect results
df.head()

Unnamed: 0,TIMESTAMP,STATION,TEMP2MAVG,PRECIP
0,2024-01-01 00:00:00,Manhattan,-1.4,0.0
1,2024-01-02 00:00:00,Manhattan,-2.15,0.0
2,2024-01-03 00:00:00,Manhattan,-1.09,0.0
3,2024-01-04 00:00:00,Manhattan,-1.2,0.0
4,2024-01-05 00:00:00,Manhattan,-0.38,0.0


## Example 2: U.S. Geological Survey streamflow data

In [None]:
def usgs_request(site, start_date, tz='CST', units='cubic_ft_per_s'):
    """
    start_date in UTC
    Discharge is in cubic feet per second
    P  stands for "Provisional" data subject to revision.
    Example: df = usgs_request('2021-06-28T02:00:00Z')
    """
    
    #start_date = pd.to_datetime(start_date).isoformat() + 'Z'
    url = 'https://waterservices.usgs.gov/nwis/iv/'
    params = {'format': 'rdb', 
              'sites': '06879650', 
              'startDT': start_date, 
              'parameterCd': '00060',
              'siteStatus':'active'}

    response = requests.get(url, params=params)
    s = response.content
    
    if 'No sites found' in str(s):
        return None
    else:
        df = pd.read_csv(io.StringIO(s.decode('utf-8')), sep='\t', comment='#')
        df.drop([0], inplace=True)
        df.reset_index(inplace=True, drop=True)
        df.rename(columns={"56608_00060": "discharge",
                           "56608_00060_cd": "status"}, inplace=True)

        # Convert dates to datetime format
        df['datetime'] = pd.to_datetime(df['datetime'])

        # Change timezone
        idx_cdt = df['tz_cd'] == 'CDT'
        df.loc[idx_cdt,'datetime'] = df.loc[idx_cdt,'datetime'] + pd.Timedelta('5H')

        idx_cst = df['tz_cd'] == 'CST'
        df.loc[idx_cst,'datetime'] = df.loc[idx_cst,'datetime'] + pd.Timedelta('6H')

        # replace label from CST/CDT to UTC
        df['tz_cd'] = 'UTC'
        
        # Cubic ft to cubic m per second
        #df['discharge_m_per_s'] = df['discharge']*0.0283168 

        return df

## Example 3: U.S. Climate reference network

Example of retrieving data from the U.S. Climate reference Network

- Daily data: https://www1.ncdc.noaa.gov/pub/data/uscrn/products/daily01/
- Daily data documentation: https://www1.ncdc.noaa.gov/pub/data/uscrn/products/daily01/README.txt


In [9]:
# URL link and header variables
year = 2018
station = 'CRND0103-2018-KS_Manhattan_6_SSW.txt'

url = f'https://www1.ncdc.noaa.gov/pub/data/uscrn/products/daily01/{year}/{station}'
daily_headers = ['WBANNO','LST_DATE','CRX_VN','LONGITUDE','LATITUDE',
                 'T_DAILY_MAX','T_DAILY_MIN','T_DAILY_MEAN','T_DAILY_AVG',
                 'P_DAILY_CALC','SOLARAD_DAILY','SUR_TEMP_DAILY_TYPE',
                 'SUR_TEMP_DAILY_MAX','SUR_TEMP_DAILY_MIN','SUR_TEMP_DAILY_AVG',
                 'RH_DAILY_MAX','RH_DAILY_MIN','RH_DAILY_AVG','SOIL_MOISTURE_5_DAILY',
                 'SOIL_MOISTURE_10_DAILY','SOIL_MOISTURE_20_DAILY','SOIL_MOISTURE_50_DAILY',
                 'SOIL_MOISTURE_100_DAILY','SOIL_TEMP_5_DAILY','SOIL_TEMP_10_DAILY',
                 'SOIL_TEMP_20_DAILY','SOIL_TEMP_50_DAILY','SOIL_TEMP_100_DAILY'];        

# Read fixed width data
df = pd.read_fwf(url, names=daily_headers)

# Convert date from string to datetime format
df['LST_DATE'] = pd.to_datetime(df['LST_DATE'],format='%Y%m%d')

# Replace missing values (-99 and -9999)
df = df.replace([-99,-9999,999], np.nan)
df.head(5)

Unnamed: 0,WBANNO,LST_DATE,CRX_VN,LONGITUDE,LATITUDE,T_DAILY_MAX,T_DAILY_MIN,T_DAILY_MEAN,T_DAILY_AVG,P_DAILY_CALC,...,SOIL_MOISTURE_5_DAILY,SOIL_MOISTURE_10_DAILY,SOIL_MOISTURE_20_DAILY,SOIL_MOISTURE_50_DAILY,SOIL_MOISTURE_100_DAILY,SOIL_TEMP_5_DAILY,SOIL_TEMP_10_DAILY,SOIL_TEMP_20_DAILY,SOIL_TEMP_50_DAILY,SOIL_TEMP_100_DAILY
0,53974,2018-01-01,2.422,-96.61,39.1,-11.0,-23.4,-17.2,-17.1,0.0,...,,0.114,,,,-2.7,-0.8,0.8,,
1,53974,2018-01-02,2.422,-96.61,39.1,-4.4,-20.8,-12.6,-11.6,0.0,...,,0.106,,,,-2.5,-1.0,0.1,,
2,53974,2018-01-03,2.422,-96.61,39.1,-1.5,-13.3,-7.4,-6.1,0.0,...,,0.105,,,,-1.8,-0.7,-0.1,,
3,53974,2018-01-04,2.422,-96.61,39.1,3.2,-16.3,-6.5,-6.5,0.0,...,,0.102,,,,-1.9,-0.8,-0.2,,
4,53974,2018-01-05,2.422,-96.61,39.1,-0.6,-11.9,-6.2,-6.7,0.0,...,,0.102,,,,-1.6,-0.6,-0.1,,
