# Weather Data Analysis Project

**In this project, we will:**

- Load station and temperature data from publicly available text files from the National Oceanic and Atmosphere Administration (NOAA).
- Integrate missing data, smooth data, and plot temperature data.
- Compute the daily records at a given location.
- Compare the warmest year of a cold location with the coldest year of a warm one.

## Imports

In [76]:
import os
import urllib.request
from pathlib import Path

In [77]:
import numpy as np
import matplotlib.pyplot as pp
import seaborn

%matplotlib inline

## Dataset

We will be using data from the [Global Historical Climatology Network (GHCN) | National Centers for Environmental Information (NCEI)](https://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets/global-historical-climatology-network-ghcn), formerly known as National Climatic Data Center (NCDC). The Global Historical Climatology Network (GHCN) is an integrated database of climate summaries from land surface stations across the globe.

We start by downloading a text file, namely `ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt`, which contains an annotated list of land surface stations in the network. Since we know the exact data we need, we are going to download that file using FTP as `stations.txt` at CWD. Once the file is downloaded, we move it to a data directory of choice, read the file in, remove newline characters, and load its contents to a Python list.

**We write the following functions to save a file to disk using FTP and load a data file into a Python list.**

In [101]:
# =============================================================================
def save_ftp_file(ftp_link_address, file_path):
    """
    Downloads the given file using FTP and saves it to the given file path,
    specified as 'data_dir_name/file_name.ext'. or to CWD, simply specified 
    as 'file_name.ext'.
    """        
    # Create the data directory, if needed and if it does not exist.
    if file_path.find('/') > -1:
        data_dir_name = file_path.split('/')[0]

        if not os.path.isdir(data_dir_name):
            os.mkdir(data_dir_name)
    
    # Download the given file using FTP to the desired location.
    if not os.path.isfile(file_path):
        # urllib.request.urlretrieve(ftp_file_path, ftp_file_name)
        urllib.request.urlretrieve(ftp_link_address, file_path)

         
# =============================================================================
def get_data(data_path):
    """
    Returns a clean list of entries read in from the data file.
    """    
    # If the data file exists, read it in, remove newline characters ('\n'), 
    # and return a Python list.
    if os.path.isfile(data_path):
        return [line.rstrip() for line in open(data_path, 'r')]

We can now **save and load the weather data**.

In [96]:
ftp_link_address = 'ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt'
data_path = 'data/weather_ghcnd-stations.txt'

In [97]:
# Download weather data.
save_ftp_file(ftp_link_address, data_path)

In [98]:
# Load data.
stationlist = get_data(data_path)
stationlist[:10]

['ACW00011604  17.1167  -61.7833   10.1    ST JOHNS COOLIDGE FLD',
 'ACW00011647  17.1333  -61.7833   19.2    ST JOHNS',
 'AE000041196  25.3330   55.5170   34.0    SHARJAH INTER. AIRP            GSN     41196',
 'AEM00041194  25.2550   55.3640   10.4    DUBAI INTL                             41194',
 'AEM00041217  24.4330   54.6510   26.8    ABU DHABI INTL                         41217',
 'AEM00041218  24.2620   55.6090  264.9    AL AIN INTL                            41218',
 'AF000040930  35.3170   69.0170 3366.0    NORTH-SALANG                   GSN     40930',
 'AFM00040938  34.2100   62.2280  977.2    HERAT                                  40938',
 'AFM00040948  34.5660   69.2120 1791.3    KABUL INTL                             40948',
 'AFM00040990  31.5000   65.8500 1010.0    KANDAHAR AIRPORT                       40990']

We can also **save the corresponding `readme.txt` file for reference**. This time, we do not need to load any data.

In [99]:
ftp_link_address = 'ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt'
file_path = 'data/weather_ghcnd-readme.txt'

In [100]:
# Download readme file.
save_ftp_file(ftp_link_address, file_path)

From the readme file, the downloaded *ghcnd* data has the following format:

```
------------------------------
Variable   Columns   Type
------------------------------
ID            1-11   Character
LATITUDE     13-20   Real
LONGITUDE    22-30   Real
ELEVATION    32-37   Real
STATE        39-40   Character
NAME         42-71   Character
GSN FLAG     73-75   Character
HCN/CRN FLAG 77-79   Character
WMO ID       81-85   Character
------------------------------
```