# Converting Radiation Surfrad .dat Files to NetCDF4

All of the examples will be data from `Bondville_IL/`, and from the year 2016. Such files will be names `bon16###` where `###` denotes the julian day of the year. 

## Headers

Radiation's Surfrad files have numerous headers:

    YEAR DDD MM DD HH mm hh.mmm ZNAGL dw_psp qc uw_psp qc direct qc diffuse qc dw_pir qc dwCasTmp qc dwDomTmp qc uw_pir qc uwCastmp qc uwDomtmp qc uvb qc par qc netSolar qc netIr qc totalNet qc temp qc rh qc windSp qc winsDir qc Baro qc
    
All of those items, seperated by whitespace, are headers. An initial observation would show that the `qc` header shows up multiple times -- these are quality control flags for the preceding header. A `qc` value of 0 indicated values within an expected range, a value of 1 indicates a value outside of a physically possible range, a value of 2 indicates a value that is physically possible but "should be used with scrutiny". Missing values are indicated by a value of "-9999.9" and should always have a corresponding `qc` of "1".

## Converting to .csv

The .dat files are plain-text files delimited by whitespace. These files have no headers, only minimal site info and raw data. Here are the first several rows of the file `bon16001.dat`:

     Bondville
       40.05  -88.37  213 m version 1
     2016   1  1  1  0  0  0.000 105.13    -3.4 0     0.0 0     0.8 0    -0.1 0   275.8 0   272.5 0   272.2 0   305.1 0   271.0 0   271.0 0     0.0 0     0.2 0     0.0 0   -29.3 0   -29.3 0    -2.2 0    76.5 0     6.8 0   270.0 0  1001.5 0
     2016   1  1  1  0  1  0.017 105.32    -3.6 0     0.0 0     0.8 0    -0.1 0   267.7 0   272.5 0   272.1 0   304.4 0   271.0 0   271.0 0     0.0 0     0.2 0     0.0 0   -36.7 0   -36.7 0    -2.2 0    76.1 0     6.4 0   268.2 0  1001.5 0
     2016   1  1  1  0  2  0.033 105.50    -3.6 0     0.0 0     0.7 0    -0.2 0   260.6 0   272.5 0   272.1 0   303.8 0   271.0 0   271.0 0     0.0 0     0.2 0     0.0 0   -43.1 0   -43.1 0    -2.2 0    77.3 0     6.0 0   272.1 0  1001.5 0
     2016   1  1  1  0  3  0.050 105.68    -3.7 0     0.0 0     0.4 0    -0.1 0   252.5 0   272.5 0   272.1 0   303.1 0   270.9 0   270.9 0     0.0 0     0.2 0     0.0 0   -50.7 0   -50.7 0    -2.2 0    76.7 0     6.6 0   273.9 0  1001.5 0
     2016   1  1  1  0  4  0.067 105.86    -3.8 0     0.0 0     0.4 0    -0.1 0   246.8 0   272.5 0   272.0 0   302.4 0   270.9 0   270.9 0     0.0 0     0.2 0     0.0 0   -55.6 0   -55.6 0    -2.2 0    77.0 0     6.0 0   278.4 0  1001.5 0

Converting these files to .csv will essentially double the amount of storage needed for all the data. However, it seems to be a necessary step, due to the nature of the .dat files. The first step is to include the headers in the new .csv file. I've made a file called `headers.txt` (and a matching `headers.dat`) which contains all the headers that match up with Surfrad .dat files (and are delimited by whitespace). What follows can be found in the file `dat_to_csv.py`.

In [8]:
import sys
from os.path import basename
import os

In [9]:
with open("headers.dat", 'r') as header_file:
    for line in header_file:
        headers = line.split() # this is an array of all the headers
headers = ('"%s"') % '", "'.join(headers) # this formats the headers as a comma-delimited string

print headers

"YEAR", "DDD", "MM", "DD", "HH", "mm", "hh.mmm", "ZNAGL", "dw_psp", "qc", "uw_psp", "qc", "direct", "qc", "diffuse", "qc", "dw_pir", "qc", "dwCasTmp", "qc", "dwDomTmp", "qc", "uw_pir", "qc", "uwCastmp", "qc", "uwDomtmp", "qc", "uvb", "qc", "par", "qc", "netSolar", "qc", "netIr", "qc", "totalNet", "qc", "temp", "qc", "rh", "qc", "windSp", "qc", "winsDir", "qc", "Baro", "qc"


In [11]:
input = "bon16001.dat" # this is the file for this example, normally passed in via command line
base = os.path.splitext(basename(input))[0] # this gets the name of the file (e.g "bon16001.dat" -> "bon16001")
out_name = (base + ".csv") # this is the name for the file to be written

with open(input, 'r') as input_file:
    with open(out_name, 'w') as output_file:
        output_file.write(headers + "\n") # this writes the headers to the new file
        for count, line in enumerate(input_file):
            if count < 2: # we can skip the first two lines of the input
                pass
            else:
                if line.split()[0]=='\x1a': # skip empty rows
                    pass
                else: # write the data delimited by commas
                    outLine = ",".join(line.split())
                    output_file.write(outLine + '\n')

Ther