# Extract all Daily Rainfall

An archive containing high quality rainfall data from the BOM has been downloaded from here:

http://www.bom.gov.au/climate/change/hqsites/about-hq-site-data.shtml 

There is a file with the list of stations and then a zip file for each station containing that station’s data.

Create a notebook that will extract all the folders, and extract all the files into the working folder. The next version will try to load all the data

Save the final file into the ./data_files/ folder for future processing


In [17]:
import pandas as pd
import os
from pathlib import Path
from unlzw import unlzw

sourceFolder = "./data_files_raw/Daily_Rainfall/extracted/"
workingSubFolder = "./data_files_raw/Daily_Rainfall/working/"
stationsFile = "HQDR_stations.txt"

if not os.path.exists(workingSubFolder):
  Path(workingSubFolder).mkdir(parents=True,exist_ok=True)      

First, load the stations file into a dataframe. Since the delimiters are only spaces, but there are spaces in words, need to do it a bit more manually  

In [18]:
lstStationId = []
lstLatitude = []
lstLongitude = []
lstElevationMetres = []
lstStationName = []

stationName = ""
with open(sourceFolder + stationsFile, "r") as station_file:
  for line in station_file:
    line = line.strip()
    tokens = line.split(" ")

    for i in range(4, len(tokens)):    
      if i == 4:
        stationName = tokens[4]
      else:
        stationName += " " + tokens[i]    

    lstStationId.append(tokens[0])
    lstLatitude.append(float(tokens[1]))
    lstLongitude.append(float(tokens[2]))
    lstElevationMetres.append(float(tokens[3]))    
    lstStationName.append(stationName)




In [19]:
dfStations = pd.DataFrame(
  { 
    "StationId" : lstStationId,
    "Latitude" : lstLatitude,
    "Longitude" : lstLongitude,
    "ElevationMs" : lstElevationMetres, 
    "StationName" : lstStationName
  }  
)

dfStations.head(10)


Unnamed: 0,StationId,Latitude,Longitude,ElevationMs,StationName
0,4035,-20.78,117.15,12.0,ROEBOURNE
1,5008,-21.19,115.98,11.0,MARDIE
2,6055,-27.75,115.83,300.0,WOOLGORONG
3,7007,-26.98,116.54,300.0,BOOLARDY
4,7057,-28.06,117.84,426.0,MOUNT MAGNET
5,7095,-28.23,117.65,400.0,YOWERAGABBIE
6,8066,-30.7,117.06,310.0,KOKARDINE
7,8079,-29.02,115.62,260.0,MANARRA
8,8088,-29.19,115.44,153.0,MINGENEW POST OFFICE
9,8106,-29.37,116.4,280.0,PERANGERY


Loop through each file in extracted. All files that end in .Z, unzip to a subfolder in the working folder

In [20]:
# For testing, set this a small number to just do a few files. Otherwise, set to 9999 to do unlimited (basically)
maxFiles = 9999
stepper = 0

for filename in os.listdir(sourceFolder):
  if os.path.isfile(sourceFolder+filename) and filename.lower().endswith(".z"):
    if (stepper > maxFiles):
      break

    # This is one of the zip files, extract it to a subfolder in the working folder, named after the file
    if not os.path.exists(workingSubFolder + filename):
      Path(workingSubFolder + filename).mkdir(parents=True,exist_ok=True)  
  
    fh = open(sourceFolder+filename, 'rb')
    compressed_data = fh.read()
    uncompressed_data = unlzw(compressed_data)    
    fh.close()

    fw = open(workingSubFolder + filename + "/station_rainfall.txt", 'wb')
    fw.write(uncompressed_data)
    fw.close()

    stepper += 1
        

           

