# ReadMe

**This is the notebook to import the data from K/I/whatever the driver is and send raw/cleaned data to a postgresql DB on powerplant.

**Plan:

1. Read data in
2. Select down to right TDR measurments
3. Send the raw data from each tdr to DB - this is for examing the suprise values 
4. Clean out missing and stupid high values
5. Average the value in each layer (7 layers in total)
6. Calculate deficit 
7. Upload the calculated deficit to DB - for real irrigation scheduling - need to be a separate table 

**PostgreSQL credentials

    host = "database.powerplant.pfr.co.nz",
    database = "cflfcl_Rainshelter_SWC",
    user = "cflfcl_Rainshelter_SWC",
    password = "o654UkI6iGNwhzHu"
    
**Format that `sqlalchemy` like
    
    "postgresql://cflfcl_Rainshelter_SWC:o654UkI6iGNwhzHu@database.powerplant.pfr.co.nz/cflfcl_Rainshelter_SWC"
    
**Demo data source

    K:\Rainshelter\StonySoilLysimeters

### libraries

In [None]:
import datetime
import pandas as pd
import numpy as np
import time 
import psycopg2
from sqlalchemy import create_engine

### DataSource

In [None]:
path="K:/Rainshelter/StonySoilLysimeters/"

In [None]:
# Read in the main data
AllData=pd.read_csv(path + 'DownloadedData/StonyLysimetersCS650.dat', #specify file path for data to read in
                         parse_dates=True, #tell the function to parse date columns to datetime formats
                         dayfirst=True, #tell the function that the day is before the year in the data i.e format='%d/%m/%Y %H:%M'
                         skiprows = [0,2,3], #leave out rows 1, 3 and 4 which have redundant information
                         index_col = 0, #Use the first column, which is Date, as an index
                         na_values = 'NAN')

In [None]:
AllData.head()

In [None]:
#The index for sensors
AllDataIndex = pd.read_excel(path + "Lysometer_design.xlsx",
                             sheet_name="SensorIndex",
                             index_col = 0)
AllDataIndex.head()

In [None]:
# filter the part that interested in 
FilteredIndex=AllDataIndex[AllDataIndex.Measurement.isin(['VolumetricWaterContent'])] # structure to add in more cols

In [None]:
FilteredIndex.head()


In [None]:
FilteredIndex.describe()

In [None]:
# select only the interested columns 
FilteredData=AllData.loc[:,FilteredIndex.index]

In [None]:
# set up the index and output the last row 
FilteredDataTrans=FilteredData.transpose() # transpose to the format match the index format
FilteredDataIndexed=pd.concat([FilteredIndex,FilteredDataTrans], axis=1) # join them together

FilteredDataIndexed.index.name='ColumnHeader'
FilteredDataIndexed.set_index(['Measurement','Depth','Gravels','Stones','Column','Sensor', 'MUX', 'Port','Units','Summary','Block','Treatment'], 
                        append=False, inplace=True)
FilteredDataIndexed.sort_index(inplace=True)
# FieldData=FilteredDataIndexed.transpose()
# FieldData.index = pd.to_datetime(FieldData.index) 
# LastRow = FieldData.index.size
# np.save('LastRow.npy',LastRow)

In [None]:
FilteredDataIndexed.head()

In [None]:
FilteredDataIndexed

In [None]:
# last filter to get ready upload the raw 
grouped=FilteredDataIndexed.groupby(level='Measurement',axis=1).mean().round(2)
# is not actually calculate any mean, just want to see the data

In [None]:
grouped.head()

### DB connection and uploading 

In [None]:
# FieldData.dtypes
# FieldData.index
engine = create_engine("postgresql://cflfcl_Rainshelter_SWC:o654UkI6iGNwhzHu@database.powerplant.pfr.co.nz/cflfcl_Rainshelter_SWC")
grouped.to_sql(name="RawData_96Sensors",con=engine,if_exists='replace' )

In [None]:
#Place holder for `.csv` index file
# AllDataIndex=pd.read_csv('./IndexFiles/SoilWaterAndTempIndex.csv',
#                          index_col = 0)
# AllDataIndex

In [None]:
grouped.tail()

### Calculate Deficit 

In [None]:
DataMeans =  FieldData.loc['2015-10-10':].groupby(level=['Measurement','Depth','Gravels','Stones'],axis=1).mean()
DataMeans =  DataMeans.dropna(axis=1) #For some reason it keeps non valid combinations in so need to extract with this function
ProfileWater = DataMeans.VolumetricWaterContent.loc[:, 'D1'] * 150 + \
               DataMeans.VolumetricWaterContent.loc[:, 'D2'] * 150 + \
               DataMeans.VolumetricWaterContent.loc[:, 'D3'] * 150 + \
               DataMeans.VolumetricWaterContent.loc[:, 'D4'] * 150 
FieldCapacity = ProfileWater.resample('D').max()
FieldCapacity = FieldCapacity.loc['2015-10-14'] +10 # I would have though this would return a data frame with a single row but instead it returns a series with a multiindex in columns
SoilWaterDeficit = -(FieldCapacity - ProfileWater)  

In [None]:
SoilWaterDeficit.transpose()

In [None]:
#uploading 
SoilWaterDeficit.to_sql(name="SoilWaterDeficit",con=engine, if_exists ='replace')