# Prediction of Solar power

Data provided open source at:
https://www.kaggle.com/datasets/dilipkola/shell-ai-solar-irradiance-prediction-hackathon?group=bookmarked

##### Goals of analysis:
* Predict Global Irradiance for next 2 hours at 10 minute intervals given at least 2 hours of weather data

##### Data:
* Data is given every minute meaning at least 120 observations (2hours) must be used to output 12 data points (10 minute increments)

### Housekeeping

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import datetime as dt
import os
import sys
from Settings import columns, experiments
import Pipeline
# from sklearn.model_selection import train_test_split
import random
import tensorflow as tf

In [2]:
os.getcwd() in sys.path


True

In [3]:
train = pd.read_csv('./archive/train.csv')
test = pd.read_csv('./archive/test.csv')

### Data Exploration
##### Thoughts while data cleaning
* Wet bulb is measure of heat stress in direct sunlight -- mixture of temp, humidity, wind speed, sun angle, and cloud cover
    * (Would be interesting to look at the change in dWB/dt)
* Would be interesting to take out yearly cycle, not sure if I have enough data. 

In [4]:
pipeline = Pipeline.DataClean(columns, experiments['all'])
pipeline.clean_data()
pipeline.norm()
pipeline.split_label()
# pipeline.train_val()

In [8]:
pipeline.data.sample(n=120, replace=False, random_state=26)

Unnamed: 0,Global CMP22 (vent/cor) [W/m^2],Tower Wet Bulb Temp [deg C],Direct sNIP [W/m^2],Azimuth Angle [degrees],Tower Dew Point Temp [deg C],Tower RH [%],Total Cloud Cover [%],Peak Wind Speed @ 6ft [m/s],Avg Wind Direction @ 6ft [deg from N],Station Pressure [mBar],Precipitation (Accumulated) [mm],Snow Depth [cm],Moisture,Albedo (CMP11)
25943,0.000000,0.000000,0.000375,0.181309,0.000000,0.419648,0.00,0.080579,0.800833,0.554190,0.000000,0.006542,0.0,0.00000
88496,0.546643,0.068140,0.947409,0.540655,0.000000,0.286370,0.13,0.049587,0.371944,0.204487,0.108123,0.032149,0.0,0.09220
220023,0.000000,0.685414,0.000000,0.883766,0.373636,0.303264,0.00,0.121901,0.790278,0.446640,0.000000,0.033943,0.0,0.00000
430742,0.000000,0.000000,0.000000,0.258891,0.000000,0.857128,0.00,0.090909,0.052111,0.475766,0.000000,0.414351,0.0,0.00000
23159,0.000000,0.000000,0.000000,0.239548,0.000000,0.489728,0.00,0.059917,0.040389,0.255123,0.000000,0.003869,0.0,0.00000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
205513,0.000793,0.396477,0.000000,0.826922,0.000000,0.088539,0.00,0.000000,0.000000,0.251089,0.000000,0.024974,0.0,0.00000
380786,0.636056,0.677740,0.755484,0.535183,0.078023,0.137971,0.67,0.163223,0.270833,0.482816,0.000000,0.014351,0.0,0.09675
499577,0.000000,0.000000,0.000203,0.048154,0.000000,0.935343,0.00,0.163223,0.085278,0.429541,0.000000,0.127365,1.0,0.00000
135534,0.000000,0.000000,0.000139,0.205867,0.000000,0.690583,0.00,0.059917,0.836944,0.339316,0.000000,0.031551,0.0,0.00000


In [None]:
# sns.heatmap(train.corr())

In [None]:
pipeline.data.columns.shape

In [None]:
pipeline.data

In [None]:
np.shape(train_test_split(pipeline.data, test_size=0.3, train_size=0.7))

In [None]:
train_test_split([1,2,3,3,4,5,6,7,5,6,4], test_size=0.3, train_size=0.7)

I need to split train/val by grabbing chunks of 120 samples in a row

In [None]:
len(pipeline.data)

In [None]:
startidx = []
samples = []
numsamples = 120
for i in range(60):
    startidx.append(random.choice(pipeline.data.index[:-numsamples]))
    samples.append(pipeline.data[startidx[i]:startidx[i]+numsamples])

In [None]:
np.asarray(samples).shape

In [None]:
60*0.7

In [None]:
np.shape(samples[:42])

In [None]:
np.shape(samples[42:])

In [None]:
42+18