Hello everyone!

The task is **to predict solar intensity in the near future**.
I would like to give you some info on where it is possible to get data.
First of all there is a lot of weather data easily available here:
https://meteo.physic.ut.ee/ - the data is easily accessible and there is 20 years worth of it,
with 5 minute intervals. Just press the "andmepäring" link and you can query necessary data.
"Kiirgusvoog" should be the ground truth that you would want to predict.

I guess this can be a baseline database and you can try to build first quick models
to see if it's possible to predict the solar intensity ("Kiirgusvoog") with the other
values available. (probably the models won't be very good, but at least you get some baseline -
maybe it finds some trends in how solar intensity changes in time)

Now the second and harder step is to get additional data from satellite images.
Here it is possible to get satellite info about clouds:
https://view.eumetsat.int/productviewer?v=default but I haven't had time to go into details
of how easy it is to download the pictures automatically.
Manually it is possible to get them at 15 minute intervals.
You have to spend some time to get used to satellite images and how to gather features from them.
If you find any other data sources that have useful info
about clouds that are more easily accessible, then let me know.

To work with satellite images, rasterio package in python is very useful.
GeoTiff format should have geographical information included in the images,
so it is possible to geolocate certain pixels to a certain location,
but I haven't had the time to confirm this in the EUMETSAT database.

Best regards,
Ott Kekišev

In [1]:
import urllib
import requests
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier


In [2]:
f = open("kiirgusvoog_5a.csv", "r")
content = f.read()
f.close()

content = content.replace(" ","")

g = open("kiirgusvoog_5a_2.csv", "w")
g.write(content)
g.close()

data = pd.read_csv("kiirgusvoog_5a_2.csv")
data = data.rename(columns = {'&Otilde;hur&otilde;hk':'Õhurõhk'})

In [3]:
kuupäev = data.apply(lambda row: row["Aeg"][0:10], axis=1)
kellaaeg = data.apply(lambda row: row["Aeg"][10:], axis=1)
data["Aeg"] = kuupäev + " " + kellaaeg

In [4]:
def break_up_time(data):
    data["Aasta"]= data.apply(lambda row: row["Aeg"][0:4], axis=1)
    data["Kuu"] = data.apply(lambda row: row["Aeg"][5:7], axis=1)
    data["Päev"] = data.apply(lambda row: row["Aeg"][8:10], axis=1)
    data["Tund"] = data.apply(lambda row: row["Aeg"][11:13], axis=1)
    data["Minut"] = data.apply(lambda row: row["Aeg"][14:16], axis=1)
    data.Aasta = data.Aasta.astype('int')
    data.Kuu = data.Kuu.astype('int')
    data.Päev = data.Päev.astype('int')
    data.Tund = data.Tund.astype('int')
    data.Minut = data.Minut.astype('int')

In [5]:
break_up_time(data)
data.head()

Unnamed: 0,Aeg,Temperatuur,Niiskus,Õhurõhk,Tuulekiirus,Tuulesuund,Sademed,UVindeks,Valgustatus,Kiirgusvoog,Radioaktiivsus,Sadanudlumi,Aasta,Kuu,Päev,Tund,Minut
0,2016-11-01 00:00:00,-0.591946,97.281109,1022.204033,1.249012,240.735204,0.0,,,0.0,,0.0,2016,11,1,0,0
1,2016-11-01 00:05:00,-0.597687,97.325147,1022.262033,0.866778,246.686019,0.0,,,0.0,,0.0,2016,11,1,0,5
2,2016-11-01 00:10:00,-0.580607,97.395573,1022.359833,1.239094,282.940579,0.0,,,0.0,,0.0,2016,11,1,0,10
3,2016-11-01 00:15:00,-0.558781,97.469255,1022.406333,0.883887,284.070831,0.0,,,0.0,,0.0,2016,11,1,0,15
4,2016-11-01 00:20:00,-0.578705,97.496932,1022.374467,1.592511,281.055331,0.0,,,0.0,,0.0,2016,11,1,0,20


In [6]:
data = data[["Aasta","Kuu","Päev","Tund","Minut","Temperatuur","Niiskus","Õhurõhk","Tuulekiirus","Tuulesuund","Sademed","Sadanudlumi","Kiirgusvoog"]]
data['Sadanudlumi'] = data['Sadanudlumi'].fillna(0)
data['Sademed'] = data['Sademed'].fillna(0)
data['Tuulekiirus'] = data['Tuulekiirus'].fillna(method='backfill')
data['Tuulesuund'] = data['Tuulesuund'].fillna(method='backfill')
data['Temperatuur'] = data['Temperatuur'].fillna(method='backfill')
data['Niiskus'] = data['Niiskus'].fillna(method='backfill')
data['Õhurõhk'] = data['Õhurõhk'].fillna(method='backfill')
data['Kiirgusvoog'] = data['Kiirgusvoog'].fillna(method='backfill')
data["KiirgusvoogTulevikus"] = data["Kiirgusvoog"].shift(-1)
data = data.dropna()
data.isnull().sum()

Aasta                   0
Kuu                     0
Päev                    0
Tund                    0
Minut                   0
Temperatuur             0
Niiskus                 0
Õhurõhk                 0
Tuulekiirus             0
Tuulesuund              0
Sademed                 0
Sadanudlumi             0
Kiirgusvoog             0
KiirgusvoogTulevikus    0
dtype: int64

In [7]:
data = data.rename(columns = {
    'Aasta':'year',
    'Kuu':'month',
    'Päev':'day',
    'Tund':'hour',
    'Minut':'minute',
    'Temperatuur':'temperature',
    'Niiskus':'humidity',
    'Õhurõhk':'atmospheric_pressure',
    'Tuulekiirus':'wind_speed',
    'Tuulesuund':'wind_direction',
    'Sademed':'precipitation',
    'Sadanudlumi':'snow',
    'Kiirgusvoog':'radiation_flux',
    'KiirgusvoogTulevikus':'rad_flux_infuture'    
    })
data.columns

Index(['year', 'month', 'day', 'hour', 'minute', 'temperature', 'humidity',
       'atmospheric_pressure', 'wind_speed', 'wind_direction', 'precipitation',
       'snow', 'radiation_flux', 'rad_flux_infuture'],
      dtype='object')

In [8]:
data.describe()

Unnamed: 0,year,month,day,hour,minute,temperature,humidity,atmospheric_pressure,wind_speed,wind_direction,precipitation,snow,radiation_flux,rad_flux_infuture
count,271615.0,271615.0,271615.0,271615.0,271615.0,271615.0,271615.0,271615.0,271615.0,271615.0,271615.0,271615.0,271615.0,271615.0
mean,2017.639689,6.321267,15.712891,11.46447,27.510616,5.677152,78.483295,1012.612623,3.433309,195.780329,0.00471,0.008075,107.256173,107.256389
std,0.819777,3.637112,8.785145,6.921945,17.261689,9.078446,18.256608,11.161267,1.64952,93.458438,0.044543,0.093284,191.910039,191.909951
min,2016.0,1.0,1.0,0.0,0.0,-21.645232,11.875383,970.0876,0.007322,0.001701,0.0,0.0,0.0,0.0
25%,2017.0,3.0,8.0,5.0,14.0,-0.632104,68.268864,1005.464467,2.218789,116.200252,0.0,0.0,1.510226,1.510274
50%,2018.0,6.0,16.0,11.0,29.0,3.773333,84.6357,1012.546067,3.193919,224.275239,0.0,0.0,5.465909,5.465922
75%,2018.0,10.0,23.0,17.0,45.0,12.915405,92.584239,1020.189933,4.431981,273.845409,0.0,0.0,117.479973,117.479973
max,2019.0,12.0,31.0,23.0,59.0,32.191077,99.353324,1045.714767,13.333544,359.992379,8.18,5.3,1115.314668,1115.314668


In [9]:
data.sample(50)

Unnamed: 0,year,month,day,hour,minute,temperature,humidity,atmospheric_pressure,wind_speed,wind_direction,precipitation,snow,radiation_flux,rad_flux_infuture
30426,2017,2,14,15,20,6.01355,55.365058,1024.838767,4.782787,353.80146,0.0,0.0,148.346472,140.760805
258291,2019,4,11,23,35,-1.090157,66.469438,1025.591533,1.489214,358.165693,0.0,0.0,1.436409,1.731234
168873,2018,6,10,9,25,16.707642,46.785881,1014.0788,1.468649,276.404689,0.0,0.0,523.693356,540.976627
5661,2016,11,20,15,45,4.59898,87.377488,1017.274167,3.95202,223.655313,0.0,0.0,0.0,0.0
6310,2016,11,22,21,50,0.68164,91.344396,1023.295833,4.126343,202.932154,0.0,0.0,0.0,0.0
97762,2017,10,6,11,40,4.74694,98.898333,991.440967,1.082176,269.534026,0.0,0.0,157.635893,176.533224
164845,2018,5,27,9,45,20.166398,51.188629,1025.05,2.851624,284.706636,0.0,0.0,615.904059,627.581542
190177,2018,8,23,8,45,15.71836,55.951425,1016.3819,5.064208,238.826738,0.0,0.0,319.854353,336.772851
47338,2017,4,14,9,40,-0.48246,74.413183,1002.677508,2.807744,17.468288,0.0,0.0,205.874078,207.653275
77191,2017,7,27,1,25,17.95687,79.084654,1002.447967,2.883954,60.129957,0.0,0.0,2.970892,2.845178


In [10]:
data.head(30)

Unnamed: 0,year,month,day,hour,minute,temperature,humidity,atmospheric_pressure,wind_speed,wind_direction,precipitation,snow,radiation_flux,rad_flux_infuture
0,2016,11,1,0,0,-0.591946,97.281109,1022.204033,1.249012,240.735204,0.0,0.0,0.0,0.0
1,2016,11,1,0,5,-0.597687,97.325147,1022.262033,0.866778,246.686019,0.0,0.0,0.0,0.0
2,2016,11,1,0,10,-0.580607,97.395573,1022.359833,1.239094,282.940579,0.0,0.0,0.0,0.0
3,2016,11,1,0,15,-0.558781,97.469255,1022.406333,0.883887,284.070831,0.0,0.0,0.0,0.0
4,2016,11,1,0,20,-0.578705,97.496932,1022.374467,1.592511,281.055331,0.0,0.0,0.0,0.0
5,2016,11,1,0,25,-0.599025,97.569909,1022.375467,1.807397,275.258072,0.0,0.0,0.0,0.0
6,2016,11,1,0,30,-0.625882,97.634367,1022.431395,2.091406,281.399344,0.0,0.0,0.0,0.0
7,2016,11,1,0,35,-0.619212,97.699369,1022.4392,1.412268,278.953697,0.0,0.0,0.0,0.0
8,2016,11,1,0,40,-0.632114,97.723341,1022.331267,1.928674,276.522212,0.0,0.0,0.0,0.0
9,2016,11,1,0,45,-0.659089,97.773075,1022.307867,2.075287,282.975079,0.0,0.0,0.0,0.0


In [11]:
data.to_csv("data.csv")

## trying to get data directly from web page

In [18]:
#url = 'https://meteo.physic.ut.ee/et/archive.php?do=data&begin%5Byear%5D=2016&begin%5Bmon%5D=11&begin%5Bmday%5D=18&end%5Byear%5D=2021&end%5Bmon%5D=11&end%5Bmday%5D=18&9=1&12=1&10=1&15=1&16=1&14=1&snow_16=1&ok=+Esita+päring+'


def get_weather_data(start_year, start_month, start_day, end_year, end_month, end_day):
    url = f'https://meteo.physic.ut.ee/et/archive.php?do=data&begin%5Byear%5D={start_year}&begin%5Bmon%5D={start_month}&begin%5Bmday%5D={start_day}&end%5Byear%5D={end_year}&end%5Bmon%5D={end_month}&end%5Bmday%5D={end_day}&9=1&12=1&11=1&14=1&ok=+Esita+p%C3%A4ring+'
    c = pd.read_csv(url)
    #c.columns = ['Aeg', 'Temperatuur', 'Niiskus', 'Valgustatus', 'Kiirgusvoog']
    return c

In [19]:
train = get_weather_data(2011, 1, 1, 2015, 12, 12)

In [20]:
train.head()

Unnamed: 0,Aeg,Temperatuur,Niiskus,Valgustatus,Kiirgusvoog
0,2011-01-01 00:00:00,-6.936343679417071,98.6784171261325,1354.9408873569798,
1,2011-01-01 00:05:00,-6.9568330698942695,98.5970157748458,1029.54357837049,
2,2011-01-01 00:10:00,-6.97364936736915,98.422877836847,1418.1783680055298,
3,2011-01-01 00:15:00,-6.94325190753387,98.49124327899727,1294.68212643356,
4,2011-01-01 00:20:00,-6.86547316481287,98.6406418520616,1374.16251444545,


In [17]:
train.isnull().sum()

Aeg            0
Temperatuur    0
Niiskus        0
Valgustatus    0
Kiirgusvoog    0
dtype: int64