After incorporating feedback from the Kaggle community, as well as scientific and educational partners, the Artificial Intelligence Committee of the American Meteorological Society is excited to be running a second iteration of the How Much Did It Rain? competition.

How Much Did It Rain? II is focused on solving the same core rain measurement prediction problem, but approaches it with a new and improved dataset and evaluation metric. This competition will go even further towards building a useful educational tool for universities, as well as making a meaningful contribution to continued meteorological research.

Competition Description
Rainfall is highly variable across space and time, making it notoriously tricky to measure. Rain gauges can be an effective measurement tool for a specific location, but it is impossible to have them everywhere. In order to have widespread coverage, data from weather radars is used to estimate rainfall nationwide. Unfortunately, these predictions never exactly match the measurements taken using rain gauges.

Recently, in an effort to improve their rainfall predictors, the U.S. National Weather Service upgraded their radar network to be polarimetric. These polarimetric radars are able to provide higher quality data than conventional Doppler radars because they transmit radio wave pulses with both horizontal and vertical orientations. 

Polarimetric radar. Image courtesy NOAA

Dual pulses make it easier to infer the size and type of precipitation because rain drops become flatter as they increase in size, whereas ice crystals tend to be elongated vertically.

In this competition, you are given snapshots of polarimetric radar values and asked to predict the hourly rain gauge total. A word of caution: many of the gauge values in the training dataset are implausible (gauges may get clogged, for example). More details are on the data page.

The training data consists of NEXRAD and MADIS data collected on 20 days between Apr and Aug 2014 over midwestern corn-growing states. Time and location information have been censored, and the data have been shuffled so that they are not ordered by time or place. The test data consists of data from the same radars and gauges over the remaining days in that month. Please see this page to understand more about polarimetric radar measurements.

File descriptions
train.zip - the training set.  This consists of radar observations at gauges in the Midwestern US over 20 days each month during the corn growing season. You are also provided the gauge observation at the end of each hour.
test.zip - the test set.  This consists of radar observations at gauges in the Midwestern US over the remaining 10/11 days each month of the same year(s) as the training set.  You are required to predict the gauge observation at the end of each hour.
sample_solution.zip - a sample submission file in the correct format
sample_dask.py - Example program in Python that will produce the sample submission file.  This program applies the Marshall-Palmer relationship to the radar observations to predict the gauge observation.
Data columns
To understand the data, you have to realize that there are multiple radar observations over the course of an hour, and only one gauge observation (the 'Expected'). That is why there are multiple rows with the same 'Id'.

The columns in the datasets are:

Id:  A unique number for the set of observations over an hour at a gauge.
minutes_past:  For each set of radar observations, the minutes past the top of the hour that the radar observations were carried out.  Radar observations are snapshots at that point in time.

radardist_km:  Distance of gauge from the radar whose observations are being reported.

Ref:  Radar reflectivity in km

Ref_5x5_10th:   10th percentile of reflectivity values in 5x5 neighborhood 
around the gauge.

Ref_5x5_50th:   50th percentile

Ref_5x5_90th:   90th percentile

RefComposite:  Maximum reflectivity in the vertical column above gauge.  In dBZ.

RefComposite_5x5_10th

RefComposite_5x5_50th

RefComposite_5x5_90th

RhoHV:  Correlation coefficient (unitless)

RhoHV_5x5_10th

RhoHV_5x5_50th

RhoHV_5x5_90th

Zdr:    Differential reflectivity in dB

Zdr_5x5_10th

Zdr_5x5_50th

Zdr_5x5_90th

Kdp:  Specific differential phase (deg/km)

Kdp_5x5_10th

Kdp_5x5_50th

Kdp_5x5_90th

Expected:  Actual gauge observation in mm at the end of the hour.


In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [None]:
!ls "/content/drive/My Drive/"

'Colab Notebooks'  'Data Files'   HMDIR   house_prices	 pics   UNIBS


Загрузка данных с Google Drive

In [None]:
import numpy as np
import pandas as pd
data = pd.read_csv("/content/drive/My Drive/HMDIR/train.csv")

Отображение загруженных данных

In [None]:
data = data.loc[:1000000]
data

Unnamed: 0,Id,minutes_past,radardist_km,Ref,Ref_5x5_10th,Ref_5x5_50th,Ref_5x5_90th,RefComposite,RefComposite_5x5_10th,RefComposite_5x5_50th,RefComposite_5x5_90th,RhoHV,RhoHV_5x5_10th,RhoHV_5x5_50th,RhoHV_5x5_90th,Zdr,Zdr_5x5_10th,Zdr_5x5_50th,Zdr_5x5_90th,Kdp,Kdp_5x5_10th,Kdp_5x5_50th,Kdp_5x5_90th,Expected
0,1,3,10.0,,,,,,,,,,,,,,,,,,,,,0.254000
1,1,16,10.0,,,,,,,,,,,,,,,,,,,,,0.254000
2,1,25,10.0,,,,,,,,,,,,,,,,,,,,,0.254000
3,1,35,10.0,,,,,,,,,,,,,,,,,,,,,0.254000
4,1,45,10.0,,,,,,,,,,,,,,,,,,,,,0.254000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
999996,87817,16,6.0,29.0,26.5,31.0,34.0,30.5,29.5,33.0,36.0,0.981667,0.905000,0.968333,0.988333,2.1250,-0.5000,0.4375,3.3750,0.000000,-4.570007,0.000000,8.139999,5.842003
999997,87817,21,6.0,36.0,35.0,37.5,42.0,39.0,37.0,39.0,44.5,0.968333,0.945000,0.981667,0.991667,1.7500,-0.3125,0.6875,1.9375,2.470001,-3.520004,-0.710007,2.470001,5.842003
999998,87817,26,6.0,36.0,33.0,35.5,38.0,36.0,35.0,37.0,39.0,0.985000,0.968333,0.988333,0.998333,0.3125,-0.3750,0.3125,1.5625,3.519989,-1.059998,0.349991,3.529999,5.842003
999999,87817,31,6.0,35.5,30.5,36.0,42.5,37.5,34.5,37.5,43.5,0.991667,0.968333,0.988333,0.998333,1.1250,-0.7500,0.4375,2.3125,-1.410004,-4.580002,-2.120010,7.889999,5.842003


In [None]:
data_ids = data[~np.isnan(data.loc[:,'Ref'])].Id.unique()  
##good_ids = set(data.iloc[data[first:last].notna(), 'Id'])
data = data[np.in1d(data.Id, data_ids)]

In [None]:
data

Unnamed: 0,Id,minutes_past,radardist_km,Ref,Ref_5x5_10th,Ref_5x5_50th,Ref_5x5_90th,RefComposite,RefComposite_5x5_10th,RefComposite_5x5_50th,RefComposite_5x5_90th,RhoHV,RhoHV_5x5_10th,RhoHV_5x5_50th,RhoHV_5x5_90th,Zdr,Zdr_5x5_10th,Zdr_5x5_50th,Zdr_5x5_90th,Kdp,Kdp_5x5_10th,Kdp_5x5_50th,Kdp_5x5_90th,Expected
6,2,1,2.0,9.0,5.0,7.5,10.5,15.0,10.5,16.5,23.5,0.998333,0.998333,0.998333,0.998333,0.3750,-0.1250,0.3125,0.8750,1.059998,-1.410004,-0.350006,1.059998,1.016000
7,2,6,2.0,26.5,22.5,25.5,31.5,26.5,26.5,28.5,32.0,1.001667,0.981667,0.998333,1.005000,0.0625,-0.1875,0.2500,0.6875,,,,1.409988,1.016000
8,2,11,2.0,21.5,15.5,20.5,25.0,26.5,23.5,25.0,27.0,1.001667,0.995000,0.998333,1.001667,0.3125,-0.0625,0.3125,0.6250,0.349991,,-0.350006,1.759994,1.016000
9,2,16,2.0,18.0,14.0,17.5,21.0,20.5,18.0,20.5,23.0,0.995000,0.995000,0.998333,1.001667,0.2500,0.1250,0.3750,0.6875,0.349991,-1.059998,0.000000,1.059998,1.016000
10,2,21,2.0,24.5,16.5,21.0,24.5,24.5,21.0,24.0,28.0,0.998333,0.995000,0.998333,0.998333,0.2500,0.0625,0.1875,0.5625,-0.350006,-1.059998,-0.350006,1.759994,1.016000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
999996,87817,16,6.0,29.0,26.5,31.0,34.0,30.5,29.5,33.0,36.0,0.981667,0.905000,0.968333,0.988333,2.1250,-0.5000,0.4375,3.3750,0.000000,-4.570007,0.000000,8.139999,5.842003
999997,87817,21,6.0,36.0,35.0,37.5,42.0,39.0,37.0,39.0,44.5,0.968333,0.945000,0.981667,0.991667,1.7500,-0.3125,0.6875,1.9375,2.470001,-3.520004,-0.710007,2.470001,5.842003
999998,87817,26,6.0,36.0,33.0,35.5,38.0,36.0,35.0,37.0,39.0,0.985000,0.968333,0.988333,0.998333,0.3125,-0.3750,0.3125,1.5625,3.519989,-1.059998,0.349991,3.529999,5.842003
999999,87817,31,6.0,35.5,30.5,36.0,42.5,37.5,34.5,37.5,43.5,0.991667,0.968333,0.988333,0.998333,1.1250,-0.7500,0.4375,2.3125,-1.410004,-4.580002,-2.120010,7.889999,5.842003


In [None]:
data = data.fillna(0.0)

Удаление столбца ID из набора данных

In [None]:
data = data.drop(columns=['Id'])

In [None]:
features = data.drop(columns = ['Expected'])

In [None]:
features

Unnamed: 0,minutes_past,radardist_km,Ref,Ref_5x5_10th,Ref_5x5_50th,Ref_5x5_90th,RefComposite,RefComposite_5x5_10th,RefComposite_5x5_50th,RefComposite_5x5_90th,RhoHV,RhoHV_5x5_10th,RhoHV_5x5_50th,RhoHV_5x5_90th,Zdr,Zdr_5x5_10th,Zdr_5x5_50th,Zdr_5x5_90th,Kdp,Kdp_5x5_10th,Kdp_5x5_50th,Kdp_5x5_90th
6,1,2.0,9.0,5.0,7.5,10.5,15.0,10.5,16.5,23.5,0.998333,0.998333,0.998333,0.998333,0.3750,-0.1250,0.3125,0.8750,1.059998,-1.410004,-0.350006,1.059998
7,6,2.0,26.5,22.5,25.5,31.5,26.5,26.5,28.5,32.0,1.001667,0.981667,0.998333,1.005000,0.0625,-0.1875,0.2500,0.6875,0.000000,0.000000,0.000000,1.409988
8,11,2.0,21.5,15.5,20.5,25.0,26.5,23.5,25.0,27.0,1.001667,0.995000,0.998333,1.001667,0.3125,-0.0625,0.3125,0.6250,0.349991,0.000000,-0.350006,1.759994
9,16,2.0,18.0,14.0,17.5,21.0,20.5,18.0,20.5,23.0,0.995000,0.995000,0.998333,1.001667,0.2500,0.1250,0.3750,0.6875,0.349991,-1.059998,0.000000,1.059998
10,21,2.0,24.5,16.5,21.0,24.5,24.5,21.0,24.0,28.0,0.998333,0.995000,0.998333,0.998333,0.2500,0.0625,0.1875,0.5625,-0.350006,-1.059998,-0.350006,1.759994
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
999996,16,6.0,29.0,26.5,31.0,34.0,30.5,29.5,33.0,36.0,0.981667,0.905000,0.968333,0.988333,2.1250,-0.5000,0.4375,3.3750,0.000000,-4.570007,0.000000,8.139999
999997,21,6.0,36.0,35.0,37.5,42.0,39.0,37.0,39.0,44.5,0.968333,0.945000,0.981667,0.991667,1.7500,-0.3125,0.6875,1.9375,2.470001,-3.520004,-0.710007,2.470001
999998,26,6.0,36.0,33.0,35.5,38.0,36.0,35.0,37.0,39.0,0.985000,0.968333,0.988333,0.998333,0.3125,-0.3750,0.3125,1.5625,3.519989,-1.059998,0.349991,3.529999
999999,31,6.0,35.5,30.5,36.0,42.5,37.5,34.5,37.5,43.5,0.991667,0.968333,0.988333,0.998333,1.1250,-0.7500,0.4375,2.3125,-1.410004,-4.580002,-2.120010,7.889999


In [None]:
#features.describe()

Нормировка данных так, чтобы среднее = 0, стандартное отклонение = 1

In [None]:
# Среднее значение
mymean = features.mean(axis=0)
# Стандартное отклонение
mystd = features.std(axis=0)
features -= mymean
features /= mystd

In [None]:
features

Unnamed: 0,minutes_past,radardist_km,Ref,Ref_5x5_10th,Ref_5x5_50th,Ref_5x5_90th,RefComposite,RefComposite_5x5_10th,RefComposite_5x5_50th,RefComposite_5x5_90th,RhoHV,RhoHV_5x5_10th,RhoHV_5x5_50th,RhoHV_5x5_90th,Zdr,Zdr_5x5_10th,Zdr_5x5_50th,Zdr_5x5_90th,Kdp,Kdp_5x5_10th,Kdp_5x5_50th,Kdp_5x5_90th
6,-1.643490,-1.923638,-0.531631,-0.568587,-0.640280,-0.771439,-0.247429,-0.296046,-0.128343,0.001922,0.964524,1.224548,0.960867,0.731353,0.080132,0.255781,0.173862,-0.251185,0.412217,-0.039889,-0.097710,-0.315630
7,-1.355481,-1.923638,0.757234,0.878061,0.712112,0.711686,0.565315,0.921190,0.733670,0.590580,0.971333,1.188876,0.960867,0.744918,-0.200834,0.173039,0.083767,-0.366684,-0.005308,0.551796,0.133034,-0.216385
8,-1.067472,-1.923638,0.388987,0.299402,0.336448,0.252623,0.565315,0.692958,0.482250,0.244311,0.971333,1.217414,0.960867,0.738135,0.023939,0.338524,0.173862,-0.405184,0.132551,0.551796,-0.097710,-0.117136
9,-0.779464,-1.923638,0.131214,0.175404,0.111049,-0.029877,0.141275,0.274533,0.158995,-0.032705,0.957715,1.217414,0.960867,0.738135,-0.032254,0.586750,0.263956,-0.366684,0.132551,0.106985,0.133034,-0.315630
10,-0.491455,-1.923638,0.609936,0.382068,0.374014,0.217311,0.423968,0.502765,0.410415,0.313564,0.964524,1.217414,0.960867,0.731353,-0.032254,0.504008,-0.006327,-0.443684,-0.143173,0.106985,-0.097710,-0.117136
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
999996,-0.779464,-0.933387,0.941358,1.208724,1.125343,0.888248,0.848009,1.149421,1.056925,0.867595,0.930479,1.024785,0.899528,0.711006,1.653541,-0.240672,0.354051,1.288811,-0.005308,-1.365933,0.133034,1.692006
999997,-0.491455,-0.933387,1.456904,1.911382,1.613707,1.453248,1.448733,1.720000,1.487931,1.456252,0.903243,1.110398,0.926790,0.717788,1.316382,0.007554,0.714429,0.403314,0.967607,-0.925317,-0.335042,0.084197
999998,-0.203446,-0.933387,1.456904,1.746050,1.463442,1.170748,1.236713,1.567846,1.344263,1.075356,0.937288,1.160339,0.940420,0.731353,0.023939,-0.075188,0.173862,0.172314,1.381190,0.106985,0.363768,0.384774
999999,0.084563,-0.933387,1.420079,1.539386,1.501008,1.488561,1.342723,1.529807,1.380180,1.386998,0.950906,1.160339,0.940420,0.731353,0.754450,-0.571642,0.354051,0.634313,-0.560698,-1.370127,-1.264596,1.621115


In [None]:
features.describe()

Unnamed: 0,minutes_past,radardist_km,Ref,Ref_5x5_10th,Ref_5x5_50th,Ref_5x5_90th,RefComposite,RefComposite_5x5_10th,RefComposite_5x5_50th,RefComposite_5x5_90th,RhoHV,RhoHV_5x5_10th,RhoHV_5x5_50th,RhoHV_5x5_90th,Zdr,Zdr_5x5_10th,Zdr_5x5_50th,Zdr_5x5_90th,Kdp,Kdp_5x5_10th,Kdp_5x5_50th,Kdp_5x5_90th
count,645568.0,645568.0,645568.0,645568.0,645568.0,645568.0,645568.0,645568.0,645568.0,645568.0,645568.0,645568.0,645568.0,645568.0,645568.0,645568.0,645568.0,645568.0,645568.0,645568.0,645568.0,645568.0
mean,5.147359e-16,-5.636598e-14,-2.689563e-14,1.015552e-13,6.497973e-14,-7.335184e-14,-5.650504e-14,4.008312e-14,4.097031e-14,-2.675751e-14,1.66915e-12,-3.599563e-13,1.515766e-12,4.45192e-12,2.927845e-15,4.257482e-14,-2.445006e-14,3.611828e-14,-4.455725e-15,-1.415711e-13,4.264567e-14,-5.573984e-13
std,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
min,-1.701092,-2.418763,-3.330309,-3.585881,-3.570465,-3.384564,-3.180376,-3.186979,-3.10947,-3.21838,-1.074776,-0.912204,-1.080349,-1.299952,-7.337368,-10.00426,-11.62851,-5.64117,-20.83435,-25.0081,-37.5962,-23.13122
25%,-0.8946671,-0.6858244,-1.194476,-0.9819146,-1.203777,-0.7008143,-1.30753,-1.094856,-1.31361,-0.6213619,-1.074776,-0.912204,-1.080349,-1.299952,-0.2570272,-0.2406724,-0.2766105,-0.7901831,-0.005308047,-0.3336315,0.133034,-0.6162076
50%,0.0269613,0.05686362,0.09438947,-0.03126019,0.1110489,0.1113732,0.105938,0.08434028,0.1230779,0.07117613,0.7534441,-0.912204,0.8245585,0.7245707,-0.2570272,0.4212658,-0.2766105,-0.2126847,-0.005308047,0.5517959,0.133034,-0.4148793
75%,0.8333861,0.7995516,0.7572344,0.7953959,0.7496788,0.7116857,0.7419988,0.7690351,0.7336702,0.6944604,0.9577146,1.131801,0.9540512,0.792394,0.1363251,0.4212658,0.1738618,0.4418136,-0.005308047,0.5517959,0.133034,0.4046215
max,1.697413,2.780053,4.034634,3.854023,3.980395,3.607311,4.558363,3.926239,3.786632,4.711181,1.073468,1.338699,1.069913,0.8398702,6.879507,10.92954,11.16539,4.099304,21.22944,1.433023,7.193661,23.05007


In [None]:
y = data.iloc[:, 22]

In [None]:
y

6          1.016000
7          1.016000
8          1.016000
9          1.016000
10         1.016000
             ...   
999996     5.842003
999997     5.842003
999998     5.842003
999999     5.842003
1000000    5.842003
Name: Expected, Length: 645568, dtype: float64

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(features, y, test_size=0.33, random_state=420)

In [None]:
X_train

Unnamed: 0,minutes_past,radardist_km,Ref,Ref_5x5_10th,Ref_5x5_50th,Ref_5x5_90th,RefComposite,RefComposite_5x5_10th,RefComposite_5x5_50th,RefComposite_5x5_90th,RhoHV,RhoHV_5x5_10th,RhoHV_5x5_50th,RhoHV_5x5_90th,Zdr,Zdr_5x5_10th,Zdr_5x5_50th,Zdr_5x5_90th,Kdp,Kdp_5x5_10th,Kdp_5x5_50th,Kdp_5x5_90th
676494,-0.030640,1.294677,-1.194476,-0.981915,-1.203777,-1.513002,-1.307530,-1.094856,-1.313610,-1.625542,-1.074776,-0.912204,-1.080349,-1.299952,-0.257027,0.421266,-0.276611,-0.790183,-0.005308,0.551796,0.133034,-0.616208
975388,0.199767,0.551989,-1.194476,-0.981915,-0.377315,-0.171127,-1.307530,-1.094856,-0.487515,-0.067331,0.998569,1.117532,0.947236,0.765265,-1.718050,-1.564549,-0.276611,0.249314,-0.840364,-1.072191,-1.264596,0.977423
13248,1.236599,0.056864,0.904533,0.960727,0.712112,0.535123,0.706662,0.845112,0.625919,0.659833,0.801107,1.053323,0.926790,0.724571,-0.200834,0.007554,-0.006327,-0.212685,-0.836422,-0.337832,-0.097710,0.481185
533362,-1.182676,-1.180950,-0.200208,0.010073,0.223748,0.076061,0.565315,0.578842,0.554084,0.382818,-1.074776,-0.912204,-1.080349,0.663530,-0.257027,0.421266,-0.276611,0.133814,-0.005308,0.551796,0.133034,-0.616208
390044,-0.376251,-2.171200,0.131214,-0.899249,-0.302182,0.005436,-0.035409,-0.029776,0.374498,0.521326,0.821534,-0.188063,0.926790,0.731353,0.642064,-5.618921,1.345090,2.135809,-1.250010,0.551796,-1.726073,1.198603
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
268241,-1.413083,-1.923638,0.315338,0.423401,0.261315,0.040748,0.529979,0.502765,0.374498,0.313564,0.950906,1.160339,0.947236,0.731353,0.979223,1.414173,1.705468,0.287814,0.412217,0.106985,0.363768,-0.117136
805372,0.890988,0.056864,1.235956,1.250057,1.275609,1.347311,1.307386,1.263537,1.272428,1.213864,0.889624,1.088995,0.899528,0.724571,1.765927,-0.488899,-0.006327,0.595813,0.684004,-2.545103,0.133034,0.268512
913654,-0.030640,1.542240,-1.194476,-0.981915,-1.203777,-1.513002,-1.307530,-1.094856,-1.313610,-1.625542,-1.074776,-0.912204,-1.080349,-1.299952,-0.257027,0.421266,-0.276611,-0.790183,-0.005308,0.551796,0.133034,-0.616208
996895,0.430174,1.047114,0.425812,0.547399,0.486714,0.429186,0.247285,0.350610,0.374498,0.417445,0.787489,-0.912204,0.947236,0.839870,-1.324698,0.421266,-3.069539,0.056815,1.522991,0.551796,-6.901228,0.484019


In [None]:
y_test

229935     0.762000
347466    22.860012
173570     1.524001
581777     0.508000
864365     3.810002
            ...    
458043    28.000013
850053     7.112004
410357     0.010000
112005    15.240008
451419     0.254000
Name: Expected, Length: 213038, dtype: float64

In [None]:
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense

In [None]:
model = Sequential()
model.add(Dense(32, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(512, activation='relu'))
model.add(Dense(1))

In [None]:
model.compile(optimizer='adam', loss='mse', metrics=['mae'])

In [None]:
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f7e6e0b0b50>

In [None]:
print(model.summary())

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 32)                736       
_________________________________________________________________
dense_1 (Dense)              (None, 128)               4224      
_________________________________________________________________
dense_2 (Dense)              (None, 512)               66048     
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 513       
Total params: 71,521
Trainable params: 71,521
Non-trainable params: 0
_________________________________________________________________
None


In [None]:
pred = model.predict(X_test)

In [None]:
y_test = y_test.reset_index(drop=True)

In [None]:
print("Предсказанная стоимость:", pred[12][0], ", правильная стоимость:", y_test[12])

Предсказанная стоимость: 8.097883 , правильная стоимость: 36.322018


In [None]:
vec = np.array([])
for i in np.arange(0, len(pred)):
  vec = np.append(vec, pred[i][0])
vec  

array([13.58607101,  7.37755585, 11.3064537 , ...,  5.2027607 ,
       14.14209557, 39.793396  ])

In [None]:
diff = np.array([])

for i in np.arange(0, len(y_test)):
    diff = np.append(diff, np.abs(pred[i][0] - y_test[i]))

In [None]:
np.sort(diff)[-10:-1]

array([ 4681.48413807,  4682.89441639,  4682.89441639,  4690.44164807,
        4690.68205983,  4701.01120907,  4702.21964032, 10375.86727321,
       10378.23661411])

In [None]:
vec

array([13.58607101,  7.37755585, 11.3064537 , ...,  5.2027607 ,
       14.14209557, 39.793396  ])

In [None]:
sorted = np.sort(pred, )
sorted

array([[13.586071 ],
       [ 7.377556 ],
       [11.306454 ],
       ...,
       [ 5.2027607],
       [14.142096 ],
       [39.793396 ]], dtype=float32)

In [None]:
for i in np.arange(0, len(y_test)):
    print("Предсказанная стоимость:", pred[i][0], ", правильная стоимость:", y_test[i], 'разница: ', np.abs(pred[i][0] - y_test[i]))

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Предсказанная стоимость: 15.102652 , правильная стоимость: 545.84625 разница:  530.7435984039307
Предсказанная стоимость: 10.9805155 , правильная стоимость: 2.2860012000000003 разница:  8.694514280041503
Предсказанная стоимость: 8.124468 , правильная стоимость: 2.2860012000000003 разница:  5.838466649731445
Предсказанная стоимость: 6.59002 , правильная стоимость: 0.7620004 разница:  5.828019779748535
Предсказанная стоимость: 9.9456415 , правильная стоимость: 0.010000005 разница:  9.93564151263916
Предсказанная стоимость: 7.5654845 , правильная стоимость: 0.7620004 разница:  6.803484123773194
Предсказанная стоимость: 16.685946 , правильная стоимость: 5.334003 разница:  11.351942510864259
Предсказанная стоимость: 33.138184 , правильная стоимость: 0.54000026 разница:  32.59818333375
Предсказанная стоимость: 11.845154 , правильная стоимость: 0.7620004 разница:  11.08315340859375
Предсказанная стоимость: 8.879139 , правильная 

In [None]:
СС_tuner = np.corrcoef(vec, y_test)
СС_tuner = СС_tuner[0][1]
print(f'Коэффициаент корреляция с истинными данными: {СС_tuner}')

Коэффициаент корреляция с истинными данными: 0.17616991482601785
