# Single model

Single models can be trained using the `GaussianProcessRegressionPipe`. This is a general pipe that works on any supervised learning kind of problems with numerical features and targets. Below, we detail how to use this pipe to train models using AOD data.

## 1. Define the technique and the data spec


- Define the features that will not be used to train the model in a list of strings
    - The total number of all possible features is 59, therefore the model will have 59 minus the length of the list


- Define the `GaussianProcessRegressionTechnique` that will be used to train the models
    - This requires specifying an instance of an extension of `SklearnGPRKernel`; in this example, we use a Matern kernel, which is hosted in the `SklearnGPRKernelMatern` type
    - Notice that the kernel **and** the technique are persistable types that must be upserted.


- Define a `GPRDataSourceSpec` that specifies how to fetch the appropriate data (features and target) to train the GPR. This is also a persistable type that must be upserted.
    - `featuresType`: type from which features are extracted
    - `featuresSpec`: a `FetchSpec` to select the features
    - `excludeFeatures`: as above
    - `targetType`: type from which the target is extracted
    - `targetSpec`: how to find the appropriate targets
    - `targetName`: the target type may have several columns, specify which one to use
    

- Finally, build the `GaussianProcessRegressionPipe` specifying the `dataSourceSpec` and the `technique` that were just created (pass by ID reference)

In [10]:
# features to ignore
excludeFeats = [] #["acure_ait_width", "c_r_correl"]
kernelLen = 59 - len(excludeFeats)

# create kernel
GPR_kernel = c3.SklearnGPRKernelMatern(lengthScale=[1.0]*kernelLen, nu=0.5, coefficient=1.0).build().kernel.upsert()

# define technique
GPR_technique = c3.GaussianProcessRegressionTechnique(
                    randomState=42,
                    kernel = GPR_kernel
).upsert()

# define data source spec
GPR_dataspec = c3.GPRDataSourceSpec(
    featuresType = c3.TypeRef(
        typeName="SimulationModelParameters"
    ),
    featuresSpec=c3.FetchSpec(
        limit=-1
    ),
    excludeFeatures=excludeFeats,
    targetType=c3.TypeRef(
        typeName="Simulation3HourlyAODOutput"
    ),
    targetSpec=c3.FetchSpec(
        filter="geoSurfaceTimePoint.id == '-0.625_-0.938_2017-07-01T00:20:00'"
    ),
    targetName="all"
).upsert()

# create pipe
GPR_pipe = c3.GaussianProcessRegressionPipe(
    technique=GPR_technique,
    dataSourceSpec=GPR_dataspec
)

## 2. Get features and target

- Use the `GaussianProcessRegressionPipe` methods to grab the features and target.
    - `getFeatures()` collects features
    - `getTarget()` collects target
    
They return c3 `Dataset`s that can easily be converted to Pandas dataframes.

In [11]:
X = GPR_pipe.getFeatures()
dfX = c3.Dataset.toPandas(dataset=X)
dfX

Unnamed: 0,acure_bl_nuc,acure_ait_width,acure_cloud_ph,acure_carb_ff_ems,acure_carb_ff_ems_eur,acure_carb_ff_ems_nam,acure_carb_ff_ems_chi,acure_carb_ff_ems_asi,acure_carb_ff_ems_mar,acure_carb_ff_ems_r,...,acure_oxidants_o3,bparam,two_d_fsd_factor,c_r_correl,acure_autoconv_exp_lwp,acure_autoconv_exp_nd,dbsdtbs_turb_0,ai,m_ci,a_ent_1_rp
0,0.500000,0.650000,0.396000,1.0,0.500000,0.500000,0.500000,0.500000,0.500000,0.500000,...,0.576175,0.500000,0.400000,0.900000,0.275862,0.605000,0.150000,0.514000,0.333333,0.460000
1,0.470000,0.500000,0.500000,1.0,0.530000,0.470000,0.530000,0.470000,0.530000,0.470000,...,0.500000,0.500000,0.500000,0.500000,0.500000,0.500000,0.500000,0.500000,0.500000,0.500000
2,0.969888,0.083081,0.478474,1.0,0.577525,0.231615,0.283145,0.050178,0.419984,0.914346,...,0.017104,0.927093,0.833905,0.610920,0.993935,0.755788,0.774187,0.960911,0.988952,0.508725
3,0.132847,0.445265,0.390414,1.0,0.925921,0.630498,0.187251,0.817525,0.147787,0.462419,...,0.010731,0.950732,0.902536,0.780157,0.267910,0.018570,0.106893,0.218308,0.163327,0.936031
4,0.058261,0.630422,0.132292,1.0,0.009463,0.338064,0.913479,0.575490,0.412795,0.271141,...,0.779013,0.129769,0.712185,0.552866,0.328090,0.651008,0.613814,0.101666,0.254514,0.089525
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
216,0.591530,0.996801,0.170201,1.0,0.677187,0.865209,0.194226,0.526719,0.355815,0.187145,...,0.836703,0.515917,0.891746,0.418227,0.732156,0.900471,0.035621,0.175431,0.482366,0.400788
217,0.774235,0.165151,0.881014,1.0,0.038024,0.157743,0.035123,0.946019,0.699902,0.097885,...,0.787844,0.786932,0.231686,0.840574,0.349984,0.433408,0.930975,0.119901,0.777433,0.116056
218,0.227072,0.231834,0.185796,1.0,0.997232,0.609053,0.753345,0.219389,0.438737,0.928641,...,0.709293,0.204519,0.096402,0.147632,0.313749,0.741739,0.144573,0.488763,0.245905,0.686731
219,0.047377,0.633909,0.721278,1.0,0.751703,0.047339,0.692896,0.994552,0.155710,0.762626,...,0.214451,0.303059,0.898960,0.114170,0.725976,0.766313,0.259877,0.872825,0.966951,0.826032


In [15]:
y = GPR_pipe.getTarget()
dfy = c3.Dataset.toPandas(dataset=y)
dfy

Unnamed: 0,all
0,0.335783
1,0.313286
2,0.539254
3,0.266760
4,0.328716
...,...
216,0.378555
217,0.338627
218,0.257918
219,0.182794


## 3. Train the model, save

- Train the model with the data from the previous section.
- Save the trained model by upserting the pipe.

In [4]:
GPR_trained = GPR_pipe.train(input=X, targetOutput=y)

In [5]:
GPR_trained.upsert()

c3.GaussianProcessRegressionPipe(
 id='7b2fabcb-63fb-4286-aa07-db384a5d033c',
 meta=c3.Meta(
        created=datetime.datetime(2022, 7, 15, 16, 27, 20, tzinfo=datetime.timezone.utc),
        updated=datetime.datetime(2022, 7, 15, 16, 27, 20, tzinfo=datetime.timezone.utc),
        timestamp=datetime.datetime(2022, 7, 15, 16, 27, 20, tzinfo=datetime.timezone.utc)),
 version=1,
 typeIdent='PIPE:LF:GPREG',
 noTrainScore=False,
 untrainableOverride=False)

In [19]:
GPR_pipe.upsert()

500 - NonNullField - c3.engine.database.DbException_validationErrors [909.2971]
message: "Write failed: Required field GaussianProcessRegressionPipe.technique not set.
Required field GaussianProcessRegressionPipe.dataSourceSpec not set."
JSON: {"this": {"type": "GaussianProcessRegressionPipe", "noTrainScore": false, "untrainableOverride": false, "technique": {"type": "GaussianProcessRegressionTechnique", "randomState": 42, "kernel": {"type": "SklearnGPRKernel", "name": "Matern", "hyperParameters": {"lengthScale": {"type": "[double]", "value": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]}, "coefficient": {"type": "double", "value": 1.0}, "nu": {"type": "double", "value": 0.5}}, "pickledKernel": "eJxrYEouzs5JTSzK00tPLC0uzkzMiy8oyk9OLS7Wy04tykvNKeYKKMp

C3RuntimeException: 500 - NonNullField - c3.engine.database.DbException_validationErrors [909.2971]
message: "Write failed: Required field GaussianProcessRegressionPipe.technique not set.
Required field GaussianProcessRegressionPipe.dataSourceSpec not set."
JSON: {"this": {"type": "GaussianProcessRegressionPipe", "noTrainScore": false, "untrainableOverride": false, "technique": {"type": "GaussianProcessRegressionTechnique", "randomState": 42, "kernel": {"type": "SklearnGPRKernel", "name": "Matern", "hyperParameters": {"lengthScale": {"type": "[double]", "value": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]}, "coefficient": {"type": "double", "value": 1.0}, "nu": {"type": "double", "value": 0.5}}, "pickledKernel": "eJxrYEouzs5JTSzK00tPLC0uzkzMiy8oyk9OLS7Wy04tykvNKeYKKMpPKU0u4Spk0GwsZKwtZNKIYGJgYMg2LGQmrNk5P6+4JDGvxBvM5ypkAZrBWlvIphHBBzQjGSobX5aYU5payO5u/4EBDCJEMWTjk/JL81KKCznc7Z782NrxOOOju8OPLLDqtkLO0iSIo4wKuQg7yjexBMjiKuQGOoantpBXI4IHqDcnNS+9JCO+ODkxJ7WQLzklMydHLx5EcsXn5CemxJdUFqRyFfJHsIDUZhaXFAq0FgoGFQrFFgprwB0+yhiZjNTWQpGgQtEIYQbUhARLtWLYUq04OM3mlRZKuNs/gJhTmlSapAcAuOfaqw=="}}, "dataSourceSpec": {"type": "GPRDataSourceSpec", "featuresType": {"type": "TypeRef", "typeName": "SimulationModelParameters"}, "featuresSpec": {"type": "FetchSpec", "offset": 0, "limit": -1}, "excludeFeatures": ["acure_ait_width", "c_r_correl"], "targetType": {"type": "TypeRef", "typeName": "Simulation3HourlyAODOutput"}, "targetSpec": {"type": "FetchSpec", "filter": "geoSurfaceTimePoint.id == '-0.625_-0.938_2017-07-01T00:20:00'", "offset": 0, "limit": 2000}, "targetName": "all"}}, "srcObj": null, "spec": null}

## 4. Retrieve trained model and learned parameters

- To retrieve a model, use the ID that was returned in the last cell.
- Deserialize the object and work with an Sklearn, puerly pythonic object.

In [9]:
pipe = c3.GaussianProcessRegressionPipe.get('f7f3f16a-e14b-4b41-ba16-9601495cf61b')

In [10]:
learnedKernel = c3.PythonSerialization.deserialize(serialized=pipe.trainedModel.model).kernel_



In [15]:
learnedKernel.get_params()

{'k1': 0.477**2,
 'k2': Matern(length_scale=[2.43e+03, 397, 1, 2.86e+04, 2.66e+04, 3.78e+04, 3.6e+04, 4.33e+04, 1.98e+04, 1, 3.61e+04, 4.15e+04, 2.62e+04, 1.73e+04, 3.39e+04, 3.73e+04, 1, 4.67e+04, 1.24e+04, 3.29e+04, 3.78e+04, 99.5, 2.7e+04, 1e+05, 3.99e+04, 39.4, 26.6, 1, 4.29e+04, 267, 503, 4.13e+04, 196, 227, 41.2, 145, 3.44e+04, 268, 39.8, 87.9, 84.9, 3.46e+04, 4.53e+04, 4.92e+04, 196, 2.94e+04, 116, 1.45e+04, 4.09e+04, 3.4e+04, 154, 98.1, 697, 3.19e+04, 158, 3.35e+04, 128], nu=0.5),
 'k1__constant_value': 0.22799394201892978,
 'k1__constant_value_bounds': (1e-05, 100000.0),
 'k2__length_scale': array([2.42691844e+03, 3.96709075e+02, 1.00000000e+00, 2.85747307e+04,
        2.65915401e+04, 3.78184864e+04, 3.60346620e+04, 4.33251266e+04,
        1.98155784e+04, 1.00000000e+00, 3.60757864e+04, 4.15413535e+04,
        2.62073666e+04, 1.73274399e+04, 3.39139364e+04, 3.73305580e+04,
        1.00000000e+00, 4.66695796e+04, 1.24083684e+04, 3.28689190e+04,
        3.77901993e+04, 9.9522415

# Multiple models trained as batch job

Since the `AODGaussianMLTrainingJob` batch job is constructed for AOD data, a lot of what we had to specify to train a single model using `GaussianProcessRegressionPipe` is already implemented behind the scenes.

## 1. Define the job parameters

Here, the key ingredients are:
- A `GeoSurfaceTimePoint` filter: each model will be trained for with a specific *(lat, lon, time)* key
- The features to exclude, as in the single model example
- The `SklearnGPRKernel` to train the models, as in the single model example
- The `GaussianProcessRegressionTechnique` to train the models, as in the single model example


After these are sepcified, combined them into an instance of `AODGaussianMLTrainingJobOptions`:
- `batchSize` determines how many models will be trained in each batch. One model per batch is fine for ~< 10000 models. More than that, the overhead to create the batches starts to tick
- `gstpFilter`, `targetName`, `gprTechnqiue`, and `excludeFeatures` follow immediately from the above definitions


Finally, create the job using the options instance.

In [1]:
lat1 = -30.0
lat2 = 10.0
lon1 = -45.0
lon2 = 40.0
time1 = "2017-07-01T00:00:00"
time2 = "2017-07-01T23:59:59"
gstpFilter = c3.Filter() \
    .ge("latitude", lat1) \
    .and_().le("latitude", lat2) \
    .and_().ge("longitude", lon1) \
    .and_().le("longitude", lon2) \
    .and_().ge("time", time1) \
    .and_().le("time", time2)

In [2]:
excludeFeats = ["acure_anth_so2", "acure_carb_bb_ems", "acure_carb_ff_ems", "acure_carb_res_ems"]
kernelLen = 59 - len(excludeFeats)

GPR_kernel = c3.SklearnGPRKernelMatern(lengthScale=[1.0]*kernelLen, nu=0.5, coefficient=1.0).build().kernel.upsert()

GPR_technique = c3.GaussianProcessRegressionTechnique(
                    randomState=42,
                    kernel = GPR_kernel
).upsert()

jobOptions = c3.AODGaussianMLTrainingJobOptions(
    batchSize=1,
    gstpFilter=gstpFilter,
    targetName="all",
    gprTechnique=GPR_technique,
    excludeFeatures=excludeFeats
)

job = c3.AODGaussianMLTrainingJob(
    options=jobOptions
).upsert()

## 2. Dispatch job and verify completion

- Dispatch the job using the `start()` method
- Check on progress using the `status()` method and/or by checking the `InvalidationQueue` with `c3Grid(InvalidationQueue.countAll())` *in the static console*
- If necessary, cancel the job using the `cancel()` method

In [3]:
job.start()

c3.BatchJobStatus(
 started=datetime.datetime(2022, 9, 1, 14, 33, 7, tzinfo=datetime.timezone.utc),
 startedby='babreu@illinois.edu',
 status='running')

In [4]:
job.status()

c3.BatchJobStatus(
 started=datetime.datetime(2022, 9, 1, 14, 33, 7, tzinfo=datetime.timezone.utc),
 startedby='babreu@illinois.edu',
 status='running',
 newBatchSubmitted=True)

In [5]:
job.cancel()

c3.BatchJobStatus(
 started=datetime.datetime(2022, 9, 1, 14, 33, 7, tzinfo=datetime.timezone.utc),
 startedby='babreu@illinois.edu',
 completed=datetime.datetime(2022, 9, 1, 14, 39, 26, tzinfo=datetime.timezone.utc),
 status='canceled',
 newBatchSubmitted=True)

# Retrieve learned parameters

Batch jobs can take a long time depending on the number of `GeoSurfaceTimePoint` instances enclosed by the filter that is passed as a job option. The `AODGPRModelFinder` utility type has methods that simplify finding the appropriate models and extracting the learned parameters from them.

## 1. Specify again the parameters that define the model

This may sound repetitive in this notebook because we have just define all of these parameters, but in a broader context where you don't necessarily remember how models were trained, it can be very helpful.




**The important thing to notice here is that the kernel `SklearnGPRKernel` and the technique `GaussianProcessRegressionTechnique` *must not be upserted*. They are created in memory only for the sole purpose of providing the fields to look for the models. *Only upsert these when actually training models, not searching for them.***




The `extractLearnedParametersJob` method launches a `DynMapReduce` job that collects the hyperparameters in parallel. The arguments are:
- `excludeFeatures`: the features that were excluded for training (see above)
- `gstpFilter`: a `GeoSurfaceTimePoint` filter
- `targetName`: the name of the target that was used to train the models
- `GPR_technique`: the `GaussianProcessRegressionTechnique` that was used to train the models
- `batchSize`: how many models to fetch in each batch (~10 is good)

This will return a job instance. Use the `status()` method to check on progress.

In [1]:
lat1 = -30.0
lat2 = 10.0
lon1 = -45.0
lon2 = 40.0
time1 = "2017-07-01T00:00:00"
time2 = "2017-07-01T2:59:59"
gstpFilter = c3.Filter() \
    .ge("latitude", lat1) \
    .and_().le("latitude", lat2) \
    .and_().ge("longitude", lon1) \
    .and_().le("longitude", lon2) \
    .and_().ge("time", time1) \
    .and_().le("time", time2)

excludeFeats = ["acure_anth_so2", "acure_carb_bb_ems", "acure_carb_ff_ems", "acure_carb_res_ems"]
kernelLen = 59 - len(excludeFeats)

GPR_kernel = c3.SklearnGPRKernelMatern(lengthScale=[1.0]*kernelLen, nu=0.5, coefficient=1.0).build().kernel

GPR_technique = c3.GaussianProcessRegressionTechnique(
                    randomState=42,
                    kernel = GPR_kernel
)

In [2]:
job = c3.AODGPRModelFinder.extractLearnedParametersJob(excludeFeats, gstpFilter, "all", GPR_technique, 10)

In [6]:
job.status()

c3.MapReduceStatus(
 started=datetime.datetime(2022, 11, 7, 13, 6, 48, tzinfo=datetime.timezone.utc),
 startedby='jcarzon@andrew.cmu.edu',
 completed=datetime.datetime(2022, 11, 7, 13, 10, 51, tzinfo=datetime.timezone.utc),
 status='canceled',
 step='map')

## 2. Cast the results into a dataframe

Once the job is complete, the `AODGPRModelFinder.getDataframeFromJob` method can be used to gather the results from the `DynMapReduce` job into a Pandas dataframe.

In [7]:
df = c3.AODGPRModelFinder.getDataframeFromJob(job)

In [8]:
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,49,50,51,52,53,54,modelId,latitude,longitude,time
0,477.589894,26389.964348,17830.306158,13025.238330,37326.271992,23790.255157,38716.632380,36114.558567,16616.476091,23587.198932,...,92.528024,350.771145,38750.571208,154.860874,16389.621089,109.744419,1cc80310-c7b8-4884-ac1e-3dd93e6102bf,-0.625,-0.9375,2017-07-01T00:20:00
1,42924.870680,305.491024,39559.295596,57173.068482,37917.578006,78850.762243,53692.740290,10426.149411,9862.956538,23631.892426,...,48.190895,67.022936,108.688793,100000.000000,94.561719,99.219624,5f28a33f-1afb-4464-9e51-08dce2fa242a,-0.625,-10.3125,2017-07-01T00:20:00
2,37698.854961,26607.698034,22214.939421,47000.032354,30832.268761,39321.331343,7653.429931,27208.253663,268.993991,397.588438,...,95.108614,14955.932297,182.192647,256.281376,89.046448,150.646636,ded50bd6-890b-4d13-b9e2-9cc543a2cbb5,-0.625,-12.1875,2017-07-01T00:20:00
3,93399.817864,74168.094552,387.081240,88235.008654,220.699683,85663.120079,52365.183504,53227.260262,83439.129013,11598.186937,...,324.977964,86148.221946,150.150294,40287.448701,85.163734,154.198848,013803ad-eed7-4130-a704-e5e11973202e,-0.625,-14.0625,2017-07-01T00:20:00
4,31916.846487,556.957235,13258.590912,32560.165546,295.792115,29434.733614,30451.066749,4362.307373,27615.829079,804.737896,...,6897.250293,2831.602132,158.158443,134.484771,88.727139,152.773456,1728add8-d234-4d15-b301-3411b44a81ec,-0.625,-15.9375,2017-07-01T00:20:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1435,687.739071,96486.702776,11093.045032,62153.549757,8518.206492,85113.047903,45393.629266,24764.564632,29761.109269,76378.596324,...,43567.485061,570.195478,10798.972053,13161.230617,346.959669,67543.587532,e91dbc70-5f17-4652-9e84-0a84ff8b0edb,-6.875,36.5625,2017-07-01T00:20:00
1436,73629.393807,46251.589482,62204.757976,89383.908487,44306.137459,83320.638360,84716.569727,7889.185166,182.473894,148.826333,...,66702.684776,22350.180322,160.907470,53273.943038,83913.061891,110.063955,d30d3edf-d6e6-4227-a399-85a3f881993f,-6.875,38.4375,2017-07-01T00:20:00
1437,118.335324,89628.482433,40488.415466,100000.000000,371.205307,119.975042,100000.000000,100000.000000,82463.404520,100000.000000,...,119.204108,186.113064,100000.000000,100000.000000,73709.550815,75.773011,f8ded000-d08f-4eba-ab8e-291a8eb68a94,-6.875,4.6875,2017-07-01T00:20:00
1438,202.022736,354.269773,97948.049433,3244.151521,27342.146774,100000.000000,89661.139964,75178.934541,51734.357715,295.061647,...,90.855911,133.623972,95880.149749,878.542935,100000.000000,157.177949,81b10e07-ba79-4035-a197-c3c9d3e96857,-6.875,6.5625,2017-07-01T00:20:00


# Making predictions

Once models are trained and saved as pipes, we can retrieve the entire pipe and use the remaining functionalities, most importantly the `process` method that makes predictions. Additional Sklearn methods can also be directly called by deserializing the `trainedModel.model`, making it a regular Sklearn/Python object. 

## 1. Retrieve a model

If we know the ID of the target model, it is best to use the `GaussianProcessRegressionPipe.get(<id>)` method. Otherwise, we can use filters to find the relevant model via the `dataSourceSpec` and `technique` fields. Here, we will simply get the first pipe in the table.

In [21]:
pipe = c3.GaussianProcessRegressionPipe.fetch({
    "limit": 1 
}).objs[0]

In [22]:
pipe

c3.GaussianProcessRegressionPipe(
 id='0086b35d-685e-4846-9270-469991ae522e',
 meta=c3.Meta(
        tenantTagId=151,
        tenant='dev',
        tag='tc02d',
        created=datetime.datetime(2022, 7, 18, 19, 43, 45, tzinfo=datetime.timezone.utc),
        createdBy='worker',
        updated=datetime.datetime(2022, 7, 18, 19, 43, 45, tzinfo=datetime.timezone.utc),
        updatedBy='worker',
        timestamp=datetime.datetime(2022, 7, 18, 19, 43, 45, tzinfo=datetime.timezone.utc),
        fetchInclude='[]',
        fetchType='GaussianProcessRegressionPipe'),
 version=1,
 typeIdent='PIPE:LF:GPREG',
 noTrainScore=False,
 persistedModelCategory='unidentified',
 untrainableOverride=False,
 technique=c3.GaussianProcessRegressionTechnique(id='JYS'),
 trainedModel=c3.MLTrainedModelArtifact(
                model='eJzsvGlTVd0WLkYnKKKCioDSiiKgAooNiLIQRVAQFRBBBKSTvtk0CgjS930vfd/3fQ9VY/yIVCqVSqqSH5B8u9+SZ+z3nFdO7rl1700qlUolvlXnbPZeazZjPOMZz5hzrlWqFZeTmpYQk53hlBiTl5OTHJMRnZWdGZeQk+MUnZiVre/7j2/

## 2. Create synthetic features

For this tutorial, we will simply try to run predictions on random features. But to make it work, we need to know how many features this model was trained with. This comes from the `dataSourceSpec.excludeFeatures` field.

In [23]:
sourceSpec = c3.GPRDataSourceSpec.get(pipe.dataSourceSpec.id)
nExcFeats = len(sourceSpec.excludeFeatures)
nFeatures = 59 - nExcFeats

In [24]:
nFeatures

57

In [25]:
# create synthetic data with numpy
import numpy as np

synth = np.random.rand(50, nFeatures)

# cast it into a c3.Dataset
synthDataset = c3.Dataset.fromPython(pythonData=synth)

## 3. Using the `process` method

This method calls the Sklearn `predict` method. The features come as the `input` argument, and we are also able to select boolean values to compute standard deviations **or** covariances (simultaneously calculating both is not supported by Sklearn).

### a) Plain prediction

In [26]:
y_a = pipe.process(input=synthDataset)
df_a = c3.Dataset.toPandas(y_a)
df_a

Unnamed: 0,0
0,0.0829
1,0.063641
2,0.103276
3,0.050532
4,0.073837
5,0.075007
6,0.065491
7,0.139931
8,0.080949
9,0.1012


### b) Predictions + standard deviations

In [27]:
y_b = pipe.process(input=synthDataset, computeStd=True)
df_b = c3.Dataset.toPandas(y_b)
df_b

Unnamed: 0,0,1
0,0.0829,0.534645
1,0.063641,0.547806
2,0.103276,0.514916
3,0.050532,0.555588
4,0.073837,0.541112
5,0.075007,0.541103
6,0.065491,0.547096
7,0.139931,0.474565
8,0.080949,0.536096
9,0.1012,0.518788


### c) Predictions + covariances

In [28]:
y_c = pipe.process(input=synthDataset, computeCov=True)
df_c = c3.Dataset.toPandas(y_c)
df_c

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,41,42,43,44,45,46,47,48,49,50
0,0.0829,0.285846,0.104318,0.110771,0.144026,0.178815,0.094722,0.118427,0.116779,0.109533,...,0.210857,0.113066,0.133251,0.209529,0.096146,0.16533,0.126256,0.106633,0.091007,0.12526
1,0.063641,0.104318,0.300091,0.163022,0.103408,0.12102,0.071726,0.124227,0.101725,0.111171,...,0.091996,0.080539,0.093744,0.079883,0.180622,0.093628,0.218403,0.075168,0.078934,0.05432
2,0.103276,0.110771,0.163022,0.265139,0.080147,0.118378,0.066192,0.11042,0.153774,0.101394,...,0.110329,0.084136,0.076003,0.082778,0.134936,0.08251,0.143574,0.110462,0.078695,0.069351
3,0.050532,0.144026,0.103408,0.080147,0.308678,0.136435,0.12883,0.105003,0.0749,0.137144,...,0.118413,0.092966,0.233231,0.152375,0.115581,0.202661,0.143243,0.064103,0.107816,0.079472
4,0.073837,0.178815,0.12102,0.118378,0.136435,0.292802,0.090436,0.087078,0.10809,0.101459,...,0.159368,0.071583,0.114392,0.145985,0.118457,0.121746,0.146441,0.081304,0.084727,0.085337
5,0.075007,0.094722,0.071726,0.066192,0.12883,0.090436,0.292792,0.068658,0.07334,0.179937,...,0.097881,0.077781,0.15345,0.112092,0.108573,0.12676,0.09507,0.062422,0.206387,0.086928
6,0.065491,0.118427,0.124227,0.11042,0.105003,0.087078,0.068658,0.299314,0.091206,0.105627,...,0.10074,0.176801,0.108701,0.102265,0.094474,0.131272,0.125002,0.093081,0.075383,0.080619
7,0.139931,0.116779,0.101725,0.153774,0.0749,0.10809,0.07334,0.091206,0.225212,0.09962,...,0.132982,0.087065,0.075127,0.095788,0.098507,0.084679,0.102545,0.149629,0.087475,0.099714
8,0.080949,0.109533,0.111171,0.101394,0.137144,0.101459,0.179937,0.105627,0.09962,0.287399,...,0.109192,0.105862,0.159815,0.112451,0.153094,0.141059,0.137132,0.083819,0.199762,0.089453
9,0.1012,0.097737,0.10445,0.114208,0.088015,0.074139,0.094524,0.146879,0.111903,0.144024,...,0.098227,0.159748,0.101734,0.09135,0.104959,0.111114,0.107218,0.115542,0.117688,0.095434


## 4. Using Sklearn directly

Once we have the pipe, we can simply deserialized the trained model using `PythonSerialization.serialize`.

In [29]:
pipe = c3.GaussianProcessRegressionPipe.fetch({
    "limit": 1 
}).objs[0]

In [30]:
sklearn_model = c3.PythonSerialization.deserialize(serialized=pipe.trainedModel.model)



In [31]:
sklearn_model

GaussianProcessRegressor(kernel=1**2 * Matern(length_scale=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], nu=0.5))