Copyright 2023 RISC Zero, Inc.

 Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.

The following notebook is meant to serve as a guide for training classifiers and regression models using the SmartCore crate.  Prior to training the classifier in Rust, the data should be processed in Python.  The data and classes should be exported as seperate CSV files.

Start by importing the Smartcore and Polars crates as dependencies.  Outside of a jupyter notebook environment, you can add these to your cargo.toml file or use cargo add "CRATE-NAME" in the command line.

Be sure to include serde as a feature for the smartcore crate, otherwise the Smartcore CSV readers will not work.

In [None]:
:dep smartcore = {version = "0.3.2", features = ["serde"]}
:dep polars = "*"
:dep serde_json = "1.0"
:dep rmp-serde = "1.1.2"
:dep ndarray = "0.15"

In [None]:
use smartcore::tree::decision_tree_classifier::*;
use smartcore::readers;
use smartcore::linalg::basic::matrix::DenseMatrix;
use smartcore::linear::linear_regression::LinearRegression;
use serde_json::{self, Value};
use ndarray::array;

use std::fs::File;
use std::io::{Read, Write};
use polars::prelude::*;
use serde_json;
use rmp_serde;

We train the model using the same data used the `ezkl.ipynb` notebook.  

In [None]:
// Example data
let x = DenseMatrix::from_2d_array(&[
    &[1.0, 1.0],
    &[1.0, 2.0],
    &[2.0, 2.0],
    &[2.0, 3.0],
]);
let y = array![6, 8, 9, 11];

let y_vec_i64: Vec<i64> = y.to_vec();

let y_vec_u32: Vec<u32> = y_vec_i64.iter().map(|x| *x as u32).collect();

// Train the model
let lr = LinearRegression::fit(&x, &y_vec_u32, Default::default()).unwrap();

In [None]:
y_vec_u32

Read in input data

In [None]:
// Read the JSON data
let mut file = File::open("./input.json").expect("file not found");
let mut contents = String::new();
file.read_to_string(&mut contents).expect("something went wrong reading the file");
let v: Value = serde_json::from_str(&contents)?;

// Extract input data and output data from JSON
let input_data = v["input_data"].as_array().unwrap();

// Get the first array from input_data, clone the data to avoid borrowing issues
let input_data: Vec<f64> = input_data.get(0)
    .unwrap()
    .as_array()
    .unwrap()
    .iter()
    .map(|x| x.as_f64().unwrap())
    .collect();

// Format the input data for the model
let x = DenseMatrix::from_2d_array(&[&input_data]);

We call predict() on the model in order to perform inference.

In [None]:
lr.predict(&x).unwrap()

The code below let's you export the trained model and the input data as serialized JSON files which can be imported into the host.

For use in the ZKVM, serializing the model and input data as a byte array is ideal.  The code below exports the trained model and input data as byte arrays in JSON files.

In [None]:
let model_bytes = rmp_serde::to_vec(&lr).unwrap();
let data_bytes = rmp_serde::to_vec(&x).unwrap();

let model_json = serde_json::to_string(&model_bytes)?;
let x_json = serde_json::to_string(&data_bytes)?;

let mut f = File::create("tree_model_bytes.json").expect("unable to create file");
f.write_all(model_json.as_bytes()).expect("Unable to write data");

let mut f1 = File::create("tree_model_data_bytes.json").expect("unable to create file");
f1.write_all(x_json.as_bytes()).expect("Unable to write data");