Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Serde to CountVectorizer, to export and import it #291

Closed
Bastian1110 opened this issue Feb 22, 2023 · 6 comments
Closed

Add Serde to CountVectorizer, to export and import it #291

Bastian1110 opened this issue Feb 22, 2023 · 6 comments

Comments

@Bastian1110
Copy link

Hello again!

I'm trying to make a project trying to compile a Machine Learning model to WASM and be able to put it in a browser using a UI framework, you can see the repository here.
I think it would be very useful if it were easier to export and import already trained models (like joblib with Sklearn) since it would open up a world of possibilities to be able to "embed" machine learning models anywhere!
I have been investigating a bit about how serde and ciborium work, I have managed to export one or another model but it has been very difficult for me. I'd like to help lymph models do this, but my knowledge of Rust is minimal.
Especially if someone could help me tell me how I can export a count vectorizer for my project I would greatly appreciate it.

@YuhanLiin
Copy link
Collaborator

I don't think CountVectorizer has serde support but it should be pretty easy to add. Can you post your code snippet just to be sure?

@Bastian1110
Copy link
Author

Sure !
This is the code Im using to "export" a model using ciborium, this method has worked successfully with linfa-svm .

// In the winequality SVM example after "fiting" the SVM model

let model_value = cbor!(model).unwrap();
let mut vec_model = Vec::new();
let _cebor_writer = ciborium::ser::into_writer(&model_value, &mut vec_model);

//Esporting it to a .cbor file
let path: &Path = Path::new("./model.cbor");
fs::write(path, vec_model).unwrap();

Then, you can import the model and use it, like this :

//Reading the .cbor file and converting it to a ciborium value
let mut file = File::open("./model.cbor").unwrap();
let mut data: Vec<u8> = Vec::new();
file.read_to_end(&mut data).unwrap();
let model_value = ciborium::de::from_reader::<Value, _>(&data[..]).unwrap();

//Creating again the model, but its already trained 
let model: Svm<f64, bool> = model_value.deserialized().unwrap();
println!("{}", model);

This really works with ease (with the SVM-model), but the way I find out that a model doesn't support serde serialization is by trying to pass it to the cbor! macro, when a model does not support serialization, the following error appears:

the trait bound `<MODELNAME>: serde::ser::Serialize` is not satisfied
the following other types implement trait `serde::ser::Serialize`:
  &'a T
  &'a mut T
  ()
  (T0, T1)
  . . .

@Bastian1110
Copy link
Author

I just cloned your repository with the addition of serde support, thank you very much!
I tried to test it inside the extra-serde branch, my test was the same as described in the other comment, I tried to pass the CountVectorizer through the cbor! macro but I get the same error:

//In the countvectorization.rs example of linfa-preprocessing (inside the extra-serve branch)

let vectorizer_value = cbor!(vectorizer).unwrap();

But the following error occurs :

the trait bound `CountVectorizer: serde::ser::Serialize` is not satisfied
the following other types implement trait `serde::ser::Serialize`:
  &'a T
  &'a mut T
  ()
  (T0, T1)

Maybe Im testing it wrong? If so, any other idea on how to test it without merging to the master branch?

@YuhanLiin
Copy link
Collaborator

Did you enable the serde feature on the crate?

@YuhanLiin
Copy link
Collaborator

I just tested with the serde feature enabled and I asserted that CountVectorizer: Serialize holds. I'm going to merge the PR into master and you can test it from there.

@Bastian1110
Copy link
Author

I just tested adding the serde to the features in Cargo.toml and it works!
Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants