Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read model and vocabs from memory #192

Closed
panosk opened this issue May 29, 2020 · 7 comments · Fixed by #193
Closed

Read model and vocabs from memory #192

panosk opened this issue May 29, 2020 · 7 comments · Fixed by #193
Labels
enhancement New feature or request

Comments

@panosk
Copy link
Contributor

panosk commented May 29, 2020

Hi @guillaumekln

I would like to load the model file from memory (in a std::vector<unsigned char>) but I think it's not possible as all related methods use at some point the model directory as an std::string. I can see the necessity in this, as the vocabularies and the vmap are also loaded from this directory.

Still, do you think there could be a use case (apart from mine obviously :)) for some overrides with arguments that will accept std::strings pointing directly to the model and the vocabularies?

@guillaumekln
Copy link
Collaborator

Hi,

Currently, the content of the model directory is an implementation detail and its structure could change in the future.

I'm not sure when would you need to have the model files in memory? As you should load the actual files at some point, isn't it equivalent to build a Model instance and then move it around (the instance being the in-memory representation of the model on disk)? Or are you dynamically generating the model and vocabularies?

@panosk
Copy link
Contributor Author

panosk commented May 29, 2020

I want to extract the files in memory as std::vector<unsigned char>s directly from an archive, for simplicity and security reasons. Then, I would like to reinterpret_cast them in the right type and use them as normally at runtime. Could that work?

Another case I can think of is if someone wants to share vocabularies or vmaps among different models.

@guillaumekln
Copy link
Collaborator

guillaumekln commented May 29, 2020

I'm thinking we could add a ModelReader interface with the following methods (to be refined):

  • std::unique_ptr<std::istream> get_model_binary()
  • bool is_vocabulary_shared() (if true we build a single Vocabulary instance)
  • std::unique_ptr<std::istream> get_source_vocabulary()
  • std::unique_ptr<std::istream> get_target_vocabulary()
  • std::unique_ptr<std::istream> get_vocabulary_mapping() (returns nullptr if no vocabulary mapping)

Then we add a new overload that accepts an instance implementing this interface.

In you code, you would need to extend this base class and implement your own loading logic. I think there are ways to wrap a stringstream over an existing buffer.

What do you think?

@panosk
Copy link
Contributor Author

panosk commented May 29, 2020

That would be great! I need to get a bit deeper into the relevant classes and interfaces in the code, but I'd like to get a first grasp of your idea and the work this will involve: all these methods should be templates with arguments and the derived class will implement the conversion of these arguments into the right types, do I get it right?

@guillaumekln
Copy link
Collaborator

ModelReader would be an abstract class with methods left unimplemented. Derived classes (such as ModelFileReader) can implement arbitrary loading logic as long as they meet the interface: returning a stream over the requested objects.

The main goal is to not integrate your loading logic into the main codebase, as it is very specific to your use case, but allow to plug it.

Do you want to take this one? I can also implement it if you prefer.

@guillaumekln guillaumekln added the enhancement New feature or request label May 29, 2020
@guillaumekln
Copy link
Collaborator

Actually a single method could be enough, assuming you could easily map a filename to a stream:

std::unique_ptr<std::istream> get_file(const std::string& filename)

@panosk
Copy link
Contributor Author

panosk commented May 29, 2020

ModelReader would be an abstract class with methods left unimplemented. Derived classes (such as ModelFileReader) can implement arbitrary loading logic as long as they meet the interface: returning a stream over the requested objects.

The main goal is to not integrate your loading logic into the main codebase, as it is very specific to your use case, but allow to plug it.

Yes, absolutely, I get the general idea and it's great as long as I have a way to implement my logic.

Do you want to take this one? I can also implement it if you prefer.

I would love to work on that, but honestly you already have implemented it in your mind, along with alternatives :), so I would prefer and appreciate it if you could add it --it would save us great time and effort.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants