Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save and load operators in human-readable format #122

Closed
kevinsung opened this issue Dec 11, 2017 · 18 comments
Closed

Save and load operators in human-readable format #122

kevinsung opened this issue Dec 11, 2017 · 18 comments

Comments

@kevinsung
Copy link
Collaborator

kevinsung commented Dec 11, 2017

Currently, the save_operator function in utils/_operator_utils.py saves Fermion and QubitOperators using the data serialization module marshal. I think it's desirable to save these objects in a human-readable format, like JSON. What do people think? This would probably best be implemented after resolving issue #43.

Also, I'm wondering why marshal was chosen rather than the more standard pickle or cPickle.

@babbush
Copy link
Contributor

babbush commented Dec 11, 2017

We actually had a large discussion about this back in the early days of FermiLib. See
ProjectQ-Framework/FermiLib#91

The long story short is that pickle has security issues and also is less efficient. Back in the early days of the project (when we were still developing it with tight ProjectQ integration under the name FermiLib), we did use pickle initially. However, myself and some collaborators started working on a project that involved computation of FermionOperators that are tens of gigabytes and the size of the pickle objects became a real issue for us. However, I think there are also some concerns about marshal that were never really resolved.

Looping @Spaceenter into the conversation since he implemented marshal for us.

@idk3
Copy link
Collaborator

idk3 commented Dec 11, 2017

Direct text representations are also possible and easy to implement. We could allow for different modes in our save_to_file function that save in different formats - thoughts? Would that address things for you @kevinsung ?

@kevinsung
Copy link
Collaborator Author

Yeah, I think adding an option to the save_operator to save it in JSON format would be good.

@Spaceenter
Copy link
Collaborator

I think the reason we chose marshal over JSON a few months ago is performance:

  1. I think we originally implemented using JSON, but @idk3 found it uses too much memory, and switching to marshal reduced the memory usage by an order of magnitude.

  2. Another issue was that JSON requires every key (and value?) must be string, so we end up having lots of conversions from number to string and string to number, which is not efficient.

That being said, I agree JSON is more readable for very small operators when performance is not a concern. However, if you want human-readable format, is the str method of the operators sufficient? @kevinsung

@idk3
Copy link
Collaborator

idk3 commented Dec 11, 2017 via email

@kevinsung
Copy link
Collaborator Author

@Spaceenter Yeah the str method prints it out in a nice format. I like @idk3 's suggestion to expand init to allow longer strings, rather than just a single term.

@kevinsung kevinsung changed the title Saving and loading operators Save and load operators in human-readable format Dec 22, 2017
@babbush
Copy link
Contributor

babbush commented Jan 18, 2018

Yes, that is a great suggestion. I think we should open a new issue about that functionality.

@babbush
Copy link
Contributor

babbush commented Feb 6, 2018

The string initialization functionality has now been expanded. Accordingly, if anyone wants to work on this, go ahead!

@quid256
Copy link
Contributor

quid256 commented Apr 11, 2018

Hi, I'm Chris, and I'd like to start getting involved with OpenFermion.

Has this issue been resolved? It seems as though SymbolicOperator already has a __str__ implementation defined which gives a human-readable string representation of the operator?

@kevinsung
Copy link
Collaborator Author

kevinsung commented Apr 11, 2018

This issue is not resolved yet. Yes, this should use the __str__ method of SymbolicOperator. So the main thing here is to figure out a nice way to support saving to and loading from multiple file types (this will add support to plain text files). One possibility is to have the file type detected via the file name extension.

@babbush
Copy link
Contributor

babbush commented Apr 12, 2018

@jarrodmcc always seems to have opinions about how things should be saved. Any thoughts Jarrod? Chris, we should discuss the solution here ideally before you make your PR.

@jarrodmcc
Copy link
Contributor

One key question is why we wanted to human readable format to begin with, and do we really have applications of reading in such a file rather than just outputting one? Knowing this helps determine what we really want to do here.

Something like JSON has massive overhead and not what I would call exceptional readability. Other ASCII formats are more readable, but probably require special parsers.

My initial inclination would be to keep what we have, and add as Ian suggested a flag to save_to_file to dump to a text format dictated by str. A secondary method could allow one to load in the same way. However I think we should keep the options simple, adding a bunch of different file formats could just increase maintenance complexity later unless we have a good reason for having that format.

@quid256
Copy link
Contributor

quid256 commented Apr 13, 2018

I'm not really sure about the context in which @kevinsung proposed this feature, but my understanding is that a human-readable storage format might be useful in interfacing with other applications or for inspecting simpler operators manually.

I can't speak to what kind of overhead might be incurred by using JSON in these situations, but do you think that the following example is readable enough?

{
    "type": "FermionOperator",
    "terms": [
        {
            "coeff": [-3.17, 0]
            "factors": [
                [1, 1], [2, 1], [3, 0], [4, 0]
            ]
        }, {
            "coeff": [-3.17, 0]
            "factors": [
                [4, 1], [3, 1], [2, 0], [1, 0]
            ]
        }
    ]
}

Would this be a reasonable way to encode an operator? Is there any other information that should be included or any naming discrepancy that should be corrected?

Would it be more human-readable if the list of factors was replaced with the output from the __str__ method of SymbolicOperator?

I figure that it would make sense to add a flag to save_operator and load_operator to specify whether a more human-readable format is desired like people have suggested. I don't think there would be much need for many other file formats, unless I'm missing something about why additional formats are useful?

@kevinsung
Copy link
Collaborator Author

kevinsung commented Apr 13, 2018

I actually just wanted to be able to save and load using the __str__ method. I was intending to use this only for small operators which I sometimes want revisit and inspect visually. Now that I think of it, I don't really care about loading from such a format, but only saving. I think simply adding an option to save and load a text file using the __str__ method is a good solution.

@babbush
Copy link
Contributor

babbush commented Apr 14, 2018

Currently we have no mechanism for saving and loading FermionOperators and QubitOperators, right? I think the idea was literally just to have a function for saving and loading to/from a text file that is a dump of what comes out when you call repr(my_op).

This is not the most numerically performant option, but it is certainly human readable, and would be a quick and simple solution that should suffice for most purposes that I can imagine.

@kevinsung
Copy link
Collaborator Author

We currently have the functions save_operator and load_operator in utils/_operator_utils.py. Yes, I was envisioning simply adding an option to these which allows using whatever is returned by str(operator). It seems that's also what @jarrodmcc was suggesting, so I think we can go ahead with that, @quid256 , if you like.

@quid256
Copy link
Contributor

quid256 commented Apr 23, 2018

@kevinsung is this issue closed now?

@kevinsung
Copy link
Collaborator Author

Ah, yes. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants