Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce state/toolstorage #663

Merged
merged 5 commits into from Sep 4, 2014
Merged

Introduce state/toolstorage #663

merged 5 commits into from Sep 4, 2014

Conversation

axw
Copy link
Contributor

@axw axw commented Sep 3, 2014

state/toolstorage provides an interface, Storage,
for storing tools tarballs and metadata in state.
The tarballs are stored in GridFS, while the
metadata are stored in a Mongo collection.

State grows a single new method, ToolsStorage,
which constructs a toolstorage.StorageCloser.
The returned object will store tools metadata in
the "juju" database's "toolsmetadata" collection,
and the tarballs in the "blobstore" database.

@ericsnowcurrently
Copy link
Contributor

This seems eerily similar to what I've done for backups. I added github.com/juju/utils/filestorage as the abstraction of the file + metadata storage model. The implementation for backups lives in state/backups.go. The big wins were improved test isolation and ease of switching from one storage back end to another (e.g. env storage to gridFS). It wouldn't surprise me if I missed something, but is there any reason not to re-use at least some of what I've done for backups?

// ToolsStorage returns a new toolstorage.StorageCloser
// that stores tools metadata in the "juju" database''
// "toolsmetadata" collection.
func (st *State) ToolsStorage() (toolstorage.StorageCloser, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewToolsStorage() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not keen, as it suggests to me that we're creating (and later destroying) the storage. It's more like we're "opening" the storage.

@axw
Copy link
Contributor Author

axw commented Sep 4, 2014

@ericsnowcurrently There are some similarities, yes. There are some significant differences, too, which could possibly be rectified by I'm not sure. For example, in the tools storage the metadata must not be written if the blob is not written, or clients may find tools that they cannot download.

Other differences are:

  • filestorage.Metadata is fairly generic; we want metadata structured specifically for tools, e.g. so we can do more efficient searching on versions (and parts thereof) later.
  • we need to be able to overwrite metadata in one scenario, but not others, atomically.

The ManagedStorage could potentially adapt to RawFileStorage, but I'm not sure it's worthwhile. ManagedStorage is already an interface that can be mocked out.

managedStorage, session := st.getManagedStorage(uuid)
txnRunner := st.txnRunnerWith(session)
metadataCollection := st.db.With(session).C(toolsmetadataC)
storage := toolstorageNewStorage(uuid, managedStorage, metadataCollection, txnRunner)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can rework stuff here to be able to reuse the existing txnRunner() method and not have to create a new variant. It will require changes to getManagedStorage() and even txnRunner() itself which is only used in 3 places. I think this will make the code cleaner too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed txnRunner to also return the session. I realised while doing this that we don't need the "authenticated" logic around copying sessions anymore, since we run mongo in the tests without auth, and in production we never open state without a tag/password anymore. So, I cleaned it up.

state/toolstorage provides an interface, Storage,
for storing tools tarballs and metadata in state.
The tarballs are stored in GridFS, while the
metadata are stored in a Mongo collection.

State grows a single new method, ToolsStorage,
which constructs a toolstorage.StorageCloser.
The returned object will store tools metadata in
the "juju" database's "toolsmetadata" collection,
and the tarballs in the "blobstore" database.
Don't need the "authenticated" logic around session copying anymore,
as we now run mongo without auth in the tests, and in production we
always open state with a tag/password.

txnRunner now returns the session associated with the runner, and
getManagedStorage and ToolsStorage are updated to use that.
txnRunnerWith is dropped.
@axw
Copy link
Contributor Author

axw commented Sep 4, 2014

PTAL.

closer = emptycloser
// Otherwise the session is copied and a new instance is created
// using that.
func (st *State) txnRunner() (jujutxn.Runner, *mgo.Session, closeFunc) {
if st.transactionRunner != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have done it, but could you please add a comment saying that the transactionRunner attribute is one set by tests so that the tests that use txn hooks work correctly? Normally in production code we fall through to create a new runner each time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@wallyworld
Copy link
Member

LGTM with extra txnRunner cleanup (sorry!)

@axw
Copy link
Contributor Author

axw commented Sep 4, 2014

$$merge$$

@jujubot
Copy link
Collaborator

jujubot commented Sep 4, 2014

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

jujubot added a commit that referenced this pull request Sep 4, 2014
Introduce state/toolstorage

state/toolstorage provides an interface, Storage,
for storing tools tarballs and metadata in state.
The tarballs are stored in GridFS, while the
metadata are stored in a Mongo collection.

State grows a single new method, ToolsStorage,
which constructs a toolstorage.StorageCloser.
The returned object will store tools metadata in
the "juju" database's "toolsmetadata" collection,
and the tarballs in the "blobstore" database.
@jujubot jujubot merged commit d5e7a1d into juju:master Sep 4, 2014
)

// ToolsStorage returns a new toolstorage.StorageCloser
// that stores tools metadata in the "juju" database''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/database''/database's/ ? or just drop the '' entirely?

@ericsnowcurrently
Copy link
Contributor

On Wed, Sep 3, 2014 at 7:22 PM, Andrew Wilkins notifications@github.com
wrote:

@ericsnowcurrently https://github.com/ericsnowcurrently There are some
similarities, yes. There are some significant differences, too, which could
possibly be rectified by I'm not sure.

My intention was that the filestorage package be generally useful. So if
it doesn't fit your needs I feel like that should be fixed in filestorage.
Is there any other issue than what you outlined below?

For example, in the tools storage the metadata must not be written if the
blob is not written, or clients may find tools that they cannot download.

Fair enough. The same is true of backups. I'll make sure is does so. If
it doesn't already, that should be easy to address in the default
FileStorage implementation (filestorage in wrapper.go).

Other differences are:

  • filestorage.Metadata is fairly generic; we want metadata structured
    specifically for tools, e.g. so we can do more efficient searching on
    versions (and parts thereof) later.

Is there any problem with embedding filestorage.Metadata. That's what I
did with backups to add the extra backups-specific data.

Or do you mean that filestorage.Metadata has too much stuff and you want a
smaller type? I'm not sure I buy that since this is not highly-scaled data
where that would matter.

Also, William made it clear to me that he wanted a strict distinction
between doc types we store in mongo and the ones used else where.
Effectively the doc type is a private detail of the MetadataStorage
implementation. I did this for backups (see state/backups.go). Ah, it
looks like you do it as well. :)

  • we need to be able to overwrite metadata in one scenario, but not
    others, atomically.

What metadata needs overwriting? Also, I'm guessing you mean in the
MetadataStorage implementation of the metadata doc, which isn't explicitly
constrained by Metadata. If there is a need to update an existing metadata
doc, it would probably help to have an Update() method on FileStorage and
MetadataStorage, right?

The ManagedStorage could potentially adapt to RawFileStorage, but I'm not
sure it's worthwhile. ManagedStorage is already an interface that can be
mocked out.

Well, I'll argue that blobstore.ManagedStorage should embed RawFileStorage.
I wish I'd known about it when doing the filestorage stuff, which was
originally based on environs/storage.Storage. I would have been more
cognizant and accommodating. Does anything else use ManagedStorage? A
quick grep didn't show me anything. If not then the impact of changing the
interface would be minimal.

Anyway, thanks for giving it some thought. I still think the benefits of
using utils.filestorage are worth it here, after any needed tweaks, for all
the reasons coincident with code re-use.

To be honest I also think filestorage implementations for mongo
(MetadataStorage) and something env-storage-like (RawFileStorage) should
live in utils/filestorage. If blobstore.ManagedStorage doesn't embed
RawFileStorage then at least there should be a facade type for it in
utils/filestorage. At that point we would have common implementations for
mongo+envstorage and mongo+gridfs without them cluttering up juju core and
without risk of core concerns leaking into the file storage abstraction
layer.

However, there isn't exactly time for us to make all that happen. So in
the meantime it would be nice if tools storage leveraged utils/filestorage
directly. While I'm reticent to volunteer to do so before all my
backups/reviewboard work is done, I likewise hesitate to ask you to do that
work when what you've done in "good enough".

Would you mind just aligning toolstorage's Metadata and Storage with their
counterparts in utils/filestorage (i.e. the names and signatures of the
corresponding fields/methods)? That would allow call sites to remain
unchanged if toolstorage later leverages utils/filestorage.

@ericsnowcurrently
Copy link
Contributor

Whoa. That was longer than I expected. FYI, I recognize that you have a better sense of the whole juju picture than I do and that I definitely have a purist/pro-abstractions inclination (i.e. compulsion ). So while I stand by what I say, I try not to be too stubborn about it unless I feel excessively strongly about it. If I haven't made a convincing argument then at the least please correct my incorrect perceptions and take this as an opportunity to enlighten me. :) The project benefits when we share like that!

@axw
Copy link
Contributor Author

axw commented Sep 5, 2014

On Fri, Sep 5, 2014 at 12:10 AM, ericsnowcurrently <notifications@github.com

wrote:

On Wed, Sep 3, 2014 at 7:22 PM, Andrew Wilkins notifications@github.com
wrote:

@ericsnowcurrently https://github.com/ericsnowcurrently There are some
similarities, yes. There are some significant differences, too, which
could
possibly be rectified by I'm not sure.

My intention was that the filestorage package be generally useful. So if
it doesn't fit your needs I feel like that should be fixed in filestorage.
Is there any other issue than what you outlined below?

Not that comes to mind. It would be good to have something common.

For example, in the tools storage the metadata must not be written if the
blob is not written, or clients may find tools that they cannot download.

Fair enough. The same is true of backups. I'll make sure is does so. If
it doesn't already, that should be easy to address in the default
FileStorage implementation (filestorage in wrapper.go).

Cool. I was concerned that we would require different behaviour, but if
that's the case then that's fine.

Other differences are:

  • filestorage.Metadata is fairly generic; we want metadata structured
    specifically for tools, e.g. so we can do more efficient searching on
    versions (and parts thereof) later.

Is there any problem with embedding filestorage.Metadata. That's what I
did with backups to add the extra backups-specific data.

Or do you mean that filestorage.Metadata has too much stuff and you want a
smaller type? I'm not sure I buy that since this is not highly-scaled data
where that would matter.

I'm not concerned about the number of bytes stored. My preference is always
to have a minimal interface, because the API simpler to understand.

I think there would be no problem with using filestorage internally in
toolstorage, but I don't really want to expose more information than is
necessary to the user.

Also, William made it clear to me that he wanted a strict distinction

between doc types we store in mongo and the ones used else where.
Effectively the doc type is a private detail of the MetadataStorage
implementation. I did this for backups (see state/backups.go). Ah, it
looks like you do it as well. :)

  • we need to be able to overwrite metadata in one scenario, but not
    others, atomically.

What metadata needs overwriting? Also, I'm guessing you mean in the
MetadataStorage implementation of the metadata doc, which isn't explicitly
constrained by Metadata. If there is a need to update an existing metadata
doc, it would probably help to have an Update() method on FileStorage and
MetadataStorage, right?

I mean that toolstorage.Storage.AddFile needs to overwrite tools metadata
if it exists.

The ManagedStorage could potentially adapt to RawFileStorage, but I'm not
sure it's worthwhile. ManagedStorage is already an interface that can be
mocked out.

Well, I'll argue that blobstore.ManagedStorage should embed RawFileStorage.
I wish I'd known about it when doing the filestorage stuff, which was
originally based on environs/storage.Storage. I would have been more
cognizant and accommodating. Does anything else use ManagedStorage? A
quick grep didn't show me anything. If not then the impact of changing the
interface would be minimal.

I wish I knew the intent of filestorage before. Sorry, I hadn't looked into
it enough.

juju/blobstore is also used by juju/charmstore.

Anyway, thanks for giving it some thought. I still think the benefits of
using utils.filestorage are worth it here, after any needed tweaks, for all
the reasons coincident with code re-use.

To be honest I also think filestorage implementations for mongo
(MetadataStorage) and something env-storage-like (RawFileStorage) should
live in utils/filestorage. If blobstore.ManagedStorage doesn't embed
RawFileStorage then at least there should be a facade type for it in
utils/filestorage. At that point we would have common implementations for
mongo+envstorage and mongo+gridfs without them cluttering up juju core and
without risk of core concerns leaking into the file storage abstraction
layer.

However, there isn't exactly time for us to make all that happen. So in
the meantime it would be nice if tools storage leveraged utils/filestorage
directly. While I'm reticent to volunteer to do so before all my
backups/reviewboard work is done, I likewise hesitate to ask you to do that
work when what you've done in "good enough".

And I also have a lot of work to do around this.

Would you mind just aligning toolstorage's Metadata and Storage with their
counterparts in utils/filestorage (i.e. the names and signatures of the
corresponding fields/methods)? That would allow call sites to remain
unchanged if toolstorage later leverages utils/filestorage.

I will have a look when I get my current work under control

@ericsnowcurrently
Copy link
Contributor

Cool. Thanks for following up on that Andrew. FYI, the implementation for FileStorage currently will leave the metadata stored even if storing the file itself fails. I'll work up a patch to fix that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants