Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TOOLS-2667: Support list of files for put and get subcommands in mongofiles #283

Merged
merged 26 commits into from Jul 31, 2020

Conversation

hariamoor-zz
Copy link
Contributor

@hariamoor-zz hariamoor-zz commented Jul 22, 2020

This PR implements logic to support a list of supporting arguments, i.e. file names, for mongofiles put and mongofiles get. The intended behavior is as follows:

The following (1) should put the specified files in the remote GridFS filesystem:

mongofiles put first.bson second.bson third.bson

The following (2) should put all files of the form *.bson in the current directory or any of its subdirectories:

mongofiles put **/*.bson

mongofiles should not evaluate the glob expression logic -- the underlying shell should evaluate the glob expression and call mongofiles with a command of the form specified in (1).

Furthermore, the following command should get the specified files from the remote GridFS filesystem:

mongofiles get first.bson second.bson third.bson

Notice that glob expressions do not work for mongofiles get ... -- this is because globs are generally used to index files within a Unix filesystem, which does not make sense relative to GridFS. However, TOOLS-2668 provides (upon completion) a mongofiles get_regex ... subcommand, which get's all the files in GridFS that match a given PCRE.

Test Plan: We introduce two new integration tests for the multivariadic cases for get and put. It is OK to remove the old test cases, because they're "homomorphic" to the new ones (the new ones test the subcommands for n files, and the old ones test for n=1 files).

Other than that, there are some new unittests regarding argument parsing.

@hariamoor-zz hariamoor-zz marked this pull request as ready for review July 23, 2020 19:45
@hariamoor-zz hariamoor-zz self-assigned this Jul 24, 2020
Copy link
Contributor

@huan-Mongo huan-Mongo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! I have some questions mainly about the tests and the $in

mongofiles/mongofiles.go Show resolved Hide resolved
mongofiles/mongofiles.go Show resolved Hide resolved
mongofiles/mongofiles_test.go Show resolved Hide resolved
newFile := util.ToUniversalPath("testdata/" + fmt.Sprintf(copyFormat, i))

// Makes new copies of lorem ipsum file
err = func(src, dst string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to create the files on the fly, it's easier for the users if those files were pre-generated. And we might want to test with files with different content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you suggest that I create/download some new lorem ipsum files for this purpose?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, you can just generate these files locally and push them with this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can generate two more lorem ipsum files of the same format and use stat(2) to keep track of the number of bytes.

Convey("and files should exist in gridfs", func() {
bytesGotten, err := getFilesAndBytesListFromGridFS()
So(err, ShouldBeNil)
So(len(bytesGotten), ShouldEqual, len(testFiles)+1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be checked as well for total bytes gotten

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you mean, since I check for the number of bytes gotten in lines 556-557. Could you be more specific about what exactly I'm missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check number of files in bytesGotten -- GridFS should contain the test files and only the test files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought a way to improve the current test, for get test, instead of trying to output all the files written to db, maybe you can write 4 files and have 3 filenames in the list?

mongofiles/mongofiles_test.go Show resolved Hide resolved
mongofiles/options_test.go Show resolved Hide resolved
Copy link
Collaborator

@mattChiaravalloti mattChiaravalloti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good effort! There is still some more refactoring to be done, though!

mongofiles/mongofiles.go Outdated Show resolved Hide resolved
mongofiles/mongofiles.go Outdated Show resolved Hide resolved
// (see TOOLS-2667). Otherwise, if mongofiles --put_id is called, then
// preserve existing behavior.
case Put:
for _, file := range mf.FileNameList {

This comment was marked as outdated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

mongofiles/mongofiles.go Outdated Show resolved Hide resolved
mongofiles/mongofiles.go Outdated Show resolved Hide resolved
Copy link
Member

@tfogo tfogo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may have been confusing because there was a mistake in the ticket title, but these aren't options --put and --get, they're commands put and get. So it's mongofiles get <filename> not mongofiles --get <filename>.

@@ -216,40 +234,52 @@ func (mf *MongoFiles) findGFSFiles(query bson.M) (files []*gfsFile, err error) {
}

// Gets the GridFS file the options specify. Use this for the get family of commands.
func (mf *MongoFiles) getTargetGFSFile() (*gfsFile, error) {
func (mf *MongoFiles) getTargetGFSFile() ([]*gfsFile, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is returning more than one file, you need to rename it to getTargetGFSFiles()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider it done.

mongofiles/mongofiles.go Show resolved Hide resolved
mongofiles/options_test.go Show resolved Hide resolved
@hariamoor-zz
Copy link
Contributor Author

This may have been confusing because there was a mistake in the ticket title, but these aren't options --put and --get, they're commands put and get. So it's mongofiles get <filename> not mongofiles --get <filename>.

Thanks for the clarification on this! I will edit the comments I wrote with --put and --get as specified.

@hariamoor-zz hariamoor-zz changed the title TOOLS-2667: Support list of files for --put and --get in mongofiles TOOLS-2667: Support list of files for put and get subcommands in mongofiles Jul 27, 2020
Convey("Testing the 'get' command with a file that is in GridFS should", func() {
mf, err := simpleMongoFilesInstanceWithFilename("get", "testfile1")
Convey("Testing the 'get' command with files that are in GridFS should", func() {
testFiles := []string{"testfile1", "testfile2", "testfile3"}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating three test files, i.e. of the form testfile(\d), create four -- then, test that mongofiles ... get ... operates only on three, but not four, test files.

Copy link
Member

@tfogo tfogo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff! I've just got a couple of small recommendations.

case Put:
// If mongofiles --put ... is called, i.e. with multiple supporting
// arguments, then add gridFiles specified in mf.FileNameList
for _, filename := range mf.FileNameList {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should still just be err = mf.handlePut(). The loop should be inside handlePut().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had it implemented that way for consistency the first time around, but @mattChiaravalloti recommended that I do it this way instead. I don't think it'd be a big deal one way or another, in any case.

@mattChiaravalloti Any opinions on this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagreed that the loop should be inside handlePut since it had this code:

n, err := mf.put(id, mf.FileName)
if err != nil { ... }
log("copied n bytes")
log("added mf.FileName")

duplicated in the body of the method. It appeared in a loop if mf.FileNameList was non-empty and outside the loop if it was empty.

Personally, I still think the abstraction as-is makes more sense (handlePut handles putting a single file), but if you want to combine the FileNameList and FileName cases into one that's ok. I advise doing it this way instead of how you initially had it:

func (mf) handlePut() (...) {
    id, err := mf.parseOrCreateID()
    if err != nil { ... }

    if len(mf.FileNameList) == 0 {
        mf.FileNameList = []string{mf.FileName}
    }

    for _, filename := range mf.FileNameList {
        n, err := mf.put(id, filename)
        if err != nil { ... }
        log("...")
    }
    ...
}

This way, instead of having the same code in two branches of a conditional, you make the simplest conditional possible so the rest of the code follows a single path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm okay with the abstraction being for either a single or multiple files. But whatever it is I think it should be consistent with handleGet(). Right now it isn't consistent and that feels a bit messy to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't be against say adding a handleGetID and handlePutID if this helps clean things up. I'm fine with any approach as long as it's consistent.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Making it consistent with handleGet sounds good. No need to start introducing new methods yet. @hariamoor just try to avoid duplicate code inside the body of handlePut!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattChiaravalloti Followed your suggestion in be873f0.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you. It looks good 👍 Good thing we caught the issue of having the id logic in the loop (since my suggestion above is incorrect).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Matt and I came to the understanding off-band that each file should have a unique ID. That's why I deviate from his suggestion and put parseOrCreateID in the loop here.


case PutID:
err = mf.handlePut(mf.FileName)
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to do the error check here. The error gets returned at the end of the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Collaborator

@mattChiaravalloti mattChiaravalloti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good 👍

@hariamoor-zz hariamoor-zz requested a review from tfogo July 29, 2020 18:28
Copy link
Contributor

@huan-Mongo huan-Mongo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

Copy link
Member

@tfogo tfogo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Well done!

@hariamoor-zz hariamoor-zz merged commit de2083b into mongodb:master Jul 31, 2020
@hariamoor-zz hariamoor-zz deleted the tools-2667 branch July 31, 2020 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants