Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support uploading directories #492

Closed

Conversation

stephenplusplus
Copy link
Contributor

Fixes #412

To Dos

  • Unit Tests
  • System Tests
  • Docs

This is a work in progress, but is an attempt to make a sane API for uploading directories.

zoo/
  chipmunks/
    chipmunk-1.jpg
  pandas/
    panda-1.jpg
  zoo-photos.zip
bucket.uploadDirectory("zoo", function (err, files) {
  files = [
    File("chipmunks/chipmunk-1.jpg"),
    File("pandas/panda-1.jpg"),
    File("zoo-photos.jpg")
  ]
})

Globs

bucket.uploadDirectory is a wrapper around bucket.upload, which now supports globs:

bucket.upload("zoo/**/*.jpg", function (err, files) {
  files = [
    File("chipmunk-1.jpg"),
    File("panda-1.jpg")
  ]
})

Note that the resulting file names aren't prefixed with a directory path (e.g. "chipmunks/chipmunk-1.jpg" vs "chipmunk-1.jpg"). Since a glob can include several files and directories in various locations on a hard drive, by default, just the filename is used as the uploaded file's name.

By specifying options.basePath, a user can tell us the common parent of all of the locations their glob matches, which is then used as a prefix for the file. This is how bucket.uploadDirectory works (simplified):

Bucket.prototype.uploadDirectory = function (directoryPath) {
  bucket.upload(path.join(directoryPath, "**/*"), { basePath: directoryPath })
}

Extending globs

bucket.upload accepts either a single glob or array of globs. It's passed directly to sindresorhus/globby, which uses isaacs/minimatch and isaacs/node-glob to return the matching paths. All minimatch glob patterns are accepted, as well as node-glob options via options.globOptions.

@stephenplusplus stephenplusplus added api: storage Issues related to the Cloud Storage API. don't merge labels Apr 13, 2015
@stephenplusplus stephenplusplus added this to the Storage Future milestone Apr 13, 2015
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Apr 13, 2015
@stephenplusplus
Copy link
Contributor Author

@ryanseys - If you can take a quick look at the first post to see if this is a useful addition, I'll start working on the tests for these to get it ready for code review & merge.

@ryanseys
Copy link
Contributor

Yes this looks ok. What approach is taken when filenames conflict. E.g. /img/chipmunks/chippy-1.jpg and /img/woodchippers/chippy-1.jpg will both be uploaded as chippy-1.jpg when manually using bucket.upload('/img/**/*.jpg', function(err, files) {}); Does this just mean that one will be overwritten or can we somehow tell bucket.upload to be smart about it so that nothing is overwritten?

I'm thinking maybe we don't even need bucket.uploadDirectory (???), why not just fs.stat to figure out if the name given is a directory, and then take the approach of using basePath in that case. If they give a glob, maybe they can specify a { flatten: true } to explictly say they want to overwrite files, otherwise upload and maintain the structure from their base path?

Sorry if this doesn't make sense.

@ryanseys
Copy link
Contributor

Or just always maintain structure when given a directory. If given a glob, always flatten. Seems reasonable.

@ryanseys
Copy link
Contributor

I think that's probably what you're doing already except it's strewn across upload and uploadDirectory where I'm not convinced we need both.

@stephenplusplus
Copy link
Contributor Author

What approach is taken when filenames conflict

The last one will win.

I'm thinking maybe we don't even need bucket.uploadDirectory (???), why not just fs.stat to figure out if the name given is a directory, and then take the approach of using basePath in that case.

I like this!

@ryanseys
Copy link
Contributor

If it makes the logic of upload infinitely more difficult, I'd say it's not worth it, so if this is already effectively doing the same thing as:

just always maintain structure when given a directory. If given a glob, always flatten.

then I don't see any reason to change it now. But if it's easy to change, that'd be cool :)

@stephenplusplus
Copy link
Contributor Author

I played around with it for a bit, but I couldn't find a way to make the change elegantly. For now, I'll continue with the two methods (upload for globs, uploadDirectory for directories), and maybe revisit later if a solution strikes me.

@ryanseys
Copy link
Contributor

No worries. Might be beneficial and more intuitive for the users anyway. At
least that can be our excuse :)

On Thursday, April 16, 2015, Stephen Sawchuk notifications@github.com
wrote:

I played around with it for a bit, but I couldn't find a way to make the
change elegantly. For now, I'll continue with the two methods (upload for
globs, uploadDirectory for directories), and maybe revisit later if a
solution strikes me.


Reply to this email directly or view it on GitHub
#492 (comment)
.

@stephenplusplus
Copy link
Contributor Author

Ready for a closer look 👓

};
});

async.parallelLimit(uploadFileFns, MAX_PARALLEL_UPLOADS, function() {

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

@stephenplusplus
Copy link
Contributor Author

(cont'd from #492 (comment))

After much thought and attempts to implement this, I don't think we need to support uploading directories or globs.

It's quite straightforward for a user to implement uploading a directory in their app:

var async = require("async")
var fs = require("fs")
var storage = require("gcloud")

var myBucket = gcloud.storage().bucket("pics")
uploadDirectory(myBucket, "/Users/stephen/Photos", function (err, files) {})

function uploadDirectory(bucket, directory, callback) {
  fs.readdir(directory, function (err, files) {
    async.map(files, bucket.upload, callback)
  })
}

If they want other things to happen, like uploading the files into a new directory in their bucket, recursing through child directories and maintaining that hierarchy, etc., their code is the best place to handle that. Our supporting it It leads to either too much guess work or confusing options.

@ryanseys
Copy link
Contributor

Globs I agree are a little fancy and require weird options but just
uploading a path where the path is a directory should be easy to support.
May I take a stab at it? :) I'll work off of what you've already created.

On Monday, April 20, 2015, Stephen Sawchuk notifications@github.com wrote:

(cont'd from #492 (comment)
#492 (comment)
)

After much thought and attempts to implement this, I don't think we need
to support uploading directories or globs.

It's quite straightforward for a user to implement uploading a directory
in their app:

var async = require("async")var fs = require("fs")var storage = require("gcloud")
var myBucket = gcloud.storage().bucket("pics")
uploadDirectory(myBucket, "/Users/stephen/Photos", function (err, files) {})
function uploadDirectory(bucket, directory, callback) {
fs.readdir(directory, function (err, files) {
async.map(files, bucket.upload, callback)
})
}

If they want other things to happen, like uploading the files into a new
directory in their bucket, recursing through child directories and
maintaining that hierarchy, etc., their code is the best place to handle
that. Our supporting it It leads to either too much guess work or confusing
options.


Reply to this email directly or view it on GitHub
#492 (comment)
.

@stephenplusplus
Copy link
Contributor Author

Sure, your code has yet to leave me unimpressed!

@stephenplusplus
Copy link
Contributor Author

I still think this is outside the scope of what our lib should support. We allow uploading files via upload() and a writable stream. With those two things, a user who wants to mass upload multiple files can make it happen using any utility library / native looping mechanism they choose. I'm going to close as I don't think this is a large priority.

@ryanseys with all of your free time, if you're still interesting in taking a stab at a smaller implementation of this, please feel free :)

@ryanseys
Copy link
Contributor

@ryanseys with all of your free time

Hahaha!

Like you said in #412, I believe this is not too difficult for a user to implement given our library and would allow them more flexibility than we could provide through our library in terms of handling errors and other responses on the fly.

sofisl pushed a commit that referenced this pull request Jan 17, 2023
sofisl pushed a commit that referenced this pull request Sep 14, 2023
[![WhiteSource Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [@types/tmp](https://togithub.com/DefinitelyTyped/DefinitelyTyped) | [`0.2.1` -> `0.2.2`](https://renovatebot.com/diffs/npm/@types%2ftmp/0.2.1/0.2.2) | [![age](https://badges.renovateapi.com/packages/npm/@types%2ftmp/0.2.2/age-slim)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://badges.renovateapi.com/packages/npm/@types%2ftmp/0.2.2/adoption-slim)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://badges.renovateapi.com/packages/npm/@types%2ftmp/0.2.2/compatibility-slim/0.2.1)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://badges.renovateapi.com/packages/npm/@types%2ftmp/0.2.2/confidence-slim/0.2.1)](https://docs.renovatebot.com/merge-confidence/) |

---

### Configuration

📅 **Schedule**: "after 9am and before 3pm" (UTC).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, click this checkbox.

---

This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#github/googleapis/nodejs-dns).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support globs in upload
3 participants