Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repository.get_contents does not return directory information #140

Closed
ksookocheff-va opened this issue Feb 12, 2013 · 7 comments
Closed

Comments

@ksookocheff-va
Copy link

The GitHub API returns a list of all files in a directory when you use Get Contents on a directory.

example:
https://api.github.com/repos/twitter/bootstrap/contents/js/?ref=d28343dc3ad53a411ae3685e7d6a7866c8c22d6b

Currently PyGithub only returns None when using this API to query a directory.

@ghost ghost assigned jacquev6 Feb 13, 2013
@jacquev6
Copy link
Member

I'll look at that soon, thank you for reporting.

@jacquev6
Copy link
Member

http://developer.github.com/v3/repos/contents/#get-contents does not document the response in the case :path is a directory. In that case, Github returns a list of files. This is why I missed the "contents of any file or directory" in this doc.

I will have to return a different type depending on whether path points to a file or a directory. So, the client will have to know beforehand if he requests the content of a file or a directory.

It doesn't make sense to mix this in the same method. I will add a Repository.get_dir_contents method, returning a list of ContentFile. I will also add an explicit alias Repository.get_file_contents for the current Repository.get_contents.

@bilderbuchi
Copy link

so, what happens when you do Repository.get_dir_contents on a directory which contains files and other directories? e.g. how would you crawl a directory tree using this? you'd need some way to return the directories, too?
according to the link in the first post, the github API returns also a "type" dictionary key which is either "file" or "dir". so maybe a good way is to return a list of Content and differentiate between ContentFile and ContentDir with another mechanism?

@jacquev6
Copy link
Member

Something like this (pseudo-code, obviously not tested):

def crawl(dir_path, process_file, process_dir):
    for c in repo.get_dir_contents(dir_path):  # c is a ContentFile
        if c.type == "file":
             process_file(c)
        else:  # c.type == "dir"
             process_dir(c)
             crawl(c.path)

The difference between https://api.github.com/repos/twitter/bootstrap/contents/js/bootstrap-affix.js (Repository.get_file_contents) and https://api.github.com/repos/twitter/bootstrap/contents/js (Repository.get_dir_contents) is that the first one return a structure (ContentFile) and second one returns a list of the structure returned by the first one (list of ContentFile).

In the pseudo-code above, the ContentFile passed to process_file will be built from:

{
    "sha": "960f2af85a7ced44c4e3190255ee3092c3665bbb",
    "size": 8320,
    "name": "bootstrap-typeahead.js",
    "path": "js/bootstrap-typeahead.js",
    "type": "file",
    "url": "https://api.github.com/repos/twitter/bootstrap/contents/js/bootstrap-typeahead.js",
    "git_url": "https://api.github.com/repos/twitter/bootstrap/git/blobs/960f2af85a7ced44c4e3190255ee3092c3665bbb",
    "html_url": "https://github.com/twitter/bootstrap/blob/master/js/bootstrap-typeahead.js",
    "_links": {
        "self": "https://api.github.com/repos/twitter/bootstrap/contents/js/bootstrap-typeahead.js",
        "git": "https://api.github.com/repos/twitter/bootstrap/git/blobs/960f2af85a7ced44c4e3190255ee3092c3665bbb",
        "html": "https://github.com/twitter/bootstrap/blob/master/js/bootstrap-typeahead.js"
    }
}

and will be lazy-completed by calling https://api.github.com/repos/twitter/bootstrap/contents/js/bootstrap-typeahead.js if c.contents or c.encoding is called.

The ContentFile passed to process_dir will be built from:

{
    "sha": "f1ad7515dc05d0e2bc60f7c292e4f2134dcd91cf",
    "size": 0,
    "name": "tests",
    "path": "js/tests",
    "type": "dir",
    "url": "https://api.github.com/repos/twitter/bootstrap/contents/js/tests",
    "git_url": "https://api.github.com/repos/twitter/bootstrap/git/trees/f1ad7515dc05d0e2bc60f7c292e4f2134dcd91cf",
    "html_url": "https://github.com/twitter/bootstrap/tree/master/js/tests",
    "_links": {
        "self": "https://api.github.com/repos/twitter/bootstrap/contents/js/tests",
        "git": "https://api.github.com/repos/twitter/bootstrap/git/trees/f1ad7515dc05d0e2bc60f7c292e4f2134dcd91cf",
        "html": "https://github.com/twitter/bootstrap/tree/master/js/tests"
    }
}

and will not be lazy-completed. (There is something inconsistent in Github API v3 here: calling the url field of this object does not return the same object as it usually does, but the list of objects in the directory.)

@jacquev6
Copy link
Member

It's now implemented in branch develop. It will be in next release, probably tomorrow.

@ksookocheff-va @bilderbuchi I close the issue, but do not hesitate to continue discussion here if needed.

@bilderbuchi
Copy link

Thanks. :-)

@vipulgupta2048
Copy link

vipulgupta2048 commented May 30, 2018

Hi, how would one go about finding directories through PyGitHub? I just need to find if the directory exist in the repo or not.
Using repo.get_dir_contents('screenshot/') or repo.get_dir_contents('screenshots/')
Is this correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants