-
Notifications
You must be signed in to change notification settings - Fork 13.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GithubFileLoader API Error #17453
Comments
🤖 Hey there @PrashantDixit0, fancy seeing you here again! Hope this message finds you coding well 😉 Based on the error message you provided, it seems like the URL being used to access the GitHub API is incorrect. The error message indicates a 404 error for the URL The correct base URL for the GitHub API is Please try updating the loader = GithubFileLoader(
repo="langchain-ai/langchain", # the repo name
access_token="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",
github_api_url="https://api.github.com",
file_filter=lambda file_path: file_path.endswith(
".md"
), # load all markdowns files.
) If you're still encountering issues after making this change, it would be helpful to see the full traceback of the error. This can provide more context about what's causing the error and where in the code it's occurring. Also, please ensure that your GitHub personal access token is correct and has the necessary permissions to access the repository. If the token is incorrect or doesn't have the necessary permissions, this could also cause a 404 error. Lastly, please verify that the repository For more information about the Sources
|
I am already updating |
@shufanhao @baskaryan I am following this documentation page as a reference and using the same code snippet given, but still getting same error |
loader = GithubFileLoader(
repo="langchain-ai/langchain", # the repo name
access_token=os.environ["GITHUB_ACCESS_TOKEN"],
github_api_url="https://api.github.com",
file_filter=lambda file_path: file_path.endswith(
".md"
), # load all markdowns files.
)
docs = loader.load()
|
Same here. I see in the code for GithubFileLoader that it's incorrectly hardcoding the URL
So, not possible to override that part. It's broken. |
Description- - Changed the GitHub endpoint as existing was not working and giving 404 not found error - Also the existing function was failing if file_filter is not passed as the tree api return all paths including directory as well, and when get_file_content was iterating over these path, the function was failing for directory as the api was returning list of files inside the directory, so added a condition to ignore the paths if it a directory - Fixes this issue - #17453 Co-authored-by: Radhika Bansal <Radhika.Bansal@veritas.com>
…i#17622) Description- - Changed the GitHub endpoint as existing was not working and giving 404 not found error - Also the existing function was failing if file_filter is not passed as the tree api return all paths including directory as well, and when get_file_content was iterating over these path, the function was failing for directory as the api was returning list of files inside the directory, so added a condition to ignore the paths if it a directory - Fixes this issue - langchain-ai#17453 Co-authored-by: Radhika Bansal <Radhika.Bansal@veritas.com>
Having the same issue. |
did you use the latest code ? |
I haven't yet tried with the latest. But, I just reviewed the recent updates to that code and it appears to me that it should now work. |
I can confirm it's working with this requirements.txt file.
|
Can some one confirm if the above issue is resolved, i can still reproduce same error with |
@Bluthunder make sure your |
The error still exists, because |
@shufanhao thanks. My error was resolved by adding branch name when instantiating Loader such as below loader = GithubFileLoader(
|
Checked other resources
Example Code
from langchain.document_loaders import GithubFileLoader
loader = GithubFileLoader(
repo="langchain-ai/langchain", # the repo name
access_token="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",
github_api_url="https://api.github.com",
file_filter=lambda file_path: file_path.endswith(
".md"
), # load all markdowns files.
)
documents = loader.load()
print(documents)
Error Message and Stack Trace (if applicable)
No response
Description
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://api.github.com/api/v3/repos/langchain-ai/langchain/git/trees/master?recursive=1
System Info
System Information
Package Information
Packages not installed (Not Necessarily a Problem)
The following packages were not found:
The text was updated successfully, but these errors were encountered: