Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: llama-index-readers-github, GithubRepositoryReader no longer supports include filter for specific types of files #10946

Closed
aaronjolson opened this issue Feb 18, 2024 · 3 comments · Fixed by #10949
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized

Comments

@aaronjolson
Copy link

Bug Description

I am working with a script for reading and vectorizing data from github.
This is is the code in the script (it worked with older, <0.10 versions of llama-index)

    while True:
        owner, repo = parse_github_url(github_url)
        if validate_owner_repo(owner, repo):
            loader = GithubRepositoryReader(
                github_client,
                owner=owner,
                repo=repo,
                filter_file_extensions=(
                    [".py", ".js", ".ts", ".md"],
                    GithubRepositoryReader.FilterType.INCLUDE,
                ),
                verbose=False,
                concurrent_requests=5,
            )
            print(f"Loading {repo} repository by {owner}")
            docs = loader.load_data(branch="main")
            print("Documents uploaded:")
            for doc in docs:
                print(doc.metadata)
            break  # Exit the loop once the valid URL is processed
        else:
            print("Invalid GitHub URL. Please try again.")
            github_url = input("Please enter the GitHub repository URL: ")

having fixed my imports to be compatible with >=0.10.5, running the above code generates the following error

Traceback (most recent call last):
  File "C:\Users\aaols\PycharmProjects\experiments\llamaindex_activeloop_vectorize_data_from_github.py", line 123, in <module>
    main()
  File "C:\Users\aaols\PycharmProjects\experiments\llamaindex_activeloop_vectorize_data_from_github.py", line 73, in main
    GithubRepositoryReader.FilterType.INCLUDE,
AttributeError: type object 'GithubRepositoryReader' has no attribute 'FilterType'

Looking at the code for the latest version of GithubRepositoryReader
It looks like the filter_file_extensions arg is no longer supported, nor is GithubRepositoryReader.FilterType

Is this intentional? This reader no longer supports filtering by specified allowlist and instead only supports filtering via a denylist?
In this case, specifying everything I don't want is much bigger task than just specifying the few things that I do want.
If that is not the case, what needs to be updated in order to support the explicit allowlist?

Version

0.10.6

Steps to Reproduce

try calling GithubRepositoryReader.FilterType.INCLUDE with the latest version of llama-index-readers-github

Relevant Logs/Tracbacks

No response

@aaronjolson aaronjolson added bug Something isn't working triage Issue needs to be triaged/prioritized labels Feb 18, 2024
Copy link

dosubot bot commented Feb 18, 2024

🤖

Thank you for bringing this to our attention. It seems like there has been a change in the GithubRepositoryReader class in the latest version of LlamaIndex. I apologize for the inconvenience this has caused.

To better assist you, could you please provide the following additional information:

  1. Could you confirm if you have updated all your dependencies to their latest versions along with LlamaIndex?
  2. Could you share the exact version of the llama-index-readers-github package you are using?
  3. Are there any other parts of your code that interact with the GithubRepositoryReader class or its instances?

This information will help us understand the issue more clearly and provide a more accurate solution. Thank you for your cooperation.

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@logan-markewich
Copy link
Collaborator

@aaronjolson likely a mistake when copying over the readers from llama-hub (there were duplicated readers in llama-index and llama-hub at the time)

I can bring back this option :)

@logan-markewich
Copy link
Collaborator

pip install llama-index-readers-github==0.1.3 -- should be fixed in this version :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants