Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: C when importing WikipediaLoader / WebBaseLoader #18067

Closed
4 tasks done
brandburner opened this issue Feb 24, 2024 · 2 comments
Closed
4 tasks done

AttributeError: C when importing WikipediaLoader / WebBaseLoader #18067

brandburner opened this issue Feb 24, 2024 · 2 comments
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: chroma Primarily related to ChromaDB integrations Ɑ: doc loader Related to document loader module (not documentation)

Comments

@brandburner
Copy link

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.

Example Code

from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA

# Load the Wikipedia page
loader = WebBaseLoader("https://en.wikipedia.org/wiki/New_York_City")
documents = loader.load()

# Split the text into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

# Create embeddings
embeddings = OpenAIEmbeddings()

# Create a vector store
db = Chroma.from_documents(texts, embeddings, collection_name="wiki-nyc")

# Create a retriever
retriever = db.as_retriever()

# Create a QA chain
llm = OpenAI(temperature=0)
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)

Error Message and Stack Trace (if applicable)

    from langchain_community.document_loaders import WikipediaLoader
  File "/home/runner/Langchain/.pythonlibs/lib/python3.10/site-packages/langchain_community/document_loaders/__init__.py", line 53, in <module>
    from langchain_community.document_loaders.blackboard import BlackboardLoader
  File "/home/runner/Langchain/.pythonlibs/lib/python3.10/site-packages/langchain_community/document_loaders/blackboard.py", line 10, in <module>
    from langchain_community.document_loaders.pdf import PyPDFLoader
  File "/home/runner/Langchain/.pythonlibs/lib/python3.10/site-packages/langchain_community/document_loaders/pdf.py", line 18, in <module>
    from langchain_community.document_loaders.parsers.pdf import (
  File "/home/runner/Langchain/.pythonlibs/lib/python3.10/site-packages/langchain_community/document_loaders/parsers/__init__.py", line 8, in <module>
    from langchain_community.document_loaders.parsers.language import LanguageParser
  File "/home/runner/Langchain/.pythonlibs/lib/python3.10/site-packages/langchain_community/document_loaders/parsers/language/__init__.py", line 1, in <module>
    from langchain_community.document_loaders.parsers.language.language_parser import (
  File "/home/runner/Langchain/.pythonlibs/lib/python3.10/site-packages/langchain_community/document_loaders/parsers/language/language_parser.py", line 39, in <module>
    "c": Language.C,
  File "/nix/store/xf54733x4chbawkh1qvy9i1i4mlscy1c-python3-3.10.11/lib/python3.10/enum.py", line 437, in __getattr__
    raise AttributeError(name) from None
AttributeError: C

Description

When trying a basic script to call the Wikipedia loader or WebBaseLoader (maybe any loader?) I get the error. Here's another example script that throws the same error.

from langchain_community.document_loaders import WikipediaLoader

docs = WikipediaLoader(query="Genesis of the Daleks", load_max_docs=2).load()
len(docs)

docs[0].metadata  # meta-information of the Document

docs[0].page_content[:400]  # a content of the Document

System Info

System Information

OS: Linux
OS Version: #13~22.04.1-Ubuntu SMP Wed Jan 24 23:39:40 UTC 2024
Python Version: 3.10.11 (main, Apr 4 2023, 22:10:32) [GCC 12.2.0]

Package Information

langchain_core: 0.1.26
langchain: 0.1.6
langchain_community: 0.0.24
langsmith: 0.1.7
langchain_openai: 0.0.5
langchainhub: 0.1.14

Packages not installed (Not Necessarily a Problem)

The following packages were not found:

langgraph
langserve

@dosubot dosubot bot added Ɑ: doc loader Related to document loader module (not documentation) 🔌: chroma Primarily related to ChromaDB integrations 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Feb 24, 2024
@keenborder786
Copy link
Contributor

Can you please update your packages version, that should resolve your issue.

@brandburner
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: chroma Primarily related to ChromaDB integrations Ɑ: doc loader Related to document loader module (not documentation)
Projects
None yet
Development

No branches or pull requests

2 participants