AttributeError: C when importing WikipediaLoader / WebBaseLoader #18067

brandburner · 2024-02-24T14:22:59Z

Checked other resources

I added a very descriptive title to this issue.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.

Example Code

from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA

# Load the Wikipedia page
loader = WebBaseLoader("https://en.wikipedia.org/wiki/New_York_City")
documents = loader.load()

# Split the text into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

# Create embeddings
embeddings = OpenAIEmbeddings()

# Create a vector store
db = Chroma.from_documents(texts, embeddings, collection_name="wiki-nyc")

# Create a retriever
retriever = db.as_retriever()

# Create a QA chain
llm = OpenAI(temperature=0)
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)

Error Message and Stack Trace (if applicable)

    from langchain_community.document_loaders import WikipediaLoader
  File "/home/runner/Langchain/.pythonlibs/lib/python3.10/site-packages/langchain_community/document_loaders/__init__.py", line 53, in <module>
    from langchain_community.document_loaders.blackboard import BlackboardLoader
  File "/home/runner/Langchain/.pythonlibs/lib/python3.10/site-packages/langchain_community/document_loaders/blackboard.py", line 10, in <module>
    from langchain_community.document_loaders.pdf import PyPDFLoader
  File "/home/runner/Langchain/.pythonlibs/lib/python3.10/site-packages/langchain_community/document_loaders/pdf.py", line 18, in <module>
    from langchain_community.document_loaders.parsers.pdf import (
  File "/home/runner/Langchain/.pythonlibs/lib/python3.10/site-packages/langchain_community/document_loaders/parsers/__init__.py", line 8, in <module>
    from langchain_community.document_loaders.parsers.language import LanguageParser
  File "/home/runner/Langchain/.pythonlibs/lib/python3.10/site-packages/langchain_community/document_loaders/parsers/language/__init__.py", line 1, in <module>
    from langchain_community.document_loaders.parsers.language.language_parser import (
  File "/home/runner/Langchain/.pythonlibs/lib/python3.10/site-packages/langchain_community/document_loaders/parsers/language/language_parser.py", line 39, in <module>
    "c": Language.C,
  File "/nix/store/xf54733x4chbawkh1qvy9i1i4mlscy1c-python3-3.10.11/lib/python3.10/enum.py", line 437, in __getattr__
    raise AttributeError(name) from None
AttributeError: C

Description

When trying a basic script to call the Wikipedia loader or WebBaseLoader (maybe any loader?) I get the error. Here's another example script that throws the same error.

from langchain_community.document_loaders import WikipediaLoader

docs = WikipediaLoader(query="Genesis of the Daleks", load_max_docs=2).load()
len(docs)

docs[0].metadata  # meta-information of the Document

docs[0].page_content[:400]  # a content of the Document

System Info

System Information

OS: Linux
OS Version: #13~22.04.1-Ubuntu SMP Wed Jan 24 23:39:40 UTC 2024
Python Version: 3.10.11 (main, Apr 4 2023, 22:10:32) [GCC 12.2.0]

Package Information

langchain_core: 0.1.26
langchain: 0.1.6
langchain_community: 0.0.24
langsmith: 0.1.7
langchain_openai: 0.0.5
langchainhub: 0.1.14

Packages not installed (Not Necessarily a Problem)

The following packages were not found:

langgraph
langserve

The text was updated successfully, but these errors were encountered:

keenborder786 · 2024-02-25T00:36:31Z

Can you please update your packages version, that should resolve your issue.

brandburner · 2024-02-25T11:06:40Z

Thank you!

dosubot bot added Ɑ: doc loader Related to document loader module (not documentation) 🔌: chroma Primarily related to ChromaDB integrations 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Feb 24, 2024

brandburner closed this as completed Feb 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: C when importing WikipediaLoader / WebBaseLoader #18067

AttributeError: C when importing WikipediaLoader / WebBaseLoader #18067

brandburner commented Feb 24, 2024

keenborder786 commented Feb 25, 2024

brandburner commented Feb 25, 2024

AttributeError: C when importing WikipediaLoader / WebBaseLoader #18067

AttributeError: C when importing WikipediaLoader / WebBaseLoader #18067

Comments

brandburner commented Feb 24, 2024

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

System Information

Package Information

Packages not installed (Not Necessarily a Problem)

keenborder786 commented Feb 25, 2024

brandburner commented Feb 25, 2024