-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug
Error in regex_based.py, extract_text method:"global flags not at the start"
Issue with regular expression pattern in ragas_experimental/testset/extractors/regex_based.py. Global flags in regex are incorrectly positioned. Need to move flags to start of pattern or use as separate arguments.
Ragas version: v0.1.10
Python version: 3.11.4
Code to Reproduce
from langchain_community.document_loaders import DirectoryLoader
from ragas_experimental.testset import SimpleTestGenerator
from langchain_community.document_loaders import TextLoader
import pandas as pd
import nest_asyncio
nest_asyncio.apply()
# Initialize the loader
loader = DirectoryLoader("./experimental_notebook")
try:
# Load the documents
docs = loader.load()
print(f"Loaded {len(docs)} documents")
# Initialize the generator
generator = SimpleTestGenerator()
print(" \n Run generator method \n")
# Generate the test dataset
testdataset = generator.generate(docs, test_size=10)
print("TestDataset generated successfully")
# Convert to pandas DataFrame using the to_pandas method
df = testdataset.to_pandas()
print("DataFrame created successfully")
# Display the first few rows of the DataFrame
print(df.head())
# Optionally, save the DataFrame to a CSV file
df.to_csv("testdataset.csv", index=False)
print("CSV file saved successfully")
except AttributeError as e:
print(f"AttributeError: {e}")
except Exception as e:
print(f"An error occurred: {e}")Expected behavior
Expected to extract links, emails and markdown heading and add to the metadata
Additional context
I believe the error is due to an issue with combining different rule-based extractions.
dosubot
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working
