Skip to content

Regular Expression Flag Placement Error in extract_text Method #1112

@vaishakhRaveendran

Description

@vaishakhRaveendran

I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug
Error in regex_based.py, extract_text method:"global flags not at the start"
Issue with regular expression pattern in ragas_experimental/testset/extractors/regex_based.py. Global flags in regex are incorrectly positioned. Need to move flags to start of pattern or use as separate arguments.

Ragas version: v0.1.10
Python version: 3.11.4

Code to Reproduce

from langchain_community.document_loaders import DirectoryLoader
from ragas_experimental.testset import SimpleTestGenerator
from langchain_community.document_loaders import TextLoader
import pandas as pd
import nest_asyncio
nest_asyncio.apply()

# Initialize the loader

loader = DirectoryLoader("./experimental_notebook")

try:
    # Load the documents
    docs = loader.load()
    print(f"Loaded {len(docs)} documents")

    # Initialize the generator
    generator = SimpleTestGenerator()
    print(" \n Run generator method \n")

    # Generate the test dataset
    testdataset = generator.generate(docs, test_size=10)
    print("TestDataset generated successfully")

    # Convert to pandas DataFrame using the to_pandas method
    df = testdataset.to_pandas()
    print("DataFrame created successfully")

    # Display the first few rows of the DataFrame
    print(df.head())

    # Optionally, save the DataFrame to a CSV file
    df.to_csv("testdataset.csv", index=False)
    print("CSV file saved successfully")

except AttributeError as e:
    print(f"AttributeError: {e}")
except Exception as e:
    print(f"An error occurred: {e}")

Error trace
image

Expected behavior
Expected to extract links, emails and markdown heading and add to the metadata

Additional context
I believe the error is due to an issue with combining different rule-based extractions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions