web-data-extractor is a Python tool for extracting data from web pages using BeautifulSoup and regular expressions. This tool allows you to fetch a webpage and extract specific data using a provided regular expression pattern.
- Fetches web pages using provided URLs.
- Extracts data using BeautifulSoup and regular expressions.
- Command-line interface for easy usage.
-
Clone the repository:
git clone https://github.com/your-username/web-data-extractor.git cd web-data-extractor -
Create and activate a virtual environment (optional but recommended):
python3 -m venv venv source venv/bin/activate -
Install the required dependencies:
pip install -r requirements.txt
Run the script from the command line with the URL and regular expression as arguments:
python web_data_extractor.py <URL> <REGEX_PATTERN>- Python 3.6 or higher
- BeautifulSoup4
- Requests
Contributions are welcome! Please fork the repository and submit a pull request with your changes.
- Fork the repository
- Create a new branch (git checkout -b feature-branch)
- Commit your changes (git commit -m 'Add some feature')
- Push to the branch (git push origin feature-branch)
- Open a pull request
- If you have any questions or suggestions, feel free to open an issue or contact the repository owner.