This Python script scrapes data from the Supremo Tribunal Federal (STF) website jurisprudencia.stf.jus.br. It utilizes Playwright for browser automation and Selectolax for HTML parsing.
Before running the script, make sure you have the following installed:
- Python 3.x
- Playwright Python library (
playwright
) - BeautifulSoup Python library (
BeautifulSoup
) - Browser
You can install the dependencies using pip:
pip install playwright beautifulsoup4
You can install the Chromium using apt-get:
sudo apt-get install chromium
- Clone this repository to your local machine:
git clone https://github.com/mvdiogo/stf-web-scraper.git
- Navigate to the project directory:
cd stf-web-scraper
- Run the script:
python app.py
- The script will launch a browser window, navigate to the STF website, scrape the data based on the specified base and subject, and print the results to the console.
Contributions are welcome! If you find any issues or want to add new features, feel free to open an issue or submit a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.