This project involves building a Python scraper using BeautifulSoup to extract information from an Arabic Library website. The data collected includes the author, title, language, pages, publishing house, size, format, category, and URL. The data is stored in a clean CSV file, ready for further analysis or machine learning models.
The goal of this project is to demonstrate the ability to scrape data from a website and store it in a structured format for analysis. The Arabic Library website was chosen as the target site for this project.
- Extracts detailed information about books from the Arabic Library website.
- Stores the scraped data in a clean CSV file.
- Ready for further analysis or machine learning models.
- Python
- BeautifulSoup
- Requests
- Pandas
- Clone the repository:
git clone https://github.com/husseini2000/Web_Scraping.git
- Navigate to the project directory:
cd Web_Scraping
- Install the required libraries:
pip install -r requirements.txt
- Run the scraper script:
python scraper.py
- The script will generate a CSV file named
arabic_library_books.csv
containing the scraped data.
Contributions are welcome! If you have any suggestions or improvements, please create a pull request or open an issue.
This project is licensed under the MIT License. See the LICENSE file for details.
- Al-Husseini Abdelaleem
- Email: husseiniahmed2015@gmail.com
- LinkedIn: linkedin.com/in/al-husseiniabdelaleem