Holiday_Crawler_Python

holiday_crawler.py is a Python script designed to scrape holiday information for Taiwan from the Office Holidays website. The script retrieves data for multiple years, processes and enhances it, and finally exports the result to a CSV file.

Features

Scraping Holidays by Year
- The script fetches holiday data for specified years from the Office Holidays website.
- It utilizes the requests library for making HTTP requests.
- HTML content parsing is performed using BeautifulSoup, a powerful library for web scraping in Python.
Compensated Holidays Handling
- Compensated holidays are identified based on the presence of the "Compensated by" string in the Comments column.
- The script extracts and adds compensated holidays, considering the original holiday's date.
Extended Weekends Handling
- Extended weekends are added for Fridays and Mondays to provide additional context about long weekends.
- The script checks for Fridays and Mondays in the original dataset and adds corresponding extended weekends.
Export to CSV
- The final processed data is exported to a CSV file for further analysis or integration with other applications.

Dependencies

requests: For making HTTP requests.
BeautifulSoup: For parsing HTML content.
pandas: For data manipulation and analysis.
datetime: For handling date-related operations.
time: For introducing delays during web scraping.

Usage

Clone the Repository:

git clone https://github.com/house40105/Holiday_Crawler_Python.git
cd Holiday_Crawler_Python

Install Dependencies:

pip install requests beautifulsoup4 pandas

Run the Script:
```
python holiday_crawler.py
```
Output:
- The script will generate a CSV file named out_python.csv in the specified directory.

Configuration

You can customize the number of years to scrape by modifying the generate_year_list function parameters.
Adjust the output filename and directory in the export_to_csv function.

Notes

Ensure proper internet connectivity during execution as the script fetches data from an external website.
Respect the website's terms of service and scraping policies.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
holiday_crawler.py		holiday_crawler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Holiday_Crawler_Python

Features

Dependencies

Usage

Configuration

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Holiday_Crawler_Python

Features

Dependencies

Usage

Configuration

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages