# Assignment: Extracting Static WebPage

Extract information about “วันพระ” for 3 years from:
- https://www.myhora.com/ปฏิทิน/วันพระ-พ.ศ.2565.aspx
- https://www.myhora.com/ปฏิทิน/วันพระ-พ.ศ.2566.aspx
- https://www.myhora.com/ปฏิทิน/วันพระ-พ.ศ.2567.aspx


Note that you can use dateparse package to parse Thai date.  First, we will have to install the package, this is for Google Colab users.  Otherwise, installing via command line is recommended (pip or conda).

In [1]:
import sys
IN_COLAB = 'google.colab' in sys.modules
if IN_COLAB:
    %pip install dateparser

Collecting dateparser
  Downloading dateparser-1.2.0-py2.py3-none-any.whl (294 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/295.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━[0m [32m143.4/295.0 kB[0m [31m4.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.0/295.0 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: dateparser
Successfully installed dateparser-1.2.0


In [2]:
import dateparser

To convert from thai text date string, we will use the parse method.  Note that the parse method assumes the BC year, not BE.  Thus, we will have to subtract 543 from the year.  In addition, weekday() returns day of week with 0=Monday, ..., 6=Sunday.

In [3]:
dt = dateparser.parse('วันศุกร์ที่ 17 มกราคม 2563')

# this will print out weekday == 0 (Monday)
print(dt)
print(dt.weekday())

# this will print out weekday == 4 (Friday)
dt = dt.replace(year=dt.year-543)
print(dt)
print(dt.weekday())

2563-01-17 00:00:00
0
2020-01-17 00:00:00
4


In [4]:
dt = dateparser.parse('วันเสาร์ที่ 21 กันยายน 2564')
dt = dt.replace(year=dt.year-543)
print(dt)
print(dt.weekday())

2021-09-21 00:00:00
1


Count the distribution of number of week days that are “วันพระ” for all three years and answer the following questions:

## How many วันพระ in total (of 3 years)?

In [16]:
import requests
from bs4 import BeautifulSoup

def get_holy_day(year):
    url = f'https://www.myhora.com/ปฏิทิน/วันพระ-พ.ศ.{year}.aspx'
    res = requests.get(url)
    soup = BeautifulSoup(res.text, 'html.parser')
    holy_day = []
    # find class "bud-day"
    for bud_day in soup.find_all(class_='bud-day'):
        # select first "bud_day_col" class
        bud_day_col = bud_day.find(class_='bud-day-col')

        holy_day.append(dateparser.parse(bud_day_col.text).replace(year=year-543))
    return holy_day

holy_day_2565 = get_holy_day(2565)
# print(holy_day_2565)

holy_day_2566 = get_holy_day(2566)
# print(holy_day_2566)

holy_day_2567 = get_holy_day(2567)
# print(holy_day_2567)

# all holy days
holy_day_all = holy_day_2565 + holy_day_2566 + holy_day_2567
print("Number of holy days:", len(holy_day_all))

Number of holy days: 152


## How many days in total (of 3 years) that วันพระ is Monday?

In [21]:
day_holy_day = []
for i in range(7):
	day_holy_day.append([])
	for day in holy_day_all:
		if day.weekday() == i:
			day_holy_day[i].append(day)

print(len(day_holy_day[0]))

21


## Which day of the week that has the minimum number of วันพระ?

In [32]:
min = 1e9

for i in range(7):
	if len(day_holy_day[i]) < min:
		min = len(day_holy_day[i])
		min_day = i

print(f"Day {days[min_day]} has the minimum number of holy days: {min} days")

Day Tuesday has the minimum number of holy days: 20 days


## Which day of the week that has the maximum number of วันพระ?

In [33]:
max = 0

for i in range(7):
	if len(day_holy_day[i]) > max:
		max = len(day_holy_day[i])
		max_day = i

print(f"Day {days[max_day]} has the maximum number of holy days: {max} days")

Day Sunday has the maximum number of holy days: 24 days
