# Scrape Provinces Name of Thailand.
Use web scraping to scrape provinces name of Thailand from Wiki.

In [1]:
import pandas as pd
from bs4 import BeautifulSoup
import requests
import warnings

In [2]:
print(f"Pandas version: {pd.__version__}")

Pandas version: 0.25.1


In [3]:
pd.set_option('display.max_rows', 10)
warnings.filterwarnings('ignore')

In [4]:
website_url = requests.get('https://en.wikipedia.org/wiki/Provinces_of_Thailand').text

In [5]:
soup = BeautifulSoup(website_url, 'lxml')
# print(soup.prettify()) # Uncomment if you want to see html syntax

Find `table` tag with class name `wikitable sortable` and then find `td` tag to get data of each column. From HTML structure TH provinces name are in `span` tag and EN provinces name are in `a` tag that's why we will find `span` and `a` tag respectively. After that we will get value in those tag using `.get_text()`.

In [6]:
my_table = soup.find('table',{'class': 'wikitable sortable'})
# my_table # Uncomment if you want to see html syntax

In [7]:
links = my_table.find_all('td')
# links # Uncomment if you want to see html syntax

In [8]:
province_th = []
for row in links:
    for td in row.find_all('span'):
        if td.get_text() != '\xa0':
            province_th.append(td.get_text())

In [9]:
print(f"Length of province TH: {len(province_th)}")

Length of province TH: 77


In [10]:
province_en = []
for row in links:
    for td in row.find_all('a'):
        if td.get_text():
            if td.get_text() not in province_en: # drop duplicates
                if td.get_text() not in ('Nong Bua Lam Phu', 'Sukhothai Thani'): # drop same provinces
                    province_en.append(td.get_text())

In [11]:
print(f"Length of province EN: {len(province_en)}")

Length of province EN: 77


In [12]:
province_df = pd.DataFrame()
province_df['en'] = province_en
province_df['th'] = province_th
province_df

Unnamed: 0,en,th
0,Bangkok,กรุงเทพมหานคร
1,Amnat Charoen,อำนาจเจริญ
2,Ang Thong,อ่างทอง
3,Bueng Kan,บึงกาฬ
4,Buriram,บุรีรัมย์
...,...,...
72,Udon Thani,อุดรธานี
73,Uthai Thani,อุทัยธานี
74,Uttaradit,อุตรดิตถ์
75,Yala,ยะลา
