# **HiBlu**: Web Scraping

By Mavericks Team - Hacktiv8| Data Resource: [FAQ Blu](https://blubybcadigital.id/info/faq)

---

# Introduction

This program aims to perform web scraping on the FAQ page of the [Blu website](https://blubybcadigital.id/info/faq) and extract relevant information to create a chatbot capable of answering common questions (FAQs).

# Import Libraries

In [1]:
# Import Libraries
import pandas as pd
import time
import json

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from bs4 import BeautifulSoup

In [20]:
pd.set_option('display.max_columns', 100)
pd.set_option('display.max_colwidth', None)

# Web Scraping

In [2]:
# Setup Webdriver.
driver = webdriver.Chrome() 

driver.get("https://blubybcadigital.id/info/faq")

# Give load time.
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "card-faq"))) # Scrap specific element under "card-faq".

# Command to automatically scroll down page.
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(2)  # Give spare time.
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

# Because the answer in the webpage is under the dropdown.
faq_items = driver.find_elements(By.CLASS_NAME, "question-box")
for item in faq_items:
    try:
        ActionChains(driver).move_to_element(item).click().perform() # Auto click the dropdown.
        time.sleep(1)  # Give spare time.
    except:
        continue


page_source = driver.page_source

driver.quit()

# Parse the content with the BeautifulSoap.
soup = BeautifulSoup(page_source, 'html.parser')

# Extract all the Answers and Questions.
faqs = []
faq_sections = soup.find_all('div', class_='card-faq')

for faq_section in faq_sections:
    questions = faq_section.find_all('div', class_='question-box')
    for question_box in questions:
        question_element = question_box.find('div', class_='question')
        question = question_element.get_text(strip=True) if question_element else 'Tidak ada pertanyaan'
        answer_box = question_box.find_next_sibling('div', class_='answer-box')
        if answer_box:
            answer_element = answer_box.find('div', class_='answer')
            answer = answer_element.get_text(strip=True) if answer_element else 'Tidak ada jawaban'
            faqs.append({'Question': question, 'Answer': answer})
            
# Convert to Dataframe.
df = pd.DataFrame(faqs)
print(df)


                                              Question  \
0                                         Apa itu blu?   
1                Apa perbedaan blu dengan BCA Digital?   
2                Apa perbedaan BCA Digital dengan BCA?   
3                   Apa keuntungan pakai aplikasi blu?   
4              Apakah blu punya kantor cabang offline?   
..                                                 ...   
495  Saya gagal input password dan PIN transaksi ak...   
496  Saya lupa password akun blu, apa yang harus sa...   
497  Di mana saya dapat melihat riwayat transaksi B...   
498  Di mana saya dapat melihat riwayat transaksi B...   
499  Saya ingin memutus koneksi blu dengan aplikasi...   

                                                Answer  
0    blu merupakan aplikasimobile bankingdari BCA D...  
1    blu adalah aplikasimobile bankingmilik BCA Dig...  
2    BCA Digital merupakan anak perusahaan BCA, bag...  
3    Gak terbatas ruang dan waktu, aplikasi blu bis...  
4    blu gak punya