## **Xpath Introduction to Scraping**

### **Xpath Syntax**

Example of Xpath Syntax you need to know: 

**//TagName[contains(@AtributName, 'Value')]**

### **Xpath Operation**

Example of Xpath Operator

**//TagName[(Expression 1) and (Expression 2)]**

### **HTML Code Example** 

```HTML
<body>
	<article class="main-article" style="height: auto !important;">
			<h1>Vacation Home Nightmare (2023) - full transcript</h1>
						<p class="plot">When a woman is attacked in her short term rental, the company's Clean-Up Team steps in to help her pick up the pieces. But she soon finds that they might not be who they say they are.</p>
	</article>

	<div id="cue-app" class="full-script" data-imdb-id="26458228">
				<div class="subtitle-cue">
						<p class="cue-line" data-cue-idx="0" data-line-idx="0">(dark music)</p>
					</div>
				<div class="subtitle-cue">
						<p class="cue-line" data-cue-idx="1" data-line-idx="0">♪</p>
					</div>
				<div class="subtitle-cue">
						<p class="cue-line" data-cue-idx="2" data-line-idx="0">Yeah, it's so quiet out here,</p>
						<p class="cue-line" data-cue-idx="2" data-line-idx="1">I feel like I'm all alone.</p>
					</div>
	</div>
	<footer>
			<span class="contactus">Have any questions? Contact us: subslikescript(doggysign)gmail.com | <a href="https://subslikescript.com/dmca">DMCA</a></span>
	</footer>
</body>
```

**contoh penggunaan sytanx Xpath untuk kode diatas:**

- Untuk mengambil keseluruhan isi tag article

    kita dapat menggunakan syntax:

    > //article
     
    >//article[@class="main-article"]
    

- Untuk mengamil tag h1
    
    kita dapat menggunakan syntax:

    > //article[@class="main-article"]/h1/text()

    hasil akan seperti ini : Vacation Home Nightmare (2023) - full transcript

- Untuk mengambil text script yang ada di p 
     
    kita dapat menggunakan syntax:

    > //p[contains(@data-cue-idx , "0")]

    atau 

    > //p[contains(@class , "cue")]

    hasil akan seperti ini : 
    
    ```html
    <p class="cue-line" data-cue-idx="0" data-line-idx="0">(dark music)</p>
    <p class="cue-line" data-cue-idx="1" data-line-idx="0">♪</p>
    <p class="cue-line" data-cue-idx="2" data-line-idx="0">Yeah, it's so quiet out here,</p>
    <p class="cue-line" data-cue-idx="2" data-line-idx="1">I feel like I'm all alone.</p>


### **Special Characters Xpath Syntax**

* **/. atau /..** -> digunakan untuk mengambil node parents 
* **/***-> digunakan untuk mengambil seluruh elemen tidap penting apapun elemennya
* **./*** -> digunakan untuk mengambil seluruh elemen dari childern node
* **@** -> untuk memilih attribut
* **()** -> untuk grouping xpath expression
* **[n]** -> untuk mengindikasikan setiap node dengan index 'n' akan diambil  

## **Introduction to Selenium Scraping**

Selenium merupakan library seperti Bs4 yang juga berfungsi untuk melakukan scraping. Bedanya Selenium melakukan pendekatakn scraping menggunakan Webdriver. Bs4 dan Selenium memiliki target jenis website yang berbeda. Scraping menggunakan Bs4 tidak bisa dilakukan pada website yang dinamis (contoh menggunakan JavaScipt), berbeda dengan Selenium yang mampu melakukan scraping data pada website dinamis.

### **Set Up Package**

In [None]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import csv

In [1]:
import pandas as pd
import time

### **Running Webdriver to Get Website**

In [None]:
# Inisialisasi path ke file chromedriver.exe
path = "../chromedriver-win64/chromedriver.exe"

# Set service
service = Service(executable_path=path)

# Opsi tambahan (opsional, tapi berguna)
options = Options()
options.add_argument("--start-maximized")

# Inisialisasi driver
# driver = webdriver.Chrome(service=service, options=options)

# Buka website
# driver.get("https://www.vlr.gg/matches")

### **How to Find Element with Selenium**

Biasanya untuk mengambil elemen kita menggunakan kode seperti:

```Python
driver.find_element_by_id("id")
driver.find_element_by_class_name("class_name")
driver.find_element_by_tag_name("tag_name")
driver.find_element_by_xpath("//@xpath syntax")
driver.find_elements_by_tag_name("tag_name") # for multiple write "elements" in plural 
```
Namun kode diatas sudah usang dan tidak lagi digunakan pada versi selnium > 4.30 dan dihapus di versi setelahnya dan berubah menjadi `find_element()` atau `find_elements()`. Contoh kodenya akan seperti ini:
```python
# menggunakan import by
from selenium.webdriver.common.by import By

driver.find_element(By.XPATH, '//Attribut[@class= "clas_name"]')
driver.find_element(By.ID, 'id')
driver.find_element(By.CLASS_NAME, 'class_name')
driver.find_element(By.TAG_NAME, 'tag_name')
```

Target website untuk kasus ini menggunakan website dari https://vlr.gg/matches

![valorant matches](../media/vlrgg_website.png)

kali ini kita akan mencoba menggunakan selenium untuk mengambil dan melakukan action pada driver untuk mengambil button **Results**

In [None]:
# Inisialisasi driver
driver = webdriver.Chrome(service=service, options=options)

# Buka driver
driver.get("https://www.vlr.gg/matches")

# buat driver untuk mengklik elemen yang ada di dalam website
result_mathces = driver.find_element(By.XPATH, '//div[@class="wf-nav"]/a[@href="/matches/results"]/div[@class="wf-nav-item-title"]')
# klik elemen yang telah diambil
result_mathces.click()

### **Scraping Table with Selenium and Export Data to CSV**

Website ini berisikan statistik data player Valorant Region Pacific. Didalam website tersebut bisa kita lihat data disimpan dalam bentuk tabel.

![valorant-stats-table](../media/vlrgg_stats_table.png)

Data-data tersebut tersimpan dalam tag html 
``` html
<table>
    <thead>
        <tr>
            <td></td>
        </tr>
    </thead>
</table>
```    
![valorant-stats-table-zoom](../media/vlrgg_stats_table_zoom.png)

In [None]:
# Inisialisasi driver
driver = webdriver.Chrome(service=service, options=options)

# Buka driver
driver.get("https://www.vlr.gg/event/stats/2379/champions-tour-2025-pacific-stage-1")


time.sleep(7)  # Tunggu beberapa detik agar halaman sepenuhnya dimuat

rows = driver.find_elements(By.XPATH, '//table/tbody/tr')

# tulis data ke dalam file csv
with open('../data export/val_stats.csv', 'w', encoding='utf-8') as file:
    writer = csv.writer(file)
    # Tulis header kolom sesuai dengan struktur tabel
    writer.writerow(["Player", "Agents", "RND", "R", "ACS", 
                     "K:D", "KAST", "ADR", "KPR", "APR", "FKPR", "FDPR", "HS%", 
                     "CL%", "CL", "KMAX", "K", "D", "A", "FK", "FD"])
    
    for row in rows:
        cells = row.find_elements(By.TAG_NAME, "td")
        # Ekstrak teks dari setiap sel
        data = [cell.text.strip() for cell in cells]
        writer.writerow(data)

driver.quit()  # Tutup driver setelah selesai
# Menampilkan pesan selesai
print('Data statistik Valorant telah disimpan ke val_stats.csv!')

Data statistik Valorant telah disimpan ke val_stats.csv!


In [None]:
# cek apakah file csv telah dibuat
df = pd.read_csv('../data export/val_stats.csv')
df

Unnamed: 0,Player,Agents,RND,R,ACS,K:D,KAST,ADR,KPR,APR,...,FDPR,HS%,CL%,CL,KMAX,K,D,A,FK,FD
0,invy\r\nTS,(+2),271,1.26,237.2,1.27,73%,158.4,0.87,0.35,...,0.07,29%,10%,3/31,27,236,186,95,23,18
1,Jemkin\r\nRRQ,(+1),573,1.22,253.9,1.30,73%,161.3,0.91,0.15,...,0.15,38%,18%,9/50,32,524,404,87,137,84
2,dos9\r\nBME,(+2),344,1.19,212.2,1.18,75%,136.3,0.76,0.47,...,0.07,26%,29%,14/48,35,261,222,160,30,24
3,Jinggg\r\nPRX,(+2),536,1.14,220.9,1.18,76%,145.4,0.79,0.32,...,0.08,26%,11%,7/62,31,422,359,171,53,43
4,Foxy9\r\nGEN,(+2),360,1.12,207.8,1.25,78%,137.3,0.77,0.18,...,0.08,34%,29%,9/31,23,277,222,65,35,30
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
60,Jinboong\r\nDFM,,225,0.80,157.4,0.72,62%,111.1,0.54,0.21,...,0.13,33%,12%,4/33,16,121,169,48,14,29
61,SyouTa\r\nZETA,(+1),255,0.79,162.8,0.77,61%,108.1,0.58,0.19,...,0.07,35%,9%,3/32,18,147,190,48,26,18
62,Suggest\r\nGEN,,112,0.66,152.2,0.77,65%,110.8,0.57,0.12,...,0.12,30%,,0/8,21,64,83,13,10,13
63,Art\r\nDFM,(+1),225,0.61,135.7,0.52,61%,92.8,0.42,0.31,...,0.12,21%,,0/18,16,94,182,69,9,28
