<a id="ref0"></a>

<h2 id="http">Overview of HTTP</h2>

When the **client** uses a web page your browser sends an **HTTP** request to the **server** where the page is hosted. The server tries to find the desired **resource** such as the home page (index.html). 

If your request is successful, the server will send the resource to the client in an **HTTP response**; this includes information like the type of the **resource**, the length of the **resource**, and other information.   

<p>
The figure below represents the process; the circle on the left represents the client, the circle on the right represents the  Web server.  The table under the Web server represents a list of resources stored in the web server. In  this case an <code>HTML</code> file, <code>png</code> image, and <code>txt</code> file .
</p>
<p>
The <b>HTTP</b> protocol allows you to send and receive information through the web including webpages, images, and other web resources.
</p



<center>
    <img src="https://fpt.edu.vn/Resources/brand/uploads/749540_132829686029858301_o.jpg" width="500" alt="cognitiveclass.ai logo"  />
</center>

# Lab 1: WebScraping

<br>

#### Class name: AI1803

#### Student code: HE181499

#### Student name: Tran Huy Hoang

<br>

## Objectives

After completing this lab you will be able to:

* Understand HTML via coding practice
* Handle the HTTP Requests and response using R
* Perform basic webscraping using rvest


Estimated time needed: **60** minutes
<h4 style='color:red; font-weight:bold'>DO NOT CHEAT! 1 point for anybody copy or share code</h4>

<h2 id="#httr">The httr library</h2>

`httr` is a R library that allows you to build and send <code>HTTP</code> requests, as well as process <code>HTTP</code> requests easily.  We can import the package as follows (may take less than minute to import):

In [1]:
# This lab require some library packages. If error happen when running please uncomment lines below to install them:
 #install.packages("httr", type = "binary")
 #install.packages("rvest", type="binary")


In [2]:
library(httr)
library(rvest)

Loading required package: xml2
"package 'xml2' was built under R version 3.6.3"Registered S3 method overwritten by 'rvest':
  method            from
  read_xml.response xml2


## 1. Example code

In [9]:
url <- 'https://fap.fpt.edu.vn/'
webpage <- readLines(url,encoding='UTF-8')
response<-GET(url, encodeString='unicode')

print(sprintf("Time: %s", response$date))
print(sprintf("URL link: %s", response$url))
print(sprintf("Status code: %d", response$status_code))

webpage

[1] "Time: 2024-05-15 13:46:02"
[1] "URL link: https://fap.fpt.edu.vn/"
[1] "Status code: 200"


In [4]:
root <- read_html(response)
options_node <- html_nodes(root, "option")
values <- c()
print("List of FPT University campus: ")
for(node in options_node){
    v <- as.integer(html_attr(node, "value"))
    if(!is.na(v) && !(v %in% values)){
        values<- c(values, v)
        print(html_text(node))
    }
}

[1] "List of FPT University campus: "
[1] "FU-Hòa L<U+1EA1>c"
[1] "FU-H<U+1ED3> Chí Minh"
[1] "FU-Ðà N<U+1EB5>ng"
[1] "FU-C<U+1EA7>n Tho"
[1] "FU-Quy Nhon"


## 2. Data source
Implement that code by change the URL

* https://webtygia.com/

* https://giavang.org/

* https://tygiadola.net/giavang/gia-vang-hom-nay

* https://nongnghiep.vn/bang-gia-vang-sjc-9999-24k-18k-14k-10k-hom-nay-24-10-2022-d335344.html

or any other URL that you can find!


## 3. Tasks

#### 3.1 Getting the data

Using Webscraping to crawling data of SJC gold price in major cities and provinces in Vietnam. The data should have more than 10 records. Display a table to show the data. 

In [19]:
# Enter code here
url <- 'https://tygiausd.org/giavang/gia-vang-hom-nay'
response <- GET(url)

print(sprintf("Time: %s", response$date))
print(sprintf("URL link: %s", response$url))
print(sprintf("Status code: %d", response$status_code))

[1] "Time: 2024-05-15 13:53:08"
[1] "URL link: https://tygiausd.org/giavang/gia-vang-hom-nay"
[1] "Status code: 200"


In [20]:
root <- read_html(response)
table <- html_nodes(root,'table')
sjc_dataframe <- html_table(table) 
sjc_dataframe

"Giá vàng hôm nay  (ÐVT: 1,000/Lung)","Giá vàng hôm nay  (ÐVT: 1,000/Lung).1","Giá vàng hôm nay  (ÐVT: 1,000/Lung).2"
Vàng mi<U+1EBF>ng SJC,Mua vào,Bán ra
SJC HCM 1-10L,877001700,902001200
SJC Hà N<U+1ED9>i,877001700,902001200
DOJI HCM,875001000,89200700
DOJI HN,877001200,89400900
PNJ HCM,877001700,901001100
PNJ Hà N<U+1ED9>i,877001700,901001100
,,
Phú Qúy SJC,87500500,89500700
B<U+1EA3>o Tín Minh Châu,87800700,900001000

Giá vàng SJC,Giá vàng SJC.1,Giá vàng SJC.2
"ÐVT: 1,000/Lu<U+1EE3>ng",Mua vào,Bán ra
Giá vàng SJC Chi Nhánh Khác,Giá vàng SJC Chi Nhánh Khác,Giá vàng SJC Chi Nhánh Khác
SJC Ðà N<U+1EB5>ng,877001700,902001200
SJC Nha Trang,877001700,902001200
SJC Cà Mau,877001700,902001200
SJC Hu<U+1EBF>,877001700,902001200
SJC Mi<U+1EC1>n Tây,877001700,902001200
SJC Quãng Ngãi,877001700,902001200
SJC Biên Hòa,877001700,902001200
SJC B<U+1EA1>c Liêu,877001700,902001200

Unnamed: 0,Mua vào,Bán ra
USD ch<U+1EE3> den,"25,750 0","25,820 0"
Giá dô hôm nay,Giá dô hôm nay,Giá dô hôm nay

X1,X2
1 Ðô la M<U+1EF9> =,"24,245 -1"

Tỷ giá hôm nay,Tỷ giá hôm nay.1,Tỷ giá hôm nay.2
Ngo<U+1EA1>i T<U+1EC7>,Mua vào,Bán Ra
USD,251520,254820
AUD,1644356,1712845
CAD,1817831,1893618
JPY,1580,1670
EUR,2683080,2827862
CHF,2737059,2851139
GBP,3123189,3253267
CNY,34481,"3,592-2"


In [21]:
sjc_dataframe[1]

"Giá vàng hôm nay  (ÐVT: 1,000/Lung)","Giá vàng hôm nay  (ÐVT: 1,000/Lung).1","Giá vàng hôm nay  (ÐVT: 1,000/Lung).2"
Vàng mi<U+1EBF>ng SJC,Mua vào,Bán ra
SJC HCM 1-10L,877001700,902001200
SJC Hà N<U+1ED9>i,877001700,902001200
DOJI HCM,875001000,89200700
DOJI HN,877001200,89400900
PNJ HCM,877001700,901001100
PNJ Hà N<U+1ED9>i,877001700,901001100
,,
Phú Qúy SJC,87500500,89500700
B<U+1EA3>o Tín Minh Châu,87800700,900001000


In [22]:
sjc_dataframe[2]

Giá vàng SJC,Giá vàng SJC.1,Giá vàng SJC.2
"ÐVT: 1,000/Lu<U+1EE3>ng",Mua vào,Bán ra
Giá vàng SJC Chi Nhánh Khác,Giá vàng SJC Chi Nhánh Khác,Giá vàng SJC Chi Nhánh Khác
SJC Ðà N<U+1EB5>ng,877001700,902001200
SJC Nha Trang,877001700,902001200
SJC Cà Mau,877001700,902001200
SJC Hu<U+1EBF>,877001700,902001200
SJC Mi<U+1EC1>n Tây,877001700,902001200
SJC Quãng Ngãi,877001700,902001200
SJC Biên Hòa,877001700,902001200
SJC B<U+1EA1>c Liêu,877001700,902001200


#### 3.2 Which province has the highest gold selling price?

In [None]:
# Enter code here


#### 3.3 Which provinces have the biggest difference in selling and buying prices?

In [None]:
# Enter code here


#### 3.4 Find all the province has selling price below average

In [None]:
# Enter code here


#### 3.5 Find the difference between highest buying price and lowest selling price of all provinces

In [None]:
# Enter code here


## Author

#### <a href="" target="_blank"></a>

## Change Log

| Date (YYYY-MM-DD) | Version | Changed By | Change Description                 |
| ----------------- | ------- | ---------- | ---------------------------------- |
| 2024-01-10        | 2.1     |     | Create the 2.1st version             |
|                   |         |            |                                    |
|                   |         |            |                                    |

<hr>

## <h3 align="center"> © FPT University. All rights reserved. <h3/>
