# Web Scraping Gold: Python-Selenium Techniques for Analyzing Historical and Current Gold Prices

# ![Imgur](https://i.imgur.com/90rxpMY.jpg)![](https://imgur.com/a/oDqra1X)

Gold is one of the most widely traded and valuable metals in the world. As a precious metal, it has been used for various purposes, including currency, investment, and jewelry. The price of gold is highly volatile and is influenced by various factors such as global economic conditions, political events, and supply and demand.

To make informed decisions about buying, selling, or investing in gold, it is essential to stay up-to-date with the latest gold prices. Web scraping provides a useful tool for automatically extracting data from websites, including real-time gold prices from financial news and market data websites.

In this project, we will use Python and Selenium for web scraping gold prices from Yahoo Finance. Python is a popular programming language for data science, and Selenium is a web automation tool that allows us to control web browsers and automate web scraping tasks. By automating the process of collecting gold price data from Yahoo Finance, we can save time and effort while ensuring that we have the most accurate and up-to-date information.

We will analyze the historical and current gold price data collected through web scraping to identify trends and make predictions about future price movements. This project aims to demonstrate the use of web scraping and data analysis techniques for monitoring gold prices, providing real-time market insights, and making informed investment decisions.

##  What is Web Scraping?

Web scraping is the process of extracting data from websites using automated software tools or scripts. This technique is used to collect large amounts of data from websites quickly and efficiently.

Web scraping involves accessing the HTML code of a website and using a programming language like Python to extract the desired data. The data can then be analyzed, stored, or used for other purposes.

Web scraping can be used for various applications, including:

1. Market research: Gathering data on competitors, pricing, and product features.

2. Lead generation: Collecting contact information from websites of potential customers.

3. Sentiment analysis: Scraping social media and review websites to gauge public opinion about a particular product or brand.

4. Content aggregation: Collecting news articles or blog posts from multiple sources to create a single source of information.

5. Machine learning: Collecting training data for machine learning models.

While web scraping can be a powerful tool for gathering data, it is important to be aware of legal and ethical considerations. Some websites have terms of service or copyright laws that restrict or prohibit web scraping. Additionally, web scraping can potentially violate user privacy or security if sensitive information is collected without consent.

# ![Imgur](https://i.imgur.com/1KSyXMu.png)

## Steps Followed 

1. Importing the required libraries such as webdriver from selenium, Keys from selenium.webdriver.common.keys, and so on.
2. Defining the chrome options by adding arguments to disable GPU, start the browser in maximized mode, and disable notifications.
3. Setting the path for the chromedriver.
4. Creating an instance of the webdriver with the defined chrome options and the path.
5. Navigating to the URL of the website using the get() method.
6. Maximizing the window and clicking on the date range dropdown to expand it.
7. Entering the start and end date using the clear() and send_keys() methods respectively.
8. Clicking on the 'Done' and 'Apply' buttons to apply the date range filters and then scrolling down to load all the data in the table using the execute_script() method and finding the required data using the find_elements() method.
9. Define a helper function to extract the data for each column (date, open, high, low, price, volume) using Selenium's find_elements() method and store them in dictonary.
10. Define a helper function which converts dictonary into pandas tabular and return a csv file.

## **Outline of the Project**:

1. Introduction

2. Web Scraping Process
    
3. Results and Discussion
   
4. Conclusion

5. Future Work

6. Reference
    


## Introduction

    Scraping gold prices from websites such as Yahoo Finance can provide valuable data for investors and researchers interested in tracking the price of this precious metal. Using Python and the Selenium library, it is possible to automate the process of accessing the website, navigating to the relevant pages, and extracting the gold price data. This data can then be analyzed and used for a range of applications, such as identifying trends, forecasting prices, or creating trading strategies. However, it is important to be aware of the legal and ethical considerations of web scraping, such as copyright laws and user privacy, and to ensure that the scraped data is used responsibly.

In [1]:
website='https://finance.yahoo.com/quote/GC%3DF/history'

## Web Scraping Process

    The web scraping process typically involves several steps, starting with accessing the website and navigating to the relevant pages. This can be achieved using Python and the Selenium library, which allows you to automate the process of interacting with the website. Once you have accessed the website and navigated to the desired pages, the next step is to extract the data you are interested in. This can be done using various techniques, such as parsing the HTML code of the website or using XPath or CSS selectors to locate specific elements. Finally, once you have extracted the data, you may want to save it to a file for further analysis. This can be done using Python's built-in file I/O functions or third-party libraries such as Pandas or CSV. By automating the web scraping process, you can save time and effort while gathering valuable data for your analysis.
    

In [2]:
!pip install selenium --upgrade --quiet
!pip install pandas --upgrade --quiet


[notice] A new release of pip available: 22.3.1 -> 23.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip

[notice] A new release of pip available: 22.3.1 -> 23.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [3]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait

import pandas as pd

import time

In [11]:
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--start-maximized")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-notifications")

#chrome_options.add_argument("start-maximized");
#chrome_options.add_argument('--headless')

chrome_options.add_experimental_option('excludeSwitches', ['enable-logging'])
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_experimental_option('useAutomationExtension', False)

PATH = "C:/Users/User/Desktop/chromechromedriver"
driver = webdriver.Chrome(options=chrome_options,executable_path=PATH)

#driver.manage().window().maximize();

driver=webdriver.Chrome(PATH)

driver.get(website)

driver.maximize_window()

  driver = webdriver.Chrome(options=chrome_options,executable_path=PATH)
  driver=webdriver.Chrome(PATH)


In [12]:
driver.find_element(By.XPATH,"/html/body/div[1]/div/div/div[1]/div/div[3]/div[1]/div/div[2]/div/div/section/div[1]/div[1]/div[1]/div/div/div").click()

# Wait for the page to load
driver.implicitly_wait(20)

date_start = driver.find_element(By.NAME,"startDate")
date_start.clear()
date_start.send_keys("09-01-2000")

date_end = driver.find_element(By.NAME,"endDate")
date_end.clear()
date_end.send_keys("03-30-2023")


#done
driver.find_element(By.XPATH,'/html/body/div[1]/div/div/div[1]/div/div[3]/div[1]/div/div[2]/div/div/section/div[1]/div[1]/div[1]/div/div/div[2]/div/div[3]/button[1]').click()
#apply
driver.find_element(By.XPATH,'/html/body/div[1]/div/div/div[1]/div/div[3]/div[1]/div/div[2]/div/div/section/div[1]/div[1]/button').click()

In [13]:
for i in range(200):
    driver.execute_script("window.scrollBy(0, 1500)")
    time.sleep(1)

In [14]:
inner_tag=driver.find_elements(By.TAG_NAME,'tbody')

In [15]:
inner_tag

[<selenium.webdriver.remote.webelement.WebElement (session="31e1aed471cf8dd46dea5ea9e63bbc1f", element="dbe1e1fc-c338-4185-af70-9a053797a66a")>]

In [16]:
z=inner_tag[0].find_elements(By.TAG_NAME,'tr')

In [17]:
z

[<selenium.webdriver.remote.webelement.WebElement (session="31e1aed471cf8dd46dea5ea9e63bbc1f", element="fb3cfddf-2988-4ab0-8f80-d03ebec8586a")>,
 <selenium.webdriver.remote.webelement.WebElement (session="31e1aed471cf8dd46dea5ea9e63bbc1f", element="f03e9736-e6d6-4198-b480-38ac7e5ad45f")>,
 <selenium.webdriver.remote.webelement.WebElement (session="31e1aed471cf8dd46dea5ea9e63bbc1f", element="8716d3e2-53b4-4fcc-8ff8-b4cb830691cc")>,
 <selenium.webdriver.remote.webelement.WebElement (session="31e1aed471cf8dd46dea5ea9e63bbc1f", element="6d929102-e268-4cec-93d2-92dddb00b962")>,
 <selenium.webdriver.remote.webelement.WebElement (session="31e1aed471cf8dd46dea5ea9e63bbc1f", element="c2a2ebd7-eca6-4c2d-b0d9-59e2f87dd986")>,
 <selenium.webdriver.remote.webelement.WebElement (session="31e1aed471cf8dd46dea5ea9e63bbc1f", element="f7029ed3-13a4-4b80-858e-949b84c619c4")>,
 <selenium.webdriver.remote.webelement.WebElement (session="31e1aed471cf8dd46dea5ea9e63bbc1f", element="1e226a75-614a-4262-bfd5-13

In [18]:
print("Number of Rows od data found: ",len(z))

5751

In [19]:
def extract_data(table_rows):
    data = {'date': [], 'open': [], 'high': [], 'low': [], 'price': [], 'volume': []}
    for row in table_rows:
        columns = row.find_elements(By.TAG_NAME, 'td')
        data['date'].append(columns[0].text.strip())
        data['open'].append(columns[1].text.strip())
        data['high'].append(columns[2].text.strip())
        data['low'].append(columns[3].text.strip())
        data['price'].append(columns[4].text.strip())
        data['volume'].append(columns[5].text.strip())
    return data

In [20]:
# Call the extract_data function to get the data as a dictionary
data_dict = extract_data(z)

In [21]:
def get_csv(dict):
    # Convert the dictionary to a pandas DataFrame
    df = pd.DataFrame(dict)
    
    return (df , df.to_csv('data.csv', index=False) )


In [22]:
get_csv(data_dict)

(              date      open      high       low     price    volume
 0     Mar 29, 2023  1,966.10  1,966.10  1,966.10  1,966.10  1,966.10
 1     Mar 28, 2023  1,972.40  1,972.40  1,972.40  1,972.40  1,972.40
 2     Mar 27, 2023  1,957.20  1,957.20  1,952.40  1,952.40  1,952.40
 3     Mar 24, 2023  1,991.70  1,995.40  1,982.10  1,982.10  1,982.10
 4     Mar 23, 2023  1,990.50  1,994.60  1,990.50  1,993.80  1,993.80
 ...            ...       ...       ...       ...       ...       ...
 5746  Sep 07, 2000    274.00    274.00    274.00    274.00    274.00
 5747  Sep 06, 2000    274.20    274.20    274.20    274.20    274.20
 5748  Sep 05, 2000    275.80    275.80    275.80    275.80    275.80
 5749  Sep 04, 2000         -         -         -         -         -
 5750  Sep 01, 2000    277.00    277.00    277.00    277.00    277.00
 
 [5751 rows x 6 columns],
 None)

## Results and Discussion

   The results of the web scraping resulted in `5751 rows` and `6 columns` and data analysis can provide valuable insights into the gold price trends and patterns over time. 
   
   For example, the analysis may reveal a steady increase or decrease in the price of gold, or fluctuations based on market events or economic indicators. The data can be visualized using charts or graphs to make it easier to interpret and identify trends. However, there may be challenges or limitations encountered during the project, such as websites with complex layouts or dynamic content that requires additional scripting. These challenges can be addressed by using more advanced web scraping techniques or by manually inspecting the website structure to identify the relevant data. In addition, future improvements or extensions to the project could include incorporating additional sources of data, such as news articles or social media sentiment, to provide a more comprehensive analysis of gold prices and related factors.

## Conclusion

    In conclusion, this project demonstrates the value of web scraping for data analysis, particularly from 01-09-2000 to 30-03-2023 for this period gold prices was tracked . By automating the process of accessing and extracting data from websites using Python and Selenium, we can gather large amounts of data quickly and efficiently for further analysis. The project highlights the importance of understanding the legal and ethical considerations of web scraping, as well as the technical challenges that can arise, such as navigating complex website layouts. However, with the right tools and techniques, web scraping can provide valuable insights into market trends and patterns that can inform investment strategies or research projects. The use of Selenium and Python for web scraping offers several benefits, including ease of use, flexibility, and compatibility with a range of data analysis tools and libraries. Overall, web scraping using Python and Selenium is a powerful tool for data analysis that can provide valuable insights into a range of domains, from finance to social media.

## Future Work

1. Real-time data streaming: Instead of manually running the web scraper to extract data periodically, consider implementing a real-time data streaming pipeline to automatically fetch new data as it becomes available. This can be achieved using technologies such as Apache Kafka or Amazon Kinesis.


2. Data visualization: Once data has been scraped and analyzed, consider creating a web application or dashboard to visualize the data in an interactive and user-friendly way. This can help users to better understand and interpret the data.


3. Natural language processing: In addition to scraping numerical data, consider using natural language processing techniques to extract insights from textual data on the web, such as news articles or social media posts. This can provide a more comprehensive view of market sentiment and trends.


4. Machine learning: Consider training a machine learning model on the scraped data to make predictions or identify patterns in market trends. This can enable more accurate forecasting and decision-making, and can help to automate certain aspects of the data analysis process.

## Reference
    
   This project drew on a variety of sources to support the development and implementation of the web scraping process. These included documentation for the Selenium library and the Python programming language, as well as tutorials and forums related to web scraping and data analysis. In addition, research articles and publications related to gold prices and market trends were consulted to inform the analysis and interpretation of the scraped data. The following list provides a selection of the sources used in the project:

- Selenium documentation: https://www.selenium.dev/documentation/en/
- Python documentation: https://docs.python.org/
- Stack Overflow: https://stackoverflow.com/

These sources were instrumental in providing guidance and insights throughout the project, and can serve as valuable resources for others looking to develop their skills in web scraping and data analysis.