<small><small>Project by Robin Rai. Full project list @ **[GitHub](https://github.com/users/robinrai1349/projects/)**</small></small>

# Web Scraping a Dynamic Website displaying World Population Statistics

***[Worldometer](https://www.worldometers.info/world-population/)*** is a website that displays live information on the world population and also includes other statistics such as the number of births/deaths today and within the current year so far.

WebDriver will be used from the free open-source Selenium library to fetch the dynamic contents of the website.

In [1]:
# Import the required modules
from selenium import webdriver
import pandas as pd

In [2]:
# URL of Worldometer for world population, births, deaths, etc...
url = "https://www.worldometers.info/world-population/"

# Labels for the data entries to be extracted:
labels = ["Current World Population", "Births today",  "Deaths today", "Population Growth today",
         "Births this year", "Deaths this year", "Population Growth this year"]

As the website is Dynamic, my fix for the occurence of a Stale Element Reference error is an allocated number of retry attempts given. This is the most robust idea that came to mind at the given time.

In [3]:
# Number of retry attempts given (more retries = higher certainty)
retries = 100

In [4]:
# Initialise WebDriver (Will be using the Chrome browser)
driver = webdriver.Chrome()

# Navigate to the specified website
driver.get(url)

## Extracting and Storing the data

In [5]:
# Python list (will hold the data)
data = []

In [6]:
try:
    # implicity wait time to allow for dynamic page loading
    driver.implicitly_wait(10)
    
    # Find all the stat counters on the page
    entries = driver.find_elements("css selector", "span.rts-counter")
    
    # Iterate through the labels and retry mechanism for each entry
    for i in range(len(labels)):
        
        for retry in range(retries): # Loop based on number of retries specified at the start
            
            # Find the elements that are fragments of the whole entry
            entry_fragments = entries[i].find_elements("css selector", "span.rts-nr-int")
            
            try:
                # Concatenate text from the entry fragments
                complete_num = ','.join(fragment.text for fragment in entry_fragments)
                
                # Add data to Python list
                data.append(complete_num)
                
                print("{}: {}".format(labels[i], complete_num))
                break # Successful extraction, exit the retry loop
                
            except Exception as e:
                # Handle exceptions (e.g., StaleElementReferenceException) and retry
                print("Retrying ({}/{})...".format(retry + 1, retries))
                
except Exception as e:
    # Handle any exceptions that occurred during the web scraping process
    print("Error occurred:", str(e))
                                 
finally:
    # Ensure the WebDriver is closed properly, regardless of success or failure
    driver.quit()

Current World Population: 8,067,021,077
Births today: 302,494
Retrying (1/100)...
Deaths today: 136,876
Population Growth today: 165,618
Births this year: 106,239,831
Deaths this year: 48,072,401
Population Growth this year: 58,167,430


## Exporting the data

In [7]:
# Create dataframe
df = pd.DataFrame({"World Population Statistics": data})

# Label columns accordingly (using label list from earlier)
df.index = [labels]

# Preview
df

Unnamed: 0,World Population Statistics
Current World Population,8067021077
Births today,302494
Deaths today,136876
Population Growth today,165618
Births this year,106239831
Deaths this year,48072401
Population Growth this year,58167430


In [8]:
# Name file and export
df.to_csv('World_Population_data.csv')