## Scraping Dynamic Website using Selenium

In this tutorial, we'll walk you through the process of using Python and Selenium to scrape company data from the Affiliate World Conferences Europe Exhibitors website  ( https://affiliateworldconferences.com/europe/exhibitors/ ) . We'll cover how to automate web scraping, extract relevant information, and organize the data into a structured format.  
We'll scrape the company's name, type, location, description, and website URL that are not associated with Facebook or Twitter.
Consider the following picture, we need to scrape the that information:  
![company data](companyData3.jpg)

### Prerequisites  
Before you start, make sure you have the following installed:  
- Python (3.x recommended)  
- Selenium library  
- Pandas library  
- ChromeDriver

### Web Driver  
The Chrome WebDriver enables you to open web pages, interact with elements on the page (clicking buttons, filling out forms, etc.), navigate between pages, and retrieve information from the page's source code

In [27]:
# import the required libraries
from selenium import webdriver
import os
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd

In [28]:
chromedriverPath = r'C:\Users\driver\chromedriver.exe'
# Create a Service object with the path to chromedriver
chromeService = webdriver.chrome.service.Service(chromedriverPath)
# Start the service
chromeService.start()

In [29]:
# Create the WebDriver using the Service object
driver = webdriver.Chrome(service=chromeService)
driver.get('https://affiliateworldconferences.com/europe/exhibitors/')

In [30]:
# CSS selector of the parent element containing child elements
parentSelector = '#__next > main > section > div > section > ul.styles__List-sc-izf6gv-19.bsFaZO'
#__next > main > section > div > section > ul.styles__List-sc-izf6gv-19.bsFaZO > li:nth-child(1)
#__next > main > section > div > section > ul.styles__List-sc-izf6gv-19.bsFaZO > li:nth-child(2)
#__next > main > section > div > section > ul.styles__List-sc-izf6gv-19.bsFaZO > li:nth-child(202)
companiesList = []
for i in range(155, 163):
        if i == 160:
            continue  # Skip iteration for child number 160
        try:
            childSelector = f"{parentSelector} > li:nth-child({i})"
            # Click on the child element to open the pop-up
            childElement = driver.find_element_by_css_selector(childSelector)
            driver.execute_script("arguments[0].click();", childElement)

            # Wait for the pop-up to load (if needed)
            wait = WebDriverWait(driver, 10)
            # Select the required elements within the pop-up
            h3Elements = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.body h3')))
            pElements = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.body p')))
            aElements = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.body a')))

            h3Texts = [h3Element.text for h3Element in h3Elements]
            
            if pElements:

                typeAndLocation = [pElements[0].text for pElement in pElements]

                # Split the specificText into Type and Location
                for types in typeAndLocation:
                    companyType, location = map(str.strip, types.split('|', 1))

                # ppText = [pElements[2].text]
            
                
                # Scrape pElements[2] if available for the current child
                uniqueTexts = []      
                specificTexts = []
                
                # Scrape pElements[1] if available for the current child
                if len(pElements) == 2:
                    specificText1 = pElements[1].text
                    if specificText1:
                        specificTexts.append(specificText1)

                if len(pElements) >= 3:
                    specificText1 = pElements[2].text
                    if specificText1:
                        specificTexts.append(specificText1)

            # Scrape pElements[3] if available for the current child
                if len(pElements) >= 4:
                    specificText2 = pElements[3].text
                    if specificText2:
                        specificTexts.append(specificText2)

        # Combine the texts for the current child if both [2] and [3] are available
                combinedText = " AND ".join(specificTexts)
        
        # Append the combined text to the uniqueTexts list if it's not empty
                if combinedText and combinedText not in uniqueTexts:
                    uniqueTexts.append(combinedText)
            else:
                # Handle the case where pElements is empty (no elements found)
                print(f"No 'p' elements found in child {i}")

            websiteLink = None
            for aElement in aElements:
                hrefValue = aElement.get_attribute('href')
                if hrefValue and 'facebook' not in hrefValue.lower() and 'twitter' not in hrefValue.lower():
                    websiteLink = hrefValue
                    break
            # creat diction to store the element in companiesList
            container ={
                'name': h3Texts,
                'type': companyType,
                'location': location,
                'description': uniqueTexts,
                'url': websiteLink
                }
            companiesList.append(container)
            # print('saving: ', container['name'])

        except Exception as e:
            print(f"Error while scraping child {i}: {str(e)}")
            continue


  childElement = driver.find_element_by_css_selector(childSelector)


In [31]:
companiesList

[{'name': ['PureVPN - White Label Solution'],
  'type': 'Service Provider',
  'location': 'Virgin Islands, British',
  'description': ['PureWL enables businesses, enterprises, & super affiliates with a revolutionary white label VPN solution. Our solution is built from the ground up with extensive knowledge and experience in a constantly evolving cybersecurity landscape.'],
  'url': 'https://www.purewl.com/'},
 {'name': ['Quantox Technology'],
  'type': 'Service Provider',
  'location': 'Serbia',
  'description': ['Quantox Technology is an international company specializing in software development, IT business solutions, and consulting. It has over 500 tech experts across 7 countries and is known as the fastest-growing company in Eastern Europe.'],
  'url': 'https://www.quantox.com/'},
 {'name': ['REBLL Network'],
  'type': 'Network',
  'location': 'Netherlands',
  'description': ['REBLL has a solid portfolio of in-house dating offers. Next to that, we partner up with the best in the af

In [32]:
# Create a DataFrame from the 'data' list
df = pd.DataFrame(companiesList)

In [33]:
df

Unnamed: 0,name,type,location,description,url
0,[PureVPN - White Label Solution],Service Provider,"Virgin Islands, British","[PureWL enables businesses, enterprises, & super affiliates with a revolutionary white label VPN solution. Our solution is built from the ground up with extensive knowledge and experience in a constantly evolving cybersecurity landscape.]",https://www.purewl.com/
1,[Quantox Technology],Service Provider,Serbia,"[Quantox Technology is an international company specializing in software development, IT business solutions, and consulting. It has over 500 tech experts across 7 countries and is known as the fastest-growing company in Eastern Europe.]",https://www.quantox.com/
2,[REBLL Network],Network,Netherlands,"[REBLL has a solid portfolio of in-house dating offers. Next to that, we partner up with the best in the affiliate industry, to bring you offers that outperform their competition.]",https://rebll.com/
3,[Revolution Force],Network,United States,[Revolution Force is a performance affiliate network specializing in the dating vertical.],https://revolutionforce.com/
4,[RichAds],Traffic Source,Cyprus,"[RichAds is a self-serve advertising platform where scale meets performance.Our advertising formats: push, pops, in-page, calendar, direct click. RichAds offers 5B impressions daily in 200+ countries worldwide.]",https://richads.com/?utm_source=awe_23
5,[RollerAds],Network,United States,[RollerAds is a high-performance advertising network that exclusively monetizes all the SendPulse push notifications and tons of various direct traffic.],https://rollerads.com/
6,[RoundSky],Advertiser,United States,"[Round Sky offers direct USA loan offers. We own all our offers direct and they convert over 80% and pay up to $265 with EPCs averaging over $1. We also have API/XML posting, payday loans, installment loans, and more.]",https://www.roundsky.com/


In [None]:
# set max column width outh truncation
pd.set_option('display.max_colwidth', None)
df.head()

In [34]:
# to remove the square brackets from the name and description columns
df['name'] = df['name'].apply(lambda x: ', '.join(x).replace('[', '').replace(']', ''))
df['description'] = df['description'].apply(lambda x: ', '.join(x).replace('[', '').replace(']', ''))

In [35]:
df.head()

Unnamed: 0,name,type,location,description,url
0,PureVPN - White Label Solution,Service Provider,"Virgin Islands, British","PureWL enables businesses, enterprises, & super affiliates with a revolutionary white label VPN solution. Our solution is built from the ground up with extensive knowledge and experience in a constantly evolving cybersecurity landscape.",https://www.purewl.com/
1,Quantox Technology,Service Provider,Serbia,"Quantox Technology is an international company specializing in software development, IT business solutions, and consulting. It has over 500 tech experts across 7 countries and is known as the fastest-growing company in Eastern Europe.",https://www.quantox.com/
2,REBLL Network,Network,Netherlands,"REBLL has a solid portfolio of in-house dating offers. Next to that, we partner up with the best in the affiliate industry, to bring you offers that outperform their competition.",https://rebll.com/
3,Revolution Force,Network,United States,Revolution Force is a performance affiliate network specializing in the dating vertical.,https://revolutionforce.com/
4,RichAds,Traffic Source,Cyprus,"RichAds is a self-serve advertising platform where scale meets performance.Our advertising formats: push, pops, in-page, calendar, direct click. RichAds offers 5B impressions daily in 200+ countries worldwide.",https://richads.com/?utm_source=awe_23


In [25]:
# Once 'openpyxl' is installed, you should be able to run your code and 
# save the DataFrame to an Excel file without any issues.
# pip install openpyxl

In [None]:
# Save the DataFrame to an Excel file
df.to_excel('companiesList.xlsx', index=True)