### Test URLS
https://sunstonepartners.com \
https://www.appliedlearning.com \
https://www.forsalebyowner.com login \
https://ecmins.com/ \
http://www.iconnect-corp.com \
https://cessco.ca/ ROBOT \
http://www.ticss.net \
https://www.tyremarket.com/Car-Tyres \
https://www.dentalxchange.com/ \

**ParseHub**
- can handle Javascript, AJAX, cookies, sessions and redirects.\
- fails to meet the requirements for:
Collecting data from websites that use anti-scraping technologies, such as CAPTCHA-protected websites.
Obtaining data at a large scale.
Bypassing geographical restrictions on content.
- More of a scheduled running of a set scrape of a specific website.

**JsonLink**\
(API key: pk_d22cfd76694dc6566ccbf7ed6f50cdc66215091c)\
30 Requests / per minute\
300 Requests / per hour\
30,000 Requests / per month\

*Incorporate rate limiting based on these rates*

**Site Metadata API**\
Free plan:\
100 / month + $0.0016 each other\
Rate Limit	30 requests per second\

**GPT usage** \
Purely token-based, scraping is a tool that GPT uses to help answer questions \
Using the API, price is per 1000 tokens \
Example: Query (100 tokens) and response (900) given by ChatGPT is around 1000 combined tokens. \
GPT4: I(\\$0.03), O(\\$0.06) => 0.003 + 0.054 => 0.057/url => all URLs = \\$14,000 \
GPT4-turbo: I(\\$0.01), O(\\$0.03)... \
ChatGPT is no additional cost to the subscription though

**Ideas** \
Contact Us page \
Testimonials \
Products \
Add types to function defs

### Important meta tags

In [6]:
import requests
import pandas as pd
import validators
import time
import typing

def get_meta_tags(url): #: str) -> dict[str,str]:

    api_key = 'pk_d22cfd76694dc6566ccbf7ed6f50cdc66215091c'    

#     url = 'https://cessco.ca/'
    # url = 'https://www.appliedlearning.com'

    params = {'url': url, 'api_key': api_key}
    
    # Free JsonLink limit is 30 req/minute so wait 3 seconds just in case
    time.sleep(3)
    
    response = requests.get('https://jsonlink.io/api/extract', params=params)

    if response.status_code == 200:
        data = response.json()
#         print(data)
        print('Title: ', data['title'], '\nDescription: ', data['description'], '\nDomain: ', data['domain'])
        return {k: data.get(k,None) for k in ('title', 'description', 'domain')}
    else:
        print(f'JSONLink Error for {url}: {response.status_code} - {response.text}')
        return {}

In [7]:
get_meta_tags('https://ecmins.com/')

Title:  ECM Solutions - Personal & Business - North Carolina & South Carolina 
Description:  ECM Solutions provides personal insurance (home and auto), business insurance, employee benefits and captive insurance in North Carolina and South Carolina. 
Domain:  ecmins.com


{'title': 'ECM Solutions - Personal & Business - North Carolina & South Carolina',
 'description': 'ECM Solutions provides personal insurance (home and auto), business insurance, employee benefits and captive insurance in North Carolina and South Carolina.',
 'domain': 'ecmins.com'}

## Scrape Page

```
for string in soup.nav.stripped_strings:
        print(string.strip('-'))
```
But soup may not have nav (i.e. https://www.forsalebyowner.com)

#### Exploration regarding text selection from page

Two longest pieces of text, or highest percentage length or something else?

In [132]:
a = ['basdf','sdf','zcbccb']
sorted(a,key=len)[-2:]
# a.sort(key=len)
# a

['basdf', 'zcbccb']

In [54]:
from bs4 import BeautifulSoup
import requests
import urllib.parse


texter = requests.get('https://ecmins.com/').text
souper = BeautifulSoup(texter,'html.parser')
az = souper.find_all('a')
hrefs = [x.attrs['href'] for x in az]
azs = list(set([i for i in hrefs if validators.url(i)]))

# parsed = urllib.parse.urlsplit(address)
# print("{}?{}".format(parsed.path.split("/")[-1], parsed.query))
urllib.parse.urlsplit(azs[0]).netloc
# for a in az:
#     print(a.attrs['href'])

'ecmins.com'

### Determining Relevance of href

In [72]:
import urllib.parse as up

sp = up.urlsplit('https://ecmins.com/')
print(sp.netloc)
sp

ecmins.com


SplitResult(scheme='https', netloc='ecmins.com', path='/', query='', fragment='')

### Getting different pages with same driver - Selenium

In [23]:
driver = webdriver.Chrome()
driver.get('https://www.dentalxchange.com/')
page_text = driver.find_element("xpath","/html/body").text
page_array = re.split(r'[\n\t]+',page_text)
print(page_array[10])
driver.get('https://www.dentalxchange.com/product/credentialconnect-for-providers')
page_text = driver.find_element("xpath","/html/body").text
page_array = re.split(r'[\n\t]+',page_text)
print(page_array[10])
driver.close()

The dental payments platform where
CREDENTIALCONNECT™ FOR PROVIDERS


### Compare bs4.nav and selenium nav tags

In [69]:
import requests
from bs4 import BeautifulSoup

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

from typing import Dict
import re
import urllib.parse as up

url = 'https://www.appliedlearning.com'

#bs4
response = requests.get(url)
texter = response.text
souper = BeautifulSoup(texter, 'html.parser')
print('bs4',souper.nav)

#selenium
driver = webdriver.Chrome()
driver.get(url)
navs = driver.find_elements("xpath","//nav")
navs_text = list(filter(None,[re.split(r'[\n\t]+',nav.text) for nav in navs if nav]))
driver.close()
print(navs_text)

bs4 None
<filter object at 0x000001CC58EFBD60>


**Report**: bs4 has .nav off of BeautifulSoup object, but is less reliable than Selenium for finding nav tags. However, manually finding nav tags with bs4 is better.

### Getting different pages with same driver - Selenium

In [21]:
driver = webdriver.Chrome()
driver.get('https://www.dentalxchange.com/')
page_text = driver.find_element("xpath","/html/body").text
page_array = re.split(r'[\n\t]+',page_text)
print(page_array[0])
driver.get('https://www.dentalxchange.com/product/credentialconnect-for-providers')
page_text = driver.find_element("xpath","/html/body").text
page_array = re.split(r'[\n\t]+',page_text)
print(page_array[0])
driver.close()

Skip to content
Skip to content


### All a tags within nav tags

#### Get all a tags.
1. test if href valid url \
2. test if url + (optional /) + href valid url \
3. likely a bust

####  Selenium

In [34]:
from selenium.webdriver.common.by import By 

driver = webdriver.Chrome()
driver.get('https://www.scorpion.co/')
anavs = driver.find_elements("xpath","//nav//ul")
tag_navs = driver.find_elements(By.TAG_NAME, 'nav')
tag_anavs = [i.find_elements(By.TAG_NAME,'a') for i in tag_navs]
children = [(i.text+':',i.find_elements("xpath",".//*")) for i in tag_navs]
    # Get all the elements available with tag name 'p'
# elements = element.find_elements(By.TAG_NAME, 'p')
text_anavs = [i.id for i in anavs]
# print(tag_anavs[1][0].text)
print(children[0][1][0].text)
driver.close()
print(text_anavs[0])
print(len(tag_anavs))


0347B016149A493B9B03B398C9D74967_element_23
1


#### bs4

In [125]:
from bs4 import BeautifulSoup
import requests
import validators
import urllib.parse as up

url = 'https://www.scorpion.co/'
# url = 'https://www.dentalxchange.com/'

html = requests.get(url).content
soupt = BeautifulSoup(html, 'html.parser')

def url_status(base_url, href) -> bool:
    return 'relevant' if up.urlparse(href).netloc == up.urlparse(base_url).netloc else 'irrelevant'

def assess_href(base_url, href) -> str:
    if validators.url(href):
        return [href,url_status(base_url,href)]
    if base_url[-1] != '/': base_url+='/'
    if href[0] == '/': href = href[1:]
    combined_url = base_url+href
    if validators.url(combined_url):
        return [combined_url,url_status(base_url, combined_url)]
    return ['Bad href', url_status(base_url,'')]

# First, find the <nav> tag, then find all <a> tags within it
navs = soupt.find_all('nav')
if navs:  # Check if a <nav> tag was found
    # Find the groupings of a tags within each nav of navs
    for nav in navs:
#         print(nav.get_text(strip=True))
        print(nav.id)
        a_group = nav.find_all('a')
#         print(a_group)
        for a in a_group:
            print(a.get_text(strip=True),':',assess_href(url,a.get('href'))[0])
    
    
    
#     a_groups = [nav.find_all('a') for nav in navs]
#     for a_group in a_groups:
#         print(a)
#         for a in a_group:
#             print(a.get_text(strip=True),':',a.get('href'))

None
How We Help : https://www.scorpion.co/javascript:void(0);
Get More Customers : https://www.scorpion.co/how-we-help/get-more-customers/
Scorpion ConnectConvert visitors into customers : https://www.scorpion.co/how-we-help/scorpion-connect/
AI ChatBetter Client Engagement : https://www.scorpion.co/how-we-help/ai-chat/
Content MarketingAttract attention with quality content : https://www.scorpion.co/how-we-help/content-marketing/
Digital AdvertisingGet more and better leads : https://www.scorpion.co/how-we-help/digital-advertising/
Search Engine RankingShow up in search results : https://www.scorpion.co/how-we-help/search-engine-ranking/
Video MarketingTell your story through video : https://www.scorpion.co/how-we-help/video-marketing/
WebsiteConvert more of your traffic : https://www.scorpion.co/how-we-help/website/
Fill Your Schedule : https://www.scorpion.co/how-we-help/fill-your-schedule/
ChatChat with customers : https://www.scorpion.co/how-we-help/chat/
CommunicationsConnect wi

####  Determine if url is broken/undefined (i.e. https://www.scorpion.co/javascript:void(0);) but somehow need to differentiate between these and blocked robot pages

In [123]:
driver = webdriver.Chrome()
driver.get('https://www.scorpion.co/javascript:void(0);')
ready_state = driver.execute_script('return document.readyState;')
driver.close()
print(ready_state)

complete


In [108]:
assess_href('https://www.scorpion.co/','/home-services/restoration/')
validators.url('https://www.scorpion.co/javascript:void(0);')

True

### Scraping Methods

In [2]:
from bs4 import BeautifulSoup
import requests

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

from typing import Dict, List
import re
import urllib.parse as up

def word_count(seg):
    count = 0
    for i in seg:
        if i == ' ':
            count += 1
    return count+1

def bs4_scrape(response_text):
    soup = BeautifulSoup(response_text,'html.parser')
    
    if soup.nav:
        # Loop through the nav and visit pages
        for nav_child in soup4.nav.children:
            if ('About Us' in str(nav_child)):
                html_about = requests.get(nav_child.find('a')['href']).text
                soup_about = BeautifulSoup(html_about,'html.parser')
                about_text = soup_about.get_text("|",strip=True).split("|")
                two_longest = sorted(about_text,key=len)[-2:]
                print(about_text)
            print(type(nav_child))
    else:
        # Visit every seemingly relevant href on the page
        print('1')
        
def bs4_page_scrape(url):
    page_text = requests.get(url).text
    text_soup = BeautifulSoup(page_text, 'html.parser')
    text_array = text_soup.get_text("|",strip=True).split("|")
    return sorted(text_array,key=len)[-2:]

def selenium_page_scrape(url):
    driver = webdriver.Chrome()
    driver.get(url)    
    page_text = driver.find_element("xpath","/html/body").text
    # TODO: split on any whitespace (\n and \t)
    page_array = re.split(r'[\n\t]+',page_text)
    # Extract the first two pieces of text with more than (7) words -> to be tested
    first_relevant = [i for i in page_text if word_count(i) > 7][:2]
    h1s = driver.find_elements("xpath","//h1")
    h2s = driver.find_elements("xpath","//h2")
    h1_texts = [h1.text for h1 in h1s if h1]
    h2_texts = [h2.text for h2 in h2s if h2]
    driver.close()
    return {'first_relevant': first_relevant, 'text':sorted(page_array,key=len)[-2],'headers':list(filter(None,h1_texts+h2_texts))}

def selenium_nav_scrape(url):
    driver = webdriver.Chrome()
    driver.get(url)
    anavs: List[str] = driver.find_elements("xpath","//nav//a")
        
    # 1. Find the navs (List[str])
    # 2. Clean out empties
    # 3. Get the text and href from each anav
    # 3. Split each one by [\n\t]+ then spread onto array
    anavs_clean = [(anav.text,anav.get_attribute('href')) for anav in anavs if anav.text]
    # Might not need to split since getting //nav//a
#     navs_wf = [ re.split(r'[\n\t]+',anav[0]) for anav in anavs_clean]
    nav_list = []
    # For loop because can't use splat in list comprehension
    for nav_inner in anavs_clean:
        if type(nav_inner) == list:
            nav_list += [*nav_inner]
        else:
            nav_list += [nav_inner]
            
    driver.close()
    return nav_list
    
def selenium_scrape(url) -> Dict[str,str]:
    url_results = {}
    
    # Scrape home page
    home_page_obj = selenium_page_scrape(url)
    url_results['home_page'] = home_page_obj
    
    #Scrape nav
    nav_list = selenium_nav_scrape(url)
    url_results['nav'] = nav_list
    
    if False:
        # XPath - typically unreliable but not with relative path
        atag_elems = driver.find_elements("xpath", "//a[@href]")
        for elem in atag_elems:
            print(elem.get_attribute("href"))
            if (elem.get_attribute("href") == 'https://cessco.ca/pressure-vessel-fabrication/'):
        #         and elem.is_displayed() and elem.is_enabled()):
                print(elem.id)
                inner_driver = webdriver.Chrome()
                inner_driver.get(elem.get_attribute("href"))
                h1 = inner_driver.find_element(By.TAG_NAME,"h1")
                inner_driver.close()
                break

        page_text = driver.find_element("xpath", "/html/body").text
        print('closing')
        driver.close()
        page_text
    
    return url_results

# https://www.appliedlearning.com
# https://sunstonepartners.com
# https://ecmins.com
# https://www.dentalxchange.com/
selenium_nav_scrape('https://www.dentalxchange.com/')

[('Sign up', 'https://register.dentalxchange.com/reg/step/LoginInfoPage?1'),
 ('Log in', 'https://register.dentalxchange.com/reg/login?0'),
 ('Marketplace',
  'https://www.dentalxchange.com/resources/solutions-marketplace/vendor-partners'),
 ('Tell Me More', 'https://www.dentalxchange.com/solutions/for-providers')]

### Loop through excel

In [22]:
import time

start_time = time.time()
index_range = slice(0,2)

file_path = './Website_Redirects_230919.csv'
df = pd.read_csv(file_path, low_memory=False)

if 'Website' not in df.columns:
    print("The CSV file must have a 'Website' column containing the URLs.")
else:
    backup_urls = df['Website'][index_range].tolist() 
    redirect_urls = df.get('Website Redirect', pd.Series(dtype=str)).tolist()[index_range]
    
    final_urls = []

    # Check if 'Website Redirect' column is already populated (with valid URL)
    for i, redirect_url in enumerate(redirect_urls):
        current_url, current_data = '', {}
        if redirect_url and validators.url(redirect_url):
            current_url = redirect_url
        elif backup_urls[i] and validators.url(backup_urls[i]):
            current_url = backup_urls[i]
        else:
            # Both the redirect_url and the backup_url are either nonexistent or invalid
            continue
        
        # Do scraping here

        # minimum layer is JsonLink
        current_data = get_meta_tags(current_url)
        
        # try requests, if don't receive 200 status, use selenium
        current_response = requests.get(url)
        if current_response.status_code//100 == 2:
            #BeautifulSoup scraping
            bs4_scrape(current_response.text)
        else:
            selenium_scrape(url)
                
        
    # Run the asynchronous function using asyncio.run()
    loop = asyncio.get_event_loop()
    final_urls = loop.run_until_complete(process_urls(valid_urls))

    # Update 'Website Redirect' column in the CSV file with final URLs
    update_redirect_urls(file_path, valid_urls, final_urls)

    print(f"'Website Redirect' column updated in {time.time()-start_time} seconds.")

Title:  Home - Sunstone Partners 
Description:  A partner for growth LEARN MORE Who We Are We are a leading growth oriented private equity firm that makes majority and minority investments in technology-enabled services and software businesses. We seek to partner with exceptional entrepreneurs, often as their first institutional capital partner, to help accelerate organic growth and fund acquisitions. We have over […] 
Domain:  sunstonepartners.com
Title:  Applied Learning creates powerful learning experiences to help you drive business performance. 
Description:  Applied Learning enables authentic employee engagement and better strategy execution. Our approach produces fresh solutions tailored to each client’s goals. 
Domain:  www.appliedlearning.com
Title:  FSBO Real Estate Listings: Buy or Sell a House | ForSaleByOwner 
Description:  Browse exclusive homes for sale by owner or sell your home FSBO.  ForSaleByOwner.com helps you sell your home fast and save money. See why so many are 

KeyboardInterrupt: 

Title:  Home - Sunstone Partners 
Description:  A partner for growth LEARN MORE Who We Are We are a leading growth oriented private equity firm that makes majority and minority investments in technology-enabled services and software businesses. We seek to partner with exceptional entrepreneurs, often as their first institutional capital partner, to help accelerate organic growth and fund acquisitions. We have over […] 
Domain:  sunstonepartners.com \

Title:  Applied Learning creates powerful learning experiences to help you drive business performance. 
Description:  Applied Learning enables authentic employee engagement and better strategy execution. Our approach produces fresh solutions tailored to each client’s goals. 
Domain:  www.appliedlearning.com \

Title:  FSBO Real Estate Listings: Buy or Sell a House | ForSaleByOwner 
Description:  Browse exclusive homes for sale by owner or sell your home FSBO.  ForSaleByOwner.com helps you sell your home fast and save money. See why so many are selling FSBO! 
Domain:  www.forsalebyowner.com \

Title:  Not found. 
Description:  
                    Whatever you were looking for doesn't currently exist at this address. Unless you were looking for this error page, in which case: Congrats! You totally found it.                 
Domain:  comsatmedia-en.tumblr.com \

Title:  Kdan Mobile | The Best Creative & Productivity Apps & Services 
Description:  Kdan Mobile provides mobile solutions including e-signature services, document and content creation apps that empower your creativity and productivity. 
Domain:  www.kdanmobile.com \
Title:  Buy Chlorella Online | Sun Chlorella USA 
Description:  Sun Chlorella provides the most digestible chlorella in the world using our proprietary process. Shop for & buy chlorella online in a variety of forms today! 
Domain:  www.sunchlorellausa.com \

Title:  Papé Material Handling | Hyster & Yale Dealer 
Description:  Papé Material Handling offers superior products and customer service at every one of our locations throughout the West.  
Domain:  www.papemh.com \

Title:  Wacker Brewing Company, brewed in Lancaster PA | 
Description:   
Domain:  wackerbrewing.com \

Title:  Local SEO Web Design Company Denver | 21stsoft.com 
Description:  21st maximizes your online presence with local SEO marketing. Founded in Denver, Colorado 1993 with an A+ BBB rating we elevate sales and lead generation. 
Domain:  21stsoft.com \

Title:  Orthodontic & Dental Marketing - Marketing by SOS 
Description:  Orthodontic & Pediatric Dental marketing: referrals, branding, websites, social media, tshirts & more! In a nutshell: we make great practices even better. 
Domain:  marketingbysos.com \

Title:  WITTMANN Group – US | WITTMANN Group 
Description:  Follow Us 
Domain:  www.wittmann-group.com \

Title:  Xela Pack | Environmentally Conscious Sample Packaging 
Description:  Xela Pack Inc. specializes in producing environmentally conscious sample and trial packaging. Our main goal is to provide product companies with an attractive, affordable and functional sample package while paying utmost respect to the earth. 
Domain:  www.xelapack.com \

Title:  Home - EPIC iO 
Description:  Unlock AIoT & wireless connectivity with EPIC iO and generate predictions to inform action, saving time & money! Discover the possibilities now. 
Domain:  epicio.com \

Title:  Buy Organic Chia Seeds, Products & More | Mamma Chia Online Store 
Description:  Buy Mamma Chia products online! Shop our online store to find chia seeds, snacks & more! 
Domain:  mammachia.com \

Title:  Fabricant de camions blindés Amérique du Nord – Cambli 
Description:  Canada, États-Unis, Europe. Groupe Cambli est le plus important concepteur et fabricant de camions blindés en Amérique du Nord. 
Domain:  cambli.com \

Title:  River Run | Milwaukee IT Firm & Managed IT Service Provider 
Description:  River Run is a full-service IT firm in the Greater Milwaukee Area. We partner with small to medium sized businesses on a full-suite of managed network services and IT support. 
Domain:  www.river-run.com \

Title:  Workspace Optimization | Desk Booking | ReSoft International LLC 
Description:  ReSoft International's focus is on employee self-service workplace management technology. We add highly scalable meeting room, team space and flexible workplace signage/management to your existing email technology investment. 
Domain:  www.re-soft.com \

Title:  {{title}} 
Description:  {{description}} 
Domain:  www.pavestone.com \

In [17]:
# arr = selenium_page_scrape('https://www.dentalxchange.com/')
arr = ['Skip to content', 'Sign up', 'Log in', 'Independent Practices', 'DSOs', 'Partners', 'Payers', 'Marketplace', 'Contact Us', 'Credential Now', 'The dental payments platform where', 'everyone', 'wins.', 'Accelerate your business and unlock new potential with the dental payments platform that connects your data, workflows, teams, and patients like never before.', 'Streamline claims processing', 'Speed up payments', 'Connect where it counts', 'Sign UpContact Sales', 'CREDENTIALING', 'ELIGIBILITY', 'CLAIMS', 'ATTACHMENTS', 'CLAIM STATUS', 'ERAs', 'PAYMENTS', 'BILLING', 'Accelerate', 'Dental', 'Payments', 'Payments the right way.', 'Every step of the way.', 'Say goodbye to complexity. We address the full lifecycle of dental payments. From verifying eligibility to filing accurate claims and beyond, our powerful platform, intelligent data, and massive network ensure payers get exactly what they need so dentists get exactly what they’ve earned. Quickly, reliably, and easily.', 'Get connected. Works for', 'everyone.', '107k+', 'Providers in Network', '178+', 'Partners in Network', '289MM', 'EDI Transaction Annually', '71MM', 'Dental Claims Annually', '30+', 'Years in the Industry', 'Connected to All Payers', 'All things payments. All in one place.', 'PaymentsXChange', 'The fastest, smartest way to', 'manage patient payments.', 'Manage your merchant account and', 'collect payments with ease. From credit', 'card processing to POS and financing', 'options, we’ve got you covered. Ready', 'to go paperless now? Easily connect', 'patients to recurring online payments—', 'and issue e-statements and email', 'receipts with quick clicks.', 'Learn More', 'PaymentsXChange', 'ClaimsXChange', 'NetworkXChange', '99%', 'Clean Claim Rate', 'I Want That', 'Connecting providers, partners, and payers.', "Here's what our clients are saying.", '"Once you get the swing of submitting claims with DentalXChange, it makes your life and work in the office so much better."', 'Mary Lawson, Office Manager', 'Bowden & Associates', '"I have been impressed by their', 'consistent dedication to providing first-rate customer service and technical support, and their on-going commitment to EDI in general. We are happy to work with them and appreciate their efforts, I give them two thumbs up!"', 'Barbara Castelli, EDI Support', 'Coordinator Denti-Cal (CA Medicaid)', '"We have been working with EHG for over 6 years now and I am consistently impressed with their Customer, Technical and Vendor support. Our company has a unique structure and EHG has been able to accommodate our needs with excellent results everytime!"', 'Andy Andersen, IS EDI Specialist', 'American Dental Partners, Inc.', 'Bowden & Associates', 'What you need. When you need it.', 'Providers', 'You’re constantly seeking ways to improve your practice so you can focus on what you do best—dentistry. You need to get paid on time. And you want the payment process to be as smooth, seamless, and simple as possible. That’s where we come in. We help reduce your costs and stress—by streamlining your workflows and eliminating manual processes and paperwork. Submit claims correctly the first time. Know where you stand at any time. And create better experiences across the board for your patients.', 'Tell Me More', 'Channel Partners', 'Payers', 'Trading Partners', 'One powerful platform.', 'Streamline workflows', 'Minimize risk', 'Modernize processes', 'We streamlined the payment process by making the connections that matter between providers, partners, and payers. Within our nationwide network of over 950+ payers, you can submit claims—and collect payments—faster than ever before.', 'Make your product even', 'more powerful.', 'By integrating DentalXChange into your software, you can offer a more comprehensive product and give your users a seamless interface for submitting claims and tracking payments. And with our easy-to-use APIs, you’ve got quick access to trusted data and a vast payments network that can scale—with added value just around the corner.', 'Start streamlining your payment process.', "Let's Go", 'Contact Us', '(949) 755-8065', 'Connect With Us:', 'Company', 'About Us', 'Careers', 'Contact Us', 'Partner Opportunities', 'Who We Serve', 'Independent Practices', 'DSOs', 'Channel Partners', 'Payers', 'Trading Partners', 'Products', 'Credentialing for Providers', 'Credentialing for Payers', 'Eligibility', 'Eligibility AI', 'Claims', 'Attachments', 'Attachments for Payers', 'Claim Status', 'ERA', 'Merchant Services', 'Patient Statements', 'APIs', 'Dental Billing Services', 'Resources', 'Client Support', 'Solutions Marketplace', 'Blog', 'Press Releases', 'FAQ', 'Developers', 'API', '© 2024 EDI Health Group, dba DentalXChange. All rights reserved.', 'Privacy Policy', 'User Agreement', 'Compliance', 'Security Paper']
def word_count(seg):
    count = 0
    for i in seg:
        if i == ' ':
            count += 1
    return count+1

arr2 = []
for i in arr:
#     print(i, word_count(i))
    if word_count(i) > 10:
        arr2 += [i]
# arr2 = [i for i in arr if word_count(i) > 5]
arr2[:2]

['Accelerate your business and unlock new potential with the dental payments platform that connects your data, workflows, teams, and patients like never before.',
 'Say goodbye to complexity. We address the full lifecycle of dental payments. From verifying eligibility to filing accurate claims and beyond, our powerful platform, intelligent data, and massive network ensure payers get exactly what they need so dentists get exactly what they’ve earned. Quickly, reliably, and easily.']