# Purpose

Find a way to get data from Plugshare.com since they're not responding to my API access request. The comments and metadata from stations across different networks should be extremely useful in diagnosing electrical and non-electrical customer experience issues.

# Imports

In [3]:
%load_ext autoreload
%autoreload 2

import numpy as np
from rich import print
import os
import pandas as pd
from bs4 import BeautifulSoup
import requests

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Constants

In [4]:
from dotenv import load_dotenv

load_dotenv(override=True)

True

# Testing Selenium

The plugshare website seems very tricky in terms of javascript on top of javascript on top of...you get the idea. Given that [the most recent work](https://inldigitallibrary.inl.gov/sites/sti/sti/Sort_66982.pdf) I know of using Plugshare for research purposes uses Selenium, and that I ultimately will want API access once we have a budget and just need a POC's-worth of data, this is the same approach we'll use for now. But first, I need to learn how to use Selenium effectively!

We'll be running through [this tutorial](https://towardsdatascience.com/how-to-use-selenium-to-web-scrape-with-example-80f9b23a843a). *Note that I used a very helpful comment on this article to translate the original code from Selenium 3.X (presumably) to modern Selenium 4.*

Grab driver (for your current Chrome browser version) from [here](https://googlechromelabs.github.io/chrome-for-testing/)

In [5]:
%%time

from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys

options = webdriver.ChromeOptions()
# options.add_experimental_option("excludeSwitches", ["enable-automation"])
# options.add_experimental_option("useAutomationExtension", False)
service = ChromeService(executable_path="../data/raw/chromedriver") # had to explicitly allow OSX to open this as it is unsigned (but comes from Chrome GitHub so I'm comfortable)
driver = webdriver.Chrome(service=service, options=options)

In [6]:
# Point it at the core URL you care about

driver.get('https://hoopshype.com/salaries/players/')

When trying to grab player name elements using Chrome Dev Tools and the Inspect command, we find each is of the form

```html
<td class="name">
    <a href="https://hoopshype.com/player/stephen-curry/salary/">
        Stephen Curry
    </a>
</td>
```

so we translate that into an XPath for Selenium of the form `//td[@class=”name”]`.

> Breaking that down, all XPaths are preceded by the `//`, which we want in a `td` tag, with each `class` in that `td` tag needing to correspond to `“name”`.

In [11]:
players = driver.find_elements(By.XPATH, '//td[@class="name"]')

In [10]:
from tqdm import tqdm

players_list = []
for p in tqdm(range(len(players)), desc='Getting player names'):
    if (players[p].text != ''):
        players_list.append(players[p].text)

players_list

Getting player names: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 576/576 [00:09<00:00, 59.50it/s]


['PLAYER',
 'Stephen Curry',
 'Kevin Durant',
 'Nikola Jokic',
 'LeBron James',
 'Joel Embiid',
 'Bradley Beal',
 'Kawhi Leonard',
 'Paul George',
 'Giannis Antetokounmpo',
 'Damian Lillard',
 'Jimmy Butler',
 'Klay Thompson',
 'Rudy Gobert',
 'Fred VanVleet',
 'Anthony Davis',
 'Trae Young',
 'Zach LaVine',
 'Luka Doncic',
 'Tobias Harris',
 'Ben Simmons',
 'Pascal Siakam',
 'Kyrie Irving',
 'Jrue Holiday',
 'Karl-Anthony Towns',
 'Devin Booker',
 'Kristaps Porzingis',
 'CJ McCollum',
 'James Harden',
 'Darius Garland',
 'Ja Morant',
 'Zion Williamson',
 'Jamal Murray',
 'Brandon Ingram',
 'Michael Porter',
 'Shai Gilgeous-Alexander',
 'Donovan Mitchell',
 'Jayson Tatum',
 'Bam Adebayo',
 "De'Aaron Fox",
 'Deandre Ayton',
 'Jaylen Brown',
 'Gordon Hayward',
 'Chris Paul',
 'Domantas Sabonis',
 'Kyle Lowry',
 'Khris Middleton',
 'DeMar DeRozan',
 'Julius Randle',
 'Jordan Poole',
 'Jerami Grant',
 'Jaren Jackson Jr',
 'Tyler Herro',
 'Jalen Brunson',
 'Cameron Johnson',
 'Kyle Kuzma',


In [None]:
# Now for salaries, which have multiple attributes associated with the `td` tag
salaries = driver.find_elements(By.XPATH, '//td[@class="hh-salaries-sorted"]')

salaries_list = []
for s in tqdm(range(len(salaries)), desc='Getting salaries'):
    if (salaries[s].text != ''):
        salaries_list.append(int(salaries[s].text[1:])) # ignore "$"

In [18]:
# Pull it all together into a DataFrame
columns = ['Player', 'Salary', 'Year']
df = pd.DataFrame({'Player': players_list[1:], 'Salary': salaries_list[1:]})
df

Unnamed: 0,Player,Salary
0,Stephen Curry,51915615
1,Kevin Durant,47649433
2,Nikola Jokic,47607350
3,LeBron James,47607350
4,Joel Embiid,47607350
...,...,...
570,Trey Jemison,64343
571,Edmond Sumner,40459
572,Kaiser Gates,35389
573,Jamaree Bouyea,29247


I'm not sure why that errored out, but whatevs! It works.

# Running It on Plugshare

It *may* be as simple as running [this](https://github.com/14notout/plugshare/tree/main) with a Chrome driver instead of the Firefox one, but we shall see.

## With `requests`

In [9]:
single_station = 'https://www.plugshare.com/location/343961'

import requests
r = requests.get(single_station)
soup = BeautifulSoup(r.content, "html.parser")

# Get the name of the station
soup.find_all('div')[1]

<div id="noscroll"><div id="plugshare"><!--[if lt IE 10]>
          <p class="browser">You are using an <strong>outdated</strong> browser. Please <a href="https://browsehappy.com/">upgrade your browser</a> to improve your experience.</p>
        <![endif]--><div ui-view=""></div><script src="scripts/vendor-2cdcca5031.js"></script><script src="scripts/app-7df3521c1b.js"></script></div><script type="application/ld+json">
      {
        "@context": "http://schema.org",
        "@type": "LocalBusiness",
        "aggregateRating": {
          "@type": "AggregateRating",
          "bestRating": 10,
          "worstRating": 1,
          
          "ratingValue": "10.0",
          
          "reviewCount": "237"
        },
        
        "name": "Wards Crossing - Target",
        
        
        "description": "Find EV charging stations with PlugShare, the most complete map of electric vehicle charging stations in the world!Charging tips reviews and photos from the EV community.",
       

## With `selenium`

In [14]:
is = []
for i in range(0,15):
    is.append(i)

SyntaxError: invalid syntax (707685491.py, line 1)

In [30]:
options = webdriver.ChromeOptions()
service = ChromeService(executable_path="../data/raw/chromedriver")

It looks like this is able to at least see station metadata if I parse the soup well enough (which is awesome! Just `requests` no Selenium!) but how do I get the checkins/reviews?

In [None]:
driver.find_element()

In [81]:
# single_station = 'https://www.plugshare.com/location/343961'
base_url = "https://www.plugshare.com/"

driver = webdriver.Chrome(service=service, options=options)
driver.get(base_url) # I had to do cloudflare "confirm you're human" crap

username = "davidwmrench@gmail.com"
password = "yoaBX3XDs6z9ojab"

# Run login process
login_button = driver.find_elements(By.XPATH, "//md-dialog-content[@id='dialogContent_authenticate']")
login_button#.click()

[]

In [85]:
%%time

buttons = driver.find_elements(By.XPATH, "//md-dialog-content[@id='dialogContent_authenticate']")
login_button = buttons[1]
login_button

CPU times: user 963 µs, sys: 646 µs, total: 1.61 ms
Wall time: 10.1 ms


<selenium.webdriver.remote.webelement.WebElement (session="7427eb43a11f172feb59234f2c35225d", element="F8DB4342D7D0326A551DDEDBCA8F8D2C_element_116")>

In [88]:
accept_cookies_buttons = driver.find_elements(By.ID, "global-consent-tool-wrapper")
accept_cookies_buttons
# <span _ngcontent-ng-c1348393638="">Accept All</span>

[<selenium.webdriver.remote.webelement.WebElement (session="7427eb43a11f172feb59234f2c35225d", element="F8DB4342D7D0326A551DDEDBCA8F8D2C_element_117")>]

In [94]:
cookies_footer = accept_cookies_buttons[0]

In [92]:
cookies_footer.find_elements(By.XPATH, '//button[@id="save"]')

[]

In [93]:
cookies_footer.find_elements(By.ID, 'save')

[]

In [93]:
cookies_footer.find_elements(By.ID, 'save')

[]

In [95]:
cookies_footer.accessible_name

'Privacy Manager window.'

In [96]:
driver.close()

In [41]:
# Button to close "login or register" dialog
<button class="close md-icon-button md-button md-ink-ripple" type="button" ng-transclude="" ng-click="maps.closeAuth()" aria-label="cancel" ng-show="maps.visits <= 5 || maps.gmaps" ng-disabled="maps.auth.processing" aria-hidden="false"><md-icon aria-hidden="Close dialog" class="ng-scope material-icons" role="img">close</md-icon><div class="md-ripple-container"></div></button>

In [49]:
html_source = driver.page_source
len(html_source)

MaxRetryError: HTTPConnectionPool(host='localhost', port=63318): Max retries exceeded with url: /session/70b6223ccec1cf0765b85c710bb05441/source (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x28e961290>: Failed to establish a new connection: [Errno 61] Connection refused'))

In [28]:
from bs4 import BeautifulSoup

In [45]:
soup = BeautifulSoup(html_source,"html.parser")
# name = soup.find("button",{"property" : "v:name"}).text
soup.title

<title class="ng-binding" ng-bind="(ngMeta.title ? ngMeta.title : 'PlugShare - Find Electric Vehicle Charging Locations Near You')">Wards Crossing - Target | Lynchburg, VA | EV Station</title>

In [48]:
soup.find("button",{"property" : "v:name"}).text

AttributeError: 'NoneType' object has no attribute 'text'

In [47]:
soup.find("display-title")

In [40]:
html_source



In [None]:
# driver.close()