# Introduction

The idea of this application is to get options contracts quotes data from this web using a list of tickers

## Selenium installation
Follow the  instructions of the Selenium tutorial about installation




# Getting the ticker quotes page

We want to directly access to the quotes page, avoiding navigation to get the form to access to the data. To do that, instead of navigate to the home page https://www.optionseducation.org/en.html, we will access directly to the quotes data page.

For example for HPQ ticker, the quotes data page is:
https://www.optionseducation.org/quotes.html?quote=HPQ



In [22]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select

result = {}
ticker = 'HPQ'
driver = webdriver.Chrome()
driver.get("https://www.optionseducation.org/quotes.html?quote=" + ticker)
try:
    assert "Quotes" in driver.title
except AssertionError:
    print("Unknown page \"{page}\", please check opened browser".format(page=driver.title))
result['ticker'] = ticker

We can see that the page is loaded:
<img src="img\oichome.png" alt="Drawing" style="width: 700px;"/>


# Getting the underlying price
Selecting the price in the page and doing an inspection, we realize that there is no name nor other attributes we can use the either identify the row or the table:

<img src="img\oicunderprice.png" alt="Drawing" style="width: 700px;"/>

So, we must use an indirect address method. We will look for the "Price" text and then identify the table as the parent and then we know the underlyng price is the row below.

The Xpath of the "Price" text is:
```
/html/body/table/tbody/tr/td/table[5]/tbody/tr[1]/td[1]
```
However we will look for the Xpath:
```
//*[td="Price"]/../tr[2]/td[1]

//*[td="Price"]   ---->   look for any tag with a child "td" of value "Price
/..               ---->   get the parent
/tr[2]            ---->   get the second row
/td[1]            ---->   get the first column
```
However, this search doesn't work neither, the element is not found.

This is beause the table is indeed part of another web page that optionseducation.org load when the quotes are searched.

A detailed search in the inspected document reveals that actually the page loaded is

https://oic.ivolatility.com/oic_adv_options.j?cnt=70d5ac98f2c2b1a16efa84de5c8fe10a2721bde5ac49612e&ticker=HPQ

and indeed this is the value of attribute "src" of a iframe tag which paren is:
```
<div class="iframe iframeReference">
    <iframe src="https://oic.ivolatility.com/oic_adv_options.j?cnt=70d5ac98f2c2b1a16efa84de5c8fe10a2721bde5ac49612e&amp;ticker=HPQ" width="100%" height="100%" style="height:800px"></iframe>
</div>
```

so let's look for the Xpath to get this page:
```
'//div[@class="iframe iframeReference"]/iframe'
```

and then get the "src" attribute to be used to load a new page.

To get the address:

In [29]:
xpath_str = '//div[@class="iframe iframeReference"]/iframe'
element = driver.find_element_by_xpath(xpath_str)
print(element. get_attribute("src"))

https://oic.ivolatility.com/oic_adv_options.j?cnt=70d5ac98f2c2b1a16efa84de5c8fe10a16f61b24933a7a98&ticker=HPQ


And to load this page:

In [31]:
driver.get(element. get_attribute("src"))

Now we can try to look for the "Price" text:

In [49]:
xpath_str = '//*[td="Price"]/../tr[2]/td[1]'
element = driver.find_element_by_xpath(xpath_str)
underlying = float(element.text)
print(element.text)
result['Underlying'] = underlying

23.71


# Getting the closest expiration date
The process now is quite straightforward, we have just to define the right Xpath

In this case, our key word is "Expiry". We have to look for a tag which has a text value that contains that word:
```
<tr bgcolor="#FFFFFF">
    <td colspan="16">
        <span class="s4"><b>Expiry: Jun 08, 2018
					&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
					Days: 0</b></span>
    </td>
</tr>
```

So the Xpath is:
```
xpath_str = '//*[contains(text(),"Expiry")]'
```

In [51]:
xpath_str = '//*[contains(text(),"Expiry")]'
element = driver.find_element_by_xpath(xpath_str)
expiration = element.text.split()
result['Expiration'] = "{} {}, {}".format(expiration[1], expiration[2], expiration[3])
print(result['Expiration'])

['Expiry:', 'Jun', '08,', '2018', 'Days:', '0']


# The ATM strike
This is the closest lower strike price of the current underlying price.

Depending on the security it could be calculated:

1. as a multiple of 0.5, for example if price is 23.78, the ATM strike would be 23.5
2. as a integer, for example if price is 23.78, the ATM strike would be 23
3. as a multiple of 5, for example if price is 23.78, the ATM strike would be 20

So we have to implement a script to check these three strikes by order and select the first one found. For example, in case of the 23.78 undelying price, first it will check the existence of the 23.5 strike, if not then the strike 23 and finally the 20 strike.

First we have to select the table with the closest expiration time. We can look for the first occurrence of the work "Strike":
```
xpath_str = '//*[contains(text(),"Strike")]'
```

Then we must be placed three parents above and then look for a td value equal to the strike that we are looking for:
```
xpath_str = '//*[contains(text(),"Strike")]/../../..//tr[td[1]="31.0"]'
```

But this doesn't work, since the reported strike could be the one in the second table (the table below the one with expiration date posterior). This is because the [] has higher preference that the // abreviation, in other words, assuming left-to-right precedence could be an error.

Therefore when we specify that look for any tag which contains the word "Strike", we must specify that we want the first occurence of the coincidence, this is done with the expression:

```
xpath_str = '(//*[contains(text(),"Strike")])[1]/../../..//tr[td[1]="31.0"]'
```

The `(//*[contains(text(),"Strike")])[1]` forces to look for the first occurrence of the "Strike" word

In the example below, the 31.0 strike is not present in the first table, therefore the node is not located, but present in the second table and reported:

In [257]:
xpath_str = '(//*[contains(text(),"Strike")])[1]/../../..//tr[td[1]="31.0"]'
try:
    element = driver.find_element_by_xpath(xpath_str)
    print(element.text)
except:
    print("Error, tag not found")
    
xpath_str = '(//*[contains(text(),"Strike")])[2]/../../..//tr[td[1]="31.0"]'
try:
    element = driver.find_element_by_xpath(xpath_str)
    print(element.text)
except:
    print("Error, tag not found")


Error, tag not found
31.0 C HPQ 0.000 0.00 0.01 0.00
(0.00) 0 0 97.43% 0.0249 0.0183 -0.0133 -1.3810 0.0019 0.0001


So the procedure will be:

In [267]:
underlying = 26.45


def check_strike(strike):
    strike_str = "{:01f}".format(strike) 
    xpath_str = '(//*[contains(text(),"Strike")])[1]/../../../tr[td[1]=' + strike_str + ']'
    try:
        element = driver.find_element_by_xpath(xpath_str)
        return element.text
    except:
        return None
    
# Strike multiple of 0.5
strike = int(underlying * 2)/2
strike_txt = check_strike(strike)
if not strike_txt:
    strike=int(underlying)
    strike_txt = check_strike(strike)
    if not strike_txt:
        strike=underlying-underlying%5
        strike_txt = check_strike(strike)
        if not strike_txt:
            raise ValueError("ATM strike not found")
print("ATM strike text= {}".format(strike_txt))
strike = float(strike_txt.split()[0])
print("\nStrike = {}".format(strike))
result['strike'] = strike
        
        


ATM strike text= 26.0 C HPQ 0.000 0.00 0.02 0.01
(0.00) 0 48 38.48% 0.0000 0.0000 0.0000 0.0000 0.0000 -0.0000

Strike = 26.0


# Getting contract information: IV, bid and ask
Now is simple, when we got the strike text, we also got the data related to the strike of the call option, getting this data is just a question of processing the text. But first we have to get the put option data which is located at the next row of the table.

We will use the Xpath axes "following-sibling::tr"

In [272]:
xpath_str = '(//*[contains(text(),"Strike")])[1]/../../..//tr[td[1]="23.5"]/following-sibling::tr'
element = driver.find_element_by_xpath(xpath_str)
print(element.text)


P HPQ 0.000 0.00 0.01 -0.07
(-93.33) 35 799 17.88% 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000


So final code is:

In [276]:
underlying = 23.71


def check_strike(strike):
    strike_str = "{:01f}".format(strike) 
    xpath_str = '(//*[contains(text(),"Strike")])[1]/../../../tr[td[1]=' + strike_str + ']'
    try:
        element = driver.find_element_by_xpath(xpath_str) # to locate the strike
        xpath_str = '(//*[contains(text(),"Strike")])[1]/../../../tr[td[1]=' + strike_str + ']/following-sibling::tr'
        element = driver.find_element_by_xpath(xpath_str) # to get the put contract data
        return element.text
    except:
        return None
    
# Strike multiple of 0.5
strike = int(underlying * 2)/2
strike_txt = check_strike(strike)
if not strike_txt:
    strike=int(underlying)
    strike_txt = check_strike(strike)
    if not strike_txt:
        strike=underlying-underlying%5
        strike_txt = check_strike(strike)
        if not strike_txt:
            raise ValueError("ATM strike not found")
print("ATM strike text= {}".format(strike_txt.split()))
atm_text = strike_txt.split()
print("\nStrike = {}".format(strike))
result['strike'] = strike
result['bid'] = atm_text[3]
result['ask'] = atm_text[4]
result['volume'] = atm_text[7]
result['open_interest'] = atm_text[8]
result['iv'] = atm_text[9]
result

ATM strike text= ['P', 'HPQ', '0.000', '0.00', '0.01', '-0.07', '(-93.33)', '35', '799', '17.88%', '0.0000', '0.0000', '0.0000', '0.0000', '0.0000', '0.0000']

Strike = 23.5


# Final code until now

In [277]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select

def check_strike(strike):
    strike_str = "{:01f}".format(strike) 
    xpath_str = '(//*[contains(text(),"Strike")])[1]/../../../tr[td[1]=' + strike_str + ']'
    try:
        element = driver.find_element_by_xpath(xpath_str) # to locate the strike
        xpath_str = '(//*[contains(text(),"Strike")])[1]/../../../tr[td[1]=' + strike_str + ']/following-sibling::tr'
        element = driver.find_element_by_xpath(xpath_str) # to get the put contract data
        return element.text
    except:
        return None
    


# OPEN THE QUOTES PAGE
result = {}
ticker = 'HPQ'
driver = webdriver.Chrome()
driver.get("https://www.optionseducation.org/quotes.html?quote=" + ticker)
try:
    assert "Quotes" in driver.title
except AssertionError:
    print("Unknown page \"{page}\", please check opened browser".format(page=driver.title))
result['ticker'] = ticker

# LOOK FOR TABLE PAGE
xpath_str = '//div[@class="iframe iframeReference"]/iframe'
element = driver.find_element_by_xpath(xpath_str)
#print(element. get_attribute("src"))
driver.get(element. get_attribute("src"))

# GET UNDERLYING PRICE
xpath_str = '//*[td="Price"]/../tr[2]/td[1]'
element = driver.find_element_by_xpath(xpath_str)
underlying = float(element.text)
result['Underlying'] = underlying

# GET THE EXPIRATION DATE
xpath_str = '//*[contains(text(),"Expiry")]'
element = driver.find_element_by_xpath(xpath_str)
expiration = element.text.split()
result['Expiration'] = "{} {} {}".format(expiration[1], expiration[2], expiration[3])

# GET THE ATM STRIKE
strike = int(underlying * 2)/2
strike_txt = check_strike(strike)
if not strike_txt:
    strike=int(underlying)
    strike_txt = check_strike(strike)
    if not strike_txt:
        strike=underlying-underlying%5
        strike_txt = check_strike(strike)
        if not strike_txt:
            raise ValueError("ATM strike not found")
atm_text = strike_txt.split()
result['strike'] = strike
result['bid'] = atm_text[3]
result['ask'] = atm_text[4]
result['volume'] = atm_text[7]
result['open_interest'] = atm_text[8]
result['iv'] = atm_text[9]
result

{'Expiration': 'Jun 08,, 2018',
 'Underlying': 23.71,
 'ask': '0.01',
 'bid': '0.00',
 'iv': '17.88%',
 'open_interest': '799',
 'strike': 23.5,
 'ticker': 'HPQ',
 'volume': '35'}

# Selecting the ticker
The information related to any ticker is shown when a search for a specific ticker is done.

So first we have to look for the input field and then enter the ticker we want to look for.

Before that, and because maybe we had opened the oath page before, we need to do a refresh of the driver and a check that really we are at the main page of finance.yahoo.com

```
driver.refresh()
try:
    assert "Yahoo Finance" in driver.title
except AssertionError:
    print("Unknown page {page}, please check opened browser".format(page=driver.title))
```

The input element can be checked in the Çhorme inspector, in this case is:
```
<input type="text" aria-label="Search" autocomplete="off" autocorrect="off" autocapitalize="off" class="Pos(r) W(100%) M(0) O(0) O(0):f Bgc(#fff) Z(2) Bxsh(n) Bxsh(n):f Fz(15px) Px(15px) Py(8px) Bdrs(0) Pstart(10px) Pend(10px) Va(t)" name="p" placeholder="Search for news, symbols or companies" style="-webkit-appearance:none;" value="" data-reactid="55">
```

Again using a xpath expression and looking for a name "p" should be enough:
```
element = driver.find_element_by_xpath("//input[@name='p']")
```
Upon selected we have to send firt the keys related to the ticker ("HPQ") and finally the Keys.RETUN code:


In [1]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select

ticker = "ilg"
driver = webdriver.Chrome()
driver.get("https://finance.yahoo.com")
try:
    assert "Yahoo Finance" in driver.title
except AssertionError:
    try:
        assert "Oath" in driver.title
    except AssertionError:
        print("Unknown page {page}, please check opened browser".format(page=driver.title))
    else:
        print("Closing Oath screen")
        element = driver.find_element_by_xpath("//input[@value='OK']")
        element.send_keys(Keys.RETURN)
        driver.refresh()
        try:
            assert "Yahoo Finance" in driver.title
        except AssertionError:
            print("Unknown page {page}, please check opened browser".format(page=driver.title))
        else:
            element = driver.find_element_by_xpath("//input[@name='p']")
            element.send_keys(ticker)
            element.send_keys(Keys.RETURN)

Closing Oath screen


It can last several seconds since for example the element search is not done until the page is fully loaded.

The page is loaded showing the summary tab as default. We want to extract some information from this page.

# Getting basic data
## Security price
The price is placed in the tag:
```
<span class="Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)">22.03</span>
```

However this information is not enough. To get more detailed information, go to the page opened and open the inspector over the price, then right buttom and click copy> copy Xpath.

We obtain something like:
```
//*[@id="quote-header-info"]/div[3]/div[1]/div/span[1]
```

Let's try it out:

In [2]:
element = driver.find_element_by_xpath("//*[@id='quote-header-info']/div[3]/div[1]/div/span[1]")
print(element.text)

34.53


## Information from summary table

In the summary tab, there is a table full of awesome information, let's scrapt some.

<img src="img\table.png" alt="Drawing" style="width: 400px;"/>

We will follow the same procedure as before: in Chrome inspector we will generate the Xpath information of the specific element.

### Dividend Yield
The Xpath returned by Chrome is:
```
//*[@id="quote-summary"]/div[2]/table/tbody/tr[6]/td[2]
```


In [13]:
# element.clear()
element = driver.find_element_by_xpath("//*[@id='quote-summary']/div[2]/table/tbody/tr[6]/td[2]")
print(element.text)

0.70 (2.17%)


However this xpath is too much absolute-direction oriented, and this could be dangerous if the page environment changes, for example if something is inserted in. Let's try to look for a more relative address.

To do that, we will get all the table trying to use the attributes in it. The html that manages the table is:
<img src="img\tablehtml.png" alt="Drawing" style="width: 600px;"/>

We could try to look for the "div" tag with attribute "data-test" equal to "right-summary-table", then the table below is the one we want

The xpath will be:
```
"//*[@id='quote-summary']//div[@data-test='right-summary-table']/table/tbody//tr"
```
we must capture all the rows of the table, the code will be

In [14]:
xpath_table = "//*[@id='quote-summary']//div[@data-test='right-summary-table']/table/tbody//tr"
elements = driver.find_elements_by_xpath(xpath_table)
for element in elements:
    print(element.text)
    

Market Cap 4.293B
Beta 1.34
PE Ratio (TTM) 25.77
EPS (TTM) 1.34
Earnings Date Aug 1, 2018 - Aug 6, 2018
Forward Dividend & Yield 0.70 (2.17%)
Ex-Dividend Date 2018-03-15
1y Target Est 36.00


Because we are interested in just the yield, we have to split the text of element 5

In [18]:
dividend_yield = elements[5].text
print("Element 5: {}".format(dividend_yield))
dividend_yield = dividend_yield.replace('%','(').split('(')[1]
print("Dividend Yield: {}".format(dividend_yield))

Element 5: Forward Dividend & Yield 0.70 (2.17%)
Dividend Yield: 2.17



### Ex Dividend Date (as string) and Market Cap

Any data from that table can now be extracted by row and little of processing for example for the market capitalization and ex-dividend date

In [21]:
ex_dividend_date = elements[6].text.split()[2]
print(ex_dividend_date)
market_cap = elements[0].text.split()[2]
print(market_cap)

2018-03-15
4.293B


# Historical data
## Switching tab

Let's say we want to get the closing price of the security in a given date. First we have to switch to the "Historical Data" tab:
```
//*[@id="quote-nav"]/ul/li[9]/a/span
```

We will use the method click()

In [8]:
element = driver.find_element_by_xpath("//*[@id='quote-nav']/ul/li[9]/a")
element.send_keys(Keys.ENTER)

## Selecting the timeframe
Now the issue is that the quotes information is limited to some days per page. If we want to choose a specific date we will have to do using a drop down selection menu with two fields to enter the start and end dates:
<img src="img\timeframe.png" alt="Drawing" style="width: 600px;"/>


```
<input type="text" name="startDate" maxlength="10" placeholder="mm/dd/yyyy" class="Bdrs(0) Bxsh(n)! Fz(s) Bxz(bb) D(ib) Bg(n) Pend(5px) Px(8px) Py(0) H(34px) Lh(34px) Bd O(n):f O(n):h Bdc($c-fuji-grey-c) Bdc($c-fuji-blue-1-b):f M(0) Pstart(10px) Bgc(white) W(90px) Mt(5px)" value="6/1/2018">

//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/div/input[1]
```

let's suppose we want to get the closing price of the security 5 years ago, first we have to introduce a valid timeframe that has our date, so let's take the start date 15 days before  five years ago (1840 days, 5\*365 + 15 ) and the end date 15 days after five years ago (1810, 365/*5 - 15).

What the page expects is a value fo the attribute "value" in this tag, in string format "m/d/yyyy":


In [9]:
from datetime import datetime, date, timedelta

today = date.today()
start_lapse = timedelta(days=365*5+15)
end_lapse = timedelta(days=365*5-15)
start_date = (today-start_lapse).strftime("%m/%d/%Y")
end_date = (today-end_lapse).strftime("%m/%d/%Y")
print("Start date: {start}\nEnd date: {end}".format(start=start_date, end=end_date))

Start date: 05/18/2013
End date: 06/17/2013


Now we have to select the dropdown menu, again using Xpath of Chrome:


In [10]:
#element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/span/input')
element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@data-test="date-picker-full-range"]')
element.click()

Notice, that we have chosen a xpath avoiding absolute positions, since it could change from ticker to ticker. Indeed the xpath means:
```
<input type="text" data-test="date-picker-full-range" class="C(t) O(n):f Tsh($actionBlueTextShadow) Bd(n) Bgc(t) Fz(14px) Pos(r) T(-1px) Bd(n):f Bxsh(n):f Cur(p) W(190px)" value="Jun 01, 2017 - Jun 01, 2018">

//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@data-test="date-picker-full-range"]

//*[@id="Col1-1-HistoricalDataTable-Proxy"]  = Select all nodes with attribute id equal to "Col1-1-HistoricalDataTable-Proxy"
/section                                     = Select the children section node of before nodes
//input[@data-test="date-picker-full-range"] = Select all nodes that are inputs with an attribute data-test value of "date-picker-full-range" and descendants of node before
```

once it's open we must introduce both dates:

In [11]:
element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@name="startDate"]')
for i in range(10):
    element.send_keys(Keys.DELETE)
element.send_keys(start_date)
element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@name="endDate"]')
for i in range(10):
    element.send_keys(Keys.BACKSPACE)
element.send_keys(end_date)

#driver.execute_script("arguments[0].value = arguments[1]", element, start_date)
#element.send_keys(Keys.ENTER)
#print(element.get_attribute('value'))
#element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/div/input[2]')
#driver.execute_script("arguments[0].value = arguments[1]", element, end_date)
#element.send_keys(Keys.ENTER)
#print(element.get_attribute('value'))

Notice that we have used a sequence of Deletes and backspaces to erase the old dates information in the window. 

Finally click on the Done button and afterwards the apply button:

In [12]:
element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/div/div[3]/button[1]')
element.click()
element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/button')
element.click()

## Validating the dateframe
it works. However it could happen that the introduced timeframe doesn't exist in the historical data of yahoo, if that happens then the button "Done" is not enabled such is shown:
<img src="img\notenabled.png" alt="Drawing" style="width: 600px;"/>

In this case a red text appears with a notification and the closing of that window doesn't work because is not enabled.

The xpath position of the red line is:
```
//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/div/span[3]
```

and it's inmediately below the input date fields that are located at span[2] such as it can be seen in their xpath and the Chrome inspector:
```
element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/div/input[1]')
element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/div/input[2]')
```

It's more, if the date entry is right, then this red line (error reporting) doesn't exist and try to look for it will return an error:

In [13]:
start_date = "05/19/2000"
end_date = "05/19/2001"
element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@data-test="date-picker-full-range"]')
element.click()
element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@name="startDate"]')
for i in range(10):
    element.send_keys(Keys.DELETE)
element.send_keys(start_date)
element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@name="endDate"]')
for i in range(10):
    element.send_keys(Keys.BACKSPACE)
element.send_keys(end_date)
element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/div/span[3]')
print(element.text)

'Start' date must be prior to 'End' date.


In [15]:
start_date = "05/18/2016"
end_date = "05/19/2017"
element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@data-test="date-picker-full-range"]')
element.click()
element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@name="startDate"]')
for i in range(10):
    element.send_keys(Keys.DELETE)
element.send_keys(start_date)
element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@name="endDate"]')
for i in range(10):
    element.send_keys(Keys.BACKSPACE)
element.send_keys(end_date)
try:
    element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/div/span[3]')
except Exception as e:
    print("Error detected:\n{}".format(e))

Error detected:
Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/div/span[3]"}
  (Session info: chrome=66.0.3359.181)
  (Driver info: chromedriver=2.38.552522 (437e6fbedfa8762dec75e2c5b3ddb86763dc9dcb),platform=Windows NT 6.3.9600 x86_64)



In [16]:
start_date = "05/19/2013"
end_date = "06/19/2013"
# Opent the timeframe window
element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@data-test="date-picker-full-range"]')
element.click()
# Select start date field
element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@name="startDate"]')
for i in range(10):
    element.send_keys(Keys.DELETE) # Deleting default content of datafield
element.send_keys(start_date) # Writing start date in the field
# Select end date field
element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@name="endDate"]')
for i in range(10):
    element.send_keys(Keys.BACKSPACE)
element.send_keys(end_date)
# Check if error message is present
try:
    element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/div/span[3]')
except:
    # Dates validated
    # Applying introduced timeframes
    element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/div/div[3]/button[1]')
    element.click()
    element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/button')
    element.click()
else:
    # Dates not validated, error reported
    # Closing timeframe window
    element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/div/div[3]/button[2]')
    element.click()

## Validating the date historical date existence
Now, the quotes table of the timeframe is shown. It could happen that the actual date doesn't exist in the database (for example is Sunday and markets are closed). Therefore another try-except must be done in order to check the availability of this information:

In [17]:
current_date = date.today()
lapse = timedelta(days=365*5)
end_date = (today-lapse).strftime("%b %d, %Y")
print(end_date)

Jun 02, 2013


In [18]:
xpath_date = '//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[2]/table/tbody//td[span="' + end_date +'"]'
try:
    element = driver.find_element_by_xpath(xpath_date)
except:
    print("error")
else:
    print(element.text)

error


In case of error, the closest day before has to be chosen, so a loop decreasing dates must be implemented until a valid quote is detected.

One way of doing that is iterating the search in the web page like:


In [19]:
current_date = date.today()
end_date = current_date - timedelta(days=365*5)
end_date_str = end_date.strftime("%b %d, %Y")
print(end_date_str)
while True:
    end_date_str = end_date.strftime("%b %d, %Y")
    xpath_date = '//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[2]/table/tbody//td[span="' + end_date_str +'"]'
    try:
        element = driver.find_element_by_xpath(xpath_date)
    except:
        print("error date {}".format(end_date_str))
        end_date = end_date - timedelta(days=1)
    else:
        print(element.text)
        break

Jun 02, 2013
error date Jun 02, 2013
error date Jun 01, 2013
May 31, 2013


Now, we have to select the parent of this node, and split its text, the value that we are looking for is the Adjusted closed which is the 8th item of the list

In [20]:
element = driver.find_element_by_xpath(xpath_date + '/..')
adj_close = element.text.split()[7]
print("Adjusted close = {}".format(adj_close))

Adjusted close = 19.21


However, it makes more sense to get all the quotes table and just look for the date string in the list:

In [21]:
timeframeIsValid = True
# Get the quotes
if not timeframeIsValid:
    adj_close = None
else:
    current_date = date.today()
    end_date = current_date - timedelta(days=365*5)
    end_date_str = end_date.strftime("%b %d, %Y")
    print(end_date_str)
    xpath_date = '//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[2]/table/tbody//tr' # to get all the table
    elements = driver.find_elements_by_xpath(xpath_date)
    adj_close = None
    while True:
        for element in elements:
            if end_date_str in element.text:
                adj_close = element.text.split()[7]
                break
        if adj_close:
            break
        else:
            end_date = end_date - timedelta(days=1)
            end_date_str = end_date.strftime("%b %d, %Y")
    print(adj_close)

Jun 02, 2013
19.21


## Final Code

We needed to add a block code to wait when switching from summary tab to historical data, this is because all the videos, staff and scrap that the page has to load before enabling the interaction with the user

In [33]:
from datetime import datetime, date, timedelta
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time


# Select historical Data Tab
element = driver.find_element_by_xpath("//*[@id='quote-nav']/ul/li[9]/a")
element.send_keys(Keys.ENTER)

# Calculate start and end dates for the timeframe
today = date.today()
start_lapse = timedelta(days=365*5+15)
end_lapse = timedelta(days=365*5-15)
start_date = (today-start_lapse).strftime("%m/%d/%Y")
end_date = (today-end_lapse).strftime("%m/%d/%Y")
#print("Start date: {start}\nEnd date: {end}".format(start=start_date, end=end_date))

# Open the timeframe window
try: # this block implements a wait for the load of the table element, after 5 seconds a timeout is error is produced
    element = WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.ID, "Col1-1-HistoricalDataTable-Proxy")))
except:
    raise
time.sleep(1)
element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@data-test="date-picker-full-range"]')
element.click()

# Select start date field and introduce start_date
element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@name="startDate"]')
for i in range(10):
    element.send_keys(Keys.DELETE) # Deleting default content of datafield
element.send_keys(start_date) # Writing start date in the field

# Select end date field and introduce end_date
element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@name="endDate"]')
for i in range(10):
    element.send_keys(Keys.BACKSPACE)
element.send_keys(end_date)

# Validate timeframe
# Check if error message is present
try:
    element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/div/span[3]')
except:
    # Dates validated
    # Applying introduced timeframes
    element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/div/div[3]/button[1]')
    element.click()
    element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/button')
    element.click()
    timeframeIsValid = True
else:
    # Dates not validated, error reported
    # Closing timeframe window
    element = driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/div/div[3]/button[2]')
    element.click()
    timeframeIsValid = False

# Get the quotes
if not timeframeIsValid:
    adj_close = None
else:
    current_date = date.today()
    end_date = current_date - timedelta(days=365*5)
    end_date_str = end_date.strftime("%b %d, %Y")
    print(end_date_str)
    xpath_date = '//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[2]/table/tbody//tr' # to get all the table
    elements = driver.find_elements_by_xpath(xpath_date)
    adj_close = None
    while True:
        for element in elements:
            try:
                if end_date_str in element.text:
                    adj_close = element.text.split()[7]
                    break
            except:
                pass
        if adj_close:
            break
        else:
            end_date = end_date - timedelta(days=1)
            end_date_str = end_date.strftime("%b %d, %Y")
    print("Adjusted Close = {}".format(adj_close))

Jun 02, 2013
Adjusted Close = 19.21


## Refractoring the code
Now the code is just a collection of script, let's organize it by  classes 

In [8]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select, WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time
from datetime import datetime, date, timedelta

class Selenium_Driver:
    """Creation and management of a Selenium chrome driver"""
    def __init__(self, web_page, chrome_options=None):
        if chrome_options:
            self.driver = webdriver.Chrome(chrome_options=chrome_options)
        else:
            self.driver = webdriver.Chrome()
        self.driver.get(web_page)
        
    def get_cookies(self):
        return self.driver.get_cookies()
    
    def delete_all_cookies(self):
        self.driver.delete_all_cookies()
                
    def check_page_title(self, title):
        if not self.driver:
            raise ValueError("webdriver is not initialitated")
        try:
            assert title in self.driver.title
            return 0
        except AssertionError:
            return self.driver.title

    def send_keys(self, xpath_str, keys, elementToBeClosed=False):
        if not self.driver:
            raise ValueError("webdriver is not initialitated")
        try:
            element = self.driver.find_element_by_xpath(xpath_str)
            element.send_keys(keys)
            if elementToBeClosed:
                element.clear()
        except Exception as e:
            print('Error while sending keys "{keys}". Error reported:\n{err}'.format(keys=keys, err=e))
            return 1
        else:
            return 0
    def send_n_keys(self, xpath_str, keys, n, elementToBeClosed=False):
        if not self.driver:
            raise ValueError("webdriver is not initialitated")
        try:
            element = self.driver.find_element_by_xpath(xpath_str)
            for count in range(n):
                element.send_keys(keys)
            if elementToBeClosed:
                element.clear()
        except Exception as e:
            print('Error while sending keys "{keys}". Error reported:\n{err}'.format(keys=keys, err=e))
            return 1
        else:
            return 0
    def send_click(self, xpath_str, elementToBeClosed=False):
        if not self.driver:
            raise ValueError("webdriver is not initialitated")
        try:
            element = self.driver.find_element_by_xpath(xpath_str)
            element.click()
            if elementToBeClosed:
                element.clear()
        except Exception as e:
            print('Error while sending click. Error reported:\n{err}'.format(keys=keys, err=e))
            return 1
        else:
            return 0
        
    def get_element(self, xpath_str):
        return self.driver.find_element_by_xpath(xpath_str)
    
    def get_elements(self, xpath_str):
        return self.driver.find_elements_by_xpath(xpath_str)
    
    def close_driver(self):
        self.driver.close()
    
    def wait_for_element_ID(self, id_str, wait_time):
        try: # this block implements a wait for the load of the table element
            # after 5 seconds a timeout is error is produced
            element = WebDriverWait(self.driver,
                                    wait_time).until(EC.presence_of_element_located((By.ID, id_str)))
        except:
            raise
        time.sleep(1)
        
    def wait_for_element_XPath(self, xpath_str, wait_time):
        try: # this block implements a wait for the load of the table element
            # after 5 seconds a timeout is error is produced
            element = WebDriverWait(self.driver,
                                    wait_time).until(EC.presence_of_element_located((By.XPATH, xpath_str)))
        except:
            raise
        time.sleep(1)
            
class Finance_Yahoo_Navigation(Selenium_Driver):
    """Manage navigation in yahoo.finance.page using Chrome"""
    
    def __init__(self, delete_cookies=True, chrome_options=None):
        super().__init__("https://finance.yahoo.com", chrome_options)
        if type(self.check_page_title("Yahoo Finance")) is str:
            if "Oath" in self.driver.title:
                self.send_keys("//input[@value='OK']", Keys.RETURN)
            else:
                raise ValueError("Unknown page {page},"
                                 " please check opened browser".format(page=self.driver.title))
            if type(self.check_page_title("Yahoo Finance")) is str: # do another check after closing oath page
                self.driver.refresh()
                raise ValueError("Unknown page {page},"
                                 " please check opened browser".format(page=self.driver.title))
        if delete_cookies:
            self.delete_all_cookies()
        
    def select_ticker(self, ticker):
        self.send_keys("//input[@name='p']", ticker + Keys.RETURN)
        self.summ_table_elements = None
    
    def select_tab(self, tab_id):
        tab_dic = {"Summary":"1", "Historical Data":"9"}
        tab_str = "//*[@id='quote-nav']/ul/li[" + tab_dic[tab_id] + "]/a"
        self.wait_for_element_XPath(tab_str,60)
        self.send_keys(tab_str, Keys.ENTER)
        
    def get_current_price(self, click_summary_tab_first=False):
        if click_summary_tab_first:
            self.select_tab("Summary")
        xpath_str = "//*[@id='Lead-2-QuoteHeader-Proxy']/div[@id='quote-header-info']/div[3]/div[1]/div/span[1]"
        self.wait_for_element_XPath(xpath_str, 60)
        element = self.get_element(xpath_str)
        return element.text
    
    def _get_summary_table(self, click_summary_tab_first=False):
        if click_summary_tab_first:
            self.select_tab("Summary")
        xpath_str = "//*[@id='quote-summary']//div[@data-test='right-summary-table']/table/tbody//tr"
        self.wait_for_element_XPath(xpath_str, 60)
        self.summ_table_elements = self.get_elements(xpath_str)
        
    def get_dividend_yield(self, click_summary_tab_first=False):
        if not self.summ_table_elements:
            self._get_summary_table(click_summary_tab_first)
        return self.summ_table_elements[5].text.replace('%','(').split('(')[1]
    
    def get_ex_dividend_date(self, click_summary_tab_first=False):
        if not self.summ_table_elements:
            self._get_summary_table(click_summary_tab_first)
        return self.summ_table_elements[6].text.split()[2]
    
    def get_market_cap(self, click_summary_tab_first=False):
        if not self.summ_table_elements:
            self._get_summary_table(click_summary_tab_first)
        return self.summ_table_elements[0].text.split()[2]
    
    def _calculate_timeframe_dates(self, current_day, days):
        start_lapse = timedelta(days=days+15)
        end_lapse = timedelta(days=days-15)
        self.start_date = current_day-start_lapse
        self.end_date = current_day-end_lapse
    
    def _get_quotes_table(self):
        xpath_date = '//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[2]/table/tbody//tr' # to get all the table
        return self.driver.find_elements_by_xpath(xpath_date)
    
    def get_ndays_quotes(self, ndays, click_historical_data_first=False):
        if click_historical_data_first:
            self.select_tab("Historical Data")
        self._calculate_timeframe_dates(date.today(), ndays)
        
        # Open the timeframe window
        time_frame_xpath = '//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@data-test="date-picker-full-range"]'
        self.wait_for_element_XPath(time_frame_xpath, 60)
        self.send_click(time_frame_xpath)
        
        # Select start date field and introduce start_date
        self.send_n_keys('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@name="startDate"]',
                        Keys.DELETE, 10)
        self.send_keys('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@name="startDate"]',
                        self.start_date.strftime("%m/%d/%Y"))
        
        # Select end date field and introduce end_date
        self.send_n_keys('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@name="endDate"]',
                        Keys.BACKSPACE, 10)
        self.send_keys('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@name="endDate"]',
                        self.end_date.strftime("%m/%d/%Y"))
        
        # Validate timeframe
        try:
            # Check if error message is present
            element = self.driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/div/span[3]')
        except:
            # Dates validated, error message is not present
            # Applying introduced timeframes
            self.send_click('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/div/div[3]/button[1]')
            self.send_click('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/button')
        else:
            # Dates not validated, error reported
            # Closing timeframe window
            self.send_click('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/div/div[3]/button[2]')
            return None
            
        # Get the quotes
        five_years_date = date.today()-timedelta(days=ndays)
        five_years_date_str = five_years_date.strftime("%b %d, %Y")
        quotes_table = self._get_quotes_table()
        while quotes_table:
            for element in quotes_table:
                try:
                    if five_years_date_str in element.text:
                        return element.text.split()[7]
                except:
                    pass
            five_years_date = five_years_date - timedelta(days=1)
            five_years_date_str = five_years_date.strftime("%b %d, %Y")
        return None


In [84]:
results ={}

# OPEN finance.yahoo.com ACCEPTING Oath PAGE IF NEEDED
fyn = Finance_Yahoo_Navigation(delete_cookies=False)

# INTRODUCE TICKER
ticker = "HPQ"
results['ticker'] = ticker
fyn.select_ticker(ticker)

# GET CURRENT PRICE
results['Price'] = fyn.get_current_price(click_summary_tab_first=False)
print("Current price = {}".format(results['Price']))

# COLLECT SUMMARY DATA
results['dividend_yield'] = fyn.get_dividend_yield(click_summary_tab_first=False)
results['ex_dividend_date'] = fyn.get_ex_dividend_date(click_summary_tab_first=False)
results['market_cap'] = fyn.get_market_cap(click_summary_tab_first=False)
print("Dividend Yield: {}".format(results['dividend_yield']))
print("ex_dividend_date: {}".format(results['ex_dividend_date']))
print("market_cap: {}".format(results['market_cap']))

# GET HISTORICAL DATA
results['five_years_close'] = fyn.get_ndays_quotes(ndays=365*5, click_historical_data_first=True)
results['one_year_close'] = fyn.get_ndays_quotes(ndays=365, click_historical_data_first=False)
results['one_month_close'] = fyn.get_ndays_quotes(ndays=30, click_historical_data_first=False)
print("Five years ago close: {}".format(results['five_years_close']))
print("One year ago close: {}".format(results['one_year_close']))
print("One month ago close: {}".format(results['one_month_close']))


Current price = 22.68
Dividend Yield: 2.52
ex_dividend_date: 2018-06-12
market_cap: 37.226B
Five years ago close: 9.20
One year ago close: 18.45
One month ago close: 21.55


## Download data
The previous code works well, however it is needed a lot of time to get the quotes information because the delay introduced by yahoo to load ads, videos and all kind of scrapt.

On the other hand, the idea is to do periodical access to the historical data to observe the evolution of the predictions, so perhaps a good option would be to download all the data and only access when the data needed is not available in the local downloaded files.

### Selecting the timeframe
Let's assume we want to have access to the last 10 years quotation.

I added a new method (get_historical_date) in Finance_Yahoo_Navigation classe. The final class code to do that will be:

In [15]:

class Finance_Yahoo_Navigation(Selenium_Driver):
    """Manage navigation in yahoo.finance.page using Chrome"""
    
    def __init__(self, delete_cookies=True, chrome_options=None):
        super().__init__("https://finance.yahoo.com", chrome_options)
        if type(self.check_page_title("Yahoo Finance")) is str:
            if "Oath" in self.driver.title:
                self.send_keys("//input[@value='OK']", Keys.RETURN)
            else:
                raise ValueError("Unknown page {page},"
                                 " please check opened browser".format(page=self.driver.title))
            if type(self.check_page_title("Yahoo Finance")) is str: # do another check after closing oath page
                self.driver.refresh()
                raise ValueError("Unknown page {page},"
                                 " please check opened browser".format(page=self.driver.title))
        if delete_cookies:
            self.delete_all_cookies()
        
    def select_ticker(self, ticker):
        self.send_keys("//input[@name='p']", ticker + Keys.RETURN)
        self.summ_table_elements = None
    
    def select_tab(self, tab_id):
        tab_dic = {"Summary":"1", "Historical Data":"9"}
        tab_str = "//*[@id='quote-nav']/ul/li[" + tab_dic[tab_id] + "]/a"
        self.wait_for_element_XPath(tab_str,60)
        self.send_keys(tab_str, Keys.ENTER)
        
    def get_current_price(self, click_summary_tab_first=False):
        if click_summary_tab_first:
            self.select_tab("Summary")
        xpath_str = "//*[@id='Lead-2-QuoteHeader-Proxy']/div[@id='quote-header-info']/div[3]/div[1]/div/span[1]"
        self.wait_for_element_XPath(xpath_str, 60)
        element = self.get_element(xpath_str)
        return element.text
    
    def _get_summary_table(self, click_summary_tab_first=False):
        if click_summary_tab_first:
            self.select_tab("Summary")
        xpath_str = "//*[@id='quote-summary']//div[@data-test='right-summary-table']/table/tbody//tr"
        self.wait_for_element_XPath(xpath_str, 60)
        self.summ_table_elements = self.get_elements(xpath_str)
        
    def get_dividend_yield(self, click_summary_tab_first=False):
        if not self.summ_table_elements:
            self._get_summary_table(click_summary_tab_first)
        return self.summ_table_elements[5].text.replace('%','(').split('(')[1]
    
    def get_ex_dividend_date(self, click_summary_tab_first=False):
        if not self.summ_table_elements:
            self._get_summary_table(click_summary_tab_first)
        return self.summ_table_elements[6].text.split()[2]
    
    def get_market_cap(self, click_summary_tab_first=False):
        if not self.summ_table_elements:
            self._get_summary_table(click_summary_tab_first)
        return self.summ_table_elements[0].text.split()[2]
    
    def _calculate_timeframe_dates(self, current_day, days):
        start_lapse = timedelta(days=days+15)
        end_lapse = timedelta(days=days-15)
        self.start_date = current_day-start_lapse
        self.end_date = current_day-end_lapse
    
    def _get_quotes_table(self):
        xpath_date = '//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[2]/table/tbody//tr' # to get all the table
        return self.driver.find_elements_by_xpath(xpath_date)
    
    def get_ndays_quotes(self, ndays, click_historical_data_first=False):
        if click_historical_data_first:
            self.select_tab("Historical Data")
        self._calculate_timeframe_dates(date.today(), ndays)
        
        # Open the timeframe window
        time_frame_xpath = '//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@data-test="date-picker-full-range"]'
        self.wait_for_element_XPath(time_frame_xpath, 60)
        self.send_click(time_frame_xpath)
        
        # Select start date field and introduce start_date
        self.send_n_keys('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@name="startDate"]',
                        Keys.DELETE, 10)
        self.send_keys('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@name="startDate"]',
                        self.start_date.strftime("%m/%d/%Y"))
        
        # Select end date field and introduce end_date
        self.send_n_keys('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@name="endDate"]',
                        Keys.BACKSPACE, 10)
        self.send_keys('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@name="endDate"]',
                        self.end_date.strftime("%m/%d/%Y"))
        
        # Validate timeframe
        try:
            # Check if error message is present
            element = self.driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/div/span[3]')
        except:
            # Dates validated, error message is not present
            # Applying introduced timeframes
            self.send_click('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/div/div[3]/button[1]')
            self.send_click('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/button')
        else:
            # Dates not validated, error reported
            # Closing timeframe window
            self.send_click('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/div/div[3]/button[2]')
            return None
            
        # Get the quotes
        five_years_date = date.today()-timedelta(days=ndays)
        five_years_date_str = five_years_date.strftime("%b %d, %Y")
        quotes_table = self._get_quotes_table()
        while quotes_table:
            for element in quotes_table:
                try:
                    if five_years_date_str in element.text:
                        return element.text.split()[7]
                except:
                    pass
            five_years_date = five_years_date - timedelta(days=1)
            five_years_date_str = five_years_date.strftime("%b %d, %Y")
        return None

    def get_historical_data(self, start_date, end_date, click_historical_data_first=False):
            if click_historical_data_first:
                self.select_tab("Historical Data")

            # Open the timeframe window
            time_frame_xpath = '//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@data-test="date-picker-full-range"]'
            self.wait_for_element_XPath(time_frame_xpath, 60)
            self.send_click(time_frame_xpath)

            # Select start date field and introduce start_date
            self.send_n_keys('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@name="startDate"]',
                            Keys.DELETE, 10)
            self.send_keys('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@name="startDate"]',
                            start_date.strftime("%m/%d/%Y"))

            # Select end date field and introduce end_date
            self.send_n_keys('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@name="endDate"]',
                            Keys.BACKSPACE, 10)
            self.send_keys('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section//input[@name="endDate"]',
                            end_date.strftime("%m/%d/%Y"))

            # Validate timeframe
            try:
                # Check if error message is present
                element = self.driver.find_element_by_xpath('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/div/span[3]')
            except:
                # Dates validated, error message is not present
                # Applying introduced timeframes
                self.send_click('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/div/div[3]/button[1]')
                self.send_click('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/button')
            else:
                # Dates not validated, error reported
                # Closing timeframe window
                self.send_click('//*[@id="Col1-1-HistoricalDataTable-Proxy"]/section/div[1]/div[1]/div[1]/span[2]/div/div[3]/button[2]')
                return None

            # Get the quotes
            time.sleep(5)
            self.send_click('//*[@id="Col1-1-HistoricalDataTable-Proxy"]//a[@download="' + ticker + '.csv"]')



### Setting the download folder
To select the download folder by default in Chrome, we have to use the chrome_options argument when creating the webdriver object.

Below the code used. In the final code, a new method should be added to the class Selenium_driver tp assign the default download folder.

In [20]:
import os

cwd = os.getcwd() + "\\csv"

# Change download location
options = webdriver.ChromeOptions()
options.add_experimental_option("prefs", {
  "download.default_directory": cwd,
  "download.prompt_for_download": False,
  "download.directory_upgrade": True,
  "safebrowsing.enabled": True
})

# OPEN finance.yahoo.com ACCEPTING Oath PAGE IF NEEDED
fyn = Finance_Yahoo_Navigation(delete_cookies=False, chrome_options=options)

# INTRODUCE TICKER
ticker = "HPQ"
fyn.select_ticker(ticker)

fyn.get_historical_data(datetime(2013,1,1), datetime(2018,1,1), click_historical_data_first=True)

# Using Xpath
Xpath is a powerful language to access to any xml or html source, however sometimes is a little bit tricky

In this example we want to access to the Revenue Estimate data of the Analysis tab. Looking at the Inspector, we can see that:

<img src="img\revenuehtml.png" alt="Drawing" style="width: 600px;"/>

The name of the table is found under a header tag, which has an ancestor that is a sibling of the body tag whose descendants are the rows with the information needed, clear not?  ;D (nice family).

To access to this information we use the xpath:
```
"//*[th='Revenue Estimate']/../..//tr"
```

that means:
```
//*[th='Revenue Estimate'] = access any tag which has a child th which value is "Revenue Estimate"
<tr class="Ta(start)">
    <th class="Fw(b) Fw(s) W(20%) Py(10px) C($finDarkLink)"><span>Revenue Estimate</span></th>
    <th class="Fw(400) W(20%) Fz(xs) C($c-fuji-grey-j) Ta(end)">...</th>
    ...
</tr>
The tag selected will be <tr class="Ta(start)">,

/../..  = access the parent of the parent of the tag tr which is indeed the tag table
<table class="W(100%) M(0) BdB Bdc($c-fuji-grey-c) Mb(25px)">
    <thead>
        <tr class="Ta(start)">
            <th class="Fw(b) Fw(s) W(20%) Py(10px) C($finDarkLink)"><span>Revenue Estimate</span></th>
            ...
        </tr>
    </thead>
    <tbody>
        <tr class="BdT Bdc($c-fuji-grey-c)">
            <td class="Py(10px) Ta(start)">
            ...
        ...
    </tbody>
</table>

//tr = select any row of the table
```

This will capture all the rows of the head and body part, if we just want to capture the rows of the body part the xpath would be:
```
"//*[th='Revenue Estimate']/../../tbody//tr"
```



In [1]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select, WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time
from datetime import datetime, date, timedelta

def wait_for_element_XPath(driver, xpath_str, wait_time):
        try: # this block implements a wait for the load of the table element
            # after 5 seconds a timeout is error is produced
            element = WebDriverWait(self.driver,
                                    wait_time).until(EC.presence_of_element_located((By.XPATH, xpath_str)))
        except:
            raise
        time.sleep(1)

ticker = "BAC"
driver = webdriver.Chrome()
driver.get("https://finance.yahoo.com")
try:
    assert "Yahoo Finance" in driver.title
except AssertionError:
    try:
        assert "Oath" in driver.title
    except AssertionError:
        print("Unknown page {page}, please check opened browser".format(page=driver.title))
    else:
        element = driver.find_element_by_xpath("//input[@value='OK']")
        element.send_keys(Keys.RETURN)
        driver.refresh()
        try:
            assert "Yahoo Finance" in driver.title
        except AssertionError:
            print("Unknown page {page}, please check opened browser".format(page=driver.title))
            raise
# Select ticker
element = driver.find_element_by_xpath("//input[@name='p']")
element.send_keys(ticker)
element.send_keys(Keys.RETURN)
# select statistics tab
xpath_str = "//*[@id='quote-nav']/ul/li[4]/a"
element = WebDriverWait(driver,10).until(EC.presence_of_element_located((By.XPATH,
                                                                         xpath_str)))
element = driver.find_element_by_xpath(xpath_str)
element.send_keys(Keys.ENTER)
# get table
xpath_str = "//*[h2='Valuation Measures']//table//tr"
element = WebDriverWait(driver,10).until(EC.presence_of_element_located((By.XPATH,
                                                                         xpath_str)))
elements = driver.find_elements_by_xpath(xpath_str)
for element in elements:
    print(element.text)
print("---end---")

Market Cap (intraday) 5 305.2B
Enterprise Value 3 248.89B
Trailing P/E 17.46
Forward P/E 1 10.42
PEG Ratio (5 yr expected) 1 0.54
Price/Sales (ttm) 3.60
Price/Book (mrq) 1.27
Enterprise Value/Revenue 3 2.93
Enterprise Value/EBITDA 6 N/A
Fiscal Year Ends Dec 31, 2017
Most Recent Quarter (mrq) Mar 31, 2018
Profit Margin 23.35%
Operating Margin (ttm) 35.51%
Return on Assets (ttm) 0.87%
Return on Equity (ttm) 7.41%
Revenue (ttm) 84.83B
Revenue Per Share (ttm) 8.27
Quarterly Revenue Growth (yoy) 4.10%
Gross Profit (ttm) N/A
EBITDA N/A
Net Income Avi to Common (ttm) 18.27B
Diluted EPS (ttm) 1.72
Quarterly Earnings Growth (yoy) 29.60%
Total Cash (mrq) 579.08B
Total Cash Per Share (mrq) 55.06
Total Debt (mrq) 502.81B
Total Debt/Equity (mrq) N/A
Current Ratio (mrq) N/A
Book Value Per Share (mrq) 23.74
Operating Cash Flow (ttm) 63.75B
Levered Free Cash Flow (ttm) N/A
---end---


In [22]:
xpath_str = '//section[@data-test="price-targets"]/div'

elements = driver.find_element_by_xpath(xpath_str)
print(elements.text)
#for element in elements:
#    print(element.text)

Current 30.10
Average 34.76
Low 28.00
High 37.00


In [24]:
elements.text.split()[3]

'34.76'

In [1]:
import pandas as pd

In [66]:
tickers_df = pd.read_csv("tickers.csv", sep=";")

In [67]:
print(tickers_df.loc[0,'Tickers'])
tickers_df

HPQ


Unnamed: 0,Tickers
0,HPQ
1,ILG
2,BAC


In [76]:
result = {"Price":14.53, "EPS":17.8, "Ticker":"HPE"}
result
results = pd.DataFrame(data=result,index=[0])
results

Unnamed: 0,EPS,Price,Ticker
0,17.8,14.53,HPE


In [77]:
result2 = {"Price":0.53, "EPS":0.8, "Ticker":"HPQ"}

In [83]:
results=results.append(result2, ignore_index=True)


In [84]:
results

Unnamed: 0,EPS,Price,Ticker
0,17.8,14.53,HPE
1,0.8,0.53,HPQ


In [85]:
for ticker in tickers_df['Tickers']:
    print (ticker)

ILG
BAC


In [69]:
tickers_df=tickers_df.drop(tickers_df.index[0])

In [70]:
tickers_df


Unnamed: 0,Tickers
1,ILG
2,BAC


In [72]:
for ticker in tickers_df['Tickers']:
    print (type(ticker))

<class 'str'>
<class 'str'>


In [74]:
tickers_df.loc[tickers_df.index[1],'Tickers']

'BAC'

In [86]:
results


Unnamed: 0,EPS,Price,Ticker
0,17.8,14.53,HPE
1,0.8,0.53,HPQ


In [91]:
print(results)

    EPS  Price Ticker
0  17.8  14.53    HPE
1   0.8   0.53    HPQ
