# Processing the price information

In this notebook we are going to extract the extra information related to the web pages scrapped using the BeautifulSoup package

## Reading the soup

In [49]:
from bs4 import BeautifulSoup

soup=None
with open("price_table_HPQ.html", encoding="UTF-8") as fp:
    text = fp.read()
soup = BeautifulSoup(text, "lxml")
print(soup.prettify())

<html>
 <head>
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  <link href="/oic/css/main.css" rel="stylesheet" type="text/css"/>
  <script language="JavaScript" src="/jsc/help_old.js">
  </script>
  <!-- SiteCatalyst code version: H.9.

Copyright 1997-2007 Omniture, Inc. More info available at

http://www.omniture.com -->
  <!-- script language="JavaScript" src="https://www.OptionsEducation.org/content/oic/en/_jcr_content/analytics.sitecatalyst.js" type="text/javascript"></script -->
  <script language="JavaScript" src="https://www.OptionsEducation.org/etc/designs/oic/s_code.js" type="text/javascript">
  </script>
  <script language="JavaScript">
   <!--

/* You may give each page an identifying name, server, and channel on

the next lines. */

s.pageName="OIC:Quotes:Detailed Options Chains";

s.channel="OIC:Quotes";

s.prop1="iVolatility.com";

s.prop3="Tool";

s.events="event16";

/************* DO NOT ALTER ANYTHING BELOW THIS LINE ! **************/

var s_c

## Getting the closest expiration date

This date is by default the date shown as Expiry:

![Expiration dater](img/expdate.png)

Inspectioning the html element, we can see that it's under a row of a table:
```
<tr bgcolor="#FFFFFF">

<td colspan="16"><span class="s4"><b>Expiry: May 25, 2018

					&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

					Days: 0</b></span></td>

</tr>
```

And it is the first "Expiry" word in the soup, therefere is very simple to get all the string, just using a regular expression

In [50]:
import re

print(soup.find(text=re.compile("Expiry")))


Expiry: May 25, 2018

					      

					Days: 0


In [52]:
from datetime import datetime
str_date = soup.find_all(text=re.compile("^Expiry:"))[0].split()
str_fmt = "{0} {1} {2}".format(str_date[1], str_date[2], str_date[3])
expiration_date = datetime.strptime(str_fmt,"%b %d, %Y")
print(expiration_date.strftime("%c"))

05/25/18 00:00:00


## Extracting the ATM strike
This is the closest lower strike price of the current underlying price.

Depending on the security it could be calculated as:
- as a multiple of 0.5, for example if price is 23.78, the ATM strike would be 23.5
- as a integer, for example if price is 23.78, the ATM strike would be 23
- as a multiple of 5, for example if price is 23.78, the ATM strike would be 20

So we have to implement a script to check these three strikes by order and select the first one found. For example, in case of the 23.78 undelying price, first it will check the existence of the 23.5 strike, if not then the strike 23 and finally the 20 strike.

To check the existence of the strike, we take profit of a tooltip existing in the table (the one markin the ticker in blue). The html code associated to this piece of code is:

```
<a target="_blank" href="#" onclick="showttip('HPQ   180525C00014000','yellow',150);return false;" onmouseover="showttip('HPQ   180525C00014000','yellow',150);" onmouseout="hidettip()">HPQ</a>
```

and the HPQ 180525C0014000 is the official nomenclature of the option contract: 180525 is the yymmdd date, C is a call option and 0014 is the integer part of the strike and the final 000 the decimal part, so to look for the existence of a Put option contract at 21.5 strike, we have to search for the string 180525P0021500


In [66]:
price = 24.7

# Strike multiple of 0.5
strike = int(price * 2)/2
contract = "{0}P{1:05d}{2:03d}".format(expiration_date.strftime("%y%m%d"),int(strike),int((strike-int(strike))*1000))
tag=soup.find_all(onclick=re.compile(contract))
if not tag:
    print("{} not found".format(contract))
    strike=int(price)
    contract = "{0}P{1:05d}000".format(expiration_date.strftime("%y%m%d"),int(strike))
    tag=soup.find(onclick=re.compile(contract))
    if tag is None:
        print("{} not found".format(contract))
        strike=price-price%5
        contract = "{0}P{1:05d}000".format(expiration_date.strftime("%y%m%d"),int(strike))
        tag=soup.find(onclick=re.compile(contract))
        if tag is None:
            print("{} not found".format(contract))
            print("ATM strike of {} price not found".format(price))
if tag:
    print(tag[0].parent)
    print("contract    : {}".format(contract))
    print("strike      : {}".format(strike))
    print("expiration  : {}".format(expiration_date.strftime("%d %b, %Y")))
    
    

<span class="s1"><a href="#" onclick="showttip('HPQ   180525P00024500','yellow',150);return false;" onmouseout="hidettip()" onmouseover="showttip('HPQ   180525P00024500','yellow',150);" target="_blank">HPQ</a></span>
contract    : 180525P00024500
strike      : 24.5
expiration  : 25 May, 2018


## Extracting the bid, ask, volume and implied volatility of the contract
Upon we got the tag with the contract associated next step is to look for the parents of this tag to capture all the row of the table

Just looking at the inspector, it looks like we need to use the paret.parent.parent tag and then capture all the tags with 'td' name

In [78]:
#print(tag[0].parent.parent.parent)
columns = tag[0].parent.parent.parent.find_all("td")
for i, column in enumerate(columns):
    print("{}: {}".format(i,column))

0: <td><span class="s1">P</span></td>
1: <td bgcolor=""><span class="s1"><a href="#" onclick="showttip('HPQ   180525P00024500','yellow',150);return false;" onmouseout="hidettip()" onmouseover="showttip('HPQ   180525P00024500','yellow',150);" target="_blank">HPQ</a></span></td>
2: <td><span class="s1">2.180</span></td>
3: <td><span class="s1">1.70</span></td>
4: <td><span class="s1">2.66</span></td>
5: <td nowrap=""><span class="s1">-0.39<br/>
<nobr>(-15.18)</nobr></span></td>
6: <td><span class="s1">0</span></td>
7: <td><span class="s1">0</span></td>
8: <td><span class="s1">42.95%</span></td>
9: <td><span class="s1"><nobr>0.0000</nobr></span></td>
10: <td><span class="s1"><nobr>0.0000</nobr></span></td>
11: <td><span class="s1"><nobr>0.0000</nobr></span></td>
12: <td><span class="s1"><nobr>0.0000</nobr></span></td>
13: <td><span class="s1"><nobr>0.0000</nobr></span></td>
14: <td><span class="s1"><nobr>0.0000</nobr></span></td>


bid and ask are items 3 and 4, volume is the 6 and the IV the 8, therefore:

In [82]:
print("contract    : {}".format(contract))
print("strike      : {}".format(strike))
print("expiration  : {}".format(expiration_date.strftime("%d %b, %Y")))
print("bid         : {}".format(columns[3].get_text()))
print("ask         : {}".format(columns[4].get_text()))
print("volume      : {}".format(columns[6].get_text()))
print("IV          : {}".format(columns[8].get_text()))

contract    : 180525P00024500
strike      : 24.5
expiration  : 25 May, 2018
bid         : 1.70
ask         : 2.66
volume      : 0
IV          : 42.95%
