# National Institute of Standard and Technology (NIST) Fundamental Constants and official units conversions  

> Build the most current and accurate copy, directly from webscraping the NIST website, of a table of fundamental constants and unit conversions and have the values ready to use in Python.   

In [1]:
import os, sys
import decimal
from decimal import Decimal

In [2]:
import pandas as pd

In [3]:
# get the current directory and files inside 
print(os.getcwd()); print(os.listdir( os.getcwd() ));

/home/topolo/PropD/servetheloop/NISTFund
['NISTFund.ipynb', '.ipynb_checkpoints', 'NISTFund.pyc', 'NISTFund.py', 'rawdata']


In [4]:
from NISTFund import retrieve_file, scraping_allascii, init_FundConst, make_pd_alphabeticalconv_lst, make_pd_conv_lst

`retrieve_file` will directly download the ASCII file that NIST uses in their website for Fundamental constants.  By default, it'll check, and create if necessary, for subdirectory `./rawdata/` and put the ASCII text file there.  

In [5]:
urladdr = retrieve_file(); print(urladdr);

<addinfourl at 139898223266200 whose fp = <socket._fileobject object at 0x7f3c9a18c3d0>>


`scraping_allascii` will parse this ASCII table.  Then, the following step will put it into a `pandas DataFrame` with the correct header.  

In [5]:
os.path.isfile('./rawdata/allascii.txt') 

True

In [7]:
lines,title,src,header,rawtbl,tbl=scraping_allascii()

In [8]:
FundConst = pd.DataFrame(tbl, columns=header)

The above 3 lines are what `init_FundConst` essentially does.  

In [5]:
FundConst = init_FundConst()

## NIST Official conversions  

To start creating to table (when it doesn't exist yet), run the following:  

In [10]:
DF_conv=make_pd_conv_lst()



 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "lxml")

  markup_type=markup_type))


IndexError: list index out of range

In [12]:
from NISTFund import scraped_BS, NISTCONValpha
convBS = scraped_BS(NISTCONValpha)

In [15]:
convBS.soup.find_all("table",{"class":'texttable'})

[]

### Lengthy, detailed explanation of how to create or modify the webscraping functions `make_conv_lst`, `make_pd_conv_lst` in the case of changes made on the NIST website, breaking previous links; drop down below this section if it had worked above

If you obtain errors as such, this means that the webmasters of the NIST site made changes.  Then you'd have to manually go in and write another webscraping procedure.  However, the Python function I created at least gives a guideline or outline of how one should proceed.  Basically, it's a combination of using the Developer Inspection tool of your web browser (of choice) and using BeautifulSoup to traverse the HTML code of the webpage.  

EY : 20170424 note: I had imagined before of a general Python function/webscraper, that could search out the key words or terms so that even if the webmaster(s) change(s) the webpage, it'll dynamically and **automatically** scrape for the table of unit conversions.  I don't know how to do that; please let me know if you do (email, twitter, github comment, etc.)

Otherwise, I go manually to the webpage and look at the link I want to scrape, directly.  I see  

[NIST Guide to the SI, Appendix B.8: Factors for Units Listed Alphabetically ](https://www.nist.gov/physical-measurement-laboratory/nist-guide-si-appendix-b8)

[ NIST Guide to the SI, Appendix B.9: Factors for units listed by kind of quantity or field of science
Share
](https://www.nist.gov/pml/nist-guide-si-appendix-b9-factors-units-listed-kind-quantity-or-field-science)



In [16]:
convBS = scraped_BS("https://www.nist.gov/physical-measurement-laboratory/nist-guide-si-appendix-b8")

In [20]:
convBS.soup.find_all("div",{"class":"table-inner"})

[]

In [22]:
convBS.soup.find_all("table"); 
print(len( convBS.soup.find_all("table")) )  # there are 26 letters in the alphabet; NIST has entries for 21 of them

21


In [23]:
convBS.convtbls = convBS.soup.find_all("table")
convdata=[]
convdata2=[]
headers = convBS.convtbls[0].find_all('tr')[1].find_all('th')
headers = [ele.text.replace(' ','') for ele in headers]

In [24]:
headers

[u'Toconvertfrom', u'to', u'Multiplyby']

In [25]:
for tbl in convBS.convtbls:
    for row in tbl.find_all('tr'):
        if row.find_all('td') != []:
            if row.text != '':
                rowsplit = row.text.replace("\n",'',1).split('\n')
                try:
                    rowsplit = [pt.replace(u'\xa0',u' ').strip() for pt in rowsplit]
                except UnicodeDecodeError as err:
                    print rowsplit
                    Break
                    raise err
                convdata.append( rowsplit )
                if len(row.find_all('td')) == (len(headers)+1):
                    convdata2.append( row.find_all('td'))


In [27]:
print(len(convdata));
print(len(convdata2))

452
445


In [30]:
convdata3 = []
for row in convdata2:
    rowout = []
    rowout.append( row[0].text.strip())
    rowout.append( row[1].text.strip())
    value = (row[2].text+row[3].text).strip().replace(u'\xa0',' ').replace(u'\n',' ').replace(' ','')
        
    rowout.append(Decimal( value ))
    convdata3.append(rowout)

In [31]:
print(len(convdata3))

445


In [36]:
pd.DataFrame(convdata3,columns=headers).head()

Unnamed: 0,Toconvertfrom,to,Multiplyby
0,abampere,ampere (A),10.0
1,abcoulomb,coulomb (C),10.0
2,abfarad,farad (F),1000000000.0
3,abhenry,henry (H),1e-09
4,abmho,siemens (S),1000000000.0


In [35]:
print(len(convdata))

452


#### dealing with Appendix B.9: Factors for units listed by kind of quantity or field of science 

In [38]:
convBS = scraped_BS("https://www.nist.gov/pml/nist-guide-si-appendix-b9-factors-units-listed-kind-quantity-or-field-science")

convBS.convtbls = convBS.soup.find_all("table")
print(len(convBS.convtbls))

46


In [50]:
headers = convBS.convtbls[1].find_all('tr')[1].find_all('th')
headers = [ele.text.replace(' ','') for ele in headers]

In [51]:
headers

[u'Toconvertfrom', u'to', u'Multiplyby']

In [52]:
convBS.convtbls[1].find_all('tr')

[<tr><td colspan="4"><strong>ACCELERATION</strong></td>\n</tr>,
 <tr><th>To convert from</th>\n<th>to</th>\n<th colspan="2">Multiply by</th>\n</tr>,
 <tr><td>acceleration of free fall, standard (<em>g</em><sub>n</sub>)</td>\n<td>meter per second squared (m/s<sup>2</sup>)</td>\n<td><strong>9.806 65 </strong></td>\n<td><strong>E+00</strong></td>\n</tr>,
 <tr><td>foot per second squared (ft/s<sup>2</sup>)</td>\n<td>meter per second squared (m/s<sup>2</sup>)</td>\n<td><strong>3.048</strong></td>\n<td><strong>E-01</strong></td>\n</tr>,
 <tr><td>gal (Gal)</td>\n<td>meter per second squared (m/s<sup>2</sup>)</td>\n<td><strong>1.0</strong></td>\n<td><strong>E-02</strong></td>\n</tr>,
 <tr><td>inch per second squared (in/s<sup>2</sup>)</td>\n<td>meter per second squared (m/s<sup>2</sup>)</td>\n<td><strong>2.54</strong></td>\n<td><strong>E-02</strong></td>\n</tr>]

In [55]:
for rows in convBS.convtbls[1].find_all('tr'):
    print rows.find_all('td')

[<td colspan="4"><strong>ACCELERATION</strong></td>]
[]
[<td>acceleration of free fall, standard (<em>g</em><sub>n</sub>)</td>, <td>meter per second squared (m/s<sup>2</sup>)</td>, <td><strong>9.806 65 </strong></td>, <td><strong>E+00</strong></td>]
[<td>foot per second squared (ft/s<sup>2</sup>)</td>, <td>meter per second squared (m/s<sup>2</sup>)</td>, <td><strong>3.048</strong></td>, <td><strong>E-01</strong></td>]
[<td>gal (Gal)</td>, <td>meter per second squared (m/s<sup>2</sup>)</td>, <td><strong>1.0</strong></td>, <td><strong>E-02</strong></td>]
[<td>inch per second squared (in/s<sup>2</sup>)</td>, <td>meter per second squared (m/s<sup>2</sup>)</td>, <td><strong>2.54</strong></td>, <td><strong>E-02</strong></td>]


In [73]:
for row in convBS.convtbls[1].find_all('tr'):
    if row.find_all('td') != []:
        if row.text != '':
            rowsplit = row.text.split('\n')
#            print rowsplit
            if u'' in rowsplit:
                rowsplit.remove(u'')
            print rowsplit

[u'ACCELERATION']
[u'acceleration of free fall, standard (gn)', u'meter per second squared (m/s2)', u'9.806 65 ', u'E+00']
[u'foot per second squared (ft/s2)', u'meter per second squared (m/s2)', u'3.048', u'E-01']
[u'gal (Gal)', u'meter per second squared (m/s2)', u'1.0', u'E-02']
[u'inch per second squared (in/s2)', u'meter per second squared (m/s2)', u'2.54', u'E-02']


In [88]:
test_convdata=[]
field_of_science = ""
for row in convBS.convtbls[1].find_all('tr'):
    if row.find_all('td') != []:
        if row.text != '':
            rowsplit = row.text.split('\n')
            if u'' in rowsplit:
                rowsplit.remove(u'')
            if len(rowsplit) is 1:
                field_of_science = rowsplit[0]
                print field_of_science
            elif field_of_science is not "":
                rowsplit.append(field_of_science)
                print rowsplit
#            print len(row.find_all('td')) 
                if len(row.find_all('td')) is (len(headers)+1):
                    test_convdata.append( rowsplit )

ACCELERATION
[u'acceleration of free fall, standard (gn)', u'meter per second squared (m/s2)', u'9.806 65 ', u'E+00', u'ACCELERATION']
[u'foot per second squared (ft/s2)', u'meter per second squared (m/s2)', u'3.048', u'E-01', u'ACCELERATION']
[u'gal (Gal)', u'meter per second squared (m/s2)', u'1.0', u'E-02', u'ACCELERATION']
[u'inch per second squared (in/s2)', u'meter per second squared (m/s2)', u'2.54', u'E-02', u'ACCELERATION']


In [89]:
test_convdata

[[u'acceleration of free fall, standard (gn)',
  u'meter per second squared (m/s2)',
  u'9.806 65 ',
  u'E+00',
  u'ACCELERATION'],
 [u'foot per second squared (ft/s2)',
  u'meter per second squared (m/s2)',
  u'3.048',
  u'E-01',
  u'ACCELERATION'],
 [u'gal (Gal)',
  u'meter per second squared (m/s2)',
  u'1.0',
  u'E-02',
  u'ACCELERATION'],
 [u'inch per second squared (in/s2)',
  u'meter per second squared (m/s2)',
  u'2.54',
  u'E-02',
  u'ACCELERATION']]

In [10]:
from NISTFund import make_conv_lst

In [11]:
test_conv=make_conv_lst()

AttributeError: 'unicode' object has no attribute 'text'

In [16]:
test_conv[2]

[]

Finally

In [6]:
DF_alphaconv=make_pd_alphabeticalconv_lst()



 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "lxml")

  markup_type=markup_type))


In [7]:
DF_alphaconv.head()

Unnamed: 0,Toconvertfrom,to,Multiplyby
0,abampere,ampere (A),10.0
1,abcoulomb,coulomb (C),10.0
2,abfarad,farad (F),1000000000.0
3,abhenry,henry (H),1e-09
4,abmho,siemens (S),1000000000.0


In [6]:
DF_conv= make_pd_conv_lst()



 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "lxml")

  markup_type=markup_type))


This is what happens when I test the length of convdata and convdata2: 
True
180
180


In [8]:
print( DF_conv.head() )
DF_conv.describe()


                              Toconvertfrom                               to  \
0  acceleration of free fall, standard (gn)  meter per second squared (m/s2)   
1           foot per second squared (ft/s2)  meter per second squared (m/s2)   
2                                 gal (Gal)  meter per second squared (m/s2)   
3           inch per second squared (in/s2)  meter per second squared (m/s2)   
4  acceleration of free fall, standard (gn)  meter per second squared (m/s2)   

  Multiplyby          kind  
0    9.80665  ACCELERATION  
1     0.3048  ACCELERATION  
2      0.010  ACCELERATION  
3     0.0254  ACCELERATION  
4    9.80665  ACCELERATION  


Unnamed: 0,Toconvertfrom,to,Multiplyby,kind
count,180,180,180.0,180
unique,4,1,4.0,1
top,foot per second squared (ft/s2),meter per second squared (m/s2),0.3048,ACCELERATION
freq,45,180,45.0,180


## Once the data has all been made, you only need to run the following 3 commands to start using the data 

In [9]:
FundConst = init_FundConst()

In [10]:
conv = pd.read_pickle('./rawdata/DF_conv')
alphaconv = pd.read_pickle('./rawdata/DF_alphabeticalconv')


In [12]:
alphaconv

Unnamed: 0,Toconvertfrom,to,Multiplyby
0,abampere,ampere (A),10
1,abcoulomb,coulomb (C),10
2,abfarad,farad (F),1.0E+9
3,abhenry,henry (H),1.0E-9
4,abmho,siemens (S),1.0E+9
5,abohm,ohm (Ω),1.0E-9
6,abvolt,volt (V),1.0E-8
7,"acceleration of free fall, standard (gn)",meter per second squared (m / s2),9.80665
8,acre (based on U.S. survey foot),square meter (m2),4046.873
9,acre foot (based on U.S. survey foot) 7,cubic meter (m3),1233.489
