# extract-pages-from-mongo-v5
SanjayKAroraPhD@gmail.com <br>
December 2018

## Description
This version of the notebook extracts groups of pages from mongodb by firm_name to create firm-centric <b>about</b> page output files that can later be topic modeled.  In doing so, it removes repetitive content (e.g., repeated menu items) and garbage content (e.g., improperly parsed HTML code). 

## Change log
v4 focuses on about pages

## TODO:
* Whole process: get data, topic model and see if it looks sufficiently interesting/different
* Enhance data collection, per the following: 
    * Select a region or country — WAIT 
        * http://www.ivoclarvivadent.com: Please select your region
        * https://www.enersys.com/: PLEASE SELECT A REGION
        * https://www.m-petfilm.com/: ENGLISH
    * Crawl from focal about page only following links that look like part of the about story, maintaining ordering.  Check to see if the other links identified above are also there? 
        * http://xtalsolar.com/investors_partners.html
* Order known about us pages in the same way the links are found on a home page or about us landing page

In [275]:
# import data processing and other libraries
import csv
import sys
import requests
import os
import re
import pprint
import pymongo
import traceback
from time import sleep
import requests
import pandas as pd
import io
from IPython.display import display
import time
import numpy as np
from bs4 import BeautifulSoup
import string
import random
from urllib.parse import urlparse, urljoin
from collections import defaultdict

In [276]:
from boilerpipe.extract import Extractor

In [274]:
# import sklearn
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_moons, make_circles, make_classification
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.model_selection import GroupKFold
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import confusion_matrix

In [10]:
# first figure out what is an about page. need to label training data
# identify features -- for right now, unigrams if just one word in the header, otherwise bi- or trigrams 
# predict about pages

In [277]:
MONGODB_DB = "FirmDB_20181226"
MONGODB_COLLECTION = "pages_ABOUT"
CONNECTION_STRING = "mongodb://localhost"

client = pymongo.MongoClient(CONNECTION_STRING)
db = client[MONGODB_DB]
col = db[MONGODB_COLLECTION]

ABOUT_DIR = '/Users/sarora/dev/EAGER/data/orgs/about/'
DATA_DIR = '/Users/sarora/dev/EAGER/data/orgs/depth0_boilerpipe/'
TRAINING_PERCENT = .10
pp = pprint.PrettyPrinter()

In [278]:
def get_domain (url):
    o = urlparse(url.lower())
    domain = o.netloc.strip('www.')
    return domain

# output urls for labeling of training data
results = col.find({},{"url": 1, "firm_name": 1})
df = pd.DataFrame(columns = ('firm_name', 'url', 'label'))
domain_count = defaultdict(lambda:0,{})
for i in range(results.count()):
    result = results.next()
    url = result['url'][0]
    domain_count[get_domain(url)] += 1
    firm_name = result['firm_name'][0] if 'firm_name' in result else ''
    df.loc[i] = [firm_name, url, '']
    
df['gid'] = df.groupby(['firm_name']).ngroup()

In [279]:
df.gid.nunique()
label_ids = random.sample(range(1, df.gid.nunique()), 200)
df_label = df[df['gid'].isin(label_ids)]
with open(ABOUT_DIR + 'about_pages_to_label.csv', mode='w') as to_label:
    df_label.to_csv(to_label, index=False)

In [280]:
# read back labeled data (note that about, management/team and partners, are dichotomous)
df_about_labeled = pd.read_csv(ABOUT_DIR + 'about_pages_labeled.csv')
df_about_labeled = df_about_labeled.fillna(0)

# count pages per domain
for index, row in df_about_labeled.iterrows():
    pages_in_domain = domain_count[get_domain(row['url'])]
    df_about_labeled.loc[index,'pages_in_domain'] = pages_in_domain
    is_sole_page = 0 if pages_in_domain > 1 else 1
    df_about_labeled.loc[index,'is_sole_page'] = is_sole_page
    
labeled_urls = list(df_about_labeled['url']) # for training models on labeled urls below
df_about_labeled = df_about_labeled.set_index(['firm_name', 'url'])
print (df_about_labeled.columns.tolist())

# final test set is the rows of the original data frame without the urls in df_about_labeled 

['about', 'mgmt', 'partners', 'gid', 'pages_in_domain', 'is_sole_page']


## Create features to predict about pages
Create features:
1. title and url path fragment unigrams (also tried n-grams, as well as content from headers, with worse results) 
2. is home page and doesn't have any other pages
3. other ideas here: https://towardsdatascience.com/understanding-feature-engineering-part-3-traditional-methods-for-text-data-f6f7d70acd41

In [206]:
# load page data and create features

# remove simple article words and punctuation (need to keep 'about')
stop_words = ['the','a'] + list(string.punctuation) 
# remove known company names for model training and evaluation in the labeled data 
remove_regex = re.compile(r'^(3m|united|states|menu|en_us|algeternal|s\d+|sarepta|skygen|nexgen|abbott|adlens|errorpage|\d{1,3}|\d{5,}|\w+\d+|\d+\w+|asten|johnson|baker|hughes|ge|bhge|biocon|egfr|gcsf|biocon|pegfilgrastim|bostik|canon|chevron|phillips|coloplast|cyberonics|microsoft|evoqua|ford|hitachi|glucanbio|hunter|douglas|kimberly|clark|lextar|fisher|lockheed|martin |lux|nec|nanocopoeia|cisco|schlumberger|weccamerica|inanobio|nanocomposix|zoetis|zygo)$', re.IGNORECASE)
# used to filter top-level header content
header_in = re.compile('(about|company|corporate|who.we.are|(^|/)vision|profile|corporate|management|team|history|values|strategy|our |technology|research|commercialization)', flags=re.IGNORECASE)
header_regex = re.compile(r'h[1-9]+')

def clean_string(in_string):
    if not in_string:
        return in_string
    split_words = in_string.lower().split()
    result_words  = [word for word in split_words if word not in stop_words]
    result_words  = [word for word in result_words if not remove_regex.search(word)]
    result = ' '.join(result_words)
    return ' ' + result

def get_page_path_text (url):
    o = urlparse(url.lower())
    path = o.path
    path_parts = path.split ('/')
    path_parts = [part.split('.')[0] for part in path_parts] # remove page names
    path_parts = [split for part in path_parts for split in part.split('-') ] # split on underscores, hyphens, et al
    path_parts = [split for part in path_parts for split in part.split('_') ] # split on underscores, hyphens, et al
    clnd_string = clean_string(' '.join(path_parts))
    return clnd_string

# recurse through the header text to add into feature grams
def get_header_text (headers, names, index):
    texts = [clean_string(header.text) for header in headers if header.name == names[index]]
    texts = list(filter(header_in.search, texts))
    if texts and len(texts[0].split()) > 4:
        if(len(names) > (index + 1)):
            return get_header_text (headers, names, index + 1)
        else:
            return ''
    else: 
        return ' '.join (texts)
    
def process_firms (urls): 
    firm_page_features = {}
    for url in urls: 
        result = col.find_one({"url": url})
        domain = get_domain(url)
        html = result['html'][0]
        
        # print (url)

        soup = BeautifulSoup(html, 'lxml')
        running_text = ''
        path_text = get_page_path_text(url)
        
        if path_text:
            # print (path_text)
            running_text += path_text
            
        if soup.title and soup.title.string:
            # print (soup.title.string)
            running_text += clean_string(soup.title.string)
            
        headers = soup.find_all(header_regex, text=True)
        names = sorted(set ([header.name for header in headers]))
        running_text += get_header_text (headers, names, 0)

        firm_page_features[url] = running_text
        
    return firm_page_features

In [282]:
# for testing regex
print (get_page_path_text('http://www.google.com/path-en/path_to/page.html'))
print (re.split("\W+|_", "Testing this_thing"))
print (clean_string('3m 01	08	100	10m ford 235 1990 s129 188209 0913lk the ? about us'))
pp.pprint (list(filter(header_in.search, ['about us', 'not found', 'company'])))

 path en path to page
['Testing', 'this', 'thing']
 about us
['about us', 'company']


In [291]:
# get firm website data for n-gram processing 
labeled_firm_page_features = process_firms (labeled_urls)

urls = labeled_firm_page_features.keys()
print (len(urls))
corpus = []
for url in urls:
    corpus.append (labeled_firm_page_features[url])
    
# unigram
ubv = TfidfVectorizer(min_df=0., max_df=1.)
# you can set the n-gram range to 1,2 to get unigrams as well as bigrams (performs worse than just unigrams)
# ubv = TfidfVectorizer(ngram_range=(3,3)) 

ubv_matrix = ubv.fit_transform(corpus)

ubv_matrix = ubv_matrix.toarray()
vocab = ubv.get_feature_names()
ubv_df = pd.DataFrame(ubv_matrix, columns=vocab)
ubv_df.index = urls
ubv_df.index.name='url'

1031


In [297]:
# merge two datasets (features and labeled data)
print(ubv_df.shape)
print(df_about_labeled.shape)

print (ubv_df.index.name)
print (df_about_labeled.index.name)

all_merged = ubv_df.join(df_about_labeled, how='inner', rsuffix='_lbl')
print(all_merged.shape)

(1031, 1472)
(1031, 6)
url
None
(1031, 1478)


In [298]:
# split labeled and predict datasets 
labeled = all_merged[all_merged['gid'].notnull()]
print(labeled.shape)
print (df_about_labeled.columns.tolist())

(1031, 1478)
['about', 'mgmt', 'partners', 'gid', 'pages_in_domain', 'is_sole_page']


In [353]:
# labeled train/test split
X = labeled.iloc[:,1:len(ubv_df.columns)]
X['pages_in_domain'] = labeled['pages_in_domain']
X['is_sole_page'] = labeled['is_sole_page']
X.to_csv(ABOUT_DIR + 'X.csv', index = True) # for manual inspection

y = labeled.loc[:,'about_lbl']

## Train and evaluate the model
On just the labeled data

In [300]:
# specify a few models

names = ["Nearest Neighbors", "Linear SVM", "RBF SVM", 
         "Decision Tree", "Random Forest", "Neural Net", "AdaBoost",
         "Naive Bayes", "SVC", "QDA"]

classifiers = [
    KNeighborsClassifier(3),
    SVC(kernel="linear", C=0.025),
    SVC(gamma=2, C=1),
    DecisionTreeClassifier(max_depth=5),
    RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1),
    MLPClassifier(alpha=1),
    AdaBoostClassifier(),
    GaussianNB(),
    SVC(gamma=0.001, C=100.), 
    QuadraticDiscriminantAnalysis()]

In [301]:
# build dataframe for output metrics 
eval_df = pd.DataFrame (names,index=(range(len(names))), columns=["Name"])
eval_df['Accuracy'] = np.float64(0)
display (eval_df)

Unnamed: 0,Name,Accuracy
0,Nearest Neighbors,0.0
1,Linear SVM,0.0
2,RBF SVM,0.0
3,Decision Tree,0.0
4,Random Forest,0.0
5,Neural Net,0.0
6,AdaBoost,0.0
7,Naive Bayes,0.0
8,SVC,0.0
9,QDA,0.0


In [302]:
# build evaluation outputs (currently limited to accuracy)
i = np.int64(0)
for name, clf in zip(names, classifiers):
    display (name)
    scores = cross_val_score(clf, X, y)
    avg_score = np.mean(scores)
    eval_df.set_value(i, 'Accuracy', avg_score)
    i = i + 1
    
display(eval_df)
eval_df.to_clipboard()

'Nearest Neighbors'

  import sys


'Linear SVM'

  import sys


'RBF SVM'

  import sys


'Decision Tree'

  import sys


'Random Forest'

  import sys


'Neural Net'

  import sys


'AdaBoost'

  import sys


'Naive Bayes'

  import sys


'SVC'

  import sys


'QDA'

  import sys


Unnamed: 0,Name,Accuracy
0,Nearest Neighbors,0.681875
1,Linear SVM,0.647918
2,RBF SVM,0.700328
3,Decision Tree,0.758491
4,Random Forest,0.646946
5,Neural Net,0.839966
6,AdaBoost,0.816721
7,Naive Bayes,0.659557
8,SVC,0.798251
9,QDA,0.4899


## Grid search using MLPClassifier to tune hyperparameter

In [175]:
hls = []
hls.append([100,])
hls.append([70,70,70])
hls.append([40,40,40])
hls.append([10,10,10])
pp.pprint(hls)

[[100], [70, 70, 70], [40, 40, 40], [10, 10, 10]]


In [176]:
parameters = {'solver': ['lbfgs'], 'max_iter': [300,500,700], 'alpha': 10.0 ** -np.arange(5, 10), 'hidden_layer_sizes': hls, 'random_state':[5,10,15]}
clf_grid = GridSearchCV(MLPClassifier(), parameters, n_jobs=-1)
clf_grid.fit(X,y)

print("Best score: %0.4f" % clf_grid.best_score_)
print("Using the following parameters:")
print(clf_grid.best_params_)



Best score: 0.8555
Using the following parameters:
{'alpha': 1e-09, 'hidden_layer_sizes': [70, 70, 70], 'max_iter': 300, 'random_state': 5, 'solver': 'lbfgs'}


In [337]:
# train neural net model with best hyperparameter configuration
clf = MLPClassifier(alpha=0.0000001, hidden_layer_sizes=(70,70,70), max_iter=300, random_state=5, solver='lbfgs')
clf.fit(X, y)

y_hat = clf.predict(X)

In [339]:
# print all instances where predictions don't match labels (for inspection)
confusion_matrix(y, y_hat)

for key, y_i, y_hat_i in zip(list(X.index), y, y_hat):
    if y_i != y_hat_i:
        print(key[1], 'has been classified as ', y_hat_i, 'but should be ', y_i) 

http://acaciaresearch.com/history/ has been classified as  1.0 but should be  0.0
https://www.asml.com/company/company-calendar/en/s32775 has been classified as  1.0 but should be  0.0
https://allisontransmission.com/company/history-heritage has been classified as  1.0 but should be  0.0
https://www.shire.com/who-we-are/areas-of-focus has been classified as  0.0 but should be  1.0
https://www.shire.com/who-we-are/responsibility has been classified as  0.0 but should be  1.0
http://www.brightleafpower.com/innovation/ has been classified as  0.0 but should be  1.0
http://www.agcbio.com/about/environmental_policy has been classified as  0.0 but should be  1.0
https://www.cadence.com/content/cadence-www/global/en_US/home/company/customers.html has been classified as  1.0 but should be  0.0
https://global.canon/en/corporate/ has been classified as  0.0 but should be  1.0
https://www.christiedigital.com/en-us/about-christie has been classified as  0.0 but should be  1.0
https://www.coloplast

## Predict about pages for unlabeled data

In [332]:
# prepare domain level features 
df_predict = df[~df['url'].isin(labeled_urls)] # careful: index is firm_name and url now
# count pages per domain
for index, row in df_predict.iterrows():
    pages_in_domain = domain_count[get_domain(row['url'])]
    df_predict.loc[index,'pages_in_domain'] = pages_in_domain
    is_sole_page = 0 if pages_in_domain > 1 else 1
    df_predict.loc[index,'is_sole_page'] = is_sole_page
    
# set index 
df_predict = df_predict.set_index(['firm_name', 'url'])
print (df_predict.columns.tolist())

['label', 'gid', 'pages_in_domain', 'is_sole_page']


In [335]:
# check to see whether there are duplicate urls
# note: there should be because different assignees may map to the same domain (see error above)
import collections
counter=collections.Counter(df_predict.index)
most_common = counter.most_common(10)
pp.pprint (most_common)

[(('Mitsubishi Metal Corporation',
   'https://www.mitsubishicorp.com/jp/en/about/plan/'),
  1),
 (('22nd Century Limited', 'http://www.xxiicentury.com/history/'), 1),
 (('Mitsubishi Metal Corporation',
   'https://www.mitsubishicorp.com/jp/en/about/global/'),
  1),
 (('Mitsubishi Metal Corporation',
   'https://www.mitsubishicorp.com/jp/en/about/message/'),
  1),
 (('Forest Concepts', 'http://forestconcepts.com/index.php?page=01002'), 1),
 (('ZENA TECHNOLOGIES', 'http://xena-technologies.com/program-management/'), 1),
 (('Kansai Paint Co.', 'https://www.kansai.com/about-us/corporate_data.html'),
  1),
 (('W&Wsens Devices', 'https://www.wwsensdevices.com/'), 1),
 (('Kansai Paint Co.', 'https://www.kansai.com/about-us/brand.html'), 1),
 (('Kansai Paint Co.', 'https://www.kansai.com/about-us/'), 1)]


In [323]:
# prepare n-gram features
unlabeled_firm_page_features = process_firms (set(df_predict.index.get_level_values('url')))

prediction_urls = unlabeled_firm_page_features.keys()

pred_corpus = []
for url in prediction_urls:
    pred_corpus.append (unlabeled_firm_page_features[url])

ubv_prediction_matrix = ubv.transform(pred_corpus)

ubv_prediction_matrix = ubv_prediction_matrix.toarray()
vocab = ubv.get_feature_names()
ubv_prediction_df = pd.DataFrame(ubv_prediction_matrix, columns=vocab)
ubv_prediction_df.index = prediction_urls
ubv_prediction_df.index.name='url'

In [333]:
print(ubv_prediction_df.shape)
print(df_predict.shape)

predict_merged = ubv_prediction_df.join(df_predict, how='right', rsuffix='_lbl')
print(predict_merged.shape)

# merge
X_test = predict_merged.iloc[:,1:len(ubv_prediction_df.columns)]
X_test['pages_in_domain'] = predict_merged['pages_in_domain']
X_test['is_sole_page'] = predict_merged['is_sole_page']
print (X.shape)
print (X_test.shape) # should be the same number of cols

X_test

(4147, 1472)
(4516, 4)
(4516, 1476)
(1031, 1473)
(4516, 1473)


Unnamed: 0_level_0,Unnamed: 1_level_0,10m,13485,14001,1870s,1910s,1920s,2016,2020,20pages,3d,...,z18038e,zegage,zeno,zero,zoetis,zonne,公司简介,隆達電子,pages_in_domain,is_sole_page
firm_name,url,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
Mitsubishi Metal Corporation,https://www.mitsubishicorp.com/jp/en/about/plan/,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,11.0,0.0
22nd Century Limited,http://www.xxiicentury.com/history/,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0
Mitsubishi Metal Corporation,https://www.mitsubishicorp.com/jp/en/about/global/,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,11.0,0.0
Mitsubishi Metal Corporation,https://www.mitsubishicorp.com/jp/en/about/message/,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,11.0,0.0
Forest Concepts,http://forestconcepts.com/index.php?page=01002,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0
ZENA TECHNOLOGIES,http://xena-technologies.com/program-management/,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0
Kansai Paint Co.,https://www.kansai.com/about-us/corporate_data.html,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,0.0
W&Wsens Devices,https://www.wwsensdevices.com/,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0
Kansai Paint Co.,https://www.kansai.com/about-us/brand.html,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,0.0
Kansai Paint Co.,https://www.kansai.com/about-us/,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,0.0


In [341]:
# predict with newly constructed X
y_predicted = clf.predict(X_test)

In [355]:
# write to file
with open(ABOUT_DIR + 'about_predicted_and_labels.csv', mode='w') as about_file:
    about_writer = csv.writer(about_file, delimiter=',', quotechar='"')
    about_writer.writerow(['firm_name', 'url', 'is_about'])
    # output predicted values to file
    for fn, u, predicted_value in zip(X_test.index.get_level_values('firm_name'), X_test.index.get_level_values('url'), y_predicted):
        # print (fn + ' with url ' + u + ' has predicted value ' + str(predicted_value))
        about_writer.writerow([fn, u, predicted_value])
    # and the labeled ones too...
    for fn, u, labeled_value in zip(X.index.get_level_values('firm_name'), X.index.get_level_values('url'), y):
        # print (fn + ' with url ' + u + ' has predicted value ' + str(labeled_value))
        about_writer.writerow([fn, u, labeled_value])

## Extract data from mongodb
* Now that we know which pages are about pages, extract from mongodb and output for topic modeling
* For now, construct paragraphs from different pages by ordering urls by their length.  In the future, might want to contruct paragraphs in their 'natural' sequential order as they would appear on a home page or landing page

In [379]:
# combine both labeled and predicted frames
print (X_test.shape)
print(X.shape)

combined = X_test.append(X)
print (combined.shape)
print (len(y_predicted))
print (len(y))
abouts = pd.DataFrame(index=combined.index)

abouts['is_about'] = list(y_predicted) + list(y)
abouts = abouts.reset_index()
abouts

(4516, 1474)
(1031, 1474)
(5547, 1474)
4516
1031


Unnamed: 0,firm_name,url,is_about
0,Mitsubishi Metal Corporation,https://www.mitsubishicorp.com/jp/en/about/plan/,1.0
1,22nd Century Limited,http://www.xxiicentury.com/history/,1.0
2,Mitsubishi Metal Corporation,https://www.mitsubishicorp.com/jp/en/about/glo...,1.0
3,Mitsubishi Metal Corporation,https://www.mitsubishicorp.com/jp/en/about/mes...,0.0
4,Forest Concepts,http://forestconcepts.com/index.php?page=01002,0.0
5,ZENA TECHNOLOGIES,http://xena-technologies.com/program-management/,0.0
6,Kansai Paint Co.,https://www.kansai.com/about-us/corporate_data...,1.0
7,W&Wsens Devices,https://www.wwsensdevices.com/,0.0
8,Kansai Paint Co.,https://www.kansai.com/about-us/brand.html,1.0
9,Kansai Paint Co.,https://www.kansai.com/about-us/,1.0


In [383]:
# gather unique firm_names from mongodb
firm_names = set(abouts['firm_name'])
print (len(firm_names))
pp = pprint.PrettyPrinter()
pp.pprint(firm_names)

1050
{'22nd Century Limited',
 '3M Innovative Properties Company',
 'ABB AB',
 'AC International Inc.',
 'ACACIA RESEARCH GROUP LLC',
 'ACUCELA INC.',
 'ACell',
 'ADASA INC.',
 'ADVANCED INNOVATION CENTER LLC',
 'AFMODEL',
 'AGC Flat Glass North America',
 'AGFA-GEVAERT N.V.',
 'ALGETERNAL TECHNOLOGIES',
 'ALSTOM Technology Ltd',
 'AMPT',
 'APPLIED STEMCELL',
 'ARBOR THERAPEUTICS',
 'ASCENT SOLAR TECHNOLOGIES',
 'ASM America',
 'ASML Netherlands B.V.',
 'ASTUTE MEDICAL',
 'AT&T Corporation',
 'ATC Technologies',
 'ATOMERA INCORPORATED',
 'ATTOSTAT',
 'AVI BioPharma',
 'AVOGY',
 'AbbVie Inc.',
 'Abbott Molecular Inc.',
 'Abbott Point of Care Inc.',
 'Abengoa Bioenergy New Technologies',
 'Ablexis',
 'Access Business Group International LLC',
 'AccuRay Corporation',
 'Acorn Technologies',
 'Adaptive Biotechnologies Corp.',
 'Adeka Corporation',
 'Adhesives Research',
 'Adlens Beacon',
 'Adobe Systems Incorporated',
 'Adtran',
 'Advanced Analogic Technologies',
 'Advanced Aqua Group',
 'A

In [648]:
def get_ordered_about_urls (firm_name):
    urls = list (abouts.loc[(abouts['firm_name'] == firm_name) & (abouts['is_about'] == 1), 'url'])
    urls.sort(key = len)
    # print ('Original urls')
    # pp.pprint(urls)

    index = {}
    for url in urls:
        path_fragments = len(url.split('/'))
        added = False
        for i in range(1, path_fragments):
            key_phrase = url.rsplit('/', maxsplit=i)[0]
            if key_phrase in urls or (key_phrase + '/') in urls: 
                od = index.setdefault(key_phrase, OrderedDict())
                od[url] = 1
                added = True
                continue
        if not added:
            od = index.setdefault(url, OrderedDict())
            od[url] = 1
 
    
    # pp.pprint (index)
    return_urls = [] # OrderedDict((item, 1)for sublist in list(index.values()) for item in sublist)
    seen = set ()
    for key in index.keys():
        tree_urls = index[key]
        for fu in tree_urls:
            if fu not in seen:
                return_urls.append(fu)
                seen.add(fu)
    
    return return_urls

test_urls = get_ordered_about_urls ('Apple Inc.')
print ('Ordered urls')
pp.pprint (test_urls)

Ordered urls
[]


In [643]:
# remove html content
def is_javascript (x):
    match_string = r"(CDATA|return\s+true|return\s+false|getelementbyid|function|\w+\(.*?\);|\w{2,}[\\.|:]+\w{2,}|'\w+':\s+'\w+|\\|{|}|\r|\n|\/\/')"
    # capture CDATA; function declarations; function calls; word sequences separated by a period (e.g., denoting paths)
    regex = re.findall(match_string, x) 
    # check to see if the regex finds some percentage of the words look like javascript patterns
    if (len(regex) / float(len(x.split())) > .10):
        return True 
    else:
        return False

def clean_page_content (text_list):
    # remove whatever we think is html
    removed_html = filter(lambda x: not( bool(BeautifulSoup(x, "html.parser").find()) ), text_list)
    # remove content that looks like javascript 
    removed_js = filter(lambda x: not (is_javascript(x)), removed_html)
    # add other checks here as needed

    return removed_js
    

# iterate through firm urls and return concatenated string
def get_url_content (urls): 
    running_text = ''
    for url in urls:
        print ('\tWorking on ' + url)
        result = col.find_one( {"url": url} )
        if result:
            clnd_text = clean_page_content(result['full_text'])
            clnd_text = '\n'.join(clnd_text)
            boilerpipe = None
            
            if 'body' in result:
                extractor = Extractor(extractor='DefaultExtractor', html = result['body'][0])
                lines = extractor.getText().replace(u'\xa0', u' ').split('\n')
                filtered = filter(lambda x: not re.match(r'^\s*$', x), lines)
                boilerpipe = '\n'.join(filtered)

            # TODO fix to split().  Counting characters currently 
            if boilerpipe and (len(boilerpipe) > .5 * len(clnd_text)):
                print ('\t\tUsing boilerplate')
                running_text += boilerpipe
            else:
                print ('\t\tUsing clnd_text')
                running_text += clnd_text
        else:
            print ('Cannot find url: ' + url)

    return running_text

In [644]:
# regex test 
regex = re.findall(r"(CDATA|return\s+true|return\s+false|getelementbyid|function|\w+\(.*?\);|\w{2,}[\\.|:]+\w{2,}|'\w+':\s+'\w+|\\|{|}|\r|\n|\/\/')", 
                   "CDATA function contact-us getelementbyid javascript.function linker:autoLink www.littlekidsinc.com fxnCall(param.param); email@dextr.us 'type': 'image' return true return false rev7bynlh\\u00252bvcgrjg\\ {height}") # last part is words sequences separated by punct
print (regex)

['CDATA', 'function', 'getelementbyid', 'javascript.function', 'linker:autoLink', 'www.littlekidsinc', 'fxnCall(param.param);', 'dextr.us', "'type': 'image", 'return true', 'return false', 'rev7bynlh\\u00252bvcgrjg', '\\', '{', '}']


In [645]:
test_site_text = get_url_content (test_urls)
print (test_site_text)

	Working on https://www.terratechcorp.com/about
		Using boilerplate
	Working on https://www.terratechcorp.com/about/mission-vision-and-values
		Using boilerplate
A Greener World
Committed to cultivating and providing the highest quality cannabis and other agricultural products
Medical Cannabis
& Urban Agriculture
Terra Tech Corp is a vertically integrated cannabis-focused agriculture company. We’re pioneering the future by integrating the best of the natural world with technology to create sustainable solutions for medical cannabis production, extraction and distribution, plant science research and development, food production and Closed Environment Agriculture (CEA). Through this development, we have created relevant brands in both the cannabis and agriculture industries.
Committed to Cultivating and Providing
The Highest Quality Medical Cannabis
Through multiple subsidiaries in this space, we are committed to cultivating and providing the highest quality medical cannabis consistently

In [646]:
# standard firm cleaning regex
def clean_firm_name (firm):
    firm_clnd = re.sub('(\.|,| corporation| incorporated| llc| inc| international| gmbh| ltd)', '', firm, flags=re.IGNORECASE).rstrip()
    return firm_clnd

In [647]:
# run process_firm and write to file
pp = pprint.PrettyPrinter()
for firm_name in firm_names: 
    print ("Working on " + firm_name)
    about_urls = get_ordered_about_urls(firm_name)
    about_text = get_url_content (about_urls)
    
    if about_text: 
        firm_clnd = clean_firm_name(firm_name) # standard cleaning code throughout project
        file = re.sub('\/', '|', firm_clnd) + '.txt'
        with io.open(DATA_DIR + file,'w',encoding='utf8') as f:
            f.write (about_text)
    else:
        print ("Couldn't find any text for firm!")

Working on E. Tech Incorporation
	Working on https://www.terratechcorp.com/about
		Using boilerplate
	Working on https://www.terratechcorp.com/about/mission-vision-and-values
		Using boilerplate
Working on COVERIS FLEXIBLES US LLC
	Working on http://www.coveris.com/company/
		Using boilerplate
	Working on http://www.coveris.com/company/about-us/
		Using boilerplate
Working on Little Kids
	Working on http://www.littlekidsinc.com/about-us
		Using boilerplate
Working on Neutronic Perpetual Innovations
	Working on http://npimobile.com/about-us/
		Using boilerplate
Working on Englewood Lab
	Working on http://englewoodlab.com/about-us/
		Using boilerplate
	Working on http://englewoodlab.com/about-us/history/
		Using boilerplate
Working on Optodot Corporation
	Working on http://optodot.com/about-us
		Using boilerplate
Working on Nippon Chemi-Con Corporation
	Working on http://www.chemi-con.co.jp/e/company/index.html
		Using boilerplate
Working on Cedar Ridge Research
	Working on http://cr-res

  ' Beautiful Soup.' % markup)


		Using boilerplate
	Working on http://cr-res.com/crr-history/
		Using clnd_text
Working on BROADCOM CORPORATION
	Working on https://www.broadcom.com/
		Using clnd_text
	Working on https://www.broadcom.com/company/about-us/
		Using clnd_text
Working on Boston Scientific Scimed
	Working on http://www.bostonscientific.com/en-US/about-us.html
		Using clnd_text
	Working on http://www.bostonscientific.com/en-US/about-us/awards.html
		Using boilerplate
	Working on http://www.bostonscientific.com/en-US/about-us/history.html
		Using clnd_text
	Working on http://www.bostonscientific.com/en-US/about-us/who-we-are.html
		Using clnd_text
Working on MonoSol
	Working on https://www.monosol.com/about/
		Using boilerplate
	Working on https://www.monosol.com/about/awards/
		Using boilerplate
	Working on https://www.monosol.com/about/monosol-af-ltd-tax-strategy/
		Using clnd_text
Working on ACell
	Working on https://acell.com/about-us/
		Using boilerplate
Working on Carver Scientific
''
Working on SunCu

  ' that document to Beautiful Soup.' % decoded_markup


Working on Marine Polymer Technologies
''
Working on Proterra Inc.
''
Working on Alcotek
	Working on https://alcotek.com/about-us/
		Using boilerplate
Working on ASML Netherlands B.V.
	Working on https://www.asml.com/company/who-we-are/en/s277?rid=51980
		Using clnd_text
	Working on https://www.asml.com/company/what-we-do/en/s277?rid=51981
		Using clnd_text
	Working on https://www.asml.com/company/our-history/en/s277?rid=51985
		Using clnd_text
	Working on https://www.asml.com/company/organization/en/s277?rid=51984
		Using clnd_text
	Working on https://www.asml.com/company/why-we-exist/en/s277?rid=51983
		Using clnd_text
	Working on https://www.asml.com/company/how-we-do-it/en/s277?rid=51979
		Using clnd_text
	Working on https://www.asml.com/company/in-a-nutshell/en/s277?rid=51978
		Using clnd_text
Working on mVerify Corporation
	Working on https://numverify.com/about
		Using boilerplate
Working on ACUCELA INC.
	Working on https://www.acucela.com/company/index.html
		Using clnd_text
	W

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
	Working on http://www.innolux.com/Pages/EN/AboutUs/Milestones_EN.html
		Using clnd_text
	Working on http://www.innolux.com/Pages/EN/AboutUs/Company_Overview_EN.html
		Using clnd_text
	Working on http://www.innolux.com/Pages/EN/AboutUs/Honors_and_Awards_EN.html
		Using boilerplate
Working on Immunolight
	Working on http://www.immunolight.com/company/
		Using clnd_text
	Working on http://www.immunolight.com/company/blog/
		Using boilerplate
Working on Dana Corporation
	Working on http://www.dana.com/corporate-pages/history
		Using boilerplate
Working on GlaxoSmithKline Biologicals
	Working on https://www.gsk.com/en-gb/about-us/
		Using clnd_text
	Working on https://www.gsk.com/en-gb/about-us/vaccines/
		Using clnd_text
	Working on https://www.gsk.com/en-gb/about-us/our-history/
		Using clnd_text
	Working on https://www.gsk.com/en-gb/about-us/pharmaceuticals/
		Using clnd_text
	Working on https://www.gsk.com/en-gb/investors/about-gsk/
		Using clnd_text
	Working on htt

  ' Beautiful Soup.' % markup)


		Using clnd_text
	Working on https://www.thecableco.com/vibrapod_company.html
		Using clnd_text
	Working on https://www.thecableco.com/the_cable_company.html
		Using clnd_text
Working on Carestream Health
	Working on https://www.carestream.com/en/us/corporate
		Using clnd_text
	Working on https://www.carestream.com/en/us/corporate/company-history
		Using clnd_text
Working on Humanetics Corporation
	Working on http://www.humaneticsatd.com/about-us
		Using clnd_text
Working on Seiko Epson Corporation
	Working on https://global.epson.com/company/
		Using clnd_text
	Working on https://global.epson.com/company/glance/
		Using clnd_text
	Working on https://global.epson.com/innovation/vision/
		Using boilerplate
Working on Analog Devices
	Working on https://www.analog.com/en/about-adi.html
		Using boilerplate
	Working on https://www.analog.com:443/en/myhistory.html
		Using boilerplate
Working on Samsung SDI Co.
	Working on http://samsungsdi.com/about-sdi.html
		Using boilerplate
	Working on 

  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
	Working on http://colinsgrp.com/site/our-purpose
		Using boilerplate
Working on Total Marketing Services
	Working on https://www.total.com/en/info/company-websites
		Using clnd_text
	Working on https://www.total.com/en/group/identity/history
		Using clnd_text
	Working on https://www.total.com/en/group/identity/five-strong-values-embedded-our-dna
		Using clnd_text
Working on Bostik
	Working on https://www.bostik.com/our-company/
		Using boilerplate
	Working on https://www.bostik.com/our-company/about-arkema/
		Using clnd_text
Working on WOVN
	Working on http://www.wovns.com/about
		Using boilerplate
Working on Roche Diagnostics GmbH
	Working on https://www.accu-chek.com/about-us
		Using clnd_text
Working on U S MICROPOWER INC
	Working on https://micropower-group.com/about/about/
		Using boilerplate
Working on Cisco Technology
	Working on https://www.cisco.com/c/en/us/about.html
		Using clnd_text
Working on MAXLINEAR
	Working on http://www.maxlinear.com/company/about

  ' that document to Beautiful Soup.' % decoded_markup


		Using clnd_text
	Working on https://www.salon.com/2018/12/02/cnn-keeps-letting-guests-and-paid-commentators-lie-about-climate-scientists_partner/
		Using boilerplate
Working on Janssen Biotech
	Working on https://www.janssen.com/about
		Using boilerplate
	Working on https://www.janssen.com/company
		Using clnd_text
Working on Amtech Systems
	Working on http://amtechsystems.com/profile.htm
		Using boilerplate
Working on SolarLego Inc.
	Working on https://education.lego.com/en-us/about-us
		Using boilerplate
Working on Dresser-Rand Company
	Working on http://dresser-rand.com/company/
		Using clnd_text
	Working on http://dresser-rand.com/company/about-us/
		Using clnd_text
	Working on http://dresser-rand.com/company/ethics-hotline/
		Using clnd_text
Working on Johns Manville
	Working on https://www.jm.com/en/our-company/
		Using clnd_text
	Working on https://www.jm.com/en/our-company/CoreValues/
		Using clnd_text
	Working on https://www.jm.com/en/our-company/HistoryandHeritage/
		Using 

  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
	Working on http://pardev.com/company/our-philosophy.html
		Using boilerplate
	Working on http://pardev.com/company/industries-served.html
		Using boilerplate
Working on Voxtel
	Working on http://voxtel-inc.com/about-voxtel/
		Using boilerplate
Working on Semprius
	Working on https://www.semprius.com/about-us/
		Using clnd_text
Working on Toda Kogyo Corporation
	Working on http://www.todakogyo.co.jp/english/about/csr.html
		Using clnd_text
	Working on http://www.todakogyo.co.jp/english/about/index.html
		Using boilerplate
	Working on http://www.todakogyo.co.jp/english/about/history.html
		Using boilerplate
Working on System Biosciences
	Working on https://www.systembio.com/company/about/
		Using clnd_text
	Working on https://www.systembio.com/company/mission/
		Using clnd_text
Working on GENCO SCIENCES LLC
	Working on https://www.gpen.com/pages/about
		Using clnd_text
Working on BASF
	Working on https://www.basf.com/us/en/who-we-are.html
		Using boilerplate
	Working

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
Working on Advanced Silicon Group
''
Working on ABB AB
	Working on https://new.abb.com/about
		Using clnd_text
	Working on https://new.abb.com/about/integrity
		Using clnd_text
	Working on https://new.abb.com/about/supplying
		Using clnd_text
	Working on https://new.abb.com/about/abb-in-brief
		Using clnd_text
Working on Cetac Technologies Inc.
	Working on http://www.teledynecetac.com/about-us/about-us
		Using boilerplate
	Working on http://www.teledynecetac.com/about-us/workshops
		Using clnd_text
Working on THYSSENKRUPP INDUSTRIAL SOLUTIONS GMBH
	Working on https://www.thyssenkrupp.com/en/company/
		Using boilerplate
	Working on https://www.thyssenkrupp.com/en/company/history/
		Using boilerplate
	Working on https://www.thyssenkrupp.com/en/company/innovation/
		Using clnd_text
	Working on https://www.thyssenkrupp.com/en/company/history/index/
		Using boilerplate
	Working on https://www.thyssenkrupp.com/en/company/history/the-logos/
		Using boilerplate
	Working on 

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
Working on HM3 Energy
	Working on http://hm3energy.com/about/
		Using clnd_text
	Working on http://hm3energy.com/about/hm3-energy-achievements/
		Using boilerplate
Working on Integrated DNA Technologies
	Working on https://www.idtdna.com/pages/about/core-values
		Using boilerplate
Working on Sick AG
	Working on https://www.sick.com/de/en/sick-ags-company-profile/w/about/
		Using clnd_text
	Working on https://www.sick.com/de/en/about-sick/our-philosophy/w/our-philosophy/
		Using clnd_text
	Working on https://www.sick.com/de/en/about-sick/sicks-company-history/w/the-history-of-sick/
		Using clnd_text
	Working on https://www.sick.com/de/en/about-sick/research-and-development/w/research-and-development/
		Using clnd_text
Working on Performance Plants
	Working on http://performanceplants.com/corporate
		Using boilerplate
Working on Bruin Biometrics
	Working on http://bruinbiometrics.com/us/about-us
		Using clnd_text
	Working on http://bruinbiometrics.com/us/about-us/abou

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using clnd_text
Working on J. E. WHITE
''
Working on iBio
	Working on https://ibioinc.com/index.php/company-background/disclaimer
		Using boilerplate
	Working on https://ibioinc.com/index.php/company-background/company-background-2
		Using clnd_text
Working on Canon Nanotechnologies
	Working on http://cnt.canon.com/overview/
		Using boilerplate
Working on Agilent Technologies
	Working on https://www.agilent.com/about/workingwa/index.html
		Using clnd_text
	Working on https://www.agilent.com/about/companyinfo/index.html
		Using boilerplate
	Working on https://www.agilent.com/about/features/raman-spectroscopy.html
		Using boilerplate
Working on PluroGen Therapeutics
	Working on http://plurogen.com/about-us/
		Using boilerplate
	Working on http://plurogen.com/about-us/about-dr-rodeheaver/
		Using boilerplate
Working on Bisco
''
Working on Antaya Technologies Corporation
	Working on https://www.antaya.com/company/
		Using boilerplate
Working on Life Technologies Corporation
''
Working on

  ' Beautiful Soup.' % markup)


		Using boilerplate
	Working on https://www.polystar.com/company/whistleblower/
		Using clnd_text
Working on SAINT-GOBAIN ABRASIFS
	Working on https://www.saint-gobain-abrasives.com/en-us/about-us
		Using clnd_text
	Working on https://www.saint-gobain-abrasives.com/en-us/proud-our-history
		Using clnd_text
Working on Green Extraction Technologies
	Working on https://www.greenextractiontechnologiesllc.com/about/company-background.php
		Using boilerplate
Working on Atonometrics
	Working on http://www.atonometrics.com/about/
		Using clnd_text
Working on Corporation for National Research Initiatives
	Working on http://cnri.reston.va.us/about_cnri.html
		Using boilerplate
Working on Nordic Technologies
	Working on http://nordic-technologies.com/about/
		Using boilerplate
	Working on http://nordic-technologies.com/drinking-water/about-drinking-water/
		Using boilerplate
Working on Alcon Research
	Working on https://www.alcon.com/about-us
		Using clnd_text
	Working on https://www.alcon.com/ab

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
	Working on http://www.conversantip.com/about/history/
		Using boilerplate
Working on Ceramatec
	Working on https://www.ceramtec.com/about-us/
		Using boilerplate
Working on Federal Signal Corporation
	Working on https://www.federalsignal.com/who-we-are
		Using clnd_text
Working on ElectraTherm
	Working on https://electratherm.com/about/
		Using clnd_text
	Working on https://electratherm.com/about/bitzer-group/
		Using clnd_text
	Working on https://electratherm.com/about/mission-vision/
		Using boilerplate
Working on Nthdegree Technologies Worldwide Inc.
	Working on https://www.ndeg.com/about
		Using boilerplate
Working on Bigelow Aerospace
	Working on http://bigelowaerospace.com/pages/whoweare/
		Using boilerplate
Working on Envisionit LLC
	Working on https://www.envisionit.com/about-us
		Using boilerplate
	Working on https://www.envisionit.com/about-us/our-process
		Using clnd_text
	Working on https://www.envisionit.com/about-us/our-customers
		Using clnd_text
Wor

  ' that document to Beautiful Soup.' % decoded_markup


		Using clnd_text
	Working on https://www.shire.com/who-we-are/our-story/our-history
		Using boilerplate
	Working on https://www.shire.com/who-we-are/our-story/our-culture
		Using clnd_text
Working on Nanoridge Materials
	Working on http://nanoridge.com/about/
		Using boilerplate
Working on Brookhaven Science Associates LLC 
	Working on http://bsa-hq.org/about/about.php
		Using boilerplate
Working on Praxair S.T. Technology
	Working on https://www.praxair.com/our-company
		Using clnd_text
	Working on https://www.praxair.com/our-company/vision-and-values
		Using boilerplate
Working on Rockwell Collins
	Working on https://www.rockwellcollins.com:443/Our-Company.aspx
		Using clnd_text
Working on Tapwave
''
Working on OmniVision Technologies
	Working on https://www.ovt.com/company
		Using clnd_text
	Working on https://www.ovt.com/company/company-profile
		Using clnd_text
Working on SAMSUNG DISPLAY CO.
	Working on https://displaysolutions.samsung.com/about-us/our-vision
		Using boilerplate


		Using boilerplate
	Working on https://www.teradata.com.au/About-Us/Corporate-Governance
		Using clnd_text
Working on Dialight Corporation
	Working on https://www.dialight.com/about/
		Using boilerplate
Working on Toray Industries Inc. 
	Working on https://www.toray.com/aboutus/
		Using clnd_text
	Working on https://www.toray.com/aboutus/index.html
		Using clnd_text
	Working on https://www.toray.com/aboutus/outline.html
		Using clnd_text
	Working on https://www.toray.com/aboutus/philosophy.html
		Using boilerplate
	Working on https://www.toray.com/aboutus/vision/index.html
		Using boilerplate
Working on Sanken Electric Co.
	Working on https://www.sanken-ele.co.jp/en/corp/index.htm
		Using clnd_text
	Working on https://www.sanken-ele.co.jp/en/corp/corp08.htm
		Using boilerplate
Working on Sandia Corporation
	Working on https://www.sandia.gov/about/60-ways/
		Using boilerplate
	Working on https://www.sandia.gov/about/index.html
		Using clnd_text
	Working on https://www.sandia.gov/about/

  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
Working on KT Corporation
	Working on https://www.broadcom.com/
		Using clnd_text
	Working on https://www.broadcom.com/company/about-us/
		Using clnd_text
	Working on http://esolar.solar/about
		Using boilerplate
	Working on http://taurx.com/about-us/
		Using boilerplate
	Working on https://kinetechpower.com/
		Using boilerplate
	Working on https://elenion.com/summary/
		Using boilerplate
	Working on http://atomera.com/overview/
		Using boilerplate
	Working on https://sionpower.com/about/
		Using boilerplate
	Working on https://www.noble.org/about/
		Using boilerplate
	Working on https://www.atum.bio/company
		Using boilerplate
	Working on http://www.gliknik.com/about/
		Using clnd_text
	Working on http://www.gliknik.com/about/how-we-work/
		Using boilerplate
	Working on https://biocare.net/about-us/


  ' that document to Beautiful Soup.' % decoded_markup


		Using clnd_text
	Working on https://biocare.net/about-us/awards/
		Using clnd_text
	Working on https://biocare.net/about-us/company-history/
		Using clnd_text
	Working on http://gswps.com/AboutUs.aspx
		Using boilerplate
	Working on http://acorntech.com/company/
		Using boilerplate
	Working on http://acorntech.com/company/innovation/
		Using boilerplate
	Working on http://livetv.sx/enx/myteams/
		Using clnd_text
	Working on http://www.glycon.com/history
		Using boilerplate
	Working on https://www.angaza.com/about/
		Using boilerplate
	Working on http://www.nanogram.com/home/
		Using boilerplate
	Working on http://acinter.com/aboutus.htm
		Using boilerplate
	Working on https://www.asurion.com/about/
		Using clnd_text
	Working on https://www.asurion.com/about/smb/
		Using clnd_text
	Working on https://www.asurion.com/about/awards/
		Using clnd_text
	Working on https://eink.com/about-us.html
		Using boilerplate
	Working on http://www.starlightenergy.us/
		Using boilerplate
	Working on h

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
	Working on https://www.ibm.com/ibm/us/en/?lnk=fab
		Using boilerplate
	Working on http://aadvancedaqua.com/About-Us.html
		Using boilerplate
	Working on http://bruinbiometrics.com/us/about-us
		Using clnd_text
	Working on http://bruinbiometrics.com/us/about-us/about-us-2
		Using clnd_text
	Working on http://bruinbiometrics.com/us/about-us/mission-values
		Using boilerplate
	Working on http://www.bestwaycorp.us/AboutUs/Page
		Using boilerplate
	Working on http://www.adhesivesresearch.com/about/
		Using clnd_text
	Working on http://nanotekinstruments.com/about-us/
		Using boilerplate
	Working on https://www.capsugel.com/about-capsugel
		Using clnd_text
	Working on https://usa.healthcare.siemens.com/about
		Using boilerplate
	Working on https://www.smartplanettech.com/about-us/
		Using clnd_text
	Working on https://www.armageddonenergy.com/company/
		Using boilerplate
	Working on https://global-sei.com/company/vision.html
		Using boilerplate
	Working on https://www.ba

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using clnd_text
	Working on http://daylightsolutions.com/about-us/awards/
		Using clnd_text
	Working on https://www.brewerscience.com/about-us/awards/
		Using boilerplate
	Working on https://www.accessbusinessgroup.com/about-abg/
		Using boilerplate
	Working on https://www.gsk.com/en-gb/investors/about-gsk/
		Using clnd_text
	Working on https://www.bosch.com/research/about-research/
		Using boilerplate
	Working on https://www.bosch.com/research/about-research/roots/
		Using boilerplate
	Working on http://pidc.com/our-company/corporate-overview
		Using clnd_text
	Working on https://www.brewerscience.com/about-us/company/
		Using boilerplate
	Working on http://jsr.vccs.edu/who_we_are/about/access.aspx
		Using clnd_text
	Working on https://king-electric.com/about/king-difference/
		Using clnd_text
	Working on https://cellularresearchinstitute.com/about.html
		Using boilerplate
	Working on http://jsr.vccs.edu/who_we_are/about/default.aspx
		Using clnd_text
	Working on http://jsr.vccs.edu

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
	Working on https://www.saint-gobain-abrasives.com/en-us/proud-our-history
		Using clnd_text
	Working on https://www.tdk-electronics.tdk.com/en/1192046/company/tdk-group
		Using clnd_text
	Working on https://www.hamamatsu.com/jp/en/our-company/about-crl/index.html
		Using clnd_text
	Working on https://www.tdk-electronics.tdk.com/en/1190708/company/tdk-europe
		Using clnd_text
	Working on https://www.exxonmobilchemical.com/en/exxonmobil-chemical/about-us
		Using clnd_text
	Working on https://www.hamamatsu.com/jp/en/our-company/at-a-glance/index.html
		Using boilerplate
	Working on https://www.hamamatsu.com/jp/en/our-company/our-philosophy/index.html
		Using boilerplate
Working on Magnachip Semiconductor
	Working on http://magnachip.com/aboutus/aboutus_sub01.html
		Using boilerplate
	Working on http://magnachip.com/aboutus/aboutus_sub08.html
		Using boilerplate
	Working on http://magnachip.com/aboutus/aboutus_sub06.html
		Using boilerplate
	Working on http://magnachip

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using clnd_text
	Working on http://www.hitachi.com/information/180420/index.html
		Using boilerplate
Working on Smith & Nephew
	Working on http://www.smith-nephew.com/about-us/
		Using clnd_text
	Working on http://www.smith-nephew.com/about-us/what-we-do/trauma/
		Using clnd_text
	Working on http://www.smith-nephew.com/about-us/where-we-operate/asia/nepal-/
		Using clnd_text
	Working on http://www.smith-nephew.com/about-us/where-we-operate/europe/malta/
		Using clnd_text
	Working on http://www.smith-nephew.com/about-us/where-we-operate/asia/vietnam/
		Using clnd_text
	Working on http://www.smith-nephew.com/about-us/where-we-operate/europe/spain/
		Using clnd_text
	Working on http://www.smith-nephew.com/about-us/where-we-operate/africa/sudan/
		Using clnd_text
	Working on http://www.smith-nephew.com/about-us/where-we-operate/africa/kenya/
		Using clnd_text
	Working on http://www.smith-nephew.com/about-us/where-we-operate/americas/peru/
		Using clnd_text
	Working on http://www.smith-ne

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
Working on LG Display Co.
	Working on http://lgdisplay.com/eng/company/vision
		Using clnd_text
	Working on http://lgdisplay.com/eng/company/history
		Using boilerplate
	Working on http://lgdisplay.com/eng/company/overview
		Using clnd_text
	Working on http://lgdisplay.com/eng/recruit/coreValues
		Using clnd_text
Working on FLOW CONTROL LLC.
	Working on https://www.flowcontrol.com/about.html
		Using boilerplate
Working on Access Business Group International LLC
	Working on https://www.accessbusinessgroup.com/about-abg/
		Using boilerplate
Working on LUCERA LABS
''
Working on Nanosys
	Working on http://www.nanosysinc.com/who-we-are
		Using boilerplate
	Working on http://www.nanosysinc.com/who-we-are/
		Using boilerplate
Working on Verliant Energy
	Working on http://www.verliantenergy.com/
		Using boilerplate
Working on Bird-B-Gone
	Working on https://www.birdbgone.com/about/
		Using boilerplate
Working on HARMAN INTERNATIONAL INDUSTRIES
	Working on https://www.harman

  ' that document to Beautiful Soup.' % decoded_markup


		Using clnd_text
	Working on https://www.bayer.com/en/profile-and-organization.aspx
		Using clnd_text
Working on HGST NETHERLANDS B.V.
	Working on https://www.westerndigital.com/company
		Using clnd_text
	Working on https://www.westerndigital.com/company/innovations/history
		Using boilerplate
Working on Biological Dynamics
''
Working on United Technologies Corporation
	Working on http://www.utc.com/Pages/Home.aspx
		Using clnd_text
	Working on http://www.utc.com/Who-We-Are/Pages/Key-Facts.aspx
		Using boilerplate
	Working on http://www.utc.com/Who-We-Are/Pages/Our-People.aspx
		Using boilerplate
	Working on http://www.utc.com/Who-We-Are/Pages/At-A-Glance.aspx
		Using boilerplate
	Working on http://www.utc.com/Who-We-Are/Research-Center/Pages/default.aspx
		Using clnd_text
Working on Renesas Electronics Corporation
	Working on https://www.renesas.com/us/en/about/company.html
		Using clnd_text
	Working on https://www.renesas.com/us/en/about/ir/financial.html
		Using clnd_text
	Working 

		Using clnd_text
Working on HTC Corporation
	Working on https://www.htc.com/us/about/
		Using boilerplate
Working on SunRun
''
Working on Siemens Medical Solutions USA
	Working on https://usa.healthcare.siemens.com/about
		Using boilerplate
Working on Nationwide Children's Hospital
	Working on https://www.nationwidechildrens.org/about-us
		Using boilerplate
	Working on https://www.nationwidechildrens.org/about-us/our-story
		Using clnd_text
Working on Texas Research International
	Working on https://tri-intl.com/about-tri/
		Using clnd_text
	Working on https://tri-intl.com/about-tri/history/
		Using clnd_text
	Working on https://tri-intl.com/about-tri/global-presence/
		Using clnd_text
	Working on https://tri-intl.com/services/technology-teams/
		Using clnd_text
Working on PQ CORPORATION
	Working on https://www.pqcorp.com/about-us
		Using clnd_text
Working on FUJIFILM Corporation
	Working on http://www.fujifilm.com/about/
		Using clnd_text
	Working on http://www.fujifilm.com/about/pro

  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
Working on COOK MEDICAL TECHNOLOGIES LLC
	Working on https://www.cookmedical.com/about/
		Using clnd_text
	Working on https://www.cookmedical.com/about/history/
		Using boilerplate
	Working on https://www.cookmedical.com/about/ethics-compliance/
		Using boilerplate
	Working on https://www.cookmedical.com/about/mission-and-values/
		Using clnd_text
Working on Banpil Photonics
	Working on http://banpil.com/vmv.htm
		Using clnd_text
Working on GE Healthcare Dharmacon
	Working on https://dharmacon.horizondiscovery.com/about-us/
		Using clnd_text
	Working on https://dharmacon.horizondiscovery.com/about-us/about-open-biosystems/
		Using clnd_text
	Working on https://dharmacon.horizondiscovery.com/about-us/genomics-discovery-initiative/
		Using boilerplate
Working on Solaire Generation LLC
''
Working on KLA-Tencor Corporation
	Working on https://www.kla-tencor.com/Company/fact-sheet.html
		Using clnd_text
	Working on https://www.kla-tencor.com/Company/procurement.html
		Us

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using clnd_text
Working on Pendar Technologies
	Working on http://www.pendar.com/about.html
		Using boilerplate
Working on JFE STEEL CORPORATION
	Working on http://www.jfe-steel.co.jp/en/movie/
		Using clnd_text
	Working on http://www.jfe-steel.co.jp/en/company/about.html
		Using clnd_text
	Working on http://www.jfe-steel.co.jp/en/company/index.html
		Using clnd_text
	Working on http://www.jfe-steel.co.jp/en/company/steel.html
		Using clnd_text
	Working on http://www.jfe-steel.co.jp/en/company/philosophy.html
		Using boilerplate
Working on TT Technologies
	Working on http://www.tttechnologies.com/about/
		Using clnd_text
	Working on http://www.tttechnologies.com/underground-construction-company-inc-doubles-down-with-trenchless-tools-and-methods-to-install-gas-services-in-arizona/
		Using boilerplate
Working on Mitsubishi Metal Corporation
	Working on https://www.mitsubishicorp.com/jp/en/about/
		Using clnd_text
	Working on https://www.mitsubishicorp.com/jp/en/about/sub/


  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using clnd_text
	Working on https://www.mitsubishicorp.com/jp/en/about/plan/
		Using clnd_text
	Working on https://www.mitsubishicorp.com/jp/en/about/movie/
		Using clnd_text
	Working on https://www.mitsubishicorp.com/jp/en/about/global/
		Using clnd_text
	Working on https://www.mitsubishicorp.com/jp/en/about/profile/
		Using clnd_text
	Working on https://www.mitsubishicorp.com/jp/en/about/history/
		Using boilerplate
Working on Genzyme Corporation
	Working on https://www.sanofigenzyme.com/en/about-us
		Using boilerplate
	Working on https://www.sanofigenzyme.com/en/about-us/our-history
		Using boilerplate
	Working on https://www.sanofigenzyme.com/en/about-us/our-stories/paulas-MPS-I-Story
		Using boilerplate
Working on Kala Pharmaceuticals
	Working on https://kalarx.com/about-us/overview/
		Using clnd_text
Working on CELLTRION
	Working on http://celltrion.com/en/aboutus/ci.do
		Using clnd_text
	Working on http://celltrion.com/en/aboutus/historyAll.do
		Using boilerplate
	Working on h

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using clnd_text
	Working on https://www.ecolab.com/about/industries-we-serve
		Using clnd_text
	Working on https://www.ecolab.com/about/industries-we-serve/retail
		Using clnd_text
	Working on https://www.ecolab.com/about/industries-we-serve/foodservice
		Using clnd_text
	Working on https://www.ecolab.com/about/industries-we-serve/hospitality
		Using clnd_text
Working on fybr
	Working on https://fybr.com/about-us/
		Using boilerplate
Working on Swagelok Company
	Working on https://www.swagelok.com:443/en/About
		Using clnd_text
	Working on https://www.swagelok.com:443/en/About/Our-Values
		Using clnd_text
Working on Chemtreat
	Working on http://www.chemtreat.com/about/
		Using boilerplate
Working on UOP LLC
	Working on https://www.phoenix.edu/about_us/title-ix.html
		Using clnd_text
	Working on https://www.phoenix.edu/about_us/trademark.html


  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using clnd_text
	Working on https://www.phoenix.edu/about_us/about_university_of_phoenix.html
		Using clnd_text
Working on Nutech Ventures
	Working on http://www.nutechventures.org/about-us/
		Using clnd_text
Working on Soraa
	Working on https://www.soraa.com/about
		Using boilerplate
	Working on https://www.soraa.com/learn/technology/lighting-revives-2-million-years-history
		Using clnd_text
Working on Innovation Hammer
	Working on http://ihammerllc.com/aboutus
		Using boilerplate
	Working on http://ihammerllc.com/AboutUs/tabid/88/Default.aspx
		Using boilerplate
Working on Arkema Inc.
	Working on https://www.arkema.com/en/arkema-group/profile/
		Using clnd_text
	Working on https://www.arkema.com/en/arkema-group/history/
		Using boilerplate
	Working on https://www.arkema.com/en/social-responsibility/vision-and-strategy/
		Using clnd_text
	Working on https://www.arkema.com/en/social-responsibility/vision-and-strategy/materiality-analysis/
		Using boilerplate
Working on NanoTech Lubri

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
	Working on https://www.domtar.com/en/who-we-are/about-domtar/recognition
		Using boilerplate
Working on iNanoBio LLC
	Working on http://inanobio.com/aboutus.php
		Using boilerplate
Working on Suncore Photovoltaics
	Working on http://suncoreus.com/about-us/
		Using boilerplate
Working on Basell Polyolefine GmbH
	Working on https://www.lyondellbasell.com/en/about-us/
		Using clnd_text
	Working on https://www.lyondellbasell.com/en/about-us/history/
		Using boilerplate
	Working on https://www.lyondellbasell.com/en/about-us/fortune/
		Using clnd_text
	Working on https://www.lyondellbasell.com/en/about-us/who-we-are/
		Using boilerplate
	Working on https://www.lyondellbasell.com/en/about-us/company-investments/
		Using clnd_text
	Working on https://www.lyondellbasell.com/en/investors/company-earnings/
		Using clnd_text
Working on Andritz Inc.
	Working on https://www.andritz.com/group-en/about-us
		Using clnd_text
	Working on https://www.andritz.com/group-en/about-us/gr-q

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that d

		Using boilerplate
	Working on https://www.eisai.com/index.html?redirect_ir=management/index.html
		Using clnd_text
Working on Avery Dennison Corporation
	Working on http://www.averydennison.com/en/home/about-us.html
		Using clnd_text
	Working on http://www.averydennison.com/en/home/about-us/values.html
		Using clnd_text
	Working on http://www.averydennison.com/en/home/about-us/our-company.html
		Using clnd_text
	Working on http://www.averydennison.com/en/home/about-us/averydennisonfoundation.html
		Using boilerplate
Working on Cree
	Working on https://www.cree.com/about/
		Using clnd_text
	Working on https://www.cree.com/about/mission
		Using clnd_text
	Working on https://www.cree.com/about/history-and-milestones
		Using clnd_text
Working on Honeywell International Inc.
	Working on https://www.honeywell.com/who-we-are/overview
		Using boilerplate
	Working on https://www.honeywell.com/who-we-are/our-history
		Using boilerplate
Working on Northrop Grumman Systems Corporation
	Working o

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using clnd_text
	Working on https://en.lumenco.ca/outdoor-lighting/floodlights/led-low-profile/
		Using clnd_text
Working on Harris Corporation
	Working on https://www.harris.com/about
		Using clnd_text
	Working on https://www.harris.com/about/the-harris-story
		Using clnd_text
	Working on https://www.harris.com/about/our-mission-and-values
		Using clnd_text
	Working on https://www.harris.com/timeline/the-history-of-harris-corporation
		Using clnd_text
Working on Kraton Polymers U.S. LLC
	Working on http://kraton.com/company/about.php
		Using boilerplate
	Working on http://kraton.com/company/values.php
		Using boilerplate
	Working on http://kraton.com/company/history.php
		Using boilerplate
Working on TEKNOR APEX COMPANY
	Working on https://www.teknorapex.com/about
		Using clnd_text
	Working on https://www.teknorapex.com/history
		Using boilerplate
Working on Ostendo Technologies
''
Working on Sanyo Electric Co.
	Working on https://www.panasonic.com/jp/corporate/profile/group-compani

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using clnd_text
	Working on https://www.qorvo.com/about-us/awards
		Using clnd_text
	Working on https://www.qorvo.com/about-us/quality
		Using clnd_text
	Working on https://www.qorvo.com/about-us/our-history
		Using boilerplate
Working on Newdoll Enterprises LLC
	Working on http://newholdllc.com/key-values/
		Using boilerplate
	Working on http://newholdllc.com/who-we-are/
		Using clnd_text
Working on Locus Energy
	Working on https://locusenergy.com:443/about-locus
		Using boilerplate
	Working on https://locusenergy.com:443/about-genscape
		Using boilerplate
Working on Saint-Gobain Ceramics & Plastics
	Working on https://www.ceramicmaterials.saint-gobain.com/history
		Using boilerplate
Working on Inaeris Technologies
	Working on http://www.inaeristech.com/about-us.html
		Using boilerplate
Working on GE-Hitachi Nuclear Energy Americas LLC
	Working on https://nuclear.gepower.com/company-info/about-ge-hitachi
		Using boilerplate
Working on Intermolecular
	Working on https://intermolecula

  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
	Working on https://paymaster.com/p/company/network/
		Using clnd_text
Working on Dr. Reddy's Laboratories Ltd.
	Working on http://www.drreddys.com/investors/shares/equity-dividend-history/
		Using clnd_text
Working on Robert Bosch GmbH
	Working on https://www.bosch.com/our-company/
		Using clnd_text
	Working on https://www.bosch.com/our-company/our-history/
		Using clnd_text
	Working on https://www.bosch.com/research/about-research/
		Using boilerplate
	Working on https://www.bosch.com/research/about-research/roots/
		Using boilerplate
Working on W&Wsens Devices
''
Working on QUALCOMM Incorporated
	Working on https://www.qualcomm.com/company
		Using boilerplate
	Working on https://www.qualcomm.com/company/about
		Using clnd_text
Working on Baxter Healthcare SA
	Working on https://www.baxter.com/our-story
		Using clnd_text
	Working on https://www.baxter.com/our-story/our-history
		Using boilerplate
Working on Cristal USA Inc.
	Working on http://www.cristal.com/chine

  ' that document to Beautiful Soup.' % decoded_markup


		Using clnd_text
	Working on https://www.mitsubishielectric.com/en/about/rd/index.html
		Using clnd_text
	Working on https://www.mitsubishielectric.com/en/about/rd/ip/index.html
		Using clnd_text
	Working on https://www.mitsubishielectric.com/en/about/history/index.html
		Using clnd_text
	Working on https://www.mitsubishielectric.com/en/about/mission/index.html
		Using clnd_text
	Working on https://www.mitsubishielectric.com/en/about/strategy/index.html
		Using boilerplate
	Working on https://www.mitsubishielectric.com/en/about/businesses/index.html
		Using clnd_text
	Working on https://www.mitsubishielectric.com/en/about/rd/advance/index.html
		Using clnd_text
Working on Johnson Matthey PLC
	Working on https://matthey.com/about-us
		Using clnd_text
	Working on https://matthey.com/about-us/our-values
		Using clnd_text
	Working on https://matthey.com/about-us/what-we-do
		Using clnd_text
	Working on https://matthey.com/about-us/our-strategy
		Using boilerplate
	Working on https://matth

  ' that document to Beautiful Soup.' % decoded_markup


		Using clnd_text
	Working on http://taiyo-hd.co.jp/en/about/group/
		Using clnd_text
	Working on http://taiyo-hd.co.jp/en/about/history/
		Using clnd_text
	Working on http://taiyo-hd.co.jp/en/about/overview/
		Using clnd_text
	Working on http://taiyo-hd.co.jp/en/about/philosophy/
		Using clnd_text
	Working on http://taiyo-hd.co.jp/en/group/thou/
		Using clnd_text
	Working on http://taiyo-hd.co.jp/en/group/ink/history/
		Using clnd_text
Working on ACACIA RESEARCH GROUP LLC
	Working on http://acaciaresearch.com/corporate-profile/
		Using clnd_text
Working on Tessera
''
Working on SolarReserve Technology
	Working on https://solarreserve.com/en/about
		Using clnd_text
Working on Alexion Pharmaceuticals
	Working on http://alexion.com/research-development
		Using boilerplate
	Working on http://alexion.com/about-alexion-pharmaceuticals


  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
	Working on http://alexion.com/about-alexion-pharmaceuticals/history
		Using boilerplate
	Working on http://alexion.com/en/about-alexion-pharmaceuticals
		Using boilerplate
Working on Milliken & Company
	Working on http://www.milliken.com/en-us/ourcompany/Pages/company.aspx
		Using clnd_text
	Working on http://www.milliken.com/en-us/ourcompany/about-us/Pages/about-us.aspx
		Using clnd_text
	Working on http://www.milliken.com/en-us/ourcompany/about-us/Pages/our-values.aspx
		Using boilerplate
Working on Zephyr Energy Systems LLC
	Working on http://zephyrenergy.com/?page_id=1057
		Using boilerplate
Working on SanDisk Technologies LLC
	Working on https://www.sandisk.com/about
		Using clnd_text
Working on Rio Grande Valley Sugar Growers
	Working on https://www.rgvsugar.com/history


  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
	Working on https://www.rgvsugar.com/company
		Using boilerplate
Working on Clearside Biomedical
	Working on http://www.clearsidebio.com/aboutus.htm
		Using clnd_text
Working on SD Technologies
	Working on http://www.sd-techs.com/about-us.html
		Using clnd_text
Working on O.B.I. Inc.
	Working on https://obi.org/about-us/
		Using boilerplate
Working on Magnolia Optical Technologies
	Working on http://magnoliaoptical.com/index.php?rt=about/index
		Using boilerplate
Working on Newlans
	Working on http://newlans.com/about.html
		Using boilerplate
Working on Bio-Rad Laboratories
	Working on http://www.bio-rad.com/en-us/corporate/about-bio-rad?ID=1003
		Using clnd_text
Working on Hynix Semiconductor Inc.
	Working on http://skhynix.com/eng/about/global.jsp
		Using boilerplate
	Working on http://skhynix.com/eng/about/history1980.jsp
		Using clnd_text
	Working on http://skhynix.com/eng/about/history2010.jsp
		Using clnd_text
	Working on http://skhynix.com/eng/about/history20

  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
	Working on https://monsanto.com/company/history/
		Using boilerplate
Working on Interface Performance Materials
	Working on http://interfacematerials.com/about/
		Using clnd_text
	Working on http://interfacematerials.com/about/our-company/
		Using clnd_text
	Working on http://interfacematerials.com/about/corporate-strategy-brand-promise/
		Using boilerplate
Working on Battelle Memorial Institute
	Working on https://www.battelle.org/about-us
		Using clnd_text
Working on Pentair Thermal Management LLC
	Working on https://www.nventthermal.com/about-us/index.aspx
		Using boilerplate
Working on Everspin Technologies
	Working on https://www.everspin.com/our-company
		Using clnd_text
Working on Nikon Corporation
	Working on https://www.nikon.com/about/
		Using clnd_text
	Working on https://www.nikon.com/about/corporate/
		Using clnd_text
	Working on https://www.nikon.com/about/sp/universcale/
		Using clnd_text
Working on Entegris
	Working on https://www.entegris.com/conte

  ' that document to Beautiful Soup.' % decoded_markup


		Using clnd_text
Working on FPInnovations
	Working on https://fpinnovations.ca/about-us/Pages/our-network.aspx
		Using clnd_text
	Working on https://fpinnovations.ca/about-us/pages/our-network.aspx
		Using clnd_text
Working on The Eastern Co.
	Working on http://easterncompany.com/corporate-profile.php
		Using boilerplate
Working on Synaptic Research
''
Working on United Microelectronics Corp.
	Working on http://www.umc.com/English/about/a.asp
		Using boilerplate
	Working on http://www.umc.com/English/about/b.asp
		Using clnd_text
	Working on http://www.umc.com/English/about/c.asp
		Using clnd_text
	Working on http://www.umc.com/English/about/index.asp
		Using boilerplate
Working on Suganit Systems
''
Working on AGFA-GEVAERT N.V.
	Working on http://agfa.com/corporate/about-us/history/
		Using boilerplate
	Working on http://agfa.com/corporate/about-us/technology/
		Using clnd_text
Working on Stora Enso Oyj
	Working on https://www.storaenso.com/en/about-stora-enso
		Using boilerplate
Wor

  ' that document to Beautiful Soup.' % decoded_markup


		Using clnd_text
	Working on https://www.csem.ch/About/History
		Using boilerplate
	Working on https://www.csem.ch/About/Start-ups
		Using boilerplate
	Working on https://www.csem.ch/About/Certifications
		Using boilerplate
	Working on https://www.csem.ch/About/Socialresponsibility
		Using boilerplate
	Working on https://www.csem.ch/About/MissionVisionStrategy
		Using boilerplate
	Working on https://www.csem.ch/Vision
		Using boilerplate
Working on NOVA Chemicals (International) S.A.
	Working on http://www.novachem.com/Pages/company/about-us.aspx
		Using boilerplate
Working on NeoPhotonics Corporation
	Working on https://www.neophotonics.com/company/
		Using boilerplate
	Working on https://www.neophotonics.com/company-history/
		Using boilerplate
Working on Tufts Medical Center
	Working on https://www.tuftsmedicalcenter.org/About-Us
		Using clnd_text
	Working on https://www.tuftsmedicalcenter.org/About-Us/History
		Using boilerplate
	Working on https://www.tuftsmedicalcenter.org/About

  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
Working on Smart Planet Technologies
	Working on https://www.smartplanettech.com/about-us/
		Using clnd_text
Working on NOK Corporation
	Working on http://www.nok.co.jp/en/company/profile.html
		Using clnd_text
	Working on http://www.nok.co.jp/en/company/history.html
		Using boilerplate
Working on Terra Caloric
	Working on https://wellconnectgeo.com/about-us/
		Using boilerplate
Working on Integrated Solar Technology
	Working on http://suntegrasolar.com/about-suntegra
		Using boilerplate
Working on Furukawa Electric Co.
	Working on https://www.furukawa.co.jp/en/company/
		Using clnd_text
	Working on https://www.furukawa.co.jp/en/company/outline.html
		Using clnd_text
	Working on https://www.furukawa.co.jp/en/company/history.html
		Using boilerplate
	Working on https://www.furukawa.co.jp/en/company/hereandthere/
		Using clnd_text
	Working on https://www.furukawa.co.jp/en/rd/profile/
		Using clnd_text
	Working on https://www.furukawa.co.jp/en/rd/vision.html
		Using cl

		Using clnd_text
	Working on https://www.celgene.com/about/our-story/
		Using clnd_text
	Working on https://www.celgene.com/about/our-culture/
		Using clnd_text
	Working on https://www.celgene.com/about/our-employees/
		Using clnd_text
	Working on https://www.celgene.com/about/awards-recognition/
		Using clnd_text
Working on Hitachi High-Technologies Corporation
	Working on https://www.hitachi-hightech.com/global/about/
		Using clnd_text
	Working on https://www.hitachi-hightech.com/global/about/data/
		Using boilerplate
	Working on https://www.hitachi-hightech.com/global/about/corporate/
		Using clnd_text
Working on Heat Seal LLC
	Working on https://heatsealco.com/about-ampak
		Using clnd_text
	Working on https://heatsealco.com/about-heatseal
		Using boilerplate
Working on MICROMIDAS
	Working on https://www.originmaterials.com/about-us
		Using clnd_text
Working on DuPont Teijin Films U.S. Limited Partnership
	Working on https://usa.dupontteijinfilms.com/about/
		Using boilerplate
	Wor

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
	Working on https://www.sabic.com/en/about/ehss
		Using clnd_text
	Working on https://www.sabic.com/en/about/our-brand
		Using clnd_text
	Working on https://www.sabic.com/en/about/our-vision
		Using clnd_text
	Working on https://www.sabic.com/en/about/ehss/environment
		Using clnd_text
Working on INFINEUM INTERNATIONAL LIMITED
	Working on https://www.infineum.com/en/about-us/
		Using clnd_text
	Working on https://www.infineum.com/en/about-us/safety/
		Using clnd_text
	Working on https://www.infineum.com/en/about-us/history/
		Using boilerplate
	Working on https://www.infineum.com/en/about-us/overview/
		Using boilerplate
	Working on https://www.infineum.com/en/about-us/our-brand/
		Using clnd_text
	Working on https://www.infineum.com/en/careers/our-teams/
		Using boilerplate
Working on 22nd Century Limited
	Working on http://www.xxiicentury.com/
		Using clnd_text
	Working on http://www.xxiicentury.com/history/
		Using boilerplate
	Working on http://www.xxiicentury.c

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using clnd_text
Working on Zeno Semiconductor Inc
	Working on https://www.zenosemi.com
		Using boilerplate
Working on Allertein Therapeutics
	Working on https://www.allergytherapeutics.com/about-us/overview/
		Using clnd_text
	Working on https://www.allergytherapeutics.com/about-us/our-markets/
		Using clnd_text
	Working on https://www.allergytherapeutics.com/about-us/mission-and-vision/
		Using clnd_text
	Working on https://www.allergytherapeutics.com/about-us/how-we-create-value/
		Using clnd_text
Working on DECA Technologies Inc.
	Working on http://www.decatechnologies.com/about-us/
		Using boilerplate
	Working on http://www.decatechnologies.com/deca-technologies-transforming-electronic-interconnect/decas-wlcsp-its-about-time/
		Using boilerplate
Working on Epizyme
	Working on http://www.epizyme.com/about-us/overview-history/
		Using boilerplate
Working on Pacific Industrial Development Corporation
	Working on http://pidc.com/our-company/corporate-overview
		Using clnd_text
Workin

  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
	Working on http://pdf.com/investors-corporate-overview
		Using boilerplate
Working on Seetron Inc.
	Working on https://www.seetron.com/about.html
		Using boilerplate
Working on Pinnacle Technology
	Working on http://pinnacle-technology.com/about.php
		Using boilerplate
	Working on http://pinnacle-technology.com/software-development.php
		Using boilerplate
Working on The Charles Stark Draper Laboratory
	Working on https://www.draper.com/about
		Using boilerplate
Working on Disney Enterprises
	Working on https://www.thewaltdisneycompany.com/about/
		Using boilerplate
Working on Intuitive Surgical Operations
	Working on https://www.intuitive.com/en/about-us/company
		Using boilerplate
Working on Gestion Ultra International Inc.
	Working on https://www.luminultra.com/about/
		Using clnd_text
Working on ATOMERA INCORPORATED
	Working on http://atomera.com/overview/
		Using boilerplate
Working on Tokyo Ohka Kogyo Co.
	Working on https://www.tok.co.jp/speng/company
		Using

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using clnd_text
Working on GEN-PROBE INCORPORATED
	Working on https://www.hologic.com/about
		Using clnd_text
	Working on https://www.hologic.com/about/medical-aesthetics
		Using clnd_text
	Working on https://www.hologic.com/about/hologic-highlights
		Using clnd_text
	Working on https://www.hologic.com/about/diagnostic-solutions
		Using clnd_text
	Working on https://www.hologic.com/about/gyn-surgical-solutions
		Using boilerplate
	Working on https://www.hologic.com/about/breast-skeletal-health-solutions
		Using clnd_text
Working on Delta Electronics
	Working on http://www.deltaww.com/about/company.aspx?secID=5&pid=0&tid=0&hl=en-US
		Using boilerplate
	Working on http://www.deltaww.com/about/milestone.aspx?secID=5&pid=5&tid=0&hl=en-US
		Using clnd_text
	Working on http://www.deltaww.com/about/innovation2.aspx?secID=5&pid=4&tid=0&hl=en-US
		Using boilerplate
Working on Canon
	Working on https://www.usa.canon.com/internet/portal/us/home/about/
		Using clnd_text
	Working on https://www.u

		Using boilerplate
	Working on https://www.mcelroy.com/aboutus_corp.htm
		Using boilerplate
	Working on https://www.mcelroy.com/en/aboutus_corp.htm
		Using boilerplate
Working on Stanley Electric Co.
	Working on http://www.stanley.co.jp/e/company/
		Using clnd_text
Working on Combined Energies
	Working on http://combined-energies.com/about-us
		Using boilerplate
Working on INVISTA North America S.a.r.l.
	Working on https://www.invista.com/about/who-we-are
		Using boilerplate
	Working on https://www.invista.com/About/Who-We-Are
		Using boilerplate
Working on Transposagen Biopharmaceuticals
	Working on http://www.transposagenbio.com/about-us
		Using boilerplate
	Working on http://www.transposagenbio.com/about-us/our-technology
		Using boilerplate
Working on NetApp
	Working on https://www.netapp.com/us/company/about-netapp/index.aspx
		Using clnd_text
Working on GROW ENERGY
	Working on http://www.growenergy.org/company/
		Using boilerplate
Working on Sangamo BioSciences
	Working on https

  ' that document to Beautiful Soup.' % decoded_markup


		Using clnd_text
	Working on https://www.accuray.com/who-we-are/locations/
		Using clnd_text
	Working on https://www.accuray.com/software/idms-data-management/
		Using boilerplate
Working on Synthetic Genomics
	Working on https://www.syntheticgenomics.com/company/
		Using boilerplate
Working on VINDICO NANOBIO TECHNOLOGY INC.
''
Working on Eaton Corporation
	Working on http://www.eaton.com/us/en-us/company.html
		Using clnd_text
	Working on http://www.eaton.com/us/en-us/company/about-us.html
		Using clnd_text
Working on Phillips 66 Company
	Working on https://www.phillips66.com/about
		Using boilerplate
	Working on https://www.phillips66.com/customers
		Using clnd_text
Working on MICROSOFT TECHNOLOGY LICENSING
	Working on https://www.microsoft.com/en-us/about
		Using clnd_text
Working on SEIKO NPC Corporation
	Working on http://www.npc.co.jp/en/corp/
		Using clnd_text
	Working on http://www.npc.co.jp/en/corp/vision/
		Using clnd_text
	Working on http://www.npc.co.jp/en/corp/profile/
	

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using clnd_text
	Working on http://starsource.com/about/?39.html
		Using clnd_text
	Working on http://starsource.com/about/?43.html
		Using clnd_text
	Working on http://starsource.com/about/?45.html
		Using clnd_text
	Working on http://starsource.com/about/?41.html
		Using clnd_text
	Working on http://starsource.com/about/?33.html
		Using clnd_text
Working on Vascular BioSciences
	Working on http://vascularbiosciences.com/about-vbs/
		Using clnd_text
Working on Electrix
	Working on https://www.electrixillumination.com/architectural-lighting-company
		Using clnd_text
	Working on https://www.electrixillumination.com/architectural-lighting-company/philosophy
		Using clnd_text
	Working on https://www.electrixillumination.com/architectural-lighting-company/our-history
		Using clnd_text
	Working on https://www.electrixillumination.com/architectural-lighting-company/news/2016-lumen-awards
		Using clnd_text
Working on JSR Corporation
	Working on http://jsr.vccs.edu/who_we_are/about/access.as

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
	Working on https://www.gene.com/media/company-information
		Using clnd_text
Working on Two Blades Foundation
	Working on http://2blades.org/about-2blades/
		Using boilerplate
Working on Molecular Rebar Design
''
Working on OPTERRA ENERGY SERVICES
	Working on https://engieservices.us/about-us/
		Using boilerplate
Working on Weatherford Canada Partnership
	Working on https://www.weatherford.com/en/about-us/
		Using boilerplate
	Working on https://www.weatherford.com/en/about-us/companies/
		Using boilerplate
	Working on https://www.weatherford.com/en/about-us/resource-hub/
		Using boilerplate
Working on Instron Corporation 
	Working on http://www.instron.us/en-us/our-company?region=North%20America
		Using clnd_text
Working on Sumitomo Electric Industries
	Working on https://global-sei.com/company/120th/
		Using boilerplate
	Working on https://global-sei.com/company/vision.html
		Using boilerplate
Working on GlassPoint Solar
	Working on https://www.glasspoint.com/abou

  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
	Working on https://www.heraeus.com/us/group/about_heraeus/facts_and_figures/facts_and_figures.aspx
		Using clnd_text
	Working on https://www.heraeus.com/us/group/about_heraeus/about_heraeus_at_a_glance/about_heraeus.aspx
		Using boilerplate
	Working on https://www.heraeus.com/us/group/careers/your_profile/students/students_overview/students.aspx
		Using boilerplate
Working on Pacific Biosciences of California
	Working on https://www.pacb.com/company/about-us/
		Using boilerplate
Working on Adynxx
	Working on http://www.adynxx.com/company/
		Using boilerplate
Working on Amkor Technology
	Working on https://amkor.com/company-history/
		Using boilerplate
Working on AVI BioPharma
	Working on https://www.sarepta.com/our-company
		Using clnd_text
Working on FutureWei Technologies
	Working on https://www.huawei.com/us/about-huawei
		Using clnd_text
	Working on https://www.huawei.com/us/about-huawei/corporate-information
		Using boilerplate
Working on Microchips Biotech
	W

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
Working on Transtron Solutions LLC
	Working on http://www.transtron.com/en/company/naibu.html
		Using boilerplate
	Working on http://www.transtron.com/en/company/rinen.html
		Using boilerplate
	Working on http://www.transtron.com/en/company/branch.html
		Using boilerplate
	Working on http://www.transtron.com/en/company/history.html
		Using boilerplate
	Working on http://www.transtron.com/en/company/outline.html
		Using boilerplate
	Working on http://www.transtron.com/en/company/compliance.html
		Using boilerplate
Working on DePuy Synthes Products
	Working on https://www.depuysynthes.com/about
		Using clnd_text
	Working on https://www.depuysynthes.com/about/corporate-information/california-compliance
		Using clnd_text
Working on Kao Corporation
''
Working on Abbott Point of Care Inc.
''
Working on Canon Kabushiki Kaisha
	Working on https://global.canon/en/about/
		Using clnd_text
	Working on https://global.canon/en/about/index.html
		Using clnd_text
	Working on https

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using clnd_text
Working on Universal Leaf Tobacco Co.
	Working on http://www.universalcorp.com/AboutUs
		Using clnd_text
	Working on http://www.universalcorp.com/AboutUs/History
		Using boilerplate
	Working on http://www.universalcorp.com/AboutUs/NextHundred
		Using boilerplate
	Working on http://www.universalcorp.com/AboutUs/CoreBeliefs
		Using boilerplate
	Working on http://www.universalcorp.com/AboutUs/CorpLeadership
		Using boilerplate
	Working on http://www.universalcorp.com/OurCompany
		Using boilerplate
	Working on http://www.universalcorp.com/OurCompany/UniversalIngredients
		Using boilerplate
Working on HTS
	Working on http://www.htstechnologies.com/about
		Using boilerplate
	Working on http://www.htstechnologies.com/about/
		Using boilerplate
Working on Hunter Douglas Inc.
''
Working on GOJO Industries
	Working on http://gojo.com/en/About-GOJO?sc_lang=en
		Using clnd_text
	Working on http://gojo.com/en/About-GOJO/History?sc_lang=en


  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
Working on Biocon Limited
	Working on http://biocon.com/biocon_aboutus.asp
		Using boilerplate
	Working on http://biocon.com/biocon_aboutus_factsheet.asp
		Using boilerplate
	Working on http://biocon.com/biocon_aboutus_businesses.asp
		Using clnd_text
Working on Xintec Inc.
	Working on http://www.xintec.com.tw/eng/AX_Introduction.aspx
		Using boilerplate
	Working on http://www.xintec.com.tw/eng/IR_Company_Profile.aspx
		Using clnd_text
	Working on http://www.xintec.com.tw/eng/AX_Core-Value_Vision.aspx
		Using clnd_text
Working on Dermazone Solutions
''
Working on SPICE SOLAR
	Working on http://www.spicesolar.com/about-spice-solar/
		Using boilerplate
Working on Incyte Holdings Corporation
	Working on https://www.incyte.com/who-we-are/biopharmaceutical-research
		Using boilerplate
Working on Agena Bioscience
''
Working on Iogen Corporation
	Working on http://iogen.ca/about-iogen/index.html
		Using boilerplate
Working on Silicon Genesis Corporation
	Working on http://

  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
Working on WAFERTECH
	Working on http://www.wafertech.com/en/csr/
		Using clnd_text
	Working on http://www.wafertech.com/en/csr/supply.html
		Using boilerplate
	Working on http://www.wafertech.com/en/about/
		Using clnd_text
	Working on http://www.wafertech.com/en/about/values.html
		Using boilerplate
	Working on http://www.wafertech.com/en/about/quality.html
		Using clnd_text
	Working on http://www.wafertech.com/en/about/mission.html
		Using clnd_text
	Working on http://www.wafertech.com/en/about/supplier.html
		Using boilerplate
	Working on http://www.wafertech.com/en/foundry/company.html
		Using clnd_text
Working on Synopsys
	Working on https://www.synopsys.com/company.html
		Using clnd_text
	Working on https://www.synopsys.com/community/snug/about-snug.html
		Using clnd_text
Working on UT-Battelle
	Working on https://ut-battelle.org/director.shtml
		Using boilerplate
Working on Fluidigm Corporation
	Working on https://www.fluidigm.com/about/aboutfluidigm
		Using

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
Working on Autonomic Materials
	Working on https://www.autonomicmaterials.com/Our-Company
		Using clnd_text
Working on Daikin Industries
	Working on https://www.daikin.com/corporate/index.html
		Using clnd_text
	Working on https://www.daikin.com/csr/environment/vision.html
		Using clnd_text
	Working on https://www.daikin.com/corporate/overview/index.html
		Using clnd_text
	Working on https://www.daikin.com/about/corporate/tic/index.html
		Using clnd_text
Working on Red Hat
	Working on https://www.redhat.com/en/about
		Using clnd_text
	Working on https://www.redhat.com/en/about/company
		Using boilerplate
	Working on https://www.redhat.com/en/about/feedback
		Using clnd_text
	Working on https://www.redhat.com/en/about/newsroom
		Using clnd_text
	Working on https://www.redhat.com/en/about/privacy-policy
		Using boilerplate
Working on Rebellion Photonics
	Working on https://rebellionphotonics.com/about-us.html
		Using clnd_text
Working on RADIANCE SOLAR
	Working on htt

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using clnd_text
Working on Solvay Specialty Polymers USA
	Working on https://www.solvay.com/en/our-company
		Using clnd_text
	Working on https://www.solvay.com/en/company-information/history
		Using clnd_text
	Working on https://www.solvay.com/en/company-information/procurement
		Using clnd_text
	Working on https://www.solvay.com/en/company-information/our-strategy
		Using clnd_text
	Working on https://www.solvay.com/en/company-information/our-businesses
		Using clnd_text
Working on Soraa Laser Diode
	Working on https://www.sldlaser.com/about
		Using clnd_text
Working on Olympus NDT
	Working on https://www.olympus-ims.com/en/about-us/
		Using boilerplate
	Working on https://www.olympus-ims.com/en/training-academy/about/
		Using boilerplate
	Working on https://www.olympus-ims.com/en/support/ethics-corporate-compliance/
		Using boilerplate
Working on St. Jude Medical
	Working on https://www.sjm.com/en/about?clset=af584191-45c9-4201-8740-5409f4cf8bdd%3ab20716c1-c2a6-4e4c-844b-d0dd6899eb

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


		Using clnd_text
	Working on http://www.ma-tek.com/about_us.php?act=history
		Using clnd_text
	Working on http://www.ma-tek.com/about_us.php?act=quality
		Using clnd_text
	Working on http://www.ma-tek.com/about_us.php?act=customer
		Using clnd_text
	Working on http://www.ma-tek.com/about_us.php?act=business
		Using clnd_text
	Working on http://www.ma-tek.com/about_us.php?act=glorious
		Using clnd_text
Working on Tarveda Therapeutics
	Working on http://www.tarvedatx.com/about.html
		Using boilerplate
Working on Sumco Corporation
	Working on http://sumco.com/CompanyPolicy.aspx
		Using boilerplate
Working on TRIDONIC GMBH & CO KG
	Working on https://www.tridonic.com/com/en/about-us.asp
		Using boilerplate
Working on Star Technology and Research
	Working on http://star-tech-inc.com/id4.html
		Using clnd_text
Working on Auterra
	Working on http://auterrainc.com/about.php
		Using boilerplate
Working on BASF Coatings GmbH
	Working on http://basf-coatings.com/global/ecweb/en/content/about_us/

  ' that document to Beautiful Soup.' % decoded_markup


		Using boilerplate
Working on Owens-Brockway Glass Container Inc.
	Working on http://www.o-i.com/About-O-I/
		Using clnd_text
Working on SolAero Technologies Corp.
	Working on https://solaerotech.com/about-us/
		Using boilerplate
Working on Adobe Systems Incorporated
	Working on https://www.adobe.com/about-adobe.html?promoid=2NVQCDBQ&mv=other
		Using clnd_text
Working on SunEdison Semiconductor Limited
''
Working on Zoetis Services LLC
	Working on https://www.zoetis.com/about-us/index.aspx
		Using clnd_text
	Working on https://www.zoetis.com/about-us/history.aspx
		Using boilerplate
	Working on https://www.zoetis.com/about-us/our-story.aspx
		Using clnd_text
	Working on https://www.zoetis.com/about-us/core-beliefs.aspx
		Using clnd_text
	Working on https://www.zoetis.com/about-us/awards/index.aspx
		Using clnd_text
	Working on https://www.zoetis.com/about-us/vision-mission.aspx
		Using clnd_text
	Working on https://www.zoetis.com/about-us/zoetis-at-a-glance.aspx
		Using clnd_text
Work