# extract-pages-from-mongo
SanjayKAroraPhD@gmail.com <br>
November 2018

## Description
This notebook extracts groups of pages from mongodb by firm_name to create firm-centric page output files that can later be topic modeled.  In doing so, it removes repetitive content (e.g., repeated menu items) and garbage content (e.g., improperly parsed HTML code). 

## Change log
v2 fixes some bugs.  

## TODO:
* Why are there so few firms with home pages being output (857?)
* Need to make better use of all pages in the site, e.g., to improve data quality and use additional paragraph data found on non-homepages 

In [2]:
# import data processing and other libraries
import csv
import sys
import requests
import os
import re
import pprint
import pymongo
import traceback
from time import sleep
import requests
import pandas as pd
import io
from IPython.display import display
import time
import numpy as np
from bs4 import BeautifulSoup
import string

In [3]:
MONGODB_DB = "FirmDB_20181116"
MONGODB_COLLECTION = "pages_COMBINED"
CONNECTION_STRING = "mongodb://localhost"

client = pymongo.MongoClient(CONNECTION_STRING)
db = client[MONGODB_DB]
col = db[MONGODB_COLLECTION]

DATA_DIR = '/Users/sarora/dev/EAGER/data/orgs/depth0_pages/'

In [4]:
# gather unique firm_names from mongodb

def get_firm_aggregates ():
    query = [ { "$group": {"_id":"$firm_name" , "number":{"$sum":1}} } ]
    results = col.aggregate(query)

    mongo_dict = {}
    for result in results:
        key = (result['_id'])
        if key:
            mongo_dict[key[0]] = result['number']
        else:
            mongo_dict['NA'] = result['number']
    
    return mongo_dict

results_dict = get_firm_aggregates()
firm_names = results_dict.keys()
pp = pprint.PrettyPrinter()
pp.pprint(firm_names)

[u'Little Kids',
 u'Daylight Solutions',
 u'Biocon Limited',
 u'Sony Corporation',
 u'Magna International Inc.',
 u'ENI Technology',
 u'Honeywell International Inc.',
 u'Aurrion',
 u'The Jackson Laboratory',
 u'Hewlett Packard Enterprise Development LP',
 u'CUMMINS FILTRATION IP',
 u'Custom Electronics Inc.',
 u'H R D CORPORATION',
 u'GENERAL MOTORS LLC',
 u'FLIR Systems',
 u'Sola U.S.A. Inc.',
 u'Chromalox',
 u'Empire Technology Development LLC',
 u'Alcon Research',
 u'Wyatt Technology Corporation',
 u'Southwest Research Institute',
 u'Tokyo Ohka Kogyo Co.',
 u'Samsung Electronics',
 u'Fairchild Semiconductor Corporation',
 u'Supremex Inc.',
 u'Calient Technologies',
 u'Infineon Technologies Americas Corp.',
 u'Ideal Power Inc.',
 u'NuOrtho Surgical',
 u'Easel Biotechnologies',
 u'L-3 Communications Cincinnati Electronics Corporation',
 u'SunEdison Semiconductor Limited',
 u'IDEA TREE',
 u'Solarmer Energy',
 u'Braun Intertec Geothermal',
 u'Fuji Electric Co.',
 u'The Babcock & Wilcox 

 u'TRI ALPHA ENERGY',
 u'Renmatix',
 u'Zygo Corporation',
 u'PACCAR Inc',
 u'Evernote Corporation',
 u'Entegris',
 u'IDEX Health & Science LLC',
 u'Echogen Power Systems',
 u'Lion Copolymer Geismar',
 u'Nanoridge Materials',
 u'Rebellion Photonics',
 u'Big Belly Solar',
 u'Nokomis',
 u'SUMITOMO WIRING SYSTEMS',
 u'Zoetis Services LLC',
 u'ExxonMobil Research and Engineering Company',
 u'VINDICO NANOBIO TECHNOLOGY INC.',
 u'HNO Greenfuels',
 u'Esolar',
 u'Bayer Cropscience AG',
 u'EMD Technologies Inc.',
 u'Newlans',
 u'HIQ SOLAR',
 u'Nederlandse Organisatie voor toegepast-natuurwetenschappelijk onderzoek TNO',
 u'U S MICROPOWER INC',
 u'Hewlett-Packard Development Company',
 u'Arkema Inc.',
 u'Floadia Corporation',
 u'Entech Solar',
 u'Global Solar Water Power Systems',
 u'PortaFire',
 u'Ormat Technologies Inc.',
 u'Calysta',
 u'MCI',
 u'Singular Bio',
 u'Xyleco',
 u'Bigelow Aerospace',
 u'Pacific Light Technologies',
 u'Tesla Nanocoatings',
 u'Integrated Solar Technology',
 u'FULL CIR

In [14]:
# remove html content
def is_javascript (x):
    match_string = "(CDATA|return\s+true|return\s+false|function|\w+\(.*?\);|\w{2,}[\\.|:]+\w{2,}|'\w+':\s+'\w+')|{|}|\r|\n|\/\/"
    # capture CDATA; function declarations; function calls; word sequences separated by a period (e.g., denoting paths)
    regex = re.findall(match_string, x) 
    # check to see if the regex finds some percentage of the words look like javascript patterns
    if (len(regex) / float(len(x.split())) > .10) and len(regex) > 3:
        return True 
    else:
        return False

def clean_page_content (text_list):
    # remove whatever we think is html
    removed_html = filter(lambda x: not( bool(BeautifulSoup(x, "html.parser").find()) ), text_list)
    # remove content that looks like javascript 
    removed_js = filter(lambda x: not (is_javascript(x)), removed_html)
    # add other checks here as needed

    return removed_js
    

# iterate through each firm, get all pages associated with a firm, and produce data structure
# url --> depth
#     --> content (list)
# return data structure
def process_firm (firm_name): 
    regex = '^' + re.escape(firm_name) + '$'
    results = col.find( {"$and":[ {"firm_name": re.compile(regex, re.IGNORECASE) }, {"depth":0}]} )
    
    firm_pages_dict = {}
    depth0_page_text = [] # home page
    for result in results:
        key = (result['url'])
        if key:
            page_dict = {}
            depth = result['depth'][0]
            page_dict['depth'] = depth
            page_dict['domain'] = result['domain'][0]
            page_dict['firm_name'] = firm_name
            clnd_text = clean_page_content(result['full_text'])
            page_dict['clnd_text'] = clnd_text
            firm_pages_dict[key[0]] = page_dict
            
            if depth == 0:
                depth0_page_text = clnd_text
        else:
            continue
            
    return firm_pages_dict, depth0_page_text
# TODO: identify which pieces of content are common across all sites, and remove those
# def clean_content(firm_dict): 

In [22]:
# regex test 
regex = re.findall(r"(CDATA|return\s+true|return\s+false|function|\w+\(.*?\);|\w{2,}[\\.|:]+\w{2,}|'\w+':\s+'\w+|\\')", 
                   "CDATA function contact-us javascript.function linker:autoLink www.littlekidsinc.com fxnCall(param.param); email@dextr.us 'type': 'image' return true return false rev7bynlh\\u00252bvcgrjg\\") # last part is words sequences separated by punct
print (regex)

['CDATA', 'function', 'javascript.function', 'linker:autoLink', 'www.littlekidsinc', 'fxnCall(param.param);', 'dextr.us', "'type': 'image", 'return true', 'return false', 'rev7bynlh\\u00252bvcgrjg']


In [15]:
firm_pages_dict, depth0_page_text = process_firm ('Brother International Corporation')
print (depth0_page_text)

[u'For Home', u'For Business', u'For Home', u'For Business', u'Login', u' Products', u'Products ', u'Products', u'For Home', u'Products', u'For Business', u'U.S.A. | Global Network', u'\xa9 2018 Brother International Corporation ', u'Global Site']


In [16]:
# run
pp = pprint.PrettyPrinter()
for firm_name in firm_names: 
    print ("Working on " + firm_name)
    firm_pages_dict, depth0_page_text = process_firm (firm_name)
    # pp.pprint(depth0_page_text)
    if depth0_page_text: 
        file = re.sub('\.|\/', '_', firm_name) + '.txt'
        with io.open(DATA_DIR + file,'w',encoding='utf8') as f:
            f.write ('\n'.join (depth0_page_text))

Working on Little Kids
Working on Daylight Solutions
Working on Biocon Limited
Working on Sony Corporation
Working on Magna International Inc.
Working on ENI Technology
Working on Honeywell International Inc.
Working on Aurrion
Working on The Jackson Laboratory
Working on Hewlett Packard Enterprise Development LP
Working on CUMMINS FILTRATION IP
Working on Custom Electronics Inc.
Working on H R D CORPORATION
Working on GENERAL MOTORS LLC
Working on FLIR Systems
Working on Sola U.S.A. Inc.
Working on Chromalox
Working on Empire Technology Development LLC
Working on Alcon Research
Working on Wyatt Technology Corporation
Working on Southwest Research Institute
Working on Tokyo Ohka Kogyo Co.
Working on Samsung Electronics
Working on Fairchild Semiconductor Corporation


  ' Beautiful Soup.' % markup)
  ' that document to Beautiful Soup.' % decoded_markup


Working on Supremex Inc.
Working on Calient Technologies
Working on Infineon Technologies Americas Corp.
Working on Ideal Power Inc.
Working on NuOrtho Surgical
Working on Easel Biotechnologies
Working on L-3 Communications Cincinnati Electronics Corporation
Working on SunEdison Semiconductor Limited
Working on IDEA TREE
Working on Solarmer Energy
Working on Braun Intertec Geothermal
Working on Fuji Electric Co.
Working on The Babcock & Wilcox Company
Working on Rima Enterprises
Working on Saint-Gobain Adfors Canada
Working on Agilent Technologies
Working on Toray Industries Inc. 
Working on Christie Digital Systems
Working on SYNTHOMER USA LLC
Working on Nissan North America


  ' Beautiful Soup.' % markup)


Working on Shin-Etsu Chemical Co.
Working on FUJIFILM Dimatix
Working on SUNSALUTER
Working on SII Semiconductor Corporation
Working on Prosoft International
Working on Energen
Working on KR Design House
Working on Akron Polymer Systems
Working on Revivicor
Working on Caribou Biosciences
Working on Syngenta Participations AG
Working on BERKEN ENERGY LLC
Working on ADASA INC.
Working on Abbott Molecular Inc.
Working on PLEX LLC
Working on Carver Scientific
Working on E. Tech Incorporation
Working on Canon
Working on Renewable Power Conversion
Working on Thermo Fisher Scientific Inc.
Working on Toray Plastics (America)
Working on Algenol Biotech LLC
Working on Cadence Design Systems
Working on ACACIA RESEARCH GROUP LLC
Working on Cabot Corporation
Working on Genesco Inc.
Working on CARBO Ceramics Inc.
Working on Metabolix
Working on TEKNOR APEX COMPANY
Working on One Earth Designs Inc.
Working on Commonwealth Scientific & Industrial Research Organization
Working on Siemens Industry
Worki

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


Working on Alliance for Sustainable Energy
Working on Adobe Systems Incorporated
Working on GOAL ZERO LLC
Working on Handstand Innovations
Working on FRONT EDGE TECHNOLOGY INC.
Working on FUJIFILM Corporation
Working on Bruin Biometrics
Working on Lockheed Martin Corporation
Working on MonoSol
Working on nLIGHT
Working on Acorn Technologies
Working on Energysolutions
Working on Aerogen
Working on Zyvex Labs
Working on HGST NETHERLANDS B.V.
Working on BioNano Genomics
Working on Revera
Working on FEI Company
Working on Moxtek
Working on DNA Twopointo
Working on VMware
Working on Analog Devices
Working on SOL-ELECTRICA
Working on Toyota Motor Engineering & Manufacturing North America
Working on Deep Science
Working on Saint-Gobain Performance Plastics Corporation
Working on Macronix International Co.
Working on DIC Corporation
Working on CoolEarth Solar
Working on Express Imaging Systems
Working on Proterra Inc.
Working on Thorn Bioscience LLC
Working on Ortho-Clinical Diagnostics
Workin

  ' that document to Beautiful Soup.' % decoded_markup


Working on ASML Netherlands B.V.
Working on Hysitron Incorporated
Working on Eaton Corporation
Working on Medical Diagnostic Laboratories
Working on Bausch & Lomb Incorporated
Working on Wentworth Laboratories
Working on Gilead Sciences
Working on Heidelberger Druckmaschinen AG
Working on SD Technologies
Working on Abbott Point of Care Inc.
Working on Antaya Technologies Corporation
Working on Coleman Cable
Working on CMC ICOS BIOLOGICS
Working on Centre de Recherche Industrielle du Quebec
Working on Anelva Corporation
Working on Stablcor Technology
Working on Claret Medical
Working on Angstron Materials
Working on Veracyte
Working on Magnachip Semiconductor
Working on Reynolds Technologies
Working on Danisco US Inc.
Working on Bristol-Myers Squibb Company
Working on Polaris Products LLC
Working on Global Filtration Systems
Working on GE-Hitachi Nuclear Energy Americas LLC
Working on Butamax(TM) Advanced Biofuels LLC
Working on Fike Corporation
Working on Waters Technologies Corporatio

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup



Working on Transilwrap Company
Working on Quantum Devices
Working on Vascular BioSciences
Working on Longhorn Vaccines and Diagnostics
Working on AGFA-GEVAERT N.V.
Working on Nordic Technologies
Working on Optodot Corporation
Working on Morpho
Working on MERCK SHARP & DOHME CORP.
Working on Ajinomoto Althea
Working on Combined Energies
Working on Michigan Biotechnology Institute


  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


Working on ProNAi Therapeutics
Working on Cbrite Inc.
Working on PetraSolar
Working on WHATSAPP INC.
Working on Virent
Working on Ethicon Endo-Surgery
Working on Iogen Corporation
Working on McElroy Manufacturing
Working on NeoVision LLC
Working on Astech
Working on DEKA Products Limited Partnership
Working on Taiwan Semiconductor Manufacturing Company
Working on InView Technology Corporation
Working on Hunt Energy Enterprises LLC
Working on Rolls-Royce PLC
Working on BASF
Working on SRG Global
Working on Theraclone Sciences
Working on St. Jude Medical
Working on SEIKO NPC Corporation
Working on Areesys Technologies
Working on OLYMPUS CORPORATION
Working on Two Blades Foundation
Working on Senaya
Working on RES USA
Working on EXOS LLC
Working on BRIGHTLEAF TECHNOLOGIES INC.
Working on Ibis Biosciences
Working on Archer Daniels Midland Company
Working on Avertech
Working on Oculus VR
Working on Silicon Space Technology Corp.
Working on CP Kelco U.S.
Working on Cleanvantage LLC
Working o

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


Working on Intermolecular
Working on Siemens Corporation
Working on Momentive Performance Materials Inc.
Working on Baxter International Inc.
Working on Materia
Working on Pfizer Inc.
Working on Maxout Renewables
Working on Ube Industries
Working on EPCOS AG


  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


Working on Sick AG
Working on Siluria Technologies
Working on Adynxx
Working on Dialogic Corporation
Working on Marine Polymer Technologies
Working on NANOBIO CORPORATION
Working on Seiko Epson Corporation
Working on UTC FIRE & SECURITY CORPORATION
Working on NANO CELL SYSTEMS
Working on Illinois Tool Works
Working on SunRun
Working on ABB Technology Ltd.
Working on KUBIX INC.
Working on Gemex Systems
Working on King Electric Vehicles Inc.
Working on Global OLED Technology LLC
Working on Gracenote
Working on LG NANOH2O
Working on Battelle Memorial Institute
Working on American Piledriving Equipment
Working on Wizard Labs
Working on Genentech
Working on Sysmex Corporation
Working on GCP Applied Technologies Inc.


  ' that document to Beautiful Soup.' % decoded_markup


Working on KINO LLC
Working on TECNIUM


  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' Beautiful Soup.' % markup)
  ' that document to Beautiful Soup.' % decoded_markup


Working on Silicon Storage Technology
Working on Molecular Rebar Design
Working on Uniseal Solutions Inc.
Working on MicroLink Devices
Working on Bioneer Corporation
Working on D-Wave Systems Inc.
Working on Nantero
Working on Amberwave Inc.
Working on Thoratec Corporation
Working on Selecta Biosciences
Working on Child Laboratories Inc.
Working on AgroFresh Inc.
Working on Arpin Renewable Energy


  ' that document to Beautiful Soup.' % decoded_markup


Working on Xerox Corporation
Working on Dialight Corporation
Working on SPC International
Working on Nationwide Children's Hospital
Working on Perkinelmer Holdings
Working on American Air Liquide
Working on Brewer Science Inc.
Working on Solvay Specialty Polymers USA
Working on UWM Research Foundation
Working on Grandis Inc.
Working on BASF Coatings GmbH
Working on Pacific Biosciences of California
Working on Genisphere
Working on Achushnet Company
Working on Midrex Technologies
Working on TT Technologies


  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


Working on Newdoll Enterprises LLC
Working on Carbon3D
Working on AG ENERGY SOLUTIONS
Working on Terra Caloric
Working on Applied Membrane Technologies
Working on Lintec Corporation
Working on Kansai Paint Co.
Working on Nanotech Biomachines
Working on Egenera
Working on 3M Innovative Properties Company


  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


Working on MATERIALS ANALYSIS TECHNOLOGY (US) CORP.
Working on SunCulture Solar Inc.
Working on KYOCERA DOCUMENT SOLUTIONS INC.
Working on Olympus NDT
Working on Omron Corporation
Working on Red Hat
Working on Takara Bio Inc.
Working on MirTech
Working on Momentive Performance Materials GmbH
Working on Pinnacle Technology
Working on Elwha LLC
Working on Inaeris Technologies
Working on Genomatica
Working on Henkel AG & Co. KGaA
Working on Dentsply International
Working on Owens-Brockway Glass Container Inc.
Working on SolarWorld Americas Inc.
Working on Applied Nanostructures
Working on GENERAL ELECTRIC COMPANY
Working on KT Corporation
Working on Cisco Technology
Working on Avantor Performance Materials
Working on Kaneka Corporation
Working on Dexerials Corporation
Working on Angiotech Pharmaceuticals (US)
Working on Dolby Laboratories Licensing Corporation


  ' that document to Beautiful Soup.' % decoded_markup


Working on Authenex
Working on Canon U.S. Life Sciences
Working on Garland Industries
Working on Andritz Inc.
Working on AGC Flat Glass North America
Working on Babcock Power Services
Working on Vizio Inc.
Working on Arkival Technology Corp.
Working on Newport Corporation
Working on iNanoBio LLC
Working on AltaRock Energy
Working on Unifrax I LLC
Working on Liquidia Technologies
Working on Lawrence Livermore National Security
Working on Applied Genetic Technologies Corporation
Working on Research Triangle Institute
Working on SolarLego Inc.
Working on HARMAN INTERNATIONAL INDUSTRIES
Working on Brookhaven Science Associates LLC 
Working on nanoComposix
Working on Nordson Corporation


  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


Working on Federal Signal Corporation
Working on Imperial Innovations Limited
Working on Crestovo LLC
Working on OrbusNeich Medical
Working on Minebea Co.
Working on Adhesives Research
Working on KLA-Tencor Corporation
Working on Confluence Energy
Working on CP KELCO APS
Working on Roche Molecular Systems
Working on HAMAMATSU PHOTONICS K.K.


  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


Working on Sanyo Electric Co.
Working on Hitachi High-Technologies Corporation
Working on Cedar Ridge Research
Working on Invensas Corporation
Working on Cambridge Electronics
Working on HRL Laboratories
Working on Verliant Energy
Working on Northwest Biotherapeutics
Working on KJ BIOSCIENCES LLC
Working on Johnson & Johnson Consumer Companies
Working on SNS NANO FIBER TECHNOLOGY
Working on CNH Industrial America LLC
Working on Osram Sylvania Inc.
Working on Envisionit LLC
Working on Sun Chemical Corporation
Working on Kajima Corporation
Working on Mitsubishi Metal Corporation
Working on TP Solar
Working on ATTOSTAT
Working on Wikipad
Working on Sunlight Photonics Inc.
Working on Pulse Therapeutics
Working on Sensor Electronic Technology
Working on Hon Hai Precision Industry Co.
Working on Extech/Exterior Technologies
Working on NanoTech Lubricants
Working on The Samuel Roberts Noble Foundation
Working on WiSys Technology Foundation
Working on ITN Energy Systems
Working on Reliance Con

  ' that document to Beautiful Soup.' % decoded_markup


Working on Renmatix
Working on Zygo Corporation
Working on PACCAR Inc
Working on Evernote Corporation
Working on Entegris
Working on IDEX Health & Science LLC
Working on Echogen Power Systems
Working on Lion Copolymer Geismar
Working on Nanoridge Materials
Working on Rebellion Photonics
Working on Big Belly Solar
Working on Nokomis
Working on SUMITOMO WIRING SYSTEMS
Working on Zoetis Services LLC
Working on ExxonMobil Research and Engineering Company
Working on VINDICO NANOBIO TECHNOLOGY INC.
Working on HNO Greenfuels
Working on Esolar
Working on Bayer Cropscience AG
Working on EMD Technologies Inc.
Working on Newlans
Working on HIQ SOLAR
Working on Nederlandse Organisatie voor toegepast-natuurwetenschappelijk onderzoek TNO
Working on U S MICROPOWER INC
Working on Hewlett-Packard Development Company
Working on Arkema Inc.
Working on Floadia Corporation
Working on Entech Solar
Working on Global Solar Water Power Systems
Working on PortaFire
Working on Ormat Technologies Inc.
Working on 

  ' that document to Beautiful Soup.' % decoded_markup


Working on Integrated Solar Technology
Working on FULL CIRCLE BIOCHAR
Working on bioTheranostics
Working on UChicago Argonne
Working on Cetac Technologies Inc.
Working on fybr
Working on Da Yu Enterprises
Working on Adtran
Working on HOWARD INDUSTRIES
Working on Bitrode Corporation
Working on Sundrop Fuels
Working on AT&T Corporation
Working on Quallion LLC
Working on Ignis Innovation
Working on MONTEREY RESEARCH
Working on Stora Enso Oyj
Working on Proton Power
Working on Ambature
Working on Dermazone Solutions
Working on Industrial Technology Research Institute
Working on First Solar
Working on Inphenix
Working on RF Micro Devices
Working on Fujitsu Semiconductor Limited
Working on Meso Scale Technologies
Working on DSP Group LTD.
Working on Evri
Working on Abbott Cardiovascular Systems Inc.
Working on Solan
Working on LG Display Co.
Working on Bostik
Working on George Mason Research Foundation
Working on Soliton Lasers
Working on LIQUID X PRINTED METALS
Working on GOLBA LLC
Working 

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


Working on Microsurge
Working on GLIKNIK INC.
Working on Luna Innovations Incorporated
Working on Eastman Kodak Company
Working on Mainstream Engineering Corp.
Working on Deployable Space Systems
Working on OSI Optoelectronics
Working on Thorlabs
Working on Microsemi SoC Corporation
Working on Transtron Solutions LLC
Working on NOK Corporation
Working on Bi-Modal Corporation
Working on Wenger Corporation
Working on Tufts Medical Center
Working on Chevron Oronitz Company LLC
Working on Hitachi Metals
Working on JNC Corporation
Working on Synaptic Research
Working on k-Space Associates
Working on Intuitive Surgical Operations
Working on FutureWei Technologies
Working on GLYCON LLC
Working on S. C. Johnson & Son
Working on Solar-Tectic LLC
Working on Cree
Working on Arcturus Therapeutics
Working on T.H.E.M.
Working on SolAero Technologies Corp.
Working on Banpil Photonics
Working on Selkermetrics
Working on Atleisure LLC
Working on Kiverdi
Working on STANDARD ALCOHOL COMPANY OF AMERICA
Wo

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


Working on KOLON INDUSTRIES
Working on Lam Research Corporation
Working on Heraeus Precious Metals North America Conshohocken LLC
Working on Medicis Pharmaceutical Corporation
Working on Robert Bosch GmbH
Working on Monolithe Semiconductor Inc.
Working on Senga Advisors
Working on MICROSOFT TECHNOLOGY LICENSING
Working on United Microelectronics Corp.
Working on Kureha Corporation
Working on Polysar Corporation
Working on Ethox Chemicals
Working on Mumetel
Working on Universal Display Corporation
Working on Biotta LLC
Working on SolaBlock LLC
Working on Redwood Systems
Working on Astex Pharmaceuticals
Working on Novus Energy LLC
Working on DECA Technologies Inc.
Working on Pioneer Energy
Working on Performance Plants
Working on Sinewatts
Working on ESCAPE THERAPEUTICS
Working on OmniVision Technologies
Working on X DEVELOPMENT LLC
Working on Monsanto Technology LLC
Working on Kinetech Power Company LLC
Working on Formula Plastics
Working on Green Solar Transportation
Working on eNow
Wo

  ' that document to Beautiful Soup.' % decoded_markup


Working on OTSUKA PHARMACEUTICAL CO.
Working on ACUCELA INC.
Working on Smith & Nephew
Working on Eastman Chemical Company
Working on UNISANTIS ELECTRONICS SINGAPORE PTE. LTD.
Working on Neural Signals
Working on Vorbeck Materials Corporation
Working on Mattson Technology
Working on S&S X-Ray Products
Working on Nanoquantum Sciences
Working on Armageddon Energy Inc.
Working on PAX Scientific
Working on UCB Pharma S.A.
Working on INFINEUM INTERNATIONAL LIMITED
Working on ExxonMobil Chemical Patents Inc.
Working on Koch Biological Solutions
Working on INTEVAC
Working on Texas Research International
Working on Blue Sea Systems
Working on Electronic Warfare Associates
Working on Ricoh Company Limited
Working on SunPower Corporation
Working on Nanoco Technologies
Working on SAMSUNG DISPLAY CO.
Working on E Ink Corporation
Working on Nanocopocia
Working on GENCO SCIENCES LLC
Working on Amtech Systems
Working on mVerify Corporation
Working on Dana Corporation
Working on Sanofi
Working on Elec

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


Working on LG Electronics Inc.
Working on Roll-N-Lock Corporation
Working on SunEdison
Working on Synthetic Genomics
Working on TRANE INTERNATIONAL INC.
Working on Propagation Research Associates
Working on Industrial Science & Technology Network
Working on SOLENA FUELS CORPORATION
Working on II-VI Incorporated
Working on SELMAN AND ASSOCIATES
Working on ATC Technologies
Working on Allison Transmission
Working on Kerr Corporation
Working on Cura Vac
Working on Siemens Aktiengesellschaft 
Working on CSEM Centre Suisse d'Electronique et de Microtechnique SA-Recherche et Developpement
Working on Sinton Consulting
Working on Alnylam Pharmaceuticals
Working on PeterBrod Corp.
Working on Daikin Industries
Working on NeoPhotonics Corporation
Working on ECOSYNTHETIX LTD.
Working on ConocoPhillips Company
Working on AC International Inc.
Working on PolyOne Corporation
Working on Nanospectra Biosciences
Working on Skidmore
Working on SABIC GLOBAL TECHNOLOGIES B.V.
Working on GOJO Industries
Work

  ' that document to Beautiful Soup.' % decoded_markup


Working on Quantapore
Working on Alcotek
Working on Becton
Working on EMC Corporation
Working on Microchips Biotech
Working on Interface Performance Materials
Working on Paratek Pharmaceuticals
Working on NLT TECHNOLOGIES
Working on Sima Therapeutics
Working on Tyco Electronics Corporation
Working on MILLENIUM SYNTHFUELS CORPORATION
Working on Manta Instruments
Working on Plasma-Therm
Working on GEN-PROBE INCORPORATED
Working on Agrivida
Working on Elenion Technologies
Working on Alcatel Lucent
Working on Galectin Therapeutics
Working on APPLIED STEMCELL
Working on MIETAMARK GENETICS
Working on AIR PRODUCTS AND CHEMICALS
Working on Interez
Working on Uni-Charm Corporation
Working on Kemira OY
Working on Gestion Ultra International Inc.
Working on Nova Technologies
Working on Cook Biotech Incorporated
Working on Helios Focus LLC
Working on Acer Incorporated
Working on B.G. Negev Technologies and Applications Ltd.
Working on The Goodyear Tire & Rubber Company
Working on Basell Polyolefin

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


Working on VIASAT INC.
Working on Bell Helicopter Textron Inc.
Working on Starsource Scientific LLC
Working on AVOGY
Working on Schneider Electric USA
Working on WAFERTECH
Working on Hunter Douglas Inc.
Working on Dymax Corporation
Working on ASCENT SOLAR TECHNOLOGIES
Working on Conoco Inc.
Working on Cardiva Medical
Working on Columbia Insurance Company
Working on OPTERRA ENERGY SERVICES
Working on Altivera
Working on AROG PHARMACEUTICALS
Working on GTherm
Working on GE Healthcare Limited
Working on Buckman Laboratories International
Working on Abengoa Bioenergy New Technologies
Working on Ford Global Technologies
Working on Bio-Rad Laboratories
Working on Smart Planet Technologies
Working on Arrowhead Center
Working on Wostec
Working on Piksel
Working on Forest Concepts
Working on Delta Electronics
Working on Humanetics Corporation
Working on Fianium Ltd.
Working on Ivoclar Vivadent AG
Working on Parion Sciences
Working on Biosphere Medical
Working on GOSOLARLIFE
Working on Biosense 

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


Working on Wine.com
Working on SEEK THERMAL
Working on FLOW CONTROL LLC.
Working on Infineon Technologies AG
Working on Meyer Tool
Working on Magnolia Solar
Working on Micro Cooling Concepts
Working on Toda Kogyo Corporation
Working on Dell Software Inc.
Working on Atom Nanoelectronics
Working on Landauer
Working on International Business Machines Corporation
Working on Advanced Silicon Group
Working on Lake Lite
Working on CyboEnergy
Working on Marine Biotech Inc.
Working on ADVANCED INNOVATION CENTER LLC
Working on Maxim Integrated Products
Working on Midori USA
Working on Eisai Co.
Working on NICHIA CORPORATION
Working on Takeda Pharmaceutical Company Limited
Working on Siliconware Precision Industries Co.
Working on Nexcom Technology
Working on Denso Corporation
Working on Kraton Polymers U.S. LLC
Working on SPICE SOLAR
Working on Mediatek Inc.
Working on Continental Manufacturing
Working on TMC Corporation
Working on AMPT
Working on Narsys
Working on IMRA America
Working on Sun Dr

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


Working on Johnson Matthey PLC
Working on Ferro Corporation
Working on BTU International
Working on Ziptronix
Working on LifeNet Health
Working on New Technology Ventures
Working on Boston Scientific Scimed
Working on Greyrock Energy
Working on Courtagen Life Sciences
Working on ATOMERA INCORPORATED
Working on BASF Enzymes LLC
Working on Alpha and Omega Semiconductor Incorporated
Working on Quest Diagnostics Investments Incorporated
Working on Murata Manufacturing Co.
Working on O.B.I. Inc.
Working on Rhodia Operations
Working on Foret Plasma Labs
Working on Tigo Energy
Working on United Technologies Corporation
Working on Sila Nanotechnologies
Working on Lof Solar Corporation
Working on Neumedicines
Working on OAS Design Group
Working on Sagacious Investment Group L.L.C.
Working on System Biosciences
Working on Renewable Algal Energy
Working on SanDisk Technologies LLC
Working on Seetron Inc.
Working on Baxter Healthcare SA
Working on Complete Genomics
Working on Schneider Electric So

  ' that document to Beautiful Soup.' % decoded_markup
  ' that document to Beautiful Soup.' % decoded_markup


Working on Strategic Solar Energy
Working on GlaxoSmithKline Biologicals
Working on iBio
Working on Voxtel
Working on ENI S.p.A.
Working on Sigma-Aldrich Co. LLC
Working on Bridgestone Corporation
Working on Turtle Beach Corporation
Working on Palo Alto Research Center Incorporated
Working on Scientific Design Company
Working on SAE Magnetics (HK) Ltd.
Working on JFE STEEL CORPORATION
Working on Silicon Genesis Corporation
Working on JAC Products Inc.
Working on Vaxiion Therapeutics
Working on Johns Manville
Working on Watlow Electric Manufacturing Company
Working on Rockwell Collins
Working on Transposagen Biopharmaceuticals
Working on Conexant Systems
Working on Wave Energy Conversion Corporation of America
Working on Hadasit Medical Research Services & Development Company Ltd.
Working on Nippon Chemi-Con Corporation
Working on STMicroelectronics Limited
Working on Coriant Advanced Technology
Working on Dresser-Rand Company
Working on Bluestar Silicones France
Working on Finisar Corp

  ' that document to Beautiful Soup.' % decoded_markup


Working on Nanomix
Working on InVisage Technologies
Working on Atonometrics
Working on ARBOR THERAPEUTICS
Working on Carl Zeiss Microscopy GmbH
Working on Applied Biosystems
Working on THE BOARD INSTITUTE
Working on Relypsa
Working on Ticona LLC
Working on Kao Corporation
Working on Xintec Inc.
Working on MAHLE International GmbH
Working on POLAR LIGHT TECHNOLOGIES AB
Working on Epizyme
Working on Sanken Electric Co.
Working on Alexion Pharmaceuticals
Working on Yissum
Working on WestPoint Home
Working on Georgia-Pacific Gypsum LLC
Working on Korea Kumho Petrochemical Co.
Working on Richtek Technology Corporation
Working on Avery Dennison Corporation
Working on Gtech Corporation


  ' that document to Beautiful Soup.' % decoded_markup


Working on AccuRay Corporation
Working on Gram Power
Working on Teradata US
Working on Rockwell Automation Technologies
Working on NanoOncology
Working on OFS Fitel
Working on ZIH Corp.
Working on DRS Network & Imaging Systems
Working on Travis Industries
Working on Lumentum Operations LLC
