[WebSearcher](https://github.com/gitronald/WebSearcher) is a Python package that facilitates obtaining and parsing search results from Google text search. Compared to `webbotparser`, it supports parsing more diverse results (ads, knowledge boxes, etc.), but only Google text results (for now). We can utilize its parsing capabilities on search result pages downloaded using [WebBot](https://github.com/gesiscss/WebBot) as follows:

## Installation

In [1]:
%pip install git+https://github.com/gitronald/WebSearcher@dev # more up-to-date than the PyPi package

Collecting git+https://github.com/gitronald/WebSearcher@dev
  Cloning https://github.com/gitronald/WebSearcher (to revision dev) to /private/var/folders/xh/r84_yqtx7pj262lm0s8w56640000gn/T/pip-req-build-683u2ft9
  Running command git clone --filter=blob:none --quiet https://github.com/gitronald/WebSearcher /private/var/folders/xh/r84_yqtx7pj262lm0s8w56640000gn/T/pip-req-build-683u2ft9
  Running command git checkout -b dev --track origin/dev
  Switched to a new branch 'dev'
  branch 'dev' set up to track 'origin/dev'.
  Resolved https://github.com/gitronald/WebSearcher to commit 1efdc35df7598724c74afa5e5671d7262555b589
  Preparing metadata (setup.py) ... [?25ldone
Note: you may need to restart the kernel to use updated packages.


## Usage

Initialze the WebSearcher

In [2]:
import WebSearcher as ws
se = ws.SearchEngine()

WebSearcher doesn't (yet) have a function to load external HTML for parsing. So we do this manually:

In [3]:
import os

filename = 'www.google.com_climate change_text_2023-01-30_14_18_37'
directory = 'testdata'
fp = os.path.join(directory, f'{filename}.html')
with open(fp, 'r') as file:
    se.html = file.read()
    se.serp_id = filename

Parse the results using the parser provided by WebSearcher:

In [4]:
se.parse_results()
se.results[0]

{'type': 'general',
 'sub_rank': 0,
 'title': 'What Is Climate Change? | NRDC',
 'url': 'https://www.nrdc.org/stories/what-climate-change',
 'cite': 'https://www.nrdc.org › stories › w...',
 'details': '',
 'cmpt_rank': 0,
 'qry': 'climate change',
 'lang': 'de',
 'serp_id': 'www.google.com_climate change_text_2023-01-30_14_18_37',
 'serp_rank': 0,
 'lhs_bar': False}

We can also convert them to a Pandas dataframe:

In [5]:
import pandas as pd

pd.DataFrame(se.results)

Unnamed: 0,type,sub_rank,title,url,cite,details,cmpt_rank,qry,lang,serp_id,serp_rank,lhs_bar
0,general,0,What Is Climate Change? | NRDC,https://www.nrdc.org/stories/what-climate-change,https://www.nrdc.org › stories › w...,,0,climate change,de,www.google.com_climate change_text_2023-01-30_...,0,False
1,general,0,OECD Climate Change - OECD,https://www.oecd.org/climate-change/,https://www.oecd.org › climate-cha...,,1,climate change,de,www.google.com_climate change_text_2023-01-30_...,1,False
2,general,0,Environment and climate change - UNICEF,https://www.unicef.org/environment-and-climate...,https://www.unicef.org › environm...,,2,climate change,de,www.google.com_climate change_text_2023-01-30_...,2,False
3,general,0,Climate change - United Nations Population Fund,https://www.unfpa.org/climate-change,https://www.unfpa.org › climate-ch...,,3,climate change,de,www.google.com_climate change_text_2023-01-30_...,3,False
4,general,0,"Department of Climate Change, Energy, the Envi...",https://www.dcceew.gov.au/,https://www.dcceew.gov.au,,4,climate change,de,www.google.com_climate change_text_2023-01-30_...,4,False
5,general,0,Climate Change - GlobalChange.gov,https://www.globalchange.gov/climate-change,https://www.globalchange.gov › cli...,,5,climate change,de,www.google.com_climate change_text_2023-01-30_...,5,False
6,general,0,Global warming and climate change effects,https://www.nationalgeographic.com/environment...,https://www.nationalgeographic.com › ...,,6,climate change,de,www.google.com_climate change_text_2023-01-30_...,6,False
7,general,0,CARE Climate Change,https://careclimatechange.org/,https://careclimatechange.org,,7,climate change,de,www.google.com_climate change_text_2023-01-30_...,7,False
8,general,0,Rising to the Climate Change Challenge,https://www.wri.org/climate,https://www.wri.org › climate,,8,climate change,de,www.google.com_climate change_text_2023-01-30_...,8,False
9,general,0,Climate change - DW,https://www.dw.com/en/climate-change/t-18614374,https://www.dw.com › climate-cha...,,9,climate change,de,www.google.com_climate change_text_2023-01-30_...,9,False
