Not another Google searching library. Just kidding - it is.
Tested on Kali Linux v2024.2 (64-bit).
Made for educational purposes. I hope it will help!
Future plans:
- ability to set (rotate) search parameters, user agents, and proxies without the need for reinitialization.
pip3 install nagooglesearch
pip3 install --upgrade nagooglesearch
Run the following commands:
git clone https://github.com/ivan-sincek/nagooglesearch && cd nagooglesearch
python3 -m pip install --upgrade build
python3 -m build
python3 -m pip install dist/nagooglesearch-7.4-py3-none-any.whl
Default values:
nagooglesearch.SearchClient(
tld = "com",
homepage_parameters = {
"btnK": "Google+Search",
"source": "hp"
},
search_parameters = {},
user_agent = "",
proxy = "",
max_results = 100,
min_sleep = 8,
max_sleep = 18,
debug = False
)
Only domains without they keyword google
and not ending with the keyword goo.gl
are accepted as valid results. The final output is a unique and sorted list of URLs.
Example:
from nagooglesearch import nagooglesearch
# the following query string parameters are set only if 'start' query string parameter is not set or is equal to zero
# simulate a homepage search
homepage_parameters = {
"btnK": "Google+Search",
"source": "hp"
}
# search the internet for additional query string parameters
search_parameters = {
"q": "site:*.example.com intext:password", # search query
"tbs": "li:1", # specify 'li:1' for verbatim search, i.e., do not search alternate spellings, etc.
"hl": "en",
"lr": "lang_en",
"cr": "countryUS",
"filter": "0", # specify '0' to display hidden results
"safe": "images", # specify 'images' to turn off safe search, or specify 'active' to turn on safe search
"num": "80" # number of results per page
}
client = nagooglesearch.SearchClient(
tld = "com", # top level domain, e.g., www.google.com or www.google.hr
homepage_parameters = homepage_parameters, # 'search_parameters' will override 'homepage_parameters'
search_parameters = search_parameters,
user_agent = "curl/3.30.1", # will set a random user agent if not set or is empty
proxy = "socks5://127.0.0.1:9050", # supported URL schemes are 'http[s], 'socks4[h]', and 'socks5[h]'
max_results = 200, # maximum unique urls to return
min_sleep = 15, # minimum sleep between page requests
max_sleep = 30, # maximum sleep between page requests
debug = True # show debug output
)
urls = client.search()
if client.get_error() == "INIT_ERROR":
print("[ Initialization Error ]")
# do something
elif client.get_error() == "REQUESTS_EXCEPTION":
print("[ Requests Exception ]")
# do something
elif client.get_error() == "429_TOO_MANY_REQUESTS":
print("[ HTTP 429 Too Many Requests ]")
# do something
for url in urls:
print(url)
# do something
If max_results
is set to, e.g., 200
and num
is set to, e.g., 80
, then, maximum unique urls that could be returned could actually reach 240
.
Check the list of user agents here. For more user agents, check scrapeops.io.
Example:
from nagooglesearch import nagooglesearch
urls = nagooglesearch.SearchClient(search_parameters = {"q": "site:*.example.com intext:password"}).search()
# do something
Example (e.g., do not show results older than 6 months):
from nagooglesearch import nagooglesearch
import dateutil.relativedelta as relativedelta
def get_tbs(months):
today = datetime.datetime.today()
return nagooglesearch.get_tbs(today, today - relativedelta.relativedelta(months = months))
search_parameters = {
"tbs": get_tbs(6)
}
# do something
Example (get a random user agent):
from nagooglesearch import nagooglesearch
user_agent = nagooglesearch.get_random_user_agent()
print(user_agent)
# do something
Example (get all user agents):
from nagooglesearch import nagooglesearch
user_agents = nagooglesearch.get_all_user_agents()
print(user_agents)
# do something