loop on multiple Analyze website write on same variable object #49

mayouf · 2020-01-01T20:31:35Z

Hi,
I want to analyze multiple website by loop on a list and write the results in a json file.

I notice that when we crawl 2 differents website and we store the output in two differents variables (let's say A and B), the second variable, B, gets incremented of A...and so on for other crawls.

It is like the analyse() write on a the same object !!

And it gets even weirder when I delete A and B with a del A,B, the analyse() function do not re-run, it recovers the old results from nowhere !!

I tried to use function % reset to erase the memory...but still recover the results from local memory !!!

here is an example:

from seoanalyzer import analyze
A = analyze("https://krugerwildlifesafaris.com/")

# the lenght is 90
print(len(A['pages'])) 

B = analyze("http://www.vintage.co.bw/")

# the lenght is 90
print(len(A['pages']))
# the lenght is 100 but it should be 10 
print(len(B['pages']))

the A has 90 pages and B should have only 10 pages, but it has 90 from A + its own 10..

how to avoid this ?
Why this erratic behavior ?

regards,

karim.m

The text was updated successfully, but these errors were encountered:

ghost · 2020-01-04T09:24:00Z

Same problem guyz !

ghost · 2020-01-04T09:38:31Z

I fixed the issue by doing this:
Go to the ("Manifest") class in the implementation and look for the "Analyze" method.

At the end of the method, before "return output" just write:
Manifest.clear_cache()

Everything will be cool !

mayouf · 2020-01-04T15:08:12Z

Hi Ghezaielm,

Thanks for your quick feedback..by the meantime, I used another workaround, see below:

import os
for website in list_of_website:
----file_name = # whatever name file you want
----command='seoanalyze {} -f json > "{}"'.format(website,file_name)
----returned_value = os.system(command)
----print(str(returned_value)+' name= '+file_name+' '+website)

And it is convenient if you want parallelize you crawl by using ThreadPoolExecutor

I have 8 cores /20 threads CPU, it is damn fast...I crawled 20k websites in few hours !!

with concurrent.futures.ThreadPoolExecutor(max_workers=80) as executor:
#48 Start the load operations and mark each future with its URL
future_to_url = {executor.submit(analyze_SEO, url): url for row in list_website}
#print(future_to_url)

for future_url in concurrent.futures.as_completed(future_to_url):
url_completed = future_to_url[future_url]

try:
data = url_completed .result()
if data!=None:
print(data)
except Exception as exc:
print('%r generated an exception: %s' % url, exc)

(PS: sorry I did not how to make the spaces on github quote for code)

mayouf · 2020-01-04T15:09:02Z

Did you submit the correction on github ?

sethblack · 2020-02-01T18:34:35Z

Ah, right. I'm putting this on my roadmap for v4.1. 👍

mayouf changed the title ~~Analyze funtion write on same variable object~~ loop on multiple Analyze website write on same variable object Jan 1, 2020

sethblack self-assigned this Feb 11, 2020

sethblack added the enhancement label Feb 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loop on multiple Analyze website write on same variable object #49

loop on multiple Analyze website write on same variable object #49

mayouf commented Jan 1, 2020

ghost commented Jan 4, 2020

ghost commented Jan 4, 2020

mayouf commented Jan 4, 2020

mayouf commented Jan 4, 2020

sethblack commented Feb 1, 2020

loop on multiple Analyze website write on same variable object #49

loop on multiple Analyze website write on same variable object #49

Comments

mayouf commented Jan 1, 2020

ghost commented Jan 4, 2020

ghost commented Jan 4, 2020

mayouf commented Jan 4, 2020

mayouf commented Jan 4, 2020

sethblack commented Feb 1, 2020