<a href="https://colab.research.google.com/github/akash-yede/Twittorials/blob/master/Generate_SEO_reports.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<center><b><h1>Generate SEO reports</h1></b>

Marketers can generate reports to find page speed insights of all their sites at once. This script allows you to get insights for all your sites in a single document. We will be using Python's pypeteer library to snapshot the reports for each of the sites. These images will then be consolidated to form a report using Python's docx library.

In [None]:
# Installing libraries
!pip install -U git+https://github.com/pyppeteer/pyppeteer@dev
!pip install nest_asyncio
!pip install python-docx

In [None]:
#Install the chromium driver for pypeteer
!apt-get update
!apt install chromium-chromedriver

Once you have installed the necessary libraries, you should now include all the site URLs you wish to add to the report.

To add additional URLs, click on the three dots on the code block, select Form, and click Add Form Field. Replace the variable_name with the new URL. To remove them, delete the variable name directly from the code block.

In [7]:
#@title Add additional URLs.
url1 = "www.example1.com" #@param {type:"string"}
url2 = "www.example2.com" #@param {type:"string"}
url3 = "www.example3.com" #@param {type:"string"}
url4 = "www.example4.com" #@param {type:"string"}
url5 = "www.example5.com" #@param {type:"string"}


Adding the URLs to form a list.

In [8]:
list_url = [url1, url2, url3, url4, url5]
print(list_url)

['www.example1.com', 'www.example2.com', 'www.example3.com', 'www.example4.com', 'www.example5.com']


Now, we will be using Python's pypeteer library to crawl through the page speed insights site. The below code block will use the above-listed sites to create individual reports. It will take a snapshot of all the reports in png format.


In [5]:
from pyppeteer import launch
import asyncio
import time
import nest_asyncio
nest_asyncio.apply()

async def main():
    #Launch the chromium browser. This code will run in the background.
    browser = await launch(executablePath="/usr/lib/chromium-browser/chromium-browser",args=['--no-sandbox'])

    #Creating a new page on the browser
    page = await browser.newPage()
    await page.setViewport({ 'width': 800, 'height': 1100})
    
    for url in list_url:

      #This code will reach the web page on the browser.
      await page.goto('https://developers.google.com/speed/pagespeed/insights/?url='+url+'&tab=desktop')
    
      #Add a timeout for 120 seconds. This is the approximate time that the website will take to generate the report.
      time.sleep(60)
      item = list_url.index(url)

      #Take a screenshot of the report and save it locally on colab.
      await page.screenshot({'path': 'screenshot'+str(item)+'.png'})
    
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

Once we have all the snapshots saved, we'll further consolidate them to create a report. The new file will be generated and ready to download.

In [6]:
import docx
# Creating a blank word document
mydoc = docx.Document()

for i in list_url:
  #Iterating through each URL and adding screenshots to the word file.
  item = list_url.index(i)
  mydoc.add_picture("screenshot"+str(item)+".png", width=docx.shared.Inches(6), height=docx.shared.Inches(8))

#Saving the word file with a new name.
mydoc.save("generatedfile.docx")