# 9. The Comprehensive Guide To Automating Screaming Frog

------------------------------------------------

## Learning Outcomes

- To learn how to use run Screaming Frog using the command line for Mac & Windows.
- To learn how to automatically run the Screaming Frog commands on Mac/Linux.
- To learn how to extract and wrangle the data returned directly from Screaming Frog reports in Python with Pandas.
- To learn how to how to upload your Screaming Frog data to BigQuery.
- To learn how to automatically run your Screaming Frog crawls daily with CRONtab.

------------------------------------------------------

[Screaming Frog (SF)](https://www.screamingfrog.co.uk/seo-spider/) is a fantastic desktop crawler that's available for Windows, Mac and Linux. 

This tutorial is separated into multiple parts:

You'll learn not only how to easily automate SF crawls but also how to automatically wrangle the data using Python.

Then we'll <strong> create a data pipeline which will push all of the data into BigQuery that will then be viewed via a Google Data Report template. </strong>

---

Finally, we'll step up the automation and upload our scripts to a Google Cloud virtual machine:
- The virtual machine will turn on every day at a specific time.
- Several python scripts will automatically execute and will perform the following: 
    - A list of domains from a .txt file / via environment variables will be sequentially crawled.
    - The data will be wrangled.
    - The data will be saved to BigQuery.
- Then the virtual machine will shut down after all of the domains have either completed or failed.
- The daily data will then be available via Google Data Studio

---


## 1. Screaming Frog On The Command Line (CLI)
- [Part 1 - Learning How To Use Screaming Frog On The Command Line - Mac/Linux](#) 
- [Part 1 - Learning How To Use Screaming Frog On The Command Line - Windows](#) 

## 2. Automating Screaming Frog Data Analysis
- [Part 2 - Data Wrangling Using Pandas & Python](#)

## 3. Pushing Your Data Into BigQuery
- [Part 3 - Uploading Data into BigQuery with Python](#)

## 4. Viewing Your Data In A Google Data Studio Template
- [Part 4 - Connecting a Google Data Template To The BigQuery Data]

## 5. Automating The Automation + Scheduling
- [Part 5 - How To Run CRONjobs and Start Up And Shutdown A Google Cloud Virtual Machine](#)

--------------------------------------------------------------------------------------------------------------------------------

Let's get to it and take Screaming Frog to the next level!

![screaming frog image](https://sempioneer.com/wp-content/uploads/2020/06/screaming-frog-template.png)

------------------------------------------------------------------------------

## The Command Line

Before I give you any commands to copy and paste, the next part of this guide will show you how to use the [command line](https://www.codecademy.com/articles/command-line-commands#:~:text=The%20command%20line%20is%20a,or%20Finder%20on%20Mac%20OS). Lots of activities that you regularly do on your computer such as opening/closing programs or requesting a web page can be performed on the command line.

If you'd like a detailed overview of the different types of commands you can use on your computer, I'd recommend viewing these guides:

---

- [Mac/Linux Udemy Course](https://www.udemy.com/course/mac-os-x-command-line-beyond-the-basics-d/)
- [Mac/Linux Cheatsheet](https://cheatography.com/davechild/cheat-sheets/linux-command-line/)
- [Windows Udemy Course](https://www.udemy.com/topic/windows-command-line/)
- [Windows Command Prompt Cheatsheet](https://www.sans.org/security-resources/sec560/windows_command_line_sheet_v1.pdf)


-------------------------------------------------------------------------------------------------------------------

## Part 1 - Screaming Frog CLI

### Mac

This part of the tutorial is only for Mac OSX users, therefore if you're using Windows, visit the Windows section instead here.

#### Opening Terminal

Firstly you will need to open terminal which can be done by the following commands:
    
- ⌘ Cmd + Space
- Type terminal
- Press enter

----------------

![terminal loading here](https://sempioneer.com/wp-content/uploads/2020/06/loading-macosx-terminal-1.png)

----------------------------

![mac os x terminal loaded](https://sempioneer.com/wp-content/uploads/2020/06/macosx-terminal-start-1.png)

#### Useful Linux Commands:

Several useful commands that you can use are:
- cd ~ (cd allows you to change directory)
- pwd (pwd prints your current working directory)
- mkdir folder (mkdir allows you to create folders)
- clear (clear removes any previous text from your terminal)

--------------------------------------------------------------------------------------------------------------

#### How To Open Screaming Frog With The Terminal

Assuming that Screaming Frog is installed in the default location, you can run Screaming Frog with:
    
~~~

/Applications/Screaming\ Frog\ SEO\ Spider.app/Contents/MacOS/ScreamingFrogSEOSpiderLauncher

~~~

------------------------------------------------------------------------------------------------------------

#### How To Create An Alias In Terminal

Now let's create a shortcut for the command that we just ran, this is called an alias. All of your alias' shortcuts need to be created inside of:
    
~~~
 
~/.bash_profile (Older Mac Terminals)
~/.zshrc (Newer Mac Terminals)

~~~

NB: You can easily find out whether you're on a new Mac terminal with:

~~~

which $SHELL
    
~~~

If it says /bin/zsh, then you will need to update the .zsrc file instead.

You can edit this file with either:
~~~    
cd ~ && sudo nano .bash_profile (Older Mac Terminals)
cd ~ && sudo nano .zshrc (Newer Mac Terminals)
~~~

------------------------------------------------------

The alias that we will create will be called sf and will automatically run the Screaming Frog Application. Add the following to either your .bash_profile or .zsrc file:

~~~

alias sf="/Applications/Screaming\ Frog\ SEO\ Spider.app/Contents/MacOS/ScreamingFrogSEOSpiderLauncher"

~~~

![](https://sempioneer.com/wp-content/uploads/2020/06/creating-alias-sf.png)

Then hit:
    
~~~

CTRL + X (to save the file)
Enter

~~~

![](https://sempioneer.com/wp-content/uploads/2020/06/confirming-alias-sf.png)

------------------------------------------------------------------

Now close your terminal and reload it using:
- ⌘ Cmd + Space
- Type terminal
- Press enter

---------------------------------------------------------------------------------

Now type:
    
~~~

sf

~~~

![](https://sempioneer.com/wp-content/uploads/2020/06/screaming-frog-alias.png)

As you can see, we've now successfully created a shortcut for loading Screaming Frog.

--------------------------------------------------------

#### How To See All Of The Commands:

You can easily get a list of all of the available commands with:
    
~~~

sf --help

~~~

-------------------------------------------------------------------------------------------------------------------

#### How To Run A Crawl

If you want to open Screaming Frog and crawl a website use this:

~~~

sf --crawl <url_name>

~~~

For example if you wanted to crawl https://sempioneer.com:
    
~~~

sf --crawl https://sempioneer.com

~~~

You can use any URL or domain name that you'd like and the above commands will both:

- Open your Screaming Frog Application.
- Crawl the desired domain.

------------------------------------------------------------------------------------------------------------

#### How To Run Screaming Frog Headless (Without A Graphical User Interface)

It's possible for us to execute Screaming Frog without a graphical user interface, by adding <strong>--headless</strong>:

~~~

sf --headless --crawl <url>


~~~

Additionally we can save the crawl by adding <strong> --save-crawl </strong>:


~~~

sf --headless --save-crawl --crawl <url> 

~~~

---

<strong> NB: You will need to [purchase a license](https://www.screamingfrog.co.uk/seo-spider/licence/) for executing Screaming Frog with the --save-crawl functionality. </strong>

An example would be:
    
~~~

sf --headless --save-crawl --crawl https://phoenixandpartners.co.uk/

~~~

------------------------------------------------------------------------------------------------------------

### How To Export Data

Instead of saving a crawl, we will export the data to a folder using by adding two extra arguments

~~~

--output-folder (This argument allows us to specify a folder where you would like to export the crawl data)
--timestamped-output (This argument will save the file under a time-stamped folder and as every file is saved as crawl.seospider, adding a timestamp prevents a conflict or overwriting an existing file).

~~~

1. Locate your username by typing <strong> pwd </strong> in Terminal and excluding the $. For example my username is: jamesaphoenix
    


![](https://sempioneer.com/wp-content/uploads/2020/06/finding-user-name.png)

2. Go back to either your .bash_profile file or .zshrc file and let's create a new alias:


~~~
cd ~ && sudo nano .bash_profile (Older Mac Terminals)
cd ~ && sudo nano .zshrc (Newer Mac Terminals)
~~~

---

Then add the following alias to the bottom of the file:

~~~
alias sf-headless="/Applications/Screaming\ Frog\ SEO\ Spider.app/Contents/MacOS/ScreamingFrogSEOSpiderLauncher --headless --save-crawl --output-folder /users/{username}/desktop --timestamped-output --crawl"
~~~

---

Please remember to replace <strong> {username} </strong> with your true username!

------------------------------------------------------------------------

Save your file and load up a new Terminal window and enter:
    
~~~

sf-headless example.org


~~~

You'll hopefully have a time-stamped folder on your desktop and inside of that folder, you will see a file called crawl.seospider

------------------------------------------------------

![](https://sempioneer.com/wp-content/uploads/2020/06/folder-.png)

------------------------------------------------------

![](https://sempioneer.com/wp-content/uploads/2020/06/seo-spider.png)

--------------------------------------------------

#### How To Export CSV Files:

As well as doing a crawl, its possible to automatically extract the .csv files.

You can export tabs, which are the following:
    
![](https://sempioneer.com/wp-content/uploads/2020/06/tabs.png)


For example if we wanted to crawl the website and export a .csv file with all of the images without alt text, we would do the following:

~~~
sf --crawl <url> --output-folder /users/{username}/desktop/sf --export-tabs "Images:Missing Alt Text" --headless
~~~

---------------------------------------------------------------------------------------------------------

The snytax for exporting from tabs follows a generic structure:

~~~

--export-tabs "tab-parent:tab-child"

~~~

--------------------------------------------------------------------------------

#### How To Export Multiple Tabs

You can also export multiple files at once by simplying separating them by a comma:
    
~~~

"parent1:child1,parent2:child2,parent3:child3"

~~~

In order to see the parent:child relationships for the tabs, simply look at how they nested inside of the right panel of Screaming Frog:
        

![](https://sempioneer.com/wp-content/uploads/2020/06/meta-description-tab.png)

-------------------------------------------------

![](https://sempioneer.com/wp-content/uploads/2020/06/page-titles-tab.png)

------------------------

Let's simulataneously extract duplicated title tags, missing title tags and meta descriptions:

~~~
sf --crawl phoenixandpartners.co.uk --timestamped-output --output-folder /users/{username}/Desktop --export-tabs "Page Titles:Duplicate,Page Titles:Missing,Meta Description:Missing" --headless

~~~

--- 

For example my username is jamesaphoenix:

~~~
sf --crawl phoenixandpartners.co.uk --timestamped-output --output-folder /users/{username}/Desktop --export-tabs "Page Titles:Duplicate,Page Titles:Missing,Meta Description:Missing" --headless
~~~

---

![](https://sempioneer.com/wp-content/uploads/2020/06/exporting-multiple-tabs-with-screaming-frog-cli.png)

------------------------------------------------------------------------

#### How To Export Reports

Also you can export reports:

![](https://sempioneer.com/wp-content/uploads/2020/06/exporting-screaming-frog-reports.png)

The syntax is similar and uses the <strong>parent:child</strong> structure, however if there is no child then only the parent name is required.

Here's an example where only the parent level is required:

~~~
sf --crawl <url> --timestamped-output --output-folder /users/{username}/desktop --save-report "Redirect & Canonical Chains" --headless

~~~
---
Here's an example where the parent:child structure is required:
~~~
sf --crawl <url> --timestamped-output --output-folder /users/{username}/desktop --save-report "Redirects:All Redirects" --headless
~~~

--- 


![](https://sempioneer.com/wp-content/uploads/2020/06/all-redirects.png)

#### How To Perform Bulk Exports

We can also extract the bulk exports too! 

![](https://sempioneer.com/wp-content/uploads/2020/06/bulk-export.png)

An example where only a parent level is required:

~~~

sf --crawl <url> --timestamped-output --output-folder /users/{username}/Desktop --bulk-export "All Images" --headless

~~~

---

An example where the parent:child structure is required:

~~~

sf --crawl <url> --timestamped-output --output-folder /users/{username}/Desktop --bulk-export "AMP:All Inlinks" --headless

~~~

------------------------------------------------------------------------------------------------------------------------------

#### How To Create A Sitemap

If you're using a content management system such as Wordpress, then I'd recommend using a plugin such as [Yoast](https://yoast.com/wordpress/plugins/seo/)/ [TheSEOFramework](https://theseoframework.com/) / [RankMath](https://rankmath.com/) to automatically build your sitemap.xml files.

However if you're working with a headless CMS or a static website, you can automatically create sitemaps with Screaming Frog:

~~~

sf --crawl <url> --create-sitemap --output-folder /users/{username}/desktop --headless

~~~

![](https://sempioneer.com/wp-content/uploads/2020/06/sitemap_xml.png)

------------------------------------------------------------------------------------------------------------------------------

#### How To Create Configuration Files

Configuration files allow you to tune the crawl speed, choose specific user agents, crawl or not crawl specific pages and tons more!

After changing the configuration inside of Screaming Frog, you can save it as a configuration file.

We can then apply that config file to a headless terminal screaming frog crawl via the terminal.

---


Create Your Config File:

First open up Screaming Frog and go to Configuration >> Extraction >> Structured Data:

Then tick the following checkboxes:

- JSON-LD
- Microdata
- RDFa


![]()


------------------------------------------------------------------------------------------------------------------------------

#### How To Crawl Text Files

------------------------------------------------------------------------------------------------------------------------------

### Windows

------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------

Resources:

- https://seobutler.com/badass-seo-automate-screaming-frog/

-----------------------------------------------------------------------------