Backlink checker is a simple tool, which checks backlink quality, identifies problematic backlinks, and outputs them to a specific Slack channel.
The tool tries to reach a backlink, which is supposed to contain a referent link, and checks if it indeed does. If a backlink contains a referent link, the tool retrieves the HTML of that backlink and checks for certain HTML elements, which indicate good quality of backlink.
The first step is to prepare the environment. The backlink checker is written in Python. The most common Python packages for creating any web crawling tool are Requests and Beautiful Soup 4 - a library needed for pulling data out of HTML. Also, make sure you have Pandas package installed, as it will be used for some simple data wrangling.
These packages can be installed using the pip install
command.
pip install beautifulsoup4 requests pandas
This will install all the three needed packages.
Important: Note that version 4 of BeautifulSoup is being installed here. Earlier versions are now obsolete.
The script scrapes backlink websites and checks for several backlink quality signs:
- if backlink is reachable
- if backlink contains noindex element or not
- if backlink contains a link to a referent page
- if link to referent's page is marked as nofollow
The first step is to try to reach the backlink. This can be done using the Requests library's get()
method.
try:
resp = requests.get(
backlink,
allow_redirects=True
)
except Exception as e:
return ("Backlink not reachable", "None")
response_code = resp.status_code
if response_code != 200:
return ("Backlink not reachable", response_code)
If a request returns an error (such as 404 Not Found
) or backlink cannot be reached, backlink is assigned Backlink not reachable status.
To be able to navigate in the HTML of a backlink, a Beautiful soup object needs to be created.
bsObj = BeautifulSoup(resp.content, 'lxml', from_encoding=encoding)
Note that if you do not have lxml installed already, you can do that by running pip install lxml
.
Beautiful Soup's find_all()
method can be used to find if there are <meta>
tags with noindex
attributes in HTML. If that's true, let's assign Noindex status to that backlink.
if len(bsObj.findAll('meta', content=re.compile("noindex"))) > 0:
return('Noindex', response_code)
Next, it can be found if HTML contains an anchor tag (marked as a
) with a referent link. If there was no referent link found, let's assign Link was not found status to that particular backlink.
elements = bsObj.findAll('a', href=re.compile(our_link))
if elements == []:
return ('Link was not found', response_code)
Finally, let's check if an HTML element, containing a link to a referent page, has a nofollow
tag. This tag can be found in the rel
attribute.
try:
if 'nofollow' in element['rel']:
return ('Link found, nofollow', response_code)
except KeyError:
return ('Link found, dofollow', response_code)
Based on the result, let's assign either Link found, nofollow or Link found, dofollow status.
After getting status for each backlink and referent link pair, let's append this information (along with the response code from a backlink) to pandas DataFrame.
df = None
for backlink, referent_link in zip(backlinks_list, referent_links_list):
(status, response_code) = get_page(backlink, referent_link)
if df is not None:
df = df.append([[backlink, status, response_code]])
else:
df = pd.DataFrame(data=[[backlink, status, response_code]])
df.columns = ['Backlink', 'Status', 'Response code']
get_page()
function refers to the 4-step process that was described above (please see the complete code for the better understanding).
In order to be able to automatically report backlinks and their statuses in a convenient way, a Slack app could be used. You will need to create an app in Slack and assign incoming webhook to connect it and Slack's channel you would like to post notifications to. More on Slack apps and webhooks: https://api.slack.com/messaging/webhooks
SLACK_WEBHOOK = "YOUR_SLACK_CHANNEL_WEBHOOK"
Although the following piece of code could look a bit complicated, all that it does is formatting data into a readable format and pushing that data to Slack channel via POST request to Slack webhook.
cols = df.columns.tolist()
dict_df = df.to_dict()
header = ''
rows = []
for i in range(len(df)):
row = ''
for col in cols:
row += "`" + str(dict_df[col][i]) + "` "
row = ':black_small_square:' + row
rows.append(row)
data = ["*" + "Backlinks" "*\n"] + rows
slack_data = {
"text": '\n'.join(data)
}
requests.post(webhook_url = SLACK_WEBHOOK, json = slack_data)
That's it! In this example, Slack was used for reporting purposes, but it is possible to adjust the code so that backlinks and their statuses would be exported to a .csv file, google spreadsheets, or database.
Please see backlink_monitoring_oxylabs.py for the complete code.