# What it is

A script that takes an excel xlsx file containing the url to check along with expected meta tags that a user wants to check. The tool outputs whether the tags have been set at all, incorrectly, or correctly. It also outputs an SEO check which makes sure that the meta tags follow general SEO guidelines.

The current meta tags that can be checked are:
* Meta Title
* Meta Description

# Before Running All Cells

Check the the xlsx file containing meta tags that you want to check follows the correct format. To see the expected format, view the xlsx file "Example" under the Inputs folder. Then, place the xlsx file in the Check-Meta-Tags > Inputs folder.

Finally, enter file name and sheet below under [User Input](#user_input)

# How to Run
On top select Cell -> Run All

<a id='user_input'></a>

## User Input

Enter information below before running the cells.

In [33]:
# User input data
# Currently the fields are set to be 

META_TAGS_WORKBOOK = 'Example.xlsx'
META_TAGS_WORKBOOK_SHEET = 'MetaTagsSheet'

IS_SSL = True


## Import and Constant

Cells in this section import libraries, define where the ouputted file will go, and load the file the user wants to use to check canonicals.

In [53]:
# Imports and constants
import pandas as pd
import xlrd
from xlutils.copy import copy
from datetime import datetime
import requests
import pandas as pd

import sys
sys.path.append('../')

import automationtesting

META_TAGS_INPUT_FOLDER = 'Inputs/'
META_TAGS_OUTPUT_FOLDER = 'Results/'

META_TAGS_INPUT_WORKBOOK_PATH = META_TAGS_INPUT_FOLDER + META_TAGS_WORKBOOK

to_check = xlrd.open_workbook(META_TAGS_INPUT_WORKBOOK_PATH)
to_check_sheet = to_check.sheet_by_name(META_TAGS_WORKBOOK_SHEET)

check_wb = copy(to_check) 
check_sheet = check_wb.get_sheet(META_TAGS_WORKBOOK_SHEET)

# These are the currently available tag checkers
META_TAGS = {"title":{'column':1, 'min_char':0, 'max_char':50,
                      'results':pd.DataFrame(columns=["Match Result","Expected Title", "Actual Title", "SEO Check"])},
             "description":{'column':2, 'min_char':50, 'max_char':300,
                    'results':pd.DataFrame(
                        columns=["Match Result","Expected Description", "Actual Description", "SEO Check"])}}

URL_RESULTS = pd.DataFrame(columns=["URL Status", "URL Status Code", "URL"])

# Testing

The cells below are a check to make sure that the tool is working correctly. If one of these fails, and the canonical checker still runs, outputted file may be incorrect. Reach out or trouble shoot based on the outputted error.

When selecting 'Run All Cells', if one of these tests fails, the code will stop running at this cell. If you want to continue, you can select the 'Actual Check' cell and continue by running that, but it's highly advised against.

In [9]:
def test_return_tag(url, tag, actual):
    '''Test for checking that the canonical parser is working correctly.
    If this returns a warning, first check that the passed in url actually has the redirect.'''
    result = automationtesting.return_tag(url, tag)
    if result == actual:
        return True
    else:
        print("Error when parsing")
        return sys.exit(result)

print(test_return_tag("https://www.masterlock.com/business-use/product/A1266NBLK",
                      "title", 'Model No. A1266NBLK | Master Lock'))
print(test_return_tag("https://www.masterlock.com/business-use/product/A1266NBLK",
                "description", 
                "The American Lock A1266NBLK Solid Aluminum Padlock offers customization options to help fit your security needs, including keying, laser engraving and shackle options. Learn more.")) 

True
True


## Actual Check

Now on to applying the logic.

In [19]:
mapping_of_results = {}
# Loop through entire row first
# Make module out of basic functions

for tag in META_TAGS:
    # These are what the headers of the outputted xlsx will be, along with the output printed after running this cell.
    cols = ["Match Result", "Status Code","Url", "Expected "+tag, "actual "+tag, "SEO Check"]

    # This will be the ouputted table that will hold all of the results for that tag.
    list_of_results = pd.DataFrame(columns=cols)
    
    min_seo = META_TAGS[tag]['min_char']
    max_seo = META_TAGS[tag]['max_char']

    for i in range(1, len(check_sheet.rows)):
        '''For every row in the input data, check to see that the meta tag 1) exists 2) is what was desired'''
        
        # Clear the results to be sure they don't carry over row to row
        seo_result = "n/a"
        match_result = "n/a"
        actual_tag = "n/a"

        # Get data from the inputted file and clean url if not full path
        url_containing_tag = automationtesting.return_full_clean_path(to_check_sheet.cell(i, 0).value, IS_SSL)
        expected_tag = to_check_sheet.cell(i, META_TAGS[tag]['column']).value.strip()

        if expected_tag != "":
            try:
                url_status_code = requests.get(url_containing_tag).status_code
                print(url_status_code)

                if url_status_code == 200:
                    actual_tag = automationtesting.return_tag(url_containing_tag, tag)
                    print(actual_tag)
                    seo_result = automationtesting.check_seo_length(expected_tag, min_seo, max_seo)
                    print(seo_result)
                    match_result = automationtesting.check_matching(expected_tag, actual_tag)
                else: 
                    match_result = "Error when accessing url containing meta tag "+str(tag)+". See status code."
            except:
                url_status_code = "Error"
                match_result = "Url could not be accessed"


        # Append the result to a dataframe for output later
        list_of_results.loc[i] = [match_result, url_status_code, url_containing_tag,
                                  expected_tag, actual_tag, seo_result]

    print(list_of_results)
    mapping_of_results[tag] = list_of_results

200
Locks, Padlocks and Security Products | Master Lock
Too long, should be <= 50
200
About Us | Master Lock
OK
200
Service & Support | Master Lock
OK
200
Warranty Information | FAQS | Master Lock
OK
                       Match Result Status Code  \
1                                OK         200   
2                                OK         200   
3                               n/a         200   
4                                OK         200   
5  Expected and actual do not match         200   
6                               n/a         200   
7         Url could not be accessed       Error   

                                                 Url  \
1                         https://www.masterlock.com   
2                https://www.masterlock.com/about-us   
3  https://www.masterlock.com/personal-use/school...   
4     https://www.masterlock.com/service-and-support   
5  https://www.masterlock.com/service-and-support...   
6  https://www.masterlock.com/service-and-support...   

In [54]:
# Loop through entire row first
for i in range(1, len(check_sheet.rows)):
    # Get data from the inputted file and clean url if not full path
    url_containing_tag = automationtesting.return_full_clean_path(to_check_sheet.cell(i, 0).value, IS_SSL)
    try:
        url_status_code = requests.get(url_containing_tag).status_code

        if url_status_code == 200:
            url_status = "OK"
            for tag in META_TAGS:
                '''For every row in the input data, check to see that the meta tag 1) exists 2) is what was desired'''

                # Clear the results to be sure they don't carry over row to row
                actual_tag = "n/a"
                seo_result = "n/a"
                match_result = "n/a"

                expected_tag = to_check_sheet.cell(i, META_TAGS[tag]['column']).value.strip()
                if expected_tag != "":
                    actual_tag = automationtesting.return_tag(url_containing_tag, tag) # Actual tag
                    seo_result = automationtesting.check_seo_length(
                        expected_tag, META_TAGS[tag]['min_char'], META_TAGS[tag]['max_char']) # Seo Result
                    match_result = automationtesting.check_matching(expected_tag, actual_tag) # Match Result
                META_TAGS[tag]['results'].loc[i] = [actual_tag, expected_tag, seo_result, match_result]
            
        else: 
            url_status = "Error when accessing URL. See status code."
    except:
        url_status_code = "Error"
        url_status = "URL could not be accessed"
    URL_RESULTS.loc[i] = [url_status, url_status_code, url_containing_tag]

for tag in META_TAGS:
    META_TAGS[tag]['results'] = pd.concat([URL_RESULTS, META_TAGS[tag]['results']], axis=1).fillna("n/a")
    print(META_TAGS[tag]['results'])

                  URL Status URL Status Code  \
1                         OK             200   
2                         OK             200   
3                         OK             200   
4                         OK             200   
5                         OK             200   
6                         OK             200   
7  URL could not be accessed           Error   

                                                 URL  \
1                         https://www.masterlock.com   
2                https://www.masterlock.com/about-us   
3  https://www.masterlock.com/personal-use/school...   
4     https://www.masterlock.com/service-and-support   
5  https://www.masterlock.com/service-and-support...   
6  https://www.masterlock.com/service-and-support...   
7                                 https://www.badurl   

                                        Match Result  \
1  Locks, Padlocks and Security Products | Master...   
2                             About Us | Master Lock  

## Create Result Output File

After running the cell below, the results gotten from checking canonicals will be placed in an xlsx with the current timestamp in the title and then outputted to the __Results__ folder.

In [5]:
# Run to output the dataframe as an xlsx file in the 'Results' folder

# Time stamped file. In form yyyy-mm-dd_hour-minute
OUTPUT_FILE = META_TAGS_OUTPUT_FOLDER + 'meta-tag-results_'+ datetime.now().strftime("%Y-%m-%d_%H-%M") + '.xlsx'

writer = pd.ExcelWriter(OUTPUT_FILE, engine='xlsxwriter',)
for tag in META_TAGS:
    '''For each of the tags, create a new sheet with its results and add it to the outputted file.'''
    mapping_of_results[tag].to_excel(writer, sheet_name=tag, index=False)
writer.save()