# What it is

A script that takes an excel xlsx file containing the url to check along with expected meta tags that a user wants to check. The tool outputs whether the tags have been set at all, incorrectly, or correctly. It also outputs an SEO check which makes sure that the meta tags follow general SEO guidelines.

The current meta tags that can be checked are:
* Meta Title
* Meta Description

# Before Running All Cells

Check the the xlsx file containing meta tags that you want to check follows the correct format. To see the expected format, view the xlsx file "Example" under the Inputs folder. Then, place the xlsx file in the Check-Meta-Tags > Inputs folder.

Finally, enter file name and sheet below under [User Input](#user_input)

# How to Run
On top select Cell -> Run All

<a id='user_input'></a>

## User Input

Enter information below before running the cells.

In [None]:
# User input data
# Currently the fields are set to be 

META_TAGS_WORKBOOK = 'Example.xlsx'
META_TAGS_WORKBOOK_SHEET = 'MetaTagsSheet'

# If an http(s) needs to be appended to the url, what should it be?
# https = True
# http = False
# This will only change things if the beginning of url path is not specified
IS_SSL = True

# Whether looking to test the review site or production site
# Used for Master Lock and SentrySafe
# Not be supported for other sites, leave False
IS_ML_REVIEW = False


## Import and Constant

Cells in this section import libraries, define where the ouputted file will go, and load the file the user wants to use to check canonicals.

In [None]:
# Imports and constants
import pandas as pd
import xlrd
from xlutils.copy import copy
from datetime import datetime
import pandas as pd

import sys
sys.path.append('../')
import automatedtesting

META_TAGS_INPUT_FOLDER = 'Inputs/'
META_TAGS_OUTPUT_FOLDER = 'Results/'

META_TAGS_INPUT_WORKBOOK_PATH = META_TAGS_INPUT_FOLDER + META_TAGS_WORKBOOK

to_check = xlrd.open_workbook(META_TAGS_INPUT_WORKBOOK_PATH)
to_check_sheet = to_check.sheet_by_name(META_TAGS_WORKBOOK_SHEET)

# Needs to be a copy to get the number or rows? Weird
check_wb = copy(to_check) 
check_sheet = check_wb.get_sheet(META_TAGS_WORKBOOK_SHEET)

# These are the currently available tag checkers
META_TAGS = {"title":{'column':1, 'min_char':0, 'max_char':50,
                      'results':pd.DataFrame(columns=["Match Result","Expected Title", "Actual Title", "SEO Check"])},
             "description":{'column':2, 'min_char':50, 'max_char':300,
                    'results':pd.DataFrame(
                        columns=["Match Result","Expected Description", "Actual Description", "SEO Check"])}}

URL_RESULTS = pd.DataFrame(columns=["URL Status", "URL Status Code", "URL"])

RESULTS_STRING = "results"

## Functions

In this section, functions are defined to make the code easier to read and write tests for.

In [None]:
def get_tag_results(tag, url, min_char, max_char):
    actual_tag = automatedtesting.return_tag(url, tag)
    seo_result = automatedtesting.check_seo_length(
        expected_tag, min_char, max_char)
    match_result = automatedtesting.check_matching(expected_tag, actual_tag)
    
    return [match_result, expected_tag, actual_tag, seo_result]

# Testing

The cells below are a check to make sure that the tool is working correctly. If one of these fails, and the canonical checker still runs, outputted file may be incorrect. Reach out or trouble shoot based on the outputted error.

When selecting 'Run All Cells', if one of these tests fails, the code will stop running at this cell. If you want to continue, you can select the 'Actual Check' cell and continue by running that, but it's highly advised against.

In [None]:
def test_return_tag(url, tag, actual):
    '''Test for checking that the canonical parser is working correctly.
    If this returns a warning, first check that the passed in url actually has the redirect.'''
    result = automatedtesting.return_tag(url, tag)
    if result == actual:
        return True
    else:
        print("Error when parsing")
        return sys.exit(result)

print(test_return_tag("https://www.masterlock.com/business-use/product/A1266NBLK",
                      "title", 'Model No. A1266NBLK | Master Lock'))
print(test_return_tag("https://www.masterlock.com/business-use/product/A1266NBLK",
                "description", 
                "The American Lock A1266NBLK Solid Aluminum Padlock offers customization options to help fit your security needs, including keying, laser engraving and shackle options. Learn more.")) 

## Actual Check

Now on to applying the logic.

In [None]:
# Loop through entire row first of tags to be checked. For every row, check all meta tags
for i in range(1, len(check_sheet.rows)):
    
    # Get data from the inputted file and clean url if not full path
    url_containing_tag = automatedtesting.return_full_clean_path(to_check_sheet.cell(i, 0).value, IS_SSL)

    url_status_code = automatedtesting.get_status_code(url_containing_tag)
    
    # If the URL status code came back successfully, set to be OK and then check the meta tags
    # If not, the URL is having issues and the status code is returned. Tags are not checked.
    if url_status_code == 200:
        url_status = "OK"

        for tag in META_TAGS:
            expected_tag = to_check_sheet.cell(i, META_TAGS[tag]['column']).value.strip()
    
            # If expected tag is empty, then it doesn't need to be checked, all returned will be "n/a"
            # If not empty, check all required
            if expected_tag != "":
                # Get tag's results
                # Appends these results as a new row in the tag's results dataframe indexed by the current row
                META_TAGS[tag][RESULTS_STRING].loc[i] = get_tag_results(
                    tag, url_containing_tag, META_TAGS[tag]['min_char'], META_TAGS[tag]['max_char'])
            else: 
                META_TAGS[tag][RESULTS_STRING].loc[i] = ["n/a", "", "n/a", "n/a"]

    else: 
        url_status = "See status code."

    # For the row, set the URL results to the dataframe indexed by the row
    URL_RESULTS.loc[i] = [url_status, url_status_code, url_containing_tag]

# Combine the URL results with the meta tag results for each meta tag
# Because tags are on different sheets, this makes it easier to view all needed info
for tag in META_TAGS:
    META_TAGS[tag][RESULTS_STRING] = pd.concat([URL_RESULTS, META_TAGS[tag][RESULTS_STRING]], axis=1).fillna("n/a")
    print(META_TAGS[tag][RESULTS_STRING])

## Create Result Output File

After running the cell below, the results gotten from checking meta tags will be placed in an xlsx with the current timestamp in the title and then outputted to the __Results__ folder.

In [None]:
# Run to output the dataframe as an xlsx file in the 'Results' folder

# Time stamped file. In form yyyy-mm-dd_hour-minute
OUTPUT_FILE = META_TAGS_OUTPUT_FOLDER + 'meta-tag-results_'+ datetime.now().strftime("%Y-%m-%d_%H-%M") + '.xlsx'
writer = pd.ExcelWriter(OUTPUT_FILE, engine='xlsxwriter',)

# For each of the tags, create a new sheet with its results and add it to the outputted file. Then save.
for tag in META_TAGS:
    META_TAGS[tag][RESULTS_STRING].to_excel(writer, sheet_name=tag, index=False)
writer.save()