# What it is

A script that takes an excel xlsx file containing the original url and the redirected url pairings that a user wants to check and outputs whether the redirects are valid or not along with an SEO check.

# How to Use it

1. Create an input file containing the url's to check that is modelled off of the example sheet. Place it in the 'Inputs' folder. If you just want a feel of how to run the notebook, you can use the default examples. The input file name should be enclosed by quotes like shown and include the file extension .xlsx
2. If you are checking a Master Lock or SentrySafe site, select whether the site to check is review or production by inputting either True or False next to the variable __IS_ML_REVIEW__. If your site is neither, leave this value as False.
3. Select whether the site you are checking has ssl or not. Not needed if you have full url paths in the input folder.
4. Run the check by going to 'Cell' in the top navigation and selecting 'Run All'.
5. View what urls passed or failed the test by reading below or going to the 'Results' folder and selecting the file with the timestamp of your last run.

## User Input

Enter information below before running the cells.

In [None]:
# User input data

REDIRECTS_WORKBOOK = 'Example.xlsx'
REDIRECTS_WORKBOOK_SHEET = 'Redirects'

# If an http(s) needs to be appended to the url, what should it be?
# https = True
# http = False
# This will only change things if the beginning of url path is not specified
IS_SSL = True

# Whether looking to test the review site or production site
# Used for Master Lock and SentrySafe
# Not be supported for other sites, leave False
IS_ML_REVIEW = False


## Imports and Constants

Cells in this section import libraries, define where the ouputted file will go, and load the file the user wants to use to check canonicals.

In [None]:
# Imports and constants

import pandas as pd
import xlrd
from xlutils.copy import copy
from datetime import datetime
import requests

import sys
sys.path.append('../')
import automatedtesting

REDIRECTS_INPUT_FOLDER = 'Inputs/'
REDIRECTS_OUTPUT_FOLDER = 'Results/'

REDIRECTS_INPUT_WORKBOOK_PATH = REDIRECTS_INPUT_FOLDER + REDIRECTS_WORKBOOK

to_check = xlrd.open_workbook(REDIRECTS_INPUT_WORKBOOK_PATH)
to_check_sheet = to_check.sheet_by_name(REDIRECTS_WORKBOOK_SHEET)

check_wb = copy(to_check) 
check_sheet = check_wb.get_sheet(REDIRECTS_WORKBOOK_SHEET)

## Functions

In this section, functions are defined to make the code easier to read and write tests for.

In [None]:
def get_request(url):
    try:
        # This status code is reflective of the last code outputted and will not reflect redirects
        return requests.get(url)
    except:
        return False

# Testing

The cells below are a check to make sure that the tool is working correctly. If one of these fails, and the canonical checker still runs, outputted file may be incorrect. Reach out or trouble shoot based on the outputted error.

When selecting 'Run All Cells', if one of these tests fails, the code will stop running at this cell. If you want to continue, you can select the 'Actual Check' cell and continue by running that, but it's highly advised against.

In [None]:
def test_change_to_env(url, env_url, env, is_ssl):
    test_url = automatedtesting.change_env(url, env, is_ssl)
    if test_url == env_url:
        print("Pass")
    else:
        print("An error occurred. Test url: " + test_url)
        print("Expected url: " + env_url)
        print("Env: "+ str(env))
        sys.exit()

test_change_to_env("https://www.masterlock.com/service-and-support/faqs/lost-combinations",
                   "https://www.masterlock.com/service-and-support/faqs/lost-combinations", False, True)
test_change_to_env("www.sentrysafe.com", "http://review.sentrysafe.com", True, False)
test_change_to_env("review.sentrysafe.com", "https://www.sentrysafe.com", False, True)
test_change_to_env("nm.org", "http://nm.org", False, False)

## Actual Check

Now on to applying the logic.

In [None]:
# Checking the redirects

cols = ["URL Status", "Status Code", "URL", "Redirect Result", "Expected Redirect",
        "Actual Redirect", "Hops", "SEO Results"]
list_of_results = pd.DataFrame(columns=cols)

# For every row in the input data, check to see that the actual redirect is the same as the desired
for i in range(1, len(check_sheet.rows)):
    
    # Begin by clearing variables between rows
    seo_check = "n/a"
    matched_result = "n/a"
    hops = "n/a"
    actual_redirect = "n/a"
    
    url_to_redirect = automatedtesting.change_env(to_check_sheet.cell(i, 0).value, IS_ML_REVIEW, IS_SSL)
    expected_redirect = automatedtesting.change_env(to_check_sheet.cell(i, 1).value, IS_ML_REVIEW, IS_SSL)
    
    req = get_request(url_to_redirect)
    
    if req:
        status_code = req.status_code
        if status_code == 200:

            # To check redirects, the history of the response must be parsed
            # If there is no history, then a redirect did not occur
            if req.history:
                status_code = req.history[-1].status_code

                # If the url is the correct redirect, then test for results
                # Else, it is probably a 302, and incorrect due to SEO standards
                if  status_code == 301:
                    url_status = "OK"
                    hops = len(req.history)
                    seo_check = automatedtesting.check_seo_hops(hops)
                    actual_redirect = automatedtesting.return_full_clean_path(req.url, IS_SSL)
                    matched_result = automatedtesting.check_matching(expected_redirect, actual_redirect)
                else: url_status = "Wrong redirect response"
            else:
                matched_result = "No redirect occurred"
        else:
            url_status = "See Status Code"
    else:
        status_code = "Error"
        url_status = "Redirected URL could not be reached"
    
    
   
    
    # Append the result to a dataframe for output later
    list_of_results.loc[i] = [url_status, status_code, url_to_redirect, matched_result, 
                              expected_redirect, actual_redirect, hops, seo_check]

print(list_of_results)

## Create Result Output File

After running the cell below, the results gotten from checking redirects will be placed in an xlsx with the current timestamp in the title and then outputted to the __Results__ folder.

In [None]:
# Run to output the dataframe as an xlsx file in the 'Results' folder

OUTPUT_FILE = REDIRECTS_OUTPUT_FOLDER + 'redirect-results_'+ datetime.now().strftime("%Y-%m-%d_%H-%M") + '.xlsx'

writer = pd.ExcelWriter(OUTPUT_FILE, engine='xlsxwriter',)
list_of_results.to_excel(writer, sheet_name='Redirects', index=False)
writer.save()