Sources: https://www.holisticseo.digital/python-seo/internet-archive/,
https://pypi.org/project/waybackpy/

This code:
1.Installs the `waybackpy` library for accessing the Wayback Machine's archiving services.
2. Imports necessary libraries, including pandas for data manipulation and time for timing operations.
3. Defines a user agent string to mimic a web browser for requests.
4. Connects to Google Drive to access files stored there.
5. Prompts the user to enter the full path of an Excel file containing URLs.
6. Loads the specified Excel file into a pandas DataFrame.
7. Adds a new column called 'Archived URL' to the DataFrame to store archived links.
8. Iterates through each URL in the DataFrame, checking if it is already archived; if not, it attempts to archive it.
9. Implements a rate limit of 15 URLs per minute to avoid exceeding the Wayback Machine's request limits.
10. Saves the updated DataFrame with archived URLs into a new Excel file named "Updated_Archive_Links.xlsx" in the same folder where original file is stored on Google Drive.

# Connecting GoogleDrive and opening your Excel file. For smooth data loading and processing it should have column with the hyperlinks called "URL"

This code works best in GoogleColab interface

In [None]:
# Downloading and importing libraries
!pip install waybackpy

import pandas as pd
import time
from waybackpy import WaybackMachineCDXServerAPI, WaybackMachineSaveAPI
import os

# Define the user agent string (this can be customized)
user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"

# Connect to Google Drive
from google.colab import drive
drive.mount('/content/drive')

#File path
file_path = str(input("Insert below this code full path to the file on Google Drive: "))

# Split the path into folder and filename
folder, filename = os.path.split(file_path)

print("Folder:", folder)
print("Filename:", filename)


# Load the dataframe
df = pd.read_excel(os.path.join(folder, filename))

# Create a new column 'Archived URL' in the dataframe
# Create a new column 'Archived URL' in the dataframe
df['Archived URL'] = None

# Track the number of links saved in the current minute and overall progress
links_saved_in_minute = 0
total_links = len(df)
processed_links = 0
start_time = time.time()

# Iterate over each URL in the dataframe
for i, row in df.iterrows():
    url = row['URL']

    try:
        # Check if the URL has already been archived
        cdx_api = WaybackMachineCDXServerAPI(url, user_agent)
        newest_snapshot = cdx_api.newest()

        if newest_snapshot:
            df.at[i, 'Archived URL'] = newest_snapshot.archive_url
        else:
            # If not archived, attempt to archive it now
            save_api = WaybackMachineSaveAPI(url, user_agent)
            saved_url = save_api.save()
            df.at[i, 'Archived URL'] = saved_url

            # Update the count of links saved in the current minute
            links_saved_in_minute += 1

    except Exception as e:
        print(f"Error processing URL '{url}': {str(e)}")
        df.at[i, 'Archived URL'] = "Failed to save. Try to do it manually or with another archiving tool"

    # Increment the processed links count
    processed_links += 1

    # Print status every 5 iterations
    if processed_links % 5 == 0 or processed_links == total_links:
        print(f"{processed_links} links processed, {total_links - processed_links} remaining.")

    # Check if the limit of 15 links per minute has been reached
    if links_saved_in_minute >= 15:
        elapsed_time = time.time() - start_time
        if elapsed_time < 60:
            sleep_time = 61 - elapsed_time
            print(f"Reached limit of 15 links in a minute. Sleeping for {sleep_time:.2f} seconds...")
            time.sleep(sleep_time)

        # Reset the count and start time for the next minute
        links_saved_in_minute = 0
        start_time = time.time()

# Save the updated dataframe to a new Excel file
df.to_excel(os.path.join(folder, "Updated_Archive_Links.xlsx"), index=False)

print("Archiving process completed. Dataframe saved to 'Updated_Archive_Links.xlsx' in the initial folder.")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Insert beliw this code full path to the file on Google Drive: /content/drive/MyDrive/Analytical Centre DM/DM and CIR/Religion DeepDive/Тест архівування посилань.xlsx
Folder: /content/drive/MyDrive/Analytical Centre DM/DM and CIR/Religion DeepDive
Filename: Тест архівування посилань.xlsx
5 links processed, 1 remaining.
6 links processed, 0 remaining.
Archiving process completed. Dataframe saved to 'Updated_Archive_Links.xlsx' in the initial folder.
