<a href="https://colab.research.google.com/github/pkaiser8/info-664-final/blob/main/PK_final_project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Three (or less) Randomized Records from the Collection of the Cooper Hewitt, Smithsonian Design Museum

##1. Import dictionaries, establish file path for .CSV datasheet and request user input
In this first section we import the required dictionaries (pandas and random) in order to run our code properly. We then load in my refined dataset which was linked to in the GitHub repository this program is nested in (https://github.com/pkaiser8/info-664-final). We then have our loaded DataFrame using the pandas library.

Next, we request the user input two keys from the printed results of the columns present in the DataFrame. Then we use the groupby() method to generate a group from these selections. This allows the user to drive their own means of discovery from the data.



In [93]:
import pandas as pd
import random

# Establish file path for .CSV data sheet:
data_filepath = '/content/objects-refined.csv'

def load_data(data_filepath):
    """
    Loads data from a CSV file.

    Inputs:
        data_filepath: The file or URL path to the CSV file.

    Returns:
        The loaded DataFrame.
    """

    # Read the data from the .CSV defined in data_filepath.
    # Use low_memory=False to process entire file at once.
    objects_df = pd.read_csv(data_filepath, low_memory=False)
    return objects_df

def get_user_input(objects_df):
  """Pulls in user input to establish key selection"""

  # Allow the user to define the desired output by selecting from the key columns outlined in the .CSV file used:
  print(f'Welcome to the Cooper Hewitt, Smithsonian Design Museum collections object randomizer. \nPlease see below a list of keys used to define objects in the collection .CSV file:\n')
  # Prints the .CSV columns so the user can decide which to input:
  #print(objects_df.columns)
  column_list = objects_df.columns.to_list()
  print("Available keys:\n" + "\n".join(f'{column}' for column in column_list))
  print()
  print(f'Select which keys you would like to pair together to randomly find three or less records which share common values from these elements.\nNote that Some key combinations work better than others. See which ones yield the best results\n')

  # User inputted information for each desired key:
  user_selected_key_1 = input(f'Please enter one of the keys listed above. It is best to copy and paste everything between the quotes:\n')
  print()
  user_selected_key_2 = input(f'Please enter a second key:\n')
  print()
  print(f'You have selected "{user_selected_key_1}" and "{user_selected_key_2}" as your grouped keys.')
  # Create a group of with these two user inputted key selections:
  user_input = [user_selected_key_1, user_selected_key_2]
  return user_input

def group_data(data_filepath, user_input):
    """Loads data from a CSV file and groups it by a specified column.

    Inputs:
        data_filepath (string): The file or URL path to the CSV file.
        user_input (string): The key columns to group the data by.

    Returns:
        The grouped DataFrame.
    """

    # Use the groupby() method to group the rows in the DataFrame
    # based on specific values in the two user defined key columns:
    grouped_df = objects_df.groupby(user_input)
    return grouped_df

# Run the functions to get user input for key selection

# Load the data:
objects_df = load_data(data_filepath)

# Get user input:
user_input = get_user_input(objects_df)

# Apply the grouping:
grouped_df = group_data(data_filepath, user_input)

Welcome to the Cooper Hewitt, Smithsonian Design Museum collections object randomizer. 
Please see below a list of keys used to define objects in the collection .CSV file:

Available keys:
accession_number
creditline
date
decade
department_id
description
dimensions
dimensions_raw
gallery_text
id
inscribed
is_active
is_loan_object
justification
label_text
markings
media_id
medium
on_display
period_id
primary_image
primary_image2
provenance
signed
title
title_raw
tms:id
tombstone
type
type_id
url
videos
woe:country
woe:country_id
woe:country_name
year_acquired
year_end
year_start

Select which keys you would like to pair together to randomly find three or less records which share common values from these elements.
Note that Some key combinations work better than others. See which ones yield the best results

Please enter one of the keys listed above. It is best to copy and paste everything between the quotes:
woe:country

Please enter a second key:
year_acquired

You have selected "woe:c

##2. Selection of random values from grouping and extracting up to three full records

Now with user selected grouped keys, the user can start the process of extracting random values contained within those grouped keys. The program seeks to pull at most three records (a full row) from the group by searching for two sets of shared values.

_I.e. if both records share a similar randomly selected "type" like "Teacup" as well as "date" such as "1925", the program will pull three records that are all Teacups made in 1925._

If one grouped value comes up with "nan" or a null entry, text is printed asking the user to either rerun the cell or go back to the previous cell to reselect two new keys. If only 1-2 records are found, the program still pulls those records which the user can continue with.

In [94]:
# This is the ideal number of records the function below should aim to return:
num_records = 3

def select_random_group_and_records(grouped_df, num_records=3):
    """Selects a random group and a specified number of random records from that group.

    Inputs:
        grouped_df: The grouped DataFrame.
        num_records (int, optional): The number of records to select. Defaults to 3.

    Returns:
        tuple: A tuple containing the selected group key and the random records.
        selected_group_value (string): The key of the selected group.
        random_records (DataFrame): The randomly selected records.
        An error message is printed if no records are found for the selected group.
    """

    # Creates a list of all possible key values from the DataFrame (grouped_df) called group_keys:
    group_key_value = list(grouped_df.groups.keys())
    # The random method pulls a random choice of values from the group_keys variable above:
    selected_group_value = random.choice(group_key_value)

    try:
        # Using pd get_group() method, extracts whole records
        # from the randomized variable selected_group_value
        selected_group_records = grouped_df.get_group(selected_group_value)
    except KeyError or UnboundLocalError:
        print(f"Could not find records since one group value is blank, please run this cell again.\nYou may also reselect your two keys in the cell above and rerun both cells.\n")
        # Returns an empty DataFrame if a KeyError occurs,
        # and asks the user to try again or change the parameters in the cell before
        return selected_group_value, pd.DataFrame()

    # Checks if the number of records fall within selected_group_records
    # variable is equal to the desired number of records (num_records):
    if len(selected_group_records) >= num_records:
        # Selects at most three records from the selected_group:
        random_records = selected_group_records.sample(n=num_records)
    else:
        # If less than three records are found, it will still display what was found:
        random_records = selected_group_records
    # Returns the selected grouped valued based on user inputted keys and the
    # random records that share those randomly selected values in those keys:
    return selected_group_value, random_records

# Call the functions directly to execute the code
grouped_df = group_data(data_filepath, user_input)
selected_group_value, random_records = select_random_group_and_records(grouped_df, num_records)

print("Randomly selected group values based on user selected keys:\n")
print(f'{user_input[0]} = {selected_group_value[0]}')
print(f'{user_input[1]} = {selected_group_value[1]}\n')
if not random_records.empty:
  print("Randomly Selected Records:")
  print()
  print(random_records)

Randomly selected group values based on user selected keys:

woe:country = 23424775.0
year_acquired = 1981.0

Randomly Selected Records:

      accession_number                          creditline            date  \
81309       1981-28-29  Bequest of Gertrude M. Oppenheimer  September 1795   
79466       1981-28-24  Bequest of Gertrude M. Oppenheimer     Ca. 1880–90   
61387        1981-54-1                  Gift of Toan Klein         1980–81   

       decade  department_id  \
81309     NaN       35347501   
79466     NaN       35347501   
61387  1980.0       35347497   

                                             description  \
81309  Alphabet at top.  "THIS WORK IN MY HAND MY FRI...   
79466  Verse of "The Dying Christian to his Soul."  L...   
61387  Body of thick greenish glass with air bubble i...   

                                         dimensions  dimensions_raw  \
81309      H x W: 26.4 x 21 cm (10 3/8 x 8 1/4 in.)             NaN   
79466    H x W: 43.4 x 43.5cm (17 1/1

##3. Data Extraction and HTML Table Generation and Display

In this final section, we take the variable random_records containing 1-3 full records generated by the functions above, and select specific columns of metadata to feed into an HTML table display. The table is formatted using CSS for the preferred look and feel of the presented information in a table format.

The following functions format the rows of the table as well as determine if the 'Image' field contains a proper image. If it does not, it will default to a Cooper Hewitt logo instead. The function also wraps the image URL in an \<a> tag so the user may click to view a larger version of the image in a new tab of their browser.

The final result is a neatly printed HTML table presenting the image and details of 1-3 randomly selected records based on the user input provided in the first section. The dictionary "HTML" is imported from the IPython.display library which helps implement HTML to the Python code.

In [95]:
def extract_record_data(random_records):
  """Extracts desired metadata from the records and makes a list of dictionaries.
  Input:
      random_records: The 1-3 randomly selected records from the grouped DataFrame.

  Return:
      all_records_data: A list of dictionaries containing the desired metadata.
  """
  # Create a list of the disred metadata to be displayed in the HTML table:
  all_records_data = []
  # Use a loop to clearly define each column in the HTML table display:
  for column in random_records.values:
    record_data = {
        'Image': column[20],
        'Title': column[24],
        'Date': column[2],
        'Medium': column[17],
        'Dimensions': column[6],
        'Type': column[28],
        'Country': column[34],
        'Accession Number': column[0]
        }
    all_records_data.append(record_data)
  return all_records_data

def create_table_CSS_header():
  """Creates the HTML table header with some CSS styling.
  Input:
      None

  Return:
      table_css_header: CSS styling for the html_table.
  """
  table_css_header = """
  <table style='border-collapse: separate; border-spacing: 10px; border: 2px solid #ddd;'>
    <tr>
        <th style='border: 2px dotted #fff; padding: 8px;'>Image</th>
        <th style='border: 2px dotted #fff; padding: 8px;'>Title</th>
        <th style='border: 2px dotted #fff; padding: 8px;'>Date</th>
        <th style='border: 2px dotted #fff; padding: 8px;'>Medium</th>
        <th style='border: 2px dotted #fff; padding: 8px;'>Dimensions</th>
        <th style='border: 2px dotted #fff; padding: 8px;'>Type</th>
        <th style='border: 2px dotted #fff; padding: 8px;'>Country</th>
        <th style='border: 2px dotted #fff; padding: 8px;'>Accession Number</th>
    </tr>
  """
  return table_css_header

def create_html_table_rows(all_records_data):
  """Creates the HTML table rows with record data.
  Input:
      all_records_data: A list of dictionaries containing the desired metadata.

  Return:
      html_rows: HTML table rows with record data.
  """
  html_rows = ""
  # Generates an HTML table to visually display the data compiled in the "all_records_data" list made above
  # (1-3 records) stored in all_records_data list defined above:
  for record in all_records_data:
    html_rows += "<tr>"
    # Check to see if the record contains a valid image link:
    image_link = record['Image'] if 'Image' in record and pd.notna(record['Image']) else 'https://upload.wikimedia.org/wikipedia/commons/thumb/f/fa/Cooper_Hewitt%2C_Smithsonian_Design_Museum_logo.svg/320px-Cooper_Hewitt%2C_Smithsonian_Design_Museum_logo.svg.png'
    # Wrap the image in an <a> tag to create a link in the
    # image thumbnail to view the full size picture in a separate tab:
    html_rows += f"<td style='border: 1px solid #ddd; padding: 8px;'><a href='{image_link}' target='_blank'><img src='{image_link}' width='100'></a></td>"
    for key, value in record.items():
        # Skip 'Image' as it's already handled:
        if key != 'Image':
            html_rows += f"<td style='border: 1px solid #ddd; padding: 8px;'>{value}</td>"
    html_rows += "</tr>"
  return html_rows

def generate_html_table(random_records):
  """Generates the complete HTML table.
  Input:
      random_records: The 1-3 randomly selected records from the grouped DataFrame.

  Return:
      html_table: The complete HTML table.
  """
  all_records_data = extract_record_data(random_records)
  html_table = create_html_table_header()
  html_table += create_html_table_rows(all_records_data)
  html_table += "</table>"
  return html_table

print("Three or less Cooper Hewitt collection objects selected from the database with\nthe following shared group values under the user selected keys:\n")
print(f'{user_input[0]} = {selected_group_value[0]}')
print(f'{user_input[1]} = {selected_group_value[1]}')
print()

# Import the HTML from IPython.display
# Source: https://ipython.readthedocs.io/en/8.26.0/api/generated/IPython.display.html
from IPython.display import HTML
html_table = generate_html_table(random_records)
display(HTML(html_table))

Three or less Cooper Hewitt collection objects selected from the database with
the following shared group values under the user selected keys:

woe:country = 23424775.0
year_acquired = 1981.0



Image,Title,Date,Medium,Dimensions,Type,Country,Accession Number
,"Sampler (Canada), September 1795",September 1795,Silk embroidery on linen foundation,H x W: 26.4 x 21 cm (10 3/8 x 8 1/4 in.),Sampler,Canada,1981-28-29
,"Sampler (Canada), ca. 1880–90",Ca. 1880–90,Wool and cotton embroidery on linen foundation,H x W: 43.4 x 43.5cm (17 1/16 x 17 1/8in.),Sampler,Canada,1981-28-24
,"Vase (Canada), 1980–81",1980–81,"Glass, blown; photo transferred images",H x diam.: 17.4 x 9.5 cm (6 7/8 x 3 3/4 in.),Vase,Canada,1981-54-1
