**Coding Script**

The following code script cleans and transforms the solar farm design log data, assigning coding categories to actions; such categories include:

(1) "Explore Relevant Features" (abbreviated as ERF)

(2) "Explore Irrelevant Features" (abbreviated as EIF)

(3) "Run Relevant Simulation" (abbreviated as RRS)

(4) "Run Irrelevant Simulation" (abbreviated as RIS)

(5) "Change Relevant Parameters" (abbreviated as CRP)

(6) "Change Irrelevant Parameters" (abbreviated as CIP)

(7) "House Keeping Actions" (abbreviated as HKA)

It also discards single "Copy" actions (ones not followed by "Paste..."), and actions falling under "notification" category.

*Link References:*

(1) Coding Scheme: https://www.dropbox.com/home/Epistemic%20Network%20Analysis?preview=Action+Coding+scheme+%28Final%29.xlsx

(2) Log Data File: https://www.dropbox.com/home/Epistemic%20Network%20Analysis?preview=Solar+farm_log+data_three+schools.xlsx

In [1]:
#pip install pandas openpyxl numpy
import pandas as pd
import numpy as np

# Load the two excel files into pandas dataframes
df_log = pd.read_excel("solar_farm_log_data.xlsx")  # your log data
df_scheme = pd.read_excel("action_coding_scheme.xlsx")  # your coding scheme

# Transform 'Code' and 'Action' columns in the coding scheme to lowercase
df_scheme['Code'] = df_scheme['Code'].str.lower().replace(" ", "_")
df_scheme['Action'] = df_scheme['Action'].str.lower()

# Create a dictionary of Actions and their corresponding Codes
action_code_dict = pd.Series(df_scheme.Code.values, index=df_scheme.Action).to_dict()

# Transform 'Action' column in the log data to lowercase before mapping
df_log['Action'] = df_log['Action'].str.lower()

# Use the dictionary to assign the corresponding "Coded Action" to each record in the log data dataframe
df_log['Coded Action'] = df_log['Action'].map(action_code_dict)

# Handle missing data
df_log['Coded Action'] = df_log['Coded Action'].replace(np.nan, 'Undefined')

# List the actions to simplify
actions_to_simplify = ["paste by key", "add", "move", "move polygon", "delete", "rotate", "resize", "resize polygon", "resize wall", "cut", "paste to point"]

# Filter out repeated consecutive actions
df_log = df_log.loc[(df_log['Action'] != df_log['Action'].shift()) | (~df_log['Action'].isin(actions_to_simplify))]

# For copy-paste sequences, keep only the "Copy" action and discard the "Paste by Key" or "Paste to Point" action
copy_paste_mask = ((df_log['Action'] == 'copy') & (df_log['Action'].shift(-1).isin(['paste by key', 'paste to point'])))
df_log.loc[copy_paste_mask, 'Action'] = 'copy - ' + df_log['Action'].shift(-1)
df_log = df_log.loc[~((df_log['Action'] == 'copy') & (~copy_paste_mask))]

# Filter out the following "Paste by Key" or "Paste to Point" action
df_log = df_log.loc[(~copy_paste_mask.shift(fill_value=False)) | (copy_paste_mask)]

# Create a dictionary of categories and their abbreviations
category_abbrev_dict = {"explore task relevant features": "ERF",
                        "expore task relevant features": "ERF",
                        "explore task irrelevant features": "EIF",
                        "run relevant simulation": "RRS",
                        "run irrelevant simulation": "RIS",
                        "change relevant parameters": "CRP",
                        "change irrelevant parameters": "CIP",
                        "house keeping actions": "HKA",
                        "(notification)": "NTF"}

# Use the dictionary to abbreviate the "Coded Action" categories
df_log['Coded Action'] = df_log['Coded Action'].map(category_abbrev_dict).fillna(df_log['Coded Action'])

# Discard entries that are categorized as "notification"
df_log = df_log[df_log['Coded Action'] != 'NTF']

# Save the dataframe back to a CSV file
df_log.to_csv("solar_farm_log_data.csv", index=False)

In [2]:
action_code_dict

{'solar panel array layout': 'change relevant parameters',
 'run yearly simulation for solar panels': 'run relevant simulation',
 'close solar panel yearly yield graph': 'house keeping actions',
 'copy': 'change relevant parameters',
 'save cloud file': 'house keeping actions',
 'save as cloud file': 'house keeping actions',
 'list cloud files': 'house keeping actions',
 'create new file': 'house keeping actions',
 'show sun and time settings panel': 'expore task relevant features',
 'show heliodon': 'expore task relevant features',
 'close sun and time settings panel': 'house keeping actions',
 'generate daily solar radiation heatmap (static)': 'expore task relevant features',
 'set tilt angle for all solar panel arrays': 'change relevant parameters',
 'set pole spacing for all solar panel arrays': 'explore task irrelevant features',
 'run yearly simulation for solar panels: individual': 'run relevant simulation',
 'run yearly simulation for solar panels: total': 'run relevant simulat

**Testing Coding Script**

In [37]:
# Add the manually created actions to the coding scheme
new_actions = ['copy - paste by key', 'copy - paste to point']
new_codes = [action_code_dict['copy']]*len(new_actions)  # use the same code as 'copy' action

df_scheme = df_scheme.append(pd.DataFrame({'Action': new_actions, 'Code': new_codes}), ignore_index=True)

# Update the dictionary
action_code_dict = pd.Series(df_scheme.Code.values, index=df_scheme.Action).to_dict()

In [38]:
# Check if all the actions in your log data exist in the coding scheme:
missing_actions = df_log.loc[~df_log['Action'].isin(df_scheme['Action']), 'Action'].unique()
assert len(missing_actions) == 0, f"These actions are missing in the coding scheme: {missing_actions}"

# Check if all actions in the log data have a corresponding coded action:
assert df_log['Coded Action'].isnull().sum() == 0, "There are actions in the log data that were not coded"

# Check if there are no consecutive repeated actions left:
assert not ((df_log['Action'] == df_log['Action'].shift()) & df_log['Action'].isin(actions_to_simplify)).any(), "There are still consecutive repeated actions in the data"

# Check if the 'notification' category was correctly removed:
assert 'NTF' not in df_log['Coded Action'].unique(), "The 'notification' category was not correctly removed"

-----