## Tool for making Sample Manager outputs more user-friendly

### Leila Barker, October 2024

#### Summary:
* Save Sample Manager output file to this location (W:\TD&R\Python Tools\Sample Manager data cleanup tool\Tool inputs)
    * Ensure that the file is saved as an Excel Workbook file (not an older version such as Excel 97-2003 Workbook)
    * The tool will import the most recently modified file in this folder if there is more than one file
* Run the following cells in order
* The output file will be saved in the folder W:\TD&R\Python Tools\Sample Manager data cleanup tool\Tool inputs as "SM_cleaned_today's date"

#### Step 1: Import data, delete unauthorized data, clean up DateTimes

In [93]:
# Import the data from the most recent Excel file in the "Tool inputs" folder

import os
import pandas as pd
from datetime import datetime

folder_path = 'W:\TD&R\Python Tools\Sample Manager data cleanup tool\Tool inputs'

files = [f for f in os.listdir(folder_path) if f.endswith('.xlsx') or f.endswith('.xls')]
files_full_path = [os.path.join(folder_path, f) for f in files]

# Find the latest file by modification time and load into a dataframe
latest_file = max(files_full_path, key=os.path.getmtime)
latest_file_name = os.path.basename(latest_file) # Name of the latest file

# Load the latest file into a dataframe
raw_df = pd.read_excel(latest_file)

print(f"Input file path: {latest_file}")
print(f"Input file name: {latest_file_name}")

# Delete any data that does not have the status "Authorised"
raw_df.drop(raw_df.columns[0], axis=1, inplace=True) # Deletes the first column (standard SM export -- not needed)
raw_df = raw_df.loc[raw_df['Result Status'] == 'Authorised'] # Retains only data that has been authorised (has passed final QA/QC)

# Convert datetimes to standard format; add a column for Date without time

temp_df = pd.DataFrame()
temp_df = raw_df
temp_df['DateTime'] = pd.to_datetime(temp_df['Date'], errors='coerce', format='%m/%d/%Y %I:%M %p')
temp_df['JustDate'] = pd.to_datetime(temp_df['DateTime']).dt.strftime('%Y-%m-%d')
raw_df['Date'] = temp_df['JustDate']
raw_df.drop(columns=raw_df.columns[-1], axis=1, inplace=True) # Delete the last column

# Move DateTime next to Date
columns = raw_df.columns.tolist()
date_index = columns.index('Date')  # Get the index of the 'Date' column
columns.remove('DateTime')
columns.insert(date_index + 1, 'DateTime') # Insert 'DateTime' right after 'Date'
raw_df = raw_df[columns] # Reorder the DataFrame based on the new column order

raw_df.head()

Input file path: W:\TD&R\Python Tools\Sample Manager data cleanup tool\Tool inputs\tmp884.xlsx
Input file name: tmp884.xlsx


Unnamed: 0,Sample ID,Date,DateTime,Location,Sample Name,Collection Type,Analysis,Component,Qualifiers,Result,Units,Result Status
0,290612,2021-07-06,2021-07-06 09:54:00,NTS Effluent,,GR01,CHLOROPHYL,Chloroph-A-C (Report),,77.7,ppb_wt_v,Authorised
1,290612,2021-07-06,2021-07-06 09:54:00,NTS Effluent,,GR01,CHLOROPHYL,Chloroph-A-U (Report),,98.0,ppb_wt_v,Authorised
2,290612,2021-07-06,2021-07-06 09:54:00,NTS Effluent,,GR01,CHLOROPHYL,Pheoph-A (Report),,20.4,ppb_wt_v,Authorised
3,290612,2021-07-06,2021-07-06 09:54:00,NTS Effluent,,GR01,HG-PT-T,Hg (Report),,1.55,ppt_wt_v,Authorised
4,290612,2021-07-06,2021-07-06 09:54:00,NTS Effluent,,GR01,NPOC-S,NPOC (Report),,9.31,ppm_wt_v,Authorised


#### Step 2 (Optional): delete qualified data and/or select qualifiers (this is a placeholder for any modifications the user wants to make prior to exporting the data)

In [96]:
# Optional: delete qualified data (note: this deletes all data with something in the "Qualifiers" field;
# if desired, can modify to just delete data with certain qualifiers [e.g., U])

# We could also use this section to create a new field that indicates which parameters were flagged, and with what, prior to the creation of the final table.

raw_df = raw_df[raw_df['Qualifiers'].isna()]

#### Step 3: Create a pivot table with each row representing a single sample
* Note: as is, the parameter headings are ugly. For example, orthophosphate is "NH3OPO4-S o-PO4 (Report)". It would be easy to clean them up by assigning more user-friendly names such as "OPO4".

In [99]:
# Create a pivot table of the data and export to Excel

pivot_df = raw_df.copy()

# Create a new field, "Parameter", combining the "Analysis" and "Component" fields
pivot_df['Parameter'] = pivot_df['Analysis'] + ' ' + pivot_df['Component']

# Create a pivot table
pivot_df = pivot_df.pivot_table(index=['Sample ID', 'Date', 'Location', 'DateTime', 'Collection Type'], columns = 'Parameter', values='Result', aggfunc='first').reset_index()



#### Step 4. Export the data as an Excel workbook

In [102]:
# Get the current date in order to export the cleaned file with a timestamp
current_year = datetime.now().year
today = datetime.today().strftime('%Y-%m-%d')

# Export the file
output_directory = r'W:\TD&R\Python Tools\Sample Manager data cleanup tool\Tool outputs'
latest_file_name_no_ext = os.path.splitext(latest_file_name)[0]
filename = f'{latest_file_name_no_ext}_{today}.xlsx'
output_filepath = os.path.join(output_directory, filename)
pivot_df.to_excel(output_filepath)
print(f'File saved at: {output_filepath}')

File saved at: W:\TD&R\Python Tools\Sample Manager data cleanup tool\Tool outputs\tmp884_2024-10-15.xlsx
