# 3.0_SynAnalyzer_CompileImagingMD_v2.ipynb
Created by: JFranco | 6 AUG 2024 | Env: SynAnalyzer | Last run: 25 SEP 2024

This Python notebook is a essentially a utility function for the SynAnalyzer pipeline. The main task is compiling imaging metadata from multiple prep-level sheets into one MD csv sheet that has only the information that is relevant to plotting SynAnalyzer or ImarisStatsFiles results by tonotopic frequency and normalized to the number of hair cells. 

v2 of the notebook is setup to run in situations where the prep-level MD imaging Excel workbooks have been aggregated into a batch-level folder.

REQUIRES: 
   
    - Prep-level MD imaging Excel workbooks are expected to be named in this format: WSS_###.Metadata.Imaging.xlsx
    - Workbooks must have a sheet called "63xImages" that contain the following information for each image:
        - SlideID
        - Imagename (without the extension)
        - Turn
        - RegionID
        - Freq
        - NoHCRecon *where "NoHCRecon" is the number of hair cells that were used for reconstructing surfaces in Imaris
    - All workbooks are stored within a dedicated folder named "PrepLevelMD"
    - User must enter the specific prepIDs to include in the cell below (after package importing cell)

In [6]:
#     *** IMPORT PACKAGES **
import pandas as pd
import os
import xlrd
import openpyxl
import glob

In [17]:
#     *** WHAT TO ANALYZE // WHERE TO GET/STORE **
# Batch analysis directory (where the new metadata sheet will be stored)
batchID = 'SynAnalysis_BclwSNHL_NeonatalAAV'     
# Preps that are included in the analysis
preps = ['WSS_038','WSS_039','WSS_040','WSS_041']
# Metadata fields to grab
mdFields = ['SlideID','ImageName','Turn','RegionID','Freq','NoHCRecon']

# Directories 
#   existing ones 
dirMain = '/Users/joyfranco/Dropbox (Partners HealthCare)/JF_Shared/Data/WSS/'
dirBA = dirMain+'BatchAnalysis/'+batchID+'/'
dirBAMD = dirBA+'Metadata/'
dirPLMD = dirBAMD+ 'PrepLevelMD/'   

In [19]:
#      *** BEGIN COMPILIATION PROCESS ***
dfMDAll = pd.DataFrame()
# Iterate through the preps and build the new metadata sheet
for prep in preps:
    # Set up the file to look for 
    fnEx = prep+'.Metadata.Imaging.xlsx'

    # Make sure the file exists
    if(os.path.isfile(dirPLMD+fnEx)):
        # Load the sheet
        dfEF = pd.read_excel(dirPLMD+fnEx, sheet_name='63xImages')
    
        # Add it to the main one
        dfMDAll = pd.concat([dfMDAll, dfEF[mdFields]])
    else:
        print(fnEx+" does not exist and could not be loaded.")

dfMDAll.to_csv(dirBAMD+batchID+'.Metadata.Imaging.csv') 

In [20]:
dfMDAll

Unnamed: 0,SlideID,ImageName,Turn,RegionID,Freq,NoHCRecon
0,WSS_038.01,WSS_038.01.T1.01.Zs.4C,1,1,8,11.0
1,WSS_038.01,WSS_038.01.T1.02.Zs.4C,1,2,16,12.0
2,WSS_038.01,WSS_038.01.T2.01.Zs.4C,2,1,32,13.0
3,WSS_038.01,WSS_038.01.T2.02.Zs.4C,2,2,45,
4,WSS_038.01,WSS_038.01.T2.03.Zs.4C,2,3,64,
...,...,...,...,...,...,...
23,WSS_041.06,WSS_041.06.T2.01.Zs.4C,2,1,8,12.0
24,WSS_041.06,WSS_041.06.T2.02.Zs.4C,2,2,16,14.0
25,WSS_041.06,WSS_041.06.T1.01.Zs.4C,1,1,32,12.0
26,WSS_041.06,WSS_041.06.T3.01.Zs.4C,3,1,45,
