# Organize ribbit scores

This notebook combines ribbit scores from multiple csv files and allows you to access subsets of the scores. For example, if you only want the top 10 scores for each pond. This example is given with the ribbit scores created from all of the data in Dec 2022. 

## Setup

In [1]:
# run the file setup_functions.ipynb to define setting, import packages, and define functions 
%run ../ribbit_functions/setup_functions.ipynb

### Combine multiple ribbit score csv files 

In [448]:
# Only need to run if you need to combine ribbit scores from multiple csv files  
# Useful if you broke up a model run into section to run it faster 
# WARNING: if delete_files = True this will delete individual files after combining them

if input("Are you sure you want to run this? (type 'yes' to continue)")=='yes':
    folder_path = "./ribbit_scores_Dec2022/" #*# path to folder containing the csv files you want to combine 
    rs_flshe = combine_csvs(folder_path, new_csv_name = "ribbit_scores_combined.csv", delete_files = True)
else:
    print('aborted')


# Import and clean data

### Define file and folder paths for data import and cleaning 

In [130]:
# file path to csv file with ribbit scores 
ribbit_scores_fp = "./ribbit_scores_flshe_20221206/ribbit_scores_combined.csv" #*#

# file path to csv file with manually verified data 
verified_data_fp = "../manually_verified_data/FLSHE_pond400.csv" #*#

# path to folder containing audio files 
audio_files_fp = '/Volumes/Expansion/Frog Call Project/Calling Data/FLSHE/' #*#
# Note: if the folders within this folder are structured differently, you may need to edit the full file paths in the 
#       data cleaning section below (inicated with #*#)



### Import and clean RIBBIT score data

In [131]:
# Import ribbit scores based on ribbit_scores_fp
rs_flshe = pd.read_csv(ribbit_scores_fp, index_col = 0)

# extract date from file path 
rs_flshe['date'] = pd.to_datetime(rs_flshe.index.str[-19:-4], format='%Y%m%d_%H%M%S', errors='coerce') 


### Import and clean manually verified data 

In [132]:
# import manually verified data 
verified_flshe = pd.read_csv(verified_data_fp)[["File name", "Pond #", "L. capito", "gopher call time", "Date"]] # keeps only listed columns 

# rename columns for convenience
verified_flshe = verified_flshe.rename(columns = {"File name":"file_name", "Pond #":"logger", "L. capito":"Lcapito", "gopher call time":"call_time", "Date":"date"})

# make Lcapito categorical
verified_flshe.Lcapito = verified_flshe.Lcapito.astype("category")

# create year column based on date string
verified_flshe['year'] = verified_flshe.date.str[0:4]
verified_flshe.astype({"year":"int"})

# add .wav to file name if it is not included with the file name 
for i in verified_flshe.index:
    if verified_flshe["file_name"][i][-4:] != ".wav": 
        verified_flshe["file_name"][i] = verified_flshe["file_name"][i] + ".wav"
    
#*# create full file path from file names, year, and logger numbers #*# 
verified_flshe['file_path'] = audio_files_fp + 'FLSHE_' + \
    verified_flshe['year'].astype('string') + \
    '/FLSHE_' + verified_flshe['year'].astype('string') + '_' + verified_flshe['logger'].astype('string') + '/' + \
    verified_flshe['file_name'] #*#

# set file path as index 
verified_flshe = verified_flshe.set_index('file_path')


### Merge ribbit scores to manually verified data 

In [133]:
# merge with ribbit scores data file 
verified_flshe = verified_flshe.drop(columns = ["year", "date", "logger"]).merge(rs_flshe, left_index = True, right_index = True)
verified_flshe = verified_flshe.dropna(subset=['Lcapito']) # drop any rows with "NaN" for Lcapito - if left empty, etc. 


# Using `get_top_rs`

## Function definition 

`def get_top_rs(df, n = 5, min_score = 0.0, t_unit = "Y", \
               group_col = 'no_groups', groups = ["0"], \
               score_col = "score", time_stamp_col = "time_stamp", \
               save_csv = False):`

**Purpose:** get list of audio files with top ribbit scores for certain criterion

**Input:** 
* `df` - data frame with ribbit scores 
* `n` - number of files per group (e.g. n = 5 gets top 5 ribbit scores per group)
* `min_score` - minimum ribbit score needed for file to be included 
      (e.g. if you want all files above a ribbit score of 50, you could have min_score = 50 and n = 999999999999)
* `t_unit` - unit for how often we want the top scores (options: D, W, M, Y, Q - day, week, month, year, quarter year)
* `group_col` - the name of the column with the labels grouping our files 
      (e.g. "pond" for sandhills or "site" for ichaway wetlands)
* `groups` - list of the groupings 
      (e.g. for sandhills the pond numbers [398, 399, 400, 401, 402, 403]; for ichaway would be the wetlands' names)
* `score_col` - column name where ribbit score is stored 
* `time_stamp_col` - column name where time stamp for ribbit score is stored
* `save_csv` - False if we do not want to save our output to a csv. Otherwise string of the file path where we want to save the csv file 
      (e.g. "./ribbit_scores/top_ribbit_scores_per_year.csv")

**Out:**
dataframe with top `n` files with ribbit score over `min_score` for each `groups` for every `t_unit` 
       

## Example

In [144]:
# Get top 3 audio files for each pond and save it to a csv file
ponds = range(398, 404) # create variable of all ponds that had audio loggers 
temp = get_top_rs(rs_flshe, n = 3, group_col = 'logger', groups = ponds, save_csv = "./example_top_rs_flshe.csv")

# Get top 1 audio files for 3 ponds for each year and save it to a csv file 
ponds = [398, 400, 401]
temp = get_top_rs(rs_flshe, n = 1, t_unit = 'Y', group_col = 'logger', groups = ponds, save_csv = "./example_top_rs_flshe.csv")
#temp



In [145]:
temp


Unnamed: 0,logger,date_group,score,time_stamp,year,date
/Volumes/Expansion/Frog Call Project/Calling Data/FLSHE/FLSHE_2017/FLSHE_2017_398/TEST_0+1_20161105_234300.wav,398,2016,135.52,220.0,2017,2016-11-05 23:43:00
/Volumes/Expansion/Frog Call Project/Calling Data/FLSHE/FLSHE_2017/FLSHE_2017_398/TEST_0+1_20171113_173700.wav,398,2017,99.32,77.5,2017,2017-11-13 17:37:00
/Volumes/Expansion/Frog Call Project/Calling Data/FLSHE/FLSHE_2018/FLSHE_2018_398/TEST_0+1_20180114_235300.wav,398,2018,82.1,231.5,2018,2018-01-14 23:53:00
/Volumes/Expansion/Frog Call Project/Calling Data/FLSHE/FLSHE_2019/FLSHE_2019_398/TEST_0+1_20190115_205400.wav,398,2019,73.05,164.0,2019,2019-01-15 20:54:00
/Volumes/Expansion/Frog Call Project/Calling Data/FLSHE/FLSHE_2020/FLSHE_2020_398/TEST_0+1_20200224_203000.wav,398,2020,197.17,68.5,2020,2020-02-24 20:30:00
/Volumes/Expansion/Frog Call Project/Calling Data/FLSHE/FLSHE_2021/FLSHE_2021_398/TEST_0+1_20210208_181700.wav,398,2021,118.05,27.5,2021,2021-02-08 18:17:00
/Volumes/Expansion/Frog Call Project/Calling Data/FLSHE/FLSHE_2022/FLSHE_2022_398/TEST_0+1_20220408_010100.wav,398,2022,177.07,145.0,2022,2022-04-08 01:01:00
/Volumes/Expansion/Frog Call Project/Calling Data/FLSHE/FLSHE_2017/FLSHE_2017_400/TEST_0+1_20170213_212100.wav,400,2017,546.87,286.0,2017,2017-02-13 21:21:00
/Volumes/Expansion/Frog Call Project/Calling Data/FLSHE/FLSHE_2018/FLSHE_2018_400/TEST_0+1_20180212_202000.wav,400,2018,216.16,158.0,2018,2018-02-12 20:20:00
/Volumes/Expansion/Frog Call Project/Calling Data/FLSHE/FLSHE_2019/FLSHE_2019_400/TEST_0+1_20190303_223600.wav,400,2019,509.78,191.0,2019,2019-03-03 22:36:00
