# Organize ribbit scores

This notebook is used to organize ribbit scores after running the model. You can use it to: 
1. Combine ribbit scores from multiple csv files
1. Access subsets of the scores (for example, if you only want the top 10 scores for each wetland) and save this to a new csv file. 

Examples are given with the ribbit scores created from all of the data in Dec 2022. 

Note:`#*#` indicates locations where you may want to edit (e.g. file paths, etc.) 

## Setup

In [1]:
# run the file setup_functions.ipynb to define setting, import packages, and define functions 
%run ../ribbit_functions/setup_functions.ipynb

### Combine multiple ribbit score csv files 

In [2]:
# Only need to run if you need to combine ribbit scores from multiple csv files  
# Useful if you broke up a model run into section to run it faster 
# WARNING: if delete_files = True this will delete individual files after combining them

if input("Are you sure you want to combine csv files? (type 'yes' to continue)")=='yes':
    folder_path = "./results_Dec2022/" #*# path to folder containing the csv files you want to combine 
    rs_ich = combine_csvs(folder_path, new_csv_name = "ribbit_scores_combined.csv", delete_files = True)
else:
    print('aborted')


Are you sure you want to run this? (type 'yes' to continue)n
aborted


## Import data


In [5]:
ribbit_scores_fp = "./results_Dec2022/ribbit_scores_combined.csv" #*# change this to the file path for ribbit scores
wetland_location_fp = "./ichaway_verified_data/fake_wetland_location_datasheet.csv" #*# change this to the file path for wetland location datasheet

# import scores
rs_ich = pd.read_csv(ribbit_scores_fp, index_col = 0)
rs_ich['date']=pd.to_datetime(rs_ich['date']) # convert column to date-time format

wetland_location = pd.read_csv(wetland_location_fp)


## Extracting top ribbit scores 

The `get_top_rs()` function lets you extract a subset of the ribbit scores. For example, you can get the top 10 scores for each logger. 

### Function definition

`def get_top_rs(df, n = 5, min_score = 0.0, t_unit = "Y", \
               group_col = 'no_groups', groups = ["0"], \
               score_col = "score", time_stamp_col = "time_stamp", \
               save_csv = False):`

**Purpose:** get list of audio files with top ribbit scores for certain criterion

**Input:** 
* `df` - data frame with ribbit scores 
* `n` - number of files per group (e.g. n = 5 gets top 5 ribbit scores per group)
* `min_score` - minimum ribbit score needed for file to be included 
      (e.g. if you want all files above a ribbit score of 50, you could have min_score = 50 and n = 999999999999)
* `t_unit` - unit for how often we want the top scores (options: D, W, M, Y, Q - day, week, month, year, quarter year)
* `group_col` - the name of the column with the labels grouping our files 
      (e.g. "pond" for sandhills or "site" for ichaway wetlands)
* `groups` - list of the groupings 
      (e.g. for sandhills the pond numbers [398, 399, 400, 401, 402, 403]; for ichaway would be the wetlands' names)
* `score_col` - column name where ribbit score is stored 
* `time_stamp_col` - column name where time stamp for ribbit score is stored
* `save_csv` - False if we do not want to save our output to a csv. Otherwise string of the file path where we want to save the csv file 
      (e.g. "./ribbit_scores/top_ribbit_scores_per_year.csv")

**Out:**
dataframe with top `n` files with ribbit score over `min_score` for each `groups` for every `t_unit` 
       

### Example

In [4]:
# Get top 1 audio files for 3 loggers for each year and save it to a csv file - using data that is not manually verified
loggers = ['1a', '5a', '7a']
temp = get_top_rs(rs_ich, n = 1, t_unit = 'Y', group_col = 'logger', groups = loggers, save_csv = "./example_top_rs_ichaway.csv")
temp


Unnamed: 0,logger,date_group,score,time_stamp,year,date
/Volumes/Expansion/Frog Call Project/Calling Data/ichaway/ichaway_2014/1a/4-21-14/20140423_224200.wav,1a,2014,188.58,285.0,2014,2014-04-23 22:42:00
/Volumes/Expansion/Frog Call Project/Calling Data/ichaway/ichaway_2015/1a/2-23-15/20150223_230100.wav,1a,2015,108.08,85.5,2015,2015-02-23 23:01:00
/Volumes/Expansion/Frog Call Project/Calling Data/ichaway/ichaway_2014/5a/4-21-14/20140423_004100.wav,5a,2014,182.29,70.0,2014,2014-04-23 00:41:00
/Volumes/Expansion/Frog Call Project/Calling Data/ichaway/ichaway_2015/5a/2-2-15/20150203_194500.wav,5a,2015,576.51,195.5,2015,2015-02-03 19:45:00
/Volumes/Expansion/Frog Call Project/Calling Data/ichaway/ichaway_2014/7a/4-21-14/20140424_234200.wav,7a,2014,184.17,205.0,2014,2014-04-24 23:42:00
/Volumes/Expansion/Frog Call Project/Calling Data/ichaway/ichaway_2015/7a/3-12-15/20150314_231500.wav,7a,2015,260.08,286.5,2015,2015-03-14 23:15:00


In [None]:
# this currently doesn't work for data that has not been verified because we don't know what wetland it was in 

# create variable of all ichaway wetlands that had audio loggers 
#ichaway_wetlands = verified_ich['site'].unique()

# Get top 3 audio files for each wetland and save it to a csv file
#temp = get_top_rs(verified_ich, n = 3, group_col = 'site', groups = ichaway_wetlands, save_csv = "./example_top_rs_ichaway.csv")
