# Run WhisperSeg as a Web Service

Runing WhisperSeg as a Web service make it possible to disentangle the environment of the WhisperSeg and the environment where this segmenting function is called. For example, we can set up a WhisperSeg segmenting service at one machine, and call the segmenting service in different working environment (Matlab, Webpage frontend, Jupyter Notebook) at different physical locations.

This enables an easy implementation of calling WhisperSeg in Matlab and is essential for setting up a Web page for automatic vocal segmentation.

## Step 1: Starting the segmenting service
In a terminal, go to the main folder of this repository, and run the following command:
```
python segment_service.py --flask_port 8050 --model_path nccratliri/whisperseg-large-ms-ct2 --device cuda
```

Illustration of the parameters:
* flask_port: the port that this service will keep listening to. Requests that are sent to this port will be handled by this service
* model_path: the path to the WhisperSeg model. This model can either be original huggingface model, e.g., nccratliri/whisperseg-large-ms, or CTranslate converted model, e.g., nccratliri/whisperseg-large-ms-ct2. If you choose to use the Ctranslate converted model, please make sure the converted model exists. If you have a different trained WhisperSeg checkpoint, replace "nccratliri/whisperseg-large-ms-ct2" with the path to the checkpoint.
* device: where to run the WhisperSeg. It can be cuda or cpu. By default we run the model on cuda

**Note**:
The terminal that runs this service needs to be kept open. On Linux system's terminal, one can first create a new screen and run the service in the created screen, to allow the service runing in the background.

## Step 2: Calling the segmenting service

### call the segmenting service in python:

For example, we are segmenting a zebra finch recording:

In [5]:
import requests,json,base64
import pandas as pd

## define a function for segmentation
def call_segment_service( service_address, 
                          audio_file_path,
                          channel_id,
                          sr,
                          min_frequency,
                          spec_time_step,
                          min_segment_length,
                          eps,
                          num_trials,
                          adobe_audition_compatible
                        ):
    audio_file_base64_string = base64.b64encode( open(audio_file_path, 'rb').read()).decode('ASCII')
    response = requests.post( service_address,
                              data = json.dumps( {
                                  "audio_file_base64_string":audio_file_base64_string,
                                  "channel_id":channel_id,
                                  "sr":sr,
                                  "min_frequency":min_frequency,
                                  "spec_time_step":spec_time_step,
                                  "min_segment_length":min_segment_length,
                                  "eps":eps,
                                  "num_trials":num_trials,
                                  "adobe_audition_compatible":adobe_audition_compatible
                              } ),
                              headers = {"Content-Type": "application/json"}
                            )
    return response.json()

**Note (Important):** 
1. Runing the above code does not require any further dependencies or load any models
2. The **service_address** is composed of **SEGMENTING_SERVER_IP_ADDRESS** + **":"** + **FLASK_PORT_NUMBER** + **"/segment"**. If the server is running in the local machine, then the SEGMENTING_SERVER_IP_ADDRESS is "http://localhost", otherwise, you will need to know the IP address of the server machine. 
3. **channel_id** is useful when the input audio file has multiple channels. In this case, channel_id can be used to specify which channel to segment. By default channel_id = 0, which means the first channel is used for segmentation.
4. The choice of the values for **sr, min_frequency, spec_time_step, min_segment_length, eps and num_trials** varies from dataset to dataset. To get the detailed setting of these parameters for different species, please refer to [README.md#Illustration-of-segmentation-parameters](../README.md#Illustration-of-segmentation-parameters), and recommended parameters setting for some common species are available in the config file: [config/segment_config.json](../config/segment_config.json)
5. The parameter **adobe_audition_compatible** is used to control the returned segmentation results format. If adobe_audition_compatible=1, the returned segmentation result is a dictionary that is comptible with Adobe Audition. This means after converting the dictionary to a Dataframe and then to a csv file, this csv file can be directly loaded into Adobe Audition. If adobe_audition_compatible=0, the segmentation result is a simple dictionary containing only "onset", "offset" and "cluster".

#### Get the Adobe Audition compitible segmentation results

In [8]:
prediction = call_segment_service( "http://localhost:8050/segment", 
                          "../data/example_subset/Zebra_finch/test_adults/zebra_finch_g17y2U-f00007.wav",  
                          channel_id = 0,
                          sr = 32000,
                          min_frequency = 0,
                          spec_time_step = 0.0025,
                          min_segment_length = 0.01,
                          eps = 0.02,
                          num_trials = 3,
                          adobe_audition_compatible = 1
                        )
## we can convert the returned dictionary into a pandas Dataframe
df = pd.DataFrame(prediction)

In [9]:
df

Unnamed: 0,Description,Duration,Start,Time Format,Type,﻿Name
0,,0:00.063,0:00.010,decimal,Cue,
1,,0:00.067,0:00.380,decimal,Cue,
2,,0:00.070,0:00.603,decimal,Cue,
3,,0:00.072,0:00.758,decimal,Cue,
4,,0:00.571,0:00.912,decimal,Cue,
5,,0:00.069,0:01.813,decimal,Cue,
6,,0:00.070,0:01.967,decimal,Cue,
7,,0:00.570,0:02.073,decimal,Cue,
8,,0:00.055,0:02.838,decimal,Cue,
9,,0:00.081,0:02.982,decimal,Cue,


We can save the df to the Adobe Audition compitible csv by (note: index = False, sep="\t" is necessary!):

In [10]:
df.to_csv( "prediction_result.csv", index = False, sep="\t")

#### Get the simple segmentation results

In [4]:
prediction = call_segment_service( "http://localhost:8050/segment", 
                          "../data/example_subset/Zebra_finch/test_adults/zebra_finch_g17y2U-f00007.wav",  
                          channel_id = 0,
                          sr = 32000,
                          min_frequency = 0,
                          spec_time_step = 0.0025,
                          min_segment_length = 0.01,
                          eps = 0.02,
                          num_trials = 3,
                          adobe_audition_compatible = 0
                        )
## we can convert the returned dictionary into a pandas Dataframe
pd.DataFrame(prediction)

Unnamed: 0,cluster,offset,onset
0,zebra_finch_0,0.073,0.01
1,zebra_finch_0,0.447,0.38
2,zebra_finch_0,0.673,0.603
3,zebra_finch_0,0.83,0.758
4,zebra_finch_0,1.483,0.912
5,zebra_finch_0,1.882,1.813
6,zebra_finch_0,2.037,1.967
7,zebra_finch_0,2.643,2.073
8,zebra_finch_0,2.893,2.838
9,zebra_finch_0,3.063,2.982


### call the segmenting service in MATLAB:

First define a matlab function

```matlab
function response = call_segment_service(service_address, audio_file_path, channel_id, sr, min_frequency, spec_time_step, min_segment_length, eps, num_trials, adobe_audition_compatible)

    fileID = fopen(audio_file_path, 'r');
    fileData = fread(fileID, inf, 'uint8=>uint8');

    audio_file_base64_string = matlab.net.base64encode( fileData );
    data = struct('audio_file_base64_string', audio_file_base64_string, ...
                  "channel_id", channel_id, ...
                  "sr", sr, ...
                  "min_frequency", min_frequency, ...
                  "spec_time_step", spec_time_step, ...
                  "min_segment_length", min_segment_length, ...
                  "eps", eps, ...
                  "num_trials", num_trials, ... 
                  "adobe_audition_compatible", adobe_audition_compatible );
    jsonData = jsonencode(data);

    options = weboptions( 'RequestMethod', 'POST', 'MediaType', 'application/json'  );
    response = webwrite(service_address, jsonData, options);

end
```

Then call the matlab function in MATLAB console:

```matlab
prediction = prediction = call_segment_service( 'http://localhost:8050/segment', '/Users/meilong/Downloads/zebra_finch_g17y2U-f00007.wav', 0, 32000, 0, 0.0025, 0.01, 0.02, 3, 0 )
disp(prediction)

```