# [Sensifai](https://sensifai.com) Audio Event Classification
Sensifai offers one of the most accurate Deep Learning training platform to train your audio event data and incorporate it into your application. This product lets you access Sensifai's advanced audio event classification algorithm, train it with our own data and validate it with any dataset you would like. Our Multi-Label/Single-Label tagging system let you find general tags in any audio file.

## Step 1: Preparing your data
Before starting the training job you have to prepare your data and copy the files to S3. generally, you should have 2 separate folders for training and validation files, and also 2 CSV  files named train_samples.csv
validation_samples.csv respectively.

- All of the audio data should be in the same format, for example, wav or mp3.
- the algorithm supports 2 audio formats: Wav and mp3, if the algorithm cannot detect the audio format, skips the file.  also if the file is corrupted, the algorithm skips it. if the number of files that algorithm skip exceeds 50% of all files in the folder, the training procedure exits with the failure code.

- the standard audio duration is 10 seconds, it means for audio length less than 10, we will repeat last frames to reach  10 seconds, and for audios longer than 10 seconds, just first 10 seconds will be kept. audio with length less than 5 will be omitted. the prediction is calculated over 1 seconds frame and will be averaged over 10 seconds.
- the input CSV contains rows of data, each row contains audio file name without the extension of the file, and it's related tags.  for example, if one of the sample audio files is "dogbark.wav"  we have to use "dogbark"   instead of the original file name. The second field is a list of class names with double quotes and braces.  for example: "[Croak, Frog]" Croak is first class name and Frog is the second one. if the goal is a single-label classification,   you don't need double quotes. An example of some  rows is:

570RD5v5HyE,"[Croak,Frog]" 

69C9cBKgsBI,[Lawn mower]

bRuJev7JLxA,[Chainsaw]

cYqfSDt2B_A,[Frog]

- the second field should be exactly the same as example(contains comma between class names and include braces). if you have more than one classes  you should use double quotes



## uploading files to S3

In [1]:
import sagemaker as sage
import boto3
import time
from sagemaker import get_execution_role

role = get_execution_role()

bucket = "your bucket here"
prefix = "prefix on s3 that the test files are stored"
sess = sage.Session()
s3_train="s3://{}/{}/train/".format(bucket,prefix)
s3_validation="s3://{}/{}/validation/".format(bucket,prefix)

#we already have transfer the data to s3, if you want to copy the files uncomment below code  
#s3_train = sess.upload_data(train_data_dir, bucket, "{}/train".format(prefix))
#s3_validation = sess.upload_data(validation_data_dir, bucket, "{}/validation".format(prefix))

print("uploaded training data file to {}".format(s3_train))
print("uploaded validation data file to {}".format(s3_validation))

uploaded training data file to s3://sensifai-sagemaker-artifacts/algorithm-validation/audio-event-recognition/train/
uploaded validation data file to s3://sensifai-sagemaker-artifacts/algorithm-validation/audio-event-recognition/validation/



### important notes:
- currently, we support just "FileMode" for input mode. In order to insure that there is enough space for transferring and preprocessing files, please set *ValumeSizeinGB* parameter of the *ResourceConfig* section to 2*size_of_dateset-inGB*)
## Step 2: Create a model 
__Training Parameters__


| Name                              |Description                                                                                       | Type       | IsTunable | IsRequired | DefaultValue | Range         |
|-----------------------|---------------------------------------------------------------------------------------------------|------------|-----------|------------|--------------|---------------|
| data_type             | 0 is mp3 , 1 is wave and default is 1                                                             | Integer    | false     | false      | 1            | [0,1]         |
| num_gpus              | data percentage for validation                                                                    | Integer    | false     | false      | 2            | [1,8]         |
| num_classes           | Total number of classes                                                                           | Integer    | false     | true       | 3            | [2,527]       |
| initial_learning_rate | Initial learning rate                                                                             | Continuous | false     | false      | 0.0001       | [0.00001,0.1] |
| multilabel_flag       | 1 is multilabel, 0 is single label                                                                | Integer    | false     | false      | 1            | [0,1]         |
| lr_patience           | Patience of LR scheduler                                                                          | Integer    | false     | false      | 5            | [1,100]       |
| max_patience          | Terminate training after validation loss become greater than train loss for this number of epochs | Integer    | false     | false      | 10           | [1,100]       |
| num_epochs            | Total number of training epochs                                                                   | Integer    | false     | false      | 10           | [1,100]       |
| weigghted_loss_flag   | 1 imeans weigghted_loss, 0 is not weighted                                                        | Integer    | false     | false      | 1            | [0,1]         |

__Run a SageMaker training job__

This code will start a training job, wait for it to be done, and report its status.

In [2]:
%%time

alg_arn="Paste the algorithm ARN"
job_name_prefix = 'train-sensifai-audio-tagging'
timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())
job_name = job_name_prefix + timestamp

create_training_params = \
{
    "AlgorithmSpecification": {
        "TrainingImage": alg_arn,
        "TrainingInputMode": "File"
    },
    "RoleArn": role,
    "EnableNetworkIsolation":True,
    "OutputDataConfig": {
        "S3OutputPath": 's3://{}/{}/output/{}'.format(bucket,prefix, job_name_prefix)
    },
    "ResourceConfig": {
        "InstanceCount": 1,
        "InstanceType": "ml.p2.8xlarge",
        "VolumeSizeInGB": 20
    },
    "TrainingJobName": job_name,
    "StoppingCondition": {
        "MaxRuntimeInSeconds": 14400
    },
    "HyperParameters": {

    "data_type": "1", 
    "multilabel_flag": "1", 
    "max_patience": "10", 
    "weigghted_loss_flag": "1", 
    "batch_size": "360", 
    "initial_learning_rate": "0.0001", 
    "num_epochs": "5", 
    "num_classes": "4"
},
    
    "InputDataConfig": [
        {
            "ChannelName": "train",
            "DataSource": {
                "S3DataSource": {
                    "S3DataType": "S3Prefix",
                    "S3Uri": s3_train,
                    "S3DataDistributionType": "FullyReplicated"
                }
            },
            "ContentType": "",
            "CompressionType": "None"
        },
        {
            "ChannelName": "validation",
            "DataSource": {
                "S3DataSource": {
                    "S3DataType": "S3Prefix",
                    "S3Uri": s3_validation,
                    "S3DataDistributionType": "FullyReplicated"
                }
            },
            "ContentType": "",
            "CompressionType": "None"
        }
    ]
}

sagemaker = boto3.client(service_name='sagemaker')
sagemaker.create_training_job(**create_training_params)
status = sagemaker.describe_training_job(TrainingJobName=job_name)['TrainingJobStatus']
print('Training job current status: {}'.format(status))

try:
    sagemaker.get_waiter('training_job_completed_or_stopped').wait(TrainingJobName=job_name)
    job_info = sagemaker.describe_training_job(TrainingJobName=job_name)
    status = job_info['TrainingJobStatus']
    print("Training job ended with status: " + status)
except:
    print('Training failed to start')
    message = sagemaker.describe_training_job(TrainingJobName=job_name)['FailureReason']
    print('Training failed with the following error: {}'.format(message))

Training job current status: InProgress
Training failed to start
Training failed with the following error: AlgorithmError: Exit Code: 255
CPU times: user 121 ms, sys: 17.4 ms, total: 138 ms
Wall time: 8min


## Step3 :Create a SageMaker model 
This will set up the model created during training within SageMaker to be used later for recognition.


In [19]:
timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())
model_name="sensifai-audio-tagging" + timestamp
job_info = sagemaker.describe_training_job(TrainingJobName=job_name)
model_data = job_info['ModelArtifacts']['S3ModelArtifacts']
model_package_arn =  "Paste the model ARN"
model_creation = {
    "ModelName": model_name,
    "PrimaryContainer": {
        "ModelPackageName": model_package_arn
    },
    "ExecutionRoleArn":role
} 

## For Marketplace products, Network isolation flag must be set to true
model_creation['EnableNetworkIsolation'] = True

model = sagemaker.create_model(**model_creation)

{'ModelArn': 'arn:aws:sagemaker:us-east-2:320478615219:model/sensifai-audio-tagging-2019-02-24-22-00-42', 'ResponseMetadata': {'RequestId': '56855123-f452-4939-857b-4341c484e12d', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '56855123-f452-4939-857b-4341c484e12d', 'content-type': 'application/x-amz-json-1.1', 'content-length': '104', 'date': 'Sun, 24 Feb 2019 22:00:42 GMT'}, 'RetryAttempts': 0}}


## step 4: inference with the trained model  (Batch transform)
finally the model is ready to serve and you can feed the videos to the model and save the results in output folder


In [20]:
%%time

s3_batch_input="s3://{}/{}/test/".format(bucket,prefix)
#we already have transfer the data to s3, if you want to copy the files uncomment below code  
# s3_bath_input = sess.upload_data(batch_input_dir, bucket, "{}/test".format(prefix))
print("uploaded batch data files to {}".format(s3_batch_input))

timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())
batch_job_name = "sensifai-audio-tagging-bt" + timestamp
batch_output = 's3://{}/{}/output/{}'.format(bucket,prefix, batch_job_name)

request = \
{
  "TransformJobName": batch_job_name,
  "MaxConcurrentTransforms": 0,
  "MaxPayloadInMB": 0,
  "ModelName": model_name,
  "TransformInput": {
    "DataSource": {
      "S3DataSource": {
        "S3DataType": "S3Prefix",
        "S3Uri": s3_batch_input
      }
    },
    "ContentType": "audio/*",
    "CompressionType": "None",
    "SplitType": "None"
  },
  "TransformOutput": {
    "S3OutputPath": batch_output,
    "Accept": "application/json",
    "AssembleWith": "Line"
  },
  "TransformResources": {
    "InstanceType": "ml.p2.xlarge",
    "InstanceCount": 1
  }
}

sagemaker.create_transform_job(**request)

print("Created Transform job with name: ", batch_job_name)

while(True):
    job_info = sagemaker.describe_transform_job(TransformJobName=batch_job_name)
    status = job_info['TransformJobStatus']
    if status == 'Completed':
        print("Transform job ended with status: " + status)
        break
    if status == 'Failed':
        message = job_info['FailureReason']
        print('Transform failed with the following error: {}'.format(message))
        raise Exception('Transform job failed') 
    time.sleep(30)

uploaded batch data files to s3://sensifai-sagemaker-artifacts/algorithm-validation/audio-event-recognition/test/
Created Transform job with name:  sensifai-audio-tagging-bt-2019-02-24-22-02-03
Transform job ended with status: Completed
CPU times: user 185 ms, sys: 3.73 ms, total: 189 ms
Wall time: 6min 1s



### download the results

import os
import json
from pprint import pprint

output_path="./output"

if not os.path.exists(output_path):
    os.makedirs(output_path)
    
!aws s3 cp $batch_output $output_path --recursive

#do anything with json files

## step 5 : cleaning up

In [None]:
# optionally uncomment and run the code to clean everything up
#sagemaker.delete_model(ModelName= model_name)