# [Sensifai](https://sensifai.com) music genre Recognition
Sensifai offers one of the most accurate Deep Learning training platform to train your music genre recognition system and incorporate it into your application. This product lets you access Sensifai's advanced music genre recognition algorithm and train it with our own data and validate it with any set you would like. Our video music genre recognition system let you recognize genres of music in audio files.

## Step 1: Preparing your data
Before starting the training job you have to prepare your data and copy the files to S3.
- we have a *VALIDATION_SPILITER* parameter. if it sets to  "1" it splits training folder. the algorithm split the data based on the *VALIDATION_PERCENT* parameter. but if it sets to "0" you should make a  separate validation folder. 
in train(and if you have validation) folder, you have to create a subfolder for each class and copy the audio files there. it's recommended to use audios that satisfy the following  conditions:
- the algorithm supports most of the common audio formats (mp3,wav,...) and we do not set any limitation on the audio type. However, if the format and sample rate of the test and train files do not match, our algorithm would change the format and sample rate. if the algorithm cannot detect the audio format, skips the file.  either if the file is corrupted, the algorithm skips it. if the number of files that algorithm skip exceeds 50% of all files in the folder, the training procedure exited with the failure code (0).
- default sample rate is 22050 Hz and the default format is "au". we do not have strict conditions on the sample rate and file format. However, the low-sample rate may have a bad effect on training accuracy.
- default shape is a limitation on audio file length. which means if your files are smaller than default shape amount, they will be padded to default shape. and if your files are bigger than default shape extra frames will be removed.
- our model is single-GPU.

## uploading files to S3

In [1]:
import sagemaker as sage
import boto3
import time
from sagemaker import get_execution_role

role = get_execution_role()

bucket = "your bucket here"
prefix = "prefix on s3 that the test files are stored"
sess = sage.Session()
s3_train="s3://{}/{}/training/".format(bucket,prefix)
s3_validation="s3://{}/{}/validation/".format(bucket,prefix)

# we already have transfer the data to s3, if you want to copy the files uncomment below code  
# s3_train = sess.upload_data(train_data_dir, bucket, "{}/train".format(prefix))
# s3_validation = sess.upload_data(validation_data_dir, bucket, "{}/validation".format(prefix))

print("uploaded training data file to {}".format(s3_train))
print("uploaded validation data file to {}".format(s3_validation))

uploaded training data file to s3://sensifai-sagemaker-artifacts/algorithm-validation/MusicGenreTagging/training/
uploaded validation data file to s3://sensifai-sagemaker-artifacts/algorithm-validation/MusicGenreTagging/validation/




### important note :
- currently, we support just "FileMode" for input mode. In order to insure that there is enough space for transferring and preprocessing files, please set *ValumeSizeinGB* parameter of the *ResourceConfig* section to 2*size_of_dateset-inGB*)

## Step 2: Create a model 
__Training Parameters__

| Name                | Description                                | Type    | Min Value | Max Value | IsTunable | IsRequired | DefaultValue |
|---------------------|--------------------------------------------|---------|-----------|-----------|-----------|------------|--------------|
| WINDOW_SIZE         | Size of window                             | Integer | 256       | 4096      | FALSE     | FALSE      | 2048         |
| N_MELS              | Number of Mel frequency bands              | Integer | 32        | 512       | FALSE     | FALSE      | 128          |
| SEED                | random seed for spiting data               | Integer | 1         | 100       | FALSE     | FALSE      | 42           |
| N_LAYERS            | Number of layers                           | Integer | 1         | 5         | FALSE     | FALSE      | 3            |
| FILTER_LENGTH       | length of filter                           | Integer | 1         | 10        | FALSE     | FALSE      | 5            |
| CONV_FILTER_COUNT   | Number of convolutional filter             | Integer | 32        | 1024      | FALSE     | FALSE      | 256          |
| LSTM_COUNT          | Number of LSTM                             | Integer | 128       | 512       | FALSE     | FALSE      | 256          |
| BATCH_SIZE          | Number of batches                          | Integer | 1         | 128       | FALSE     | FALSE      | 32           |
| EPOCH_COUNT         | Number of samples for training             | Integer | 1         | 500       | FALSE     | FALSE      | 100          |
| DEFAULT_SHAPE       | input frame length to pad                  | Integer | 500       | 1000      | FALSE     | FALSE      | 700          |
| VALIDATION_SPLITER | split training data                        | boolean | 1         | 0         | FALSE     | FALSE      | 1            |
| VALIDATION_PERCENT  | what percentage of data is for validation. | Integer | 10        | 50        | FALSE     | FALSE      | 30           |

|
__Run a SageMaker training job__

This code will start a training job, wait for it to be done, and report its status.

In [None]:
%%time
alg_arn="COPY ALGOTITHM ARN HERE "
job_name_prefix = 'train-sensifai-music-tagging'
timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())
job_name = job_name_prefix + timestamp

create_training_params = \
{
    "AlgorithmSpecification": {
        "TrainingImage": alg_arn,
        "TrainingInputMode": "File"
    },
    "RoleArn": role,
    "OutputDataConfig": {
        "S3OutputPath": 's3://{}/{}/output/{}'.format(bucket,prefix, job_name_prefix)
    },
    "ResourceConfig": {
        "InstanceCount": 1,
        "InstanceType": "ml.p2.xlarge",
        "VolumeSizeInGB": 40
    },
    "TrainingJobName": job_name,
    "StoppingCondition": {
        "MaxRuntimeInSeconds": 14400
    },
    "HyperParameters": 
{
    "WINDOW_SIZE" : "2048",
    "N_MELS" : "128",
    "SEED" : "42",
    "N_LAYERS" : "3",
    "FILTER_LENGTH" : "5",
    "CONV_FILTER_COUNT" : "256",
    "LSTM_COUNT" : "256",
    "BATCH_SIZE" : "32",
    "EPOCH_COUNT" : "2",
    "DEFAULT_SHAPE" : "700",
    "VALIDATION_SPILITER" : "1",
    "VALIDATION_PERCENT" : "0.3"
},


    "InputDataConfig": [
        {
            "ChannelName": "training",
            "DataSource": {
                "S3DataSource": {
                    "S3DataType": "S3Prefix",
                    "S3Uri": s3_train,
                    "S3DataDistributionType": "FullyReplicated"
                }
            },
            "ContentType": "",
            "CompressionType": "None"
        }
    ]
}

sagemaker = boto3.client(service_name='sagemaker')
sagemaker.create_training_job(**create_training_params)
status = sagemaker.describe_training_job(TrainingJobName=job_name)['TrainingJobStatus']
print('Training job current status: {}'.format(status))

try:
    sagemaker.get_waiter('training_job_completed_or_stopped').wait(TrainingJobName=job_name)
    job_info = sagemaker.describe_training_job(TrainingJobName=job_name)
    status = job_info['TrainingJobStatus']
    print("Training job ended with status: " + status)
except:
    print('Training failed to start')
    message = sagemaker.describe_training_job(TrainingJobName=job_name)['FailureReason']
    print('Training failed with the following error: {}'.format(message))

Training job current status: InProgress
Training job ended with status: Completed
CPU times: user 159 ms, sys: 17.3 ms, total: 176 ms
Wall time: 16min 1s


## Step3 :Create a SageMaker model 
This will set up the model created during training within SageMaker to be used later for recognition.


In [4]:
timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())
model_name="sensifai-music-genre" + timestamp
job_info = sagemaker.describe_training_job(TrainingJobName=job_name)
model_data = job_info['ModelArtifacts']['S3ModelArtifacts']

model_package_arn =  "Paste the model ARN"
model_creation = {
    "ModelName": model_name,
    "PrimaryContainer": {
        "ModelPackageName": model_package_arn
    },
    "ExecutionRoleArn":role,
    "EnableNetworkIsolation": True,
}

model = sagemaker.create_model(**model_creation)
sagemaker.describe_model(ModelName = model_name)

NameError: name 'sagemaker' is not defined

## step 4: inference with the trained model  (Batch transform)
finally the model is ready to serve and you can feed the videos to the model and save the results in output folder


In [18]:
%%time

s3_batch_input="s3://{}/{}/test/".format(bucket,prefix)
#we already have transfer the data to s3, if you want to copy the files uncomment below code  
# s3_bath_input = sess.upload_data(batch_input_dir, bucket, "{}/test".format(prefix))
print("uploaded batch data files to {}".format(s3_batch_input))

timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())
batch_job_name = "sensifai-music-tagging-bt" + timestamp
batch_output = 's3://{}/{}/output/{}'.format(bucket,prefix, batch_job_name)

request = \
{
  "TransformJobName": batch_job_name,
  "MaxConcurrentTransforms": 0,
  "MaxPayloadInMB": 0,
  "ModelName": model_name,
  "TransformInput": {
    "DataSource": {
      "S3DataSource": {
        "S3DataType": "S3Prefix",
        "S3Uri": s3_batch_input
      }
    },
    "ContentType": "audio/*",
    "CompressionType": "None",
    "SplitType": "None"
  },
  "TransformOutput": {
    "S3OutputPath": batch_output,
    "Accept": "application/json",
    "AssembleWith": "Line"
  },
  "TransformResources": {
    "InstanceType": "ml.p2.xlarge",
    "InstanceCount": 1
  }
}

sagemaker.create_transform_job(**request)

print("Created Transform job with name: ", batch_job_name)

while(True):
    job_info = sagemaker.describe_transform_job(TransformJobName=batch_job_name)
    status = job_info['TransformJobStatus']
    if status == 'Completed':
        print("Transform job ended with status: " + status)
        break
    if status == 'Failed':
        message = job_info['FailureReason']
        print('Transform failed with the following error: {}'.format(message))
        raise Exception('Transform job failed') 
    time.sleep(30)

uploaded batch data files to s3://sensifai-sagemaker-artifacts/algorithm-validation/MusicGenreTagging/test/
Created Transform job with name:  sensifai-music-tagging-bt-2018-11-19-13-41-08
Transform job ended with status: Completed
CPU times: user 146 ms, sys: 1.65 ms, total: 148 ms
Wall time: 5min 31s



### download the results

import os
import json
from pprint import pprint

output_path="./output"

if not os.path.exists(output_path):
    os.makedirs(output_path)
    
!aws s3 cp $batch_output $output_path --recursive

#do anything with json files

## step 5 : cleaning up

In [None]:
# optionally uncomment and run the code to clean everything up
#sagemaker.delete_model(ModelName= model_name)