# [Sensifai](https://sensifai.com) Sport Recognition
Sensifai offers one of the most accurate Deep Learning training platform to train your sport recognition system and incorporate it into your application. This product lets you access Sensifai's advanced sport recognition algorithm and train it with our own data and validate it with any set you would like. Our sport  recognition system let you recognize actions and activities in the short and long videos.

## Step 1: preparing your data
Before starting the training job you have to prepare your data and copy the files to S3. generally, it's better to have 2 separated folders for training and validation files. but if you do not provide a separate folder for validation the algorithm split the data based on the *val_percent* and *train_percent* parameters.

in train(and also validation) folder, you have to create a subfolder for each class and copy the video files there. it's recommended to use videos that satisfy the following  conditions:
- the algorithm supports most of the common video formats (mp4,avi,...) and we do not set any limitation on the video type, however, if the algorithm cannot detect the video format, skips the file.  also if the file is corrupted, the algorithm skips it. if the number of files that algorithm skip exceeds 50% of all files in the folder the training procedure exited with the failure code.
- we do not have strict conditions on the resolution. However, very low-resolution videos (lower than 280p)may have a bad effect on training accuracy. also, very high-resolution videos take longer time for transferring and preprocessing. 360p and 480p videos are the ideal 
- depends on the type of the action you want to train, the duration of the video can vary.  the algorithm chooses *num_samples_per_video* (can set in training parameters) random samples from each video file. please keep in mind that if the duration of a video is too long, it may that some irrelevant data is feed to the algorithm and it can mislead the algorithm. videos between 10 seconds to 1 minute can be ideal.
- It's better to have a folder named **other** that contains all videos that don't belong to any class.
- It's important to have a balanced dataset in order to reach outstanding results.

## uploading files to S3

In [None]:
import sagemaker as sage
import boto3
import time
from sagemaker import get_execution_role

role = get_execution_role()

bucket = "your bucket here"
prefix = "prefix on s3 that the test files are stored"

sess = sage.Session()
s3_train="s3://{}/{}/train/".format(bucket,prefix)
s3_validation="s3://{}/{}/validation/".format(bucket,prefix)

#we already have transfer the data to s3, if you want to copy the files uncomment below code  
#s3_train = sess.upload_data(train_data_dir, bucket, "{}/train".format(prefix))
#s3_validation = sess.upload_data(validation_data_dir, bucket, "{}/validation".format(prefix))

print("uploaded training data file to {}".format(s3_train))
print("uploaded validation data file to {}".format(s3_validation))


### important note :
- currently, we support just "FileMode" for input mode. In order to insure that there is enough space for transferring and preprocessing files, please set *ValumeSizeinGB* parameter of the *ResourceConfig* section to 2*size_of_dateset-inGB*)
- Generally, the algorithm needs to work for at least for 20 epochs to achieve sufficient accuracy.


## Step 2: Create a model 
__Training Parameters__



| Name                  | Description                                                                                       | Type       | min Value | Max Value | IsTunable | IsRequired | DefaultValue |
|-----------------------|---------------------------------------------------------------------------------------------------|------------|-----------|-----------|-----------|------------|--------------|
| train_percent         | data percentage for training                                                                      | Continuous | 0         | 1         | False     | False      | 0.8          |
| val_percent           | data percentage for validation                                                                    | Continuous | 0         | 1         | False     | False      | 0.2          |
| learning_rate         | Initial learning rate                                                                             | Continuous | 1E-06     | 0.1       | False     | False      | 0.001        |
| momentum              | Momentum                                                                                          | Continuous | 0         | 0.9       | False     | False      | 0.9          |
| batch_size            | batch size(if set to 0, will automatically set batch size considering GPU memories)               | Integer    | 0         | 500       | False     | True       | 0            |
| lr_patience           | Patience of LR scheduler                                                                          | Integer    | 1         | 100       | False     | False      | 5            |
| max_patience          | Terminate training after validation loss become greater than train loss for this number of epochs | Integer    | 1         | 500       | False     | False      | 10           |
| num_epochs            | Total number of training epochs                                                                   | Integer    | 1         | 1000      | False     | False      | 30           |
| num_samples_per_video | Number of samples to get from each video for training                                             | Integer    | 1         | 100       | False     | False      | 3            |
| score_result_threshold | Show the results that their score is greater than this threshold for each timestamp in Inference json file| Continuous    | 0        | 1       | False     | False      | 0.5            |
| num_result_tags | Number of tags(Top n tags) to show in each timestamp of Inference json file                             | Integer    | 1         | #classes       | False     | False      | 5            |


__Run a SageMaker training job__

This code will start a training job, wait for it to be done, and report its status.

In [None]:
%%time

alg_arn="COPY ALGOTITHM ARN HERE "
job_name_prefix = 'sensifai-sport-rec-train'
timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())
job_name = job_name_prefix + timestamp

create_training_params = \
{
    "AlgorithmSpecification": {
        "TrainingImage": alg_arn,
        "TrainingInputMode": "File"
    },
    "RoleArn": role,
    "OutputDataConfig": {
        "S3OutputPath": 's3://{}/{}/{}/output'.format(bucket,prefix, job_name_prefix)
    },
    "ResourceConfig": {
        "InstanceCount": 1,
        "InstanceType": "ml.p2.8xlarge",
        "VolumeSizeInGB": 40
    },
    "TrainingJobName": job_name,
    "StoppingCondition": {
        "MaxRuntimeInSeconds": 14400
    },
    "HyperParameters": {
   
    },
    "InputDataConfig": [
        {
            "ChannelName": "train",
            "DataSource": {
                "S3DataSource": {
                    "S3DataType": "S3Prefix",
                    "S3Uri": s3_train,
                    "S3DataDistributionType": "FullyReplicated"
                }
            },
            "ContentType": "",
            "CompressionType": "None"
        },
         {
            "ChannelName": "validation",
            "DataSource": {
                "S3DataSource": {
                    "S3DataType": "S3Prefix",
                    "S3Uri": s3_validation,
                    "S3DataDistributionType": "FullyReplicated"
                }
            },
            "ContentType": "",
            "CompressionType": "None"
        }
    ]
}

sagemaker = boto3.client(service_name='sagemaker')
sagemaker.create_training_job(**create_training_params)
status = sagemaker.describe_training_job(TrainingJobName=job_name)['TrainingJobStatus']
print('Training job current status: {}'.format(status))

try:
    sagemaker.get_waiter('training_job_completed_or_stopped').wait(TrainingJobName=job_name)
    job_info = sagemaker.describe_training_job(TrainingJobName=job_name)
    status = job_info['TrainingJobStatus']
    print("Training job ended with status: " + status)
except:
    print('Training failed to start')
    message = sagemaker.describe_training_job(TrainingJobName=job_name)['FailureReason']
    print('Training failed with the following error: {}'.format(message))

## Step3 :Create a SageMaker model 
This will set up the model created during training within SageMaker to be used later for recognition.


In [None]:
timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())
model_name="sensifai-sport-recognition" + timestamp
job_info = sagemaker.describe_training_job(TrainingJobName=job_name)
model_data = job_info['ModelArtifacts']['S3ModelArtifacts']

model_package_arn =  "Paste the model ARN"
model_creation = {
    "ModelName": model_name,
    "PrimaryContainer": {
        "ModelPackageName": model_package_arn
    },
    "ExecutionRoleArn":role,
    "EnableNetworkIsolation": True,
}

model = sagemaker.create_model(**model_creation)
sagemaker.describe_model(ModelName = model_name)

## step 4: inference with the trained model  (Batch transform)
finally the model is ready to serve and you can feed the videos to the model and save the results in output folder


In [None]:
%%time

s3_batch_input="s3://{}/{}/test/".format(bucket,prefix)
#we already have transfer the data to s3, if you want to copy the files uncomment below code  
# s3_bath_input = sess.upload_data(batch_input_dir, bucket, "{}/test".format(prefix))
print("uploaded batch data files to {}".format(s3_batch_input))

timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())
batch_job_name = "sensifai-sport-rec-bt" + timestamp
batch_output = 's3://{}/{}/{}/output'.format(bucket,prefix, batch_job_name)

request = \
{
  "TransformJobName": batch_job_name,
  "ModelName": model_name,
  "TransformInput": {
    "DataSource": {
      "S3DataSource": {
        "S3DataType": "S3Prefix",
        "S3Uri": s3_batch_input
      }
    },
    "ContentType": "video/mp4",
    "CompressionType": "None",
    "SplitType": "None"
  },
  "TransformOutput": {
    "S3OutputPath": batch_output,
    "Accept": "application/json",
    "AssembleWith": "Line"
  },
  "TransformResources": {
    "InstanceType": "ml.p2.xlarge",
    "InstanceCount": 1
  }
}

sagemaker.create_transform_job(**request)

print("Created Transform job with name: ", batch_job_name)

while(True):
    job_info = sagemaker.describe_transform_job(TransformJobName=batch_job_name)
    status = job_info['TransformJobStatus']
    if status == 'Completed':
        print("Transform job ended with status: " + status)
        break
    if status == 'Failed':
        message = job_info['FailureReason']
        print('Transform failed with the following error: {}'.format(message))
        raise Exception('Transform job failed') 
    time.sleep(30)


### download the results

In [None]:
import os
import json
from pprint import pprint

output_path="./output"

if not os.path.exists(output_path):
    os.makedirs(output_path)
    
!aws s3 cp $batch_output $output_path --recursive

#do anything with json files

## step 5 : cleaning up

In [None]:
# optionally uncomment and run the code to clean everything up
#sagemaker.delete_model(ModelName= model_name)