# Sample submission

Assuming you have looked at our other notebooks, you should be familiar with extracting training samples from focal recordings, train a model and then make predictions on soundscape data. The only step that is missing is a valid submission to enter the leaderboard. In this tutorial, we will introduce the very basics of how to run inference on a hidden test set.

First thing we need to do is to look for test soundscapes. **The hidden test set will only appear if you submit the notebook.** Yet, you can use the training soundscapes to test your workflow.

In [None]:
import os
import pandas as pd

# First, get a list of soundscape files to process.
# We'll use the test_soundscape directory if it contains "ogg" files
# (which it only does when submitting the notebook), 
# otherwise we'll use the train_soundscape folder to make predictions.
def list_files(path):
    return [os.path.join(path, f) for f in os.listdir(path) if f.rsplit('.', 1)[-1] in ['ogg']]
test_audio = list_files('../input/birdclef-2021/test_soundscapes')
if len(test_audio) == 0:
    test_audio = list_files('../input/birdclef-2021/train_soundscapes')
    
print('{} FILES IN TEST SET.'.format(len(test_audio)))

Filenames already include all the data that we need to make a valid submission (test and train soundscapes have the same naming scheme). 

Take a look at this example:

In [None]:
path = test_audio[0]
data = path.split(os.sep)[-1].rsplit('.', 1)[0].split('_')

print('FILEPATH:', path)
print('ID: {}, SITE: {}, DATE: {}'.format(data[0], data[1], data[2]))

We could load the *test.csv* as well, but for now, there's no benefit in doing so. However, the *test.csv* might come in handy when confirming that your generated row_id actually exists in the test data.

Let's make some predictions. In this tutorial, we'll simply use mock predictions, so that we can focus on the submission process.

In [None]:
# This is where we will store our results
pred = {'row_id': [], 'birds': []}

# Analyze each soundscape recording
for path in test_audio:
    
    # Open file with Librosa
    
    # Split file into 5-second chunks
    
    # Extract spectrogram for each chunk
    
    # Predict on spectrogram
    
    # Get row_id and birds and store result
    # (maybe using a post-filter based on location)
    
    # The above steps are just placeholders, we will use mock predictions.
    # Our "model" will predict "nocall" for each spectrogram.
    fileinfo = path.split(os.sep)[-1].rsplit('.', 1)[0].split('_')
    for second in range(5, 605, 5):
        row_id = fileinfo[0] + '_'  + fileinfo[1] + '_'  + str(second)  
        pred['row_id'].append(row_id)
        pred['birds'].append('nocall')
        
# Make a new data frame and look at a few "results"
results = pd.DataFrame(pred, columns = ['row_id', 'birds'])
results.head()

Finally, we have to convert our result data frame to a csv file named "submission.csv". This way, it will get rated when submitting the notebook.

In [None]:
# Convert our results to csv
results.to_csv("submission.csv", index=False)

Ok, but how do we actually submit? These are the steps you need to take:

1. Save the notebook by clicking "Save Version" in the upper right corner (only visible in edit mode).
2. The notebook will then show up under "Code" on the main competition page (it will also show up under "Your work").
3. Click and view the notebook.
4. Now click on the three dots in the upper right corner and select "Submit to Competition" (see screenshot below).
5. Follow the on-screen instructions.
6. Wait for the notebook to finish, results will show up under "My Submissions".

![How to submit to the competition](https://tuc.cloud/index.php/s/z9eWEA8ZtbHki3i/preview)

That's it. Please make sure to check out our other notebooks, leave a comment if you have any remarks and please don't hesitate to start a new forum thread if you have any questions.