## Benchmark 3: leaderboard scoring test
We are working under the assumption that the month of November is being scored on the public leaderboard. Before we do much else, I want to confirm that assumption with an experiment. We are going to burn a days worth of submission in the process, but I think it's worth it to be sure in the long run. The plan is to use dummy submission files to test what month is actually being scored. The setup will be as follows:
1. [Positive control](#positive_control): predictions for all timepoints - this will give us the expected score when we guess the right month.
2. [Negative control](#negative_control): zeros for all timepoints - this will give us the expected score when we guess the wrong month.
3. [November test](#november_test): predictions for November timepoints, zeros everywhere else.
4. [December test](#december_test): predictions for December timepoints, zeros everywhere else.
5. [January test](#january_test): predictions for January timepoints, zeros everywhere else.
6. [Sanity check](#sanity_check): make sure all submission files have the expected number of zero and non-zero rows
7. [Results](#results)

If more than one month is being scored, the positive control score will be better than the individual test scores and we will have multiple month tests that do better than the negative control. I don't think this is the case, but we will be able to spot with this experimental design.

In [1]:
# Add parent directory to path to allow import of config.py
import sys
sys.path.append('..')
import config as conf

import pandas as pd

print(f'Python: {sys.version}')
print()
print(f'Pandas {pd.__version__}')

Python: 3.10.0 | packaged by conda-forge | (default, Nov 20 2021, 02:24:10) [GCC 9.4.0]

Pandas 1.4.3


<a name="positive_control"></a>
### 1. Positive control

In [2]:
# We already have predictions for all counties/timepoints from the oneshot naive model.
# let's use those to fill in our dummy submission files.
prediction_file = f'{conf.BENCHMARKING_PATH}/2023-01-31_oneshot_naive_model_submission.csv'
positive_control_df = pd.read_csv(prediction_file)
positive_control_df.head()

Unnamed: 0,row_id,microbusiness_density
0,1001_2022-11-01,3.463856
1,1001_2022-12-01,3.463856
2,1001_2023-01-01,3.463856
3,1001_2023-02-01,3.463856
4,1001_2023-03-01,3.463856


In [3]:
# Right off the bat, this is our positive control file so let's
# write in into our data dir for this experiment
output_file = f'{conf.LEADERBOARD_TEST_PATH}/positive_control.csv'
positive_control_df.to_csv(output_file, sep=',', index=False)

<a name="negative_control"></a>
### 2. Negative control

In [4]:
# Next, let's make our negative control, i.e. fill all of the predictions
# with just zeros
negative_control_df = positive_control_df.drop(['microbusiness_density'], axis=1)
negative_control_df['microbusiness_density'] = 0
negative_control_df.head()

Unnamed: 0,row_id,microbusiness_density
0,1001_2022-11-01,0
1,1001_2022-12-01,0
2,1001_2023-01-01,0
3,1001_2023-02-01,0
4,1001_2023-03-01,0


In [5]:
# Write negative control to csv
output_file = f'{conf.LEADERBOARD_TEST_PATH}/negative_control.csv'
negative_control_df.to_csv(output_file, sep=',', index=False)

<a name="november_test"></a>
### 3. November test

In [6]:
# Make test submission file for November

# Get rows from positive control dataframe where the row id contains November of 2022
november_test_positive_df = positive_control_df[positive_control_df['row_id'].str.contains('2022-11-01')]

# Get rows from negative control dataframe where the row id does not contain November of 2022
november_test_negative_df = negative_control_df[~positive_control_df['row_id'].str.contains('2022-11-01')]

# Combine negative and positive
november_test_df = pd.concat([november_test_positive_df, november_test_negative_df], axis=0)

# Clean up and inspect
november_test_df.reset_index(inplace=True, drop=True)
print(f'Num rows: {len(november_test_df)}')
november_test_df.head()


Num rows: 25080


Unnamed: 0,row_id,microbusiness_density
0,1001_2022-11-01,3.463856
1,1003_2022-11-01,8.359798
2,1005_2022-11-01,1.232074
3,1007_2022-11-01,1.28724
4,1009_2022-11-01,1.831783


In [7]:
# Write to csv
output_file = f'{conf.LEADERBOARD_TEST_PATH}/november_test.csv'
november_test_df.to_csv(output_file, sep=',', index=False)

<a name="december_test"></a>
### 4. December test

In [8]:
# Make test submission file for December

# Get rows from positive control dataframe where the row id contains December of 2022
december_test_positive_df = positive_control_df[positive_control_df['row_id'].str.contains('2022-12-01')]

# Get rows from negative control dataframe where the row id does not contain December of 2022
december_test_negative_df = negative_control_df[~positive_control_df['row_id'].str.contains('2022-12-01')]

# Combine negative and positive
december_test_df = pd.concat([december_test_positive_df, december_test_negative_df], axis=0)

# Clean up and inspect
december_test_df.reset_index(inplace=True, drop=True)
print(f'Num rows: {len(december_test_df)}')
december_test_df.head()

Num rows: 25080


Unnamed: 0,row_id,microbusiness_density
0,1001_2022-12-01,3.463856
1,1003_2022-12-01,8.359798
2,1005_2022-12-01,1.232074
3,1007_2022-12-01,1.28724
4,1009_2022-12-01,1.831783


In [9]:
# Write to csv
output_file = f'{conf.LEADERBOARD_TEST_PATH}/december_test.csv'
december_test_df.to_csv(output_file, sep=',', index=False)

<a name="january_test"></a>
### 5. January test

In [10]:
# Make test submission file for January

# Get rows from positive control dataframe where the row id contains January of 2023
january_test_positive_df = positive_control_df[positive_control_df['row_id'].str.contains('2023-01-01')]

# Get rows from negative control dataframe where the row id does not contain January of 2023
january_test_negative_df = negative_control_df[~positive_control_df['row_id'].str.contains('2023-01-01')]

# Combine negative and positive
january_test_df = pd.concat([january_test_positive_df, january_test_negative_df], axis=0)

# Clean up and inspect
january_test_df.reset_index(inplace=True, drop=True)
print(f'Num rows: {len(january_test_df)}')
january_test_df.head()

Num rows: 25080


Unnamed: 0,row_id,microbusiness_density
0,1001_2023-01-01,3.463856
1,1003_2023-01-01,8.359798
2,1005_2023-01-01,1.232074
3,1007_2023-01-01,1.28724
4,1009_2023-01-01,1.831783


In [11]:
# Write to csv
output_file = f'{conf.LEADERBOARD_TEST_PATH}/january_test.csv'
january_test_df.to_csv(output_file, sep=',', index=False)

<a name="sanity_check"></a>
### 6. Sanity check
Before we burn a days worth of submissions with this test, let's load up each file and double check that we have the correct number of non-zero rows in the right places.

In [12]:
positive_control_file = f'{conf.LEADERBOARD_TEST_PATH}/positive_control.csv'
positive_control_df = pd.read_csv(positive_control_file)

positive_control_rows = positive_control_df[positive_control_df['microbusiness_density'] != 0]
print(f'Total rows: {len(positive_control_df)}')
print(f'Num non-zero rows: {len(positive_control_rows)}')
positive_control_rows.head()

Total rows: 25080
Num non-zero rows: 25072


Unnamed: 0,row_id,microbusiness_density
0,1001_2022-11-01,3.463856
1,1001_2022-12-01,3.463856
2,1001_2023-01-01,3.463856
3,1001_2023-02-01,3.463856
4,1001_2023-03-01,3.463856


OK, glad I checked - we have a mismatch. We must have just predicted zero for at least one of the counties. As I remember it, there was a county (Issaquena County, Mississippi, cfips: 28055) which lost all of it's microbusinesses during the time range. That must be where these 8 zero rows are coming from. Propagating the final zero forward in that county.

In [13]:
negative_control_file = f'{conf.LEADERBOARD_TEST_PATH}/negative_control.csv'
negative_control_df = pd.read_csv(negative_control_file)

negative_control_rows = negative_control_df[negative_control_df['microbusiness_density'] != 0]
print(f'Total rows: {len(negative_control_df)}')
print(f'Num non-zero rows: {len(negative_control_rows)}')
negative_control_rows.head()

Total rows: 25080
Num non-zero rows: 0


Unnamed: 0,row_id,microbusiness_density


In [14]:

november_test_file = f'{conf.LEADERBOARD_TEST_PATH}/november_test.csv'
november_test_df = pd.read_csv(november_test_file)

november_test_rows = november_test_df[november_test_df['microbusiness_density'] != 0]
print(f'Total rows: {len(november_test_df)}')
print(f'Num non-zero rows: {len(november_test_rows)}')
november_test_rows.head()

Total rows: 25080
Num non-zero rows: 3134


Unnamed: 0,row_id,microbusiness_density
0,1001_2022-11-01,3.463856
1,1003_2022-11-01,8.359798
2,1005_2022-11-01,1.232074
3,1007_2022-11-01,1.28724
4,1009_2022-11-01,1.831783


In [15]:

december_test_file = f'{conf.LEADERBOARD_TEST_PATH}/december_test.csv'
december_test_df = pd.read_csv(december_test_file)

december_test_rows = december_test_df[december_test_df['microbusiness_density'] != 0]
print(f'Total rows: {len(december_test_df)}')
print(f'Num non-zero rows: {len(december_test_rows)}')
december_test_rows.head()

Total rows: 25080
Num non-zero rows: 3134


Unnamed: 0,row_id,microbusiness_density
0,1001_2022-12-01,3.463856
1,1003_2022-12-01,8.359798
2,1005_2022-12-01,1.232074
3,1007_2022-12-01,1.28724
4,1009_2022-12-01,1.831783


In [16]:

january_test_file = f'{conf.LEADERBOARD_TEST_PATH}/january_test.csv'
january_test_df = pd.read_csv(january_test_file)

january_test_rows = january_test_df[january_test_df['microbusiness_density'] != 0]
print(f'Total rows: {len(january_test_df)}')
print(f'Num non-zero rows: {len(january_test_rows)}')
january_test_rows.head()

Total rows: 25080
Num non-zero rows: 3134


Unnamed: 0,row_id,microbusiness_density
0,1001_2023-01-01,3.463856
1,1003_2023-01-01,8.359798
2,1005_2023-01-01,1.232074
3,1007_2023-01-01,1.28724
4,1009_2023-01-01,1.831783


<a name="results"></a>
### 7. Results
Ok, so results are conclusive. Today is February 1<sup>st</sup>: the public leaderboard is scoring predictions for November 2022 only. My reading of the contest description says this will not change until at least the middle of this month when some of the test data is revealed. For reference, here are the results:
1. **Positive control**: SMAPE = 1.0936
2. **Negative control**: SMAPE = 199.9362
3. **November test**: SMAPE = 1.0936
4. **December test**: SMAPE = 199.9362
5. **January test**: SMAPE = 199.9362

Positive control score matches November test. Negative control matches December and January tests.