# Create the filelist
We shall begin by creating a filelist for the PSLs, the data set collected in 2008 and the data set collected in 2014.

## The code

In [2]:
import os
import pandas as pd

In [9]:
def create_filelist(data_directory, filelist_name):
    
    '''
    Function for creating a filelist.csv file
    which can be used for labelling each file. 
    
    The filelist will be saved to the current directory and
    NOT the data_directory
    
    Parameters
    ----------
    data_directory : str
      directory containing the data files to be labelled.
      
    filelist_name : str
      Name of the filelist e.g. filelist_2008.csv
    '''
    
    # open the filelist file in write mode
    with open(filelist_name, 'w') as f:
        
        # write header
        f.write('Filename,Label,Short Name\n') 
        
        # write a line containing the filename of each file
        for file in os.listdir(data_directory): 
            f.write(file+'\n')

In [10]:
# for each of the data directories create a filelist
for sub_directory in ['PSL', '2008', '2014']:
    data_directory = os.path.join(os.curdir, sub_directory)
    filelist_name = 'filelist_{}.csv'.format(sub_directory)
    create_filelist(data_directory, filelist_name)

## Editing the filelists
Once the above code is run we should create three filelists: filelist_2008.csv, filelist_2014.csv and filelist_PSL.csv. Each of the files are editted using a standard spreadsheet package e.g. Excel. The following notation is used to label each of the laboratory files

| Label | Description         |
|-------|---------------------|
| F     | Forced trigger file |
| 1     | Bacteria            | 
| 2     | Fungal              | 
| 3     | Pollen              |
| 4     | Non-biological      |

The following notation is used to label the PSL data

| Label | Decscription        |
|-------|---------------------|
| F     | Forced trigger file |
| 1     | $1\mu$m Green       |
| 2     | $2.1\mu$m Blue      |
| 3     | $2.2\mu$m Red       |
| 4     | $3.1\mu$m Green    |
| 5     | $4.17\mu$m          |


In addition, we provide a shorthand name for each file which will be useful later on when we plot figures and create tables.

## The output
Once the filelist has been modified using spreadsheet package, the following .csv file is produced for the PSL data set.

In [7]:
pd.read_csv('filelist_PSL.csv')

Unnamed: 0,Filename,Label,Short Name
0,1_green_0000.csv,1,$1\mu$m Green I
1,1_green_0001.csv,1,$1\mu$m Green I
2,2.1_blue_0000.csv,2,$2.1\mu$m Blue
3,2.2_red_0000.csv,3,$2.2\mu$m Red
4,3.1_green_0000.csv,4,$3.1\mu$m Green
5,4.17_0000_0001.csv,5,$4.17\mu$m I
6,4.17_0000_0002.csv,5,$4.17\mu$m II
7,4.17_0000_0003.csv,5,$4.17\mu$m III
8,FT2_0000.csv,F,
9,FT_0000.csv,F,
