## Data extraction: daily, plankton

In bash, I scped the files from the server to my local machine:
```
cd /home/lindsay/hioekg-2013
scp lindsayv@frinkiac.soest.hawaii.edu:/share/frinkraid3/lindsayv/hioekg-2013/output_semi_daily/hioekg_his_* .
```

I ran the following loop to extract the surface layer for each half-day average:

```
printf -v PATH2 "/home/lindsay/hioekg-2013/"
cd $PATH2
# define local variable tn0; here, the integer that corresponds to the ROMS time stamp on the file (without zero padding)  -- this is Jan 2, 2013
tn0=4750
# for loop: loop over 13 files
for ((i=0; i<13; i++ ));
do
## NOW INSIDE FOR LOOP

# in this printf command, %s means to copy a string
# and %05d means a 5 digit integer padded with 0s on the left
printf -v FNin "%shioekg_his_%05d.nc" $PATH2 $tn0
printf -v FNout "%shioekg_his_surface_%05d.nc" $PATH2 $tn0
# this command extracts the surface layer (s_rho) and put the output in new file $FNout
ncks -O -d s_rho,-0.975 $FNin $FNout
# output to screen
echo $tn0
echo $FNin
echo $FNout
# increase tn0 by 30 days
tn0=$((tn0+30))

## done CLOSES FOR LOOP
done
cd $PATH2

```

#### Import modules

In [2]:
import netCDF4
from netCDF4 import Dataset
import numpy as np
import pandas as pd
from pathlib import Path
import os
import matplotlib.pyplot as plt
import xarray as xr
import scipy.stats as stats

#### Define lists called in the forthcoming for loops

In [10]:
day_2013 = ['4750','4780','4810','4840','4870','4900','4930','4960','4990','5020','5050', '5080'] 

In [3]:
%cd /home/lindsay/hioekg-2013/
%ls

/home/lindsay/hioekg-2013
hioekg_his_04750.nc  hioekg_his_05020.nc          hioekg_his_surface_04930.nc
hioekg_his_04780.nc  hioekg_his_05050.nc          hioekg_his_surface_04960.nc
hioekg_his_04810.nc  hioekg_his_05110.nc          hioekg_his_surface_04990.nc
hioekg_his_04840.nc  hioekg_his_surface_04750.nc  hioekg_his_surface_05020.nc
hioekg_his_04870.nc  hioekg_his_surface_04780.nc  hioekg_his_surface_05050.nc
hioekg_his_04900.nc  hioekg_his_surface_04810.nc  hioekg_his_surface_05080.nc
hioekg_his_04930.nc  hioekg_his_surface_04840.nc  hioekg_his_surface_05110.nc
hioekg_his_04960.nc  hioekg_his_surface_04870.nc  [0m[01;34mSeasonal[0m/
hioekg_his_04990.nc  hioekg_his_surface_04900.nc  test.nc


## Loop through all history files for 2013

In [4]:
# Reset the working directory before running the loops
%pwd
%cd /home/lindsay/hioekg-2013/

# Add master list for the files collected by this first nested loop and group lists
file_list = []
nsmz=[]
nmdz=[]
nlgz=[]
nsm=[]
nlg=[]

for i in range(0,12):
        folder = '/home/lindsay/hioekg-2013/'
        os.chdir(folder)
        file = xr.open_dataset('hioekg_his_surface_0' + str(day_2013[i]) + '.nc')
        file_list.append(file)      

for item in file_list: 
# nsmz: Small zooplankton
    dat = item.nsmz
    dat_numpy = dat.values
    dat_numpy=dat_numpy[~np.isnan(dat_numpy)]
    nsmz.append(dat_numpy)
# nmdz: Medium zooplankton
    dat = item.nmdz
    dat_numpy = dat.values
    dat_numpy=dat_numpy[~np.isnan(dat_numpy)]
    nmdz.append(dat_numpy)
# nlgz: Large zooplankton
    dat = item.nlgz
    dat_numpy = dat.values
    dat_numpy=dat_numpy[~np.isnan(dat_numpy)]
    nlgz.append(dat_numpy)
# nsm: Small phytoplankton
    dat = item.nsm
    dat_numpy = dat.values
    dat_numpy=dat_numpy[~np.isnan(dat_numpy)]
    nsm.append(dat_numpy)
# nlg: Large phytoplankton
    dat = item.nlg
    dat_numpy = dat.values
    dat_numpy=dat_numpy[~np.isnan(dat_numpy)]
    nlg.append(dat_numpy)

/home/lindsay/hioekg-2013


In [5]:
len(nsm)

12

In [6]:
len(nsm[4])

614636

In [78]:
len(nsm[12]) # Awkward last bit of December- maybe ignore? Or deal w separately. 
# Rerunning loop above to not include for now.

90684

In [7]:
614636/61

10076.0

The len of each array=614636. Each variable list contains 61 frames, so each frame (= each 12 hour half day average) is len=10076.

For 4750, data collection starts at Jan 2 00:00 and ends Feb 1 00:00 = 61 frames. I want to remove the 61st frame from each array: 614636-10076=604560.

I want to label each of the 12 arrays of len=604560 in the list with a month and date.

I want to label every other frame (half day average, len=10076) with timestamp 00:00 or 12:00.

My path forward:

- Use a for loop to slice each array within the lists to include indices from 0:604560 (slices off the day that is repeated at the start of the next month).
- Use a nested loop to label each i as i (from month/date list)
- Once my df is assembled, label every i%2 (every other frame/row) from timestamp list, starting at 00:00

Then I'll apply this framework to the 2014 data.
For now, I am ignoring that tiny chunk at the end of December; may add it in once I have the other data nicely labeled.

#### Classifying values by date, month, time

In [6]:
# Creating 12-unit date list and month list, 2-unit timestamp list, and empty list to store organized values:
date = ['2013-01-01','2013-02-01','2013-03-01','2013-04-01','2013-05-01',
        '2013-06-01','2013-07-01','2013-08-01','2013-09-01','2013-10-01',
        '2013-11-01','2013-12-01']
month=['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
timestamp=['00:00','12:00']

# Store my dfs separately for my own benefit in error reduction
list_for_dataframe_nsmz_2013=[]
list_for_dataframe_nmdz_2013=[]
list_for_dataframe_nlgz_2013=[]
list_for_dataframe_nsm_2013=[]
list_for_dataframe_nlg_2013=[]

#### nsmz (2013)

In [9]:
%cd /home/lindsay/hioekg-2013

# Iterating sequentially (i) through each array (element) in the list of arrays (selection)
for i,element in enumerate(nsmz):
    element = element[0:604560]
    this_date = date[i] # each array i is assigned date i
    this_month = month[i] # each array i is assigned month i
    for sub_element in element:
        list_for_dataframe_nsmz_2013.append(
            {'month': this_month, 'date': this_date, 'concentration': sub_element})

nsmz_df_2013 = pd.DataFrame(list_for_dataframe_nsmz_2013)

# Add group identifier and year column
nsmz_df_2013['group']='nsmz'
nsmz_df_2013['year']=2013

# Add timestamp:
timestamp = ['00:00','12:00'] * 3627360 # one half the length of the df
nsmz_df_2013['timestamp']= timestamp
print(len(nsmz_df_2013))

%cd /home/lindsay/hioekg-compare-years/
pd.DataFrame.to_csv(nsmz_df_2013,'nsmz_semidaily_df_2013.csv')
nsmz_df_2013.head()

/home/lindsay/hioekg-2013
7254720
/home/lindsay/hioekg-compare-years


Unnamed: 0,month,date,concentration,group,year,timestamp
0,Jan,2013-01-01,1.406497e-07,nsmz,2013,00:00
1,Jan,2013-01-01,1.406054e-07,nsmz,2013,12:00
2,Jan,2013-01-01,1.40561e-07,nsmz,2013,00:00
3,Jan,2013-01-01,1.405165e-07,nsmz,2013,12:00
4,Jan,2013-01-01,1.407283e-07,nsmz,2013,00:00


#### nmdz (2013)

In [103]:
%cd /home/lindsay/hioekg-2013

# Iterating sequentially (i) through each array (element) in the list of arrays (selection)
for i,element in enumerate(nmdz):
    element = element[0:604560]
    this_date = date[i] # each array i is assigned date i
    this_month = month[i] # each array i is assigned month i
    for sub_element in element:
        list_for_dataframe_nmdz_2013.append(
            {'month': this_month, 'date': this_date, 'concentration': sub_element})

nmdz_df_2013 = pd.DataFrame(list_for_dataframe_nmdz_2013)

# Add group identifier and year column
nmdz_df_2013['group']='nmdz'
nmdz_df_2013['year']=2013

# Add timestamp:
timestamp = ['00:00','12:00'] * 3627360 # one half the length of the df
nmdz_df_2013['timestamp']= timestamp
print(len(nmdz_df_2013))

%cd /home/lindsay/hioekg-compare-years/
pd.DataFrame.to_csv(nmdz_df_2013,'nmdz_semidaily_df_2013.csv')
nmdz_df_2013.head()

/home/lindsay/hioekg-2013
7254720
/home/lindsay/hioekg-compare-years


Unnamed: 0,month,date,concentration,group,year,timestamp
0,Jan,2013-01-01,5.923502e-08,nmdz,2013,00:00
1,Jan,2013-01-01,5.925435e-08,nmdz,2013,12:00
2,Jan,2013-01-01,5.927377e-08,nmdz,2013,00:00
3,Jan,2013-01-01,5.929326e-08,nmdz,2013,12:00
4,Jan,2013-01-01,5.917489e-08,nmdz,2013,00:00


#### nlgz (2013)

In [109]:
%cd /home/lindsay/hioekg-2013

# Iterating sequentially (i) through each array (element) in the list of arrays (selection)
for i,element in enumerate(nlgz):
    element = element[0:604560]
    this_date = date[i] # each array i is assigned date i
    this_month = month[i] # each array i is assigned month i
    for sub_element in element:
        list_for_dataframe_nlgz_2013.append(
            {'month': this_month, 'date': this_date, 'concentration': sub_element})

nlgz_df_2013 = pd.DataFrame(list_for_dataframe_nlgz_2013)

# Add group identifier and year column
nlgz_df_2013['group']='nlgz'
nlgz_df_2013['year']=2013

# Add timestamp:
timestamp = ['00:00','12:00'] * 3627360 # one half the length of the df
nlgz_df_2013['timestamp']= timestamp
print(len(nlgz_df_2013))

%cd /home/lindsay/hioekg-compare-years/
pd.DataFrame.to_csv(nlgz_df_2013,'nlgz_semidaily_df_2013.csv')
nlgz_df_2013.head()

/home/lindsay/hioekg-2013
7254720
/home/lindsay/hioekg-compare-years


Unnamed: 0,month,date,concentration,group,year,timestamp
0,Jan,2013-01-01,1.366578e-08,nlgz,2013,00:00
1,Jan,2013-01-01,1.367985e-08,nlgz,2013,12:00
2,Jan,2013-01-01,1.369398e-08,nlgz,2013,00:00
3,Jan,2013-01-01,1.370815e-08,nlgz,2013,12:00
4,Jan,2013-01-01,1.363337e-08,nlgz,2013,00:00


#### nsm

In [110]:
%cd /home/lindsay/hioekg-2013

# Iterating sequentially (i) through each array (element) in the list of arrays (selection)
for i,element in enumerate(nsm):
    element = element[0:604560]
    this_date = date[i] # each array i is assigned date i
    this_month = month[i] # each array i is assigned month i
    for sub_element in element:
        list_for_dataframe_nsm_2013.append(
            {'month': this_month, 'date': this_date, 'concentration': sub_element})

nsm_df_2013 = pd.DataFrame(list_for_dataframe_nsm_2013)

# Add group identifier and year column
nsm_df_2013['group']='nsm'
nsm_df_2013['year']=2013

# Add timestamp:
timestamp = ['00:00','12:00'] * 3627360 # one half the length of the df
nsm_df_2013['timestamp']= timestamp
print(len(nsm_df_2013))

%cd /home/lindsay/hioekg-compare-years/
pd.DataFrame.to_csv(nsm_df_2013,'nsm_semidaily_df_2013.csv')
nsm_df_2013.head()

/home/lindsay/hioekg-2013
7254720
/home/lindsay/hioekg-compare-years


Unnamed: 0,month,date,concentration,group,year,timestamp
0,Jan,2013-01-01,9.019494e-08,nsm,2013,00:00
1,Jan,2013-01-01,9.02188e-08,nsm,2013,12:00
2,Jan,2013-01-01,9.024291e-08,nsm,2013,00:00
3,Jan,2013-01-01,9.026729e-08,nsm,2013,12:00
4,Jan,2013-01-01,9.017967e-08,nsm,2013,00:00


#### nlg

In [13]:
%cd /home/lindsay/hioekg-2013

# Iterating sequentially (i) through each array (element) in the list of arrays (selection)
for i,element in enumerate(nlg):
    element = element[0:604560]
    this_date = date[i] # each array i is assigned date i
    this_month = month[i] # each array i is assigned month i
    for sub_element in element:
        list_for_dataframe_nlg_2013.append(
            {'month': this_month, 'date': this_date, 'concentration': sub_element})

nlg_df_2013 = pd.DataFrame(list_for_dataframe_nlg_2013)

# Add group identifier and year column
nlg_df_2013['group']='nlg'
nlg_df_2013['year']=2013

# Add timestamp:
timestamp = ['00:00','12:00'] * 3627360 # one half the length of the df
nlg_df_2013['timestamp']= timestamp
print(len(nlg_df_2013))

%cd /home/lindsay/hioekg-compare-years/
pd.DataFrame.to_csv(nlg_df_2013,'nlg_semidaily_df_2013.csv')
nlg_df_2013.head()

/home/lindsay/hioekg-2013
7254720
/home/lindsay/hioekg-compare-years


Unnamed: 0,month,date,concentration,group,year,timestamp
0,Jan,2013-01-01,7.828189e-08,nlg,2013,00:00
1,Jan,2013-01-01,7.821094e-08,nlg,2013,12:00
2,Jan,2013-01-01,7.813979e-08,nlg,2013,00:00
3,Jan,2013-01-01,7.806845e-08,nlg,2013,12:00
4,Jan,2013-01-01,7.844851e-08,nlg,2013,00:00


This machine will actually die if I try to concatenate all dataframes. So I'm not going to do that yet.

## Loop through each sector: 2014

Note: I ran the loop below on the frinkraid. I'm now scping these surface files to my local machine, and I'm going to explore storing the data in a PostgreSQL database, because there is a *lot* of it.

I ran the following loop to extract the surface layer for each half-day average:

```
printf -v PATH2 "/share/frinkraid3/lindsayv/hioekg-2014/output_semi_daily/"
cd $PATH2
# define local variable tn0; here, the integer that corresponds to the ROMS time stamp on the file (without zero padding)  -- this is Jan 2, 2014
tn0=5115
# for loop: loop over 13 files
for ((i=0; i<13; i++ ));
do
## NOW INSIDE FOR LOOP

# in this printf command, %s means to copy a string
# and %05d means a 5 digit integer padded with 0s on the left
printf -v FNin "%shioekg_his_%05d.nc" $PATH2 $tn0
printf -v FNout "%shioekg_his_surface_%05d.nc" $PATH2 $tn0
# this command extracts the surface layer (s_rho) and put the output in new file $FNout
ncks -O -d s_rho,-0.975 $FNin $FNout
# output to screen
echo $tn0
echo $FNin
echo $FNout
# increase tn0 by 30 days
tn0=$((tn0+30))

## done CLOSES FOR LOOP
done
cd $PATH2

```

...and now, scp to my local machine:

```
cd /home/lindsay/hioekg-2014
scp lindsayv@frinkiac.soest.hawaii.edu:/share/frinkraid3/lindsayv/hioekg-2014/output_semi_daily/hioekg_his_surface* .

cd /home/lindsay/Documents/biogeochem_bonanza
scp lindsayv@frinkiac.soest.hawaii.edu:/share/frinkraid3/lindsayv/Manuscript/netCDF_extraction_daily_plankton_IP
```

In [1]:
day_2014 = ['5115','5145','5175','5205','5235','5265','5295','5325','5355','5385','5415','5445']

In [4]:
# Reset the working directory before running the loops
%pwd
%cd /home/lindsay/hioekg-2014/

# Add master list for the files collected by this first nested loop and group lists
file_list = []
nsmz=[]
nmdz=[]
nlgz=[]
nsm=[]
nlg=[]

for i in range(0,12):
        folder = '/home/lindsay/hioekg-2014/'
        os.chdir(folder)
        file = xr.open_dataset('hioekg_his_surface_0' + str(day_2014[i]) + '.nc')
        file_list.append(file)      

for item in file_list: 
# nsmz: Small zooplankton
    dat = item.nsmz
    dat_numpy = dat.values
    dat_numpy=dat_numpy[~np.isnan(dat_numpy)]
    nsmz.append(dat_numpy)
# nmdz: Medium zooplankton
    dat = item.nmdz
    dat_numpy = dat.values
    dat_numpy=dat_numpy[~np.isnan(dat_numpy)]
    nmdz.append(dat_numpy)
# nlgz: Large zooplankton
    dat = item.nlgz
    dat_numpy = dat.values
    dat_numpy=dat_numpy[~np.isnan(dat_numpy)]
    nlgz.append(dat_numpy)
# nsm: Small phytoplankton
    dat = item.nsm
    dat_numpy = dat.values
    dat_numpy=dat_numpy[~np.isnan(dat_numpy)]
    nsm.append(dat_numpy)
# nlg: Large phytoplankton
    dat = item.nlg
    dat_numpy = dat.values
    dat_numpy=dat_numpy[~np.isnan(dat_numpy)]
    nlg.append(dat_numpy)

/home/lindsay/hioekg-2014


In [7]:
# Creating 12-unit date list and month list, 2-unit timestamp list, and empty list to store organized values:
date = ['2014-01-01','2014-02-01','2014-03-01','2014-04-01','2014-05-01',
        '2014-06-01','2014-07-01','2014-08-01','2014-09-01','2014-10-01',
        '2014-11-01','2014-12-01']
month=['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
timestamp=['00:00','12:00']

# Store my dfs separately for my own benefit in error reduction
list_for_dataframe_nsmz_2014=[]
list_for_dataframe_nmdz_2014=[]
list_for_dataframe_nlgz_2014=[]
list_for_dataframe_nsm_2014=[]
list_for_dataframe_nlg_2014=[]

#### nsmz (2014)

In [5]:
%cd /home/lindsay/hioekg-2014

# Iterating sequentially (i) through each array (element) in the list of arrays (selection)
for i,element in enumerate(nsmz):
    element = element[0:604560]
    this_date = date[i] # each array i is assigned date i
    this_month = month[i] # each array i is assigned month i
    for sub_element in element:
        list_for_dataframe_nsmz_2014.append(
            {'month': this_month, 'date': this_date, 'concentration': sub_element})

nsmz_df_2014 = pd.DataFrame(list_for_dataframe_nsmz_2014)

# Add group identifier and year column
nsmz_df_2014['group']='nsmz'
nsmz_df_2014['year']=2014

# Add timestamp:
timestamp = ['00:00','12:00'] * 3627360 # one half the length of the df
nsmz_df_2014['timestamp']= timestamp
print(len(nsmz_df_2014))

%cd /home/lindsay/hioekg-compare-years/
pd.DataFrame.to_csv(nsmz_df_2014,'nsmz_semidaily_df_2014.csv')
nsmz_df_2014.head()

/home/lindsay/hioekg-2014
7254720
/home/lindsay/hioekg-compare-years


Unnamed: 0,month,date,concentration,group,year,timestamp
0,Jan,2014-01-01,8.479923e-08,nsmz,2014,00:00
1,Jan,2014-01-01,8.476973e-08,nsmz,2014,12:00
2,Jan,2014-01-01,8.474002e-08,nsmz,2014,00:00
3,Jan,2014-01-01,8.471011e-08,nsmz,2014,12:00
4,Jan,2014-01-01,8.493387e-08,nsmz,2014,00:00


#### nmdz (2014)

In [6]:
%cd /home/lindsay/hioekg-2014

# Iterating sequentially (i) through each array (element) in the list of arrays (selection)
for i,element in enumerate(nmdz):
    element = element[0:604560]
    this_date = date[i] # each array i is assigned date i
    this_month = month[i] # each array i is assigned month i
    for sub_element in element:
        list_for_dataframe_nmdz_2014.append(
            {'month': this_month, 'date': this_date, 'concentration': sub_element})

nmdz_df_2014 = pd.DataFrame(list_for_dataframe_nmdz_2014)

# Add group identifier and year column
nmdz_df_2014['group']='nmdz'
nmdz_df_2014['year']=2014

# Add timestamp:
timestamp = ['00:00','12:00'] * 3627360 # one half the length of the df
nmdz_df_2014['timestamp']= timestamp
print(len(nmdz_df_2014))

%cd /home/lindsay/hioekg-compare-years/
pd.DataFrame.to_csv(nmdz_df_2014,'nmdz_semidaily_df_2014.csv')
nmdz_df_2014.head()

/home/lindsay/hioekg-2014
7254720
/home/lindsay/hioekg-compare-years


Unnamed: 0,month,date,concentration,group,year,timestamp
0,Jan,2014-01-01,5.034725e-08,nmdz,2014,00:00
1,Jan,2014-01-01,5.034684e-08,nmdz,2014,12:00
2,Jan,2014-01-01,5.034665e-08,nmdz,2014,00:00
3,Jan,2014-01-01,5.034669e-08,nmdz,2014,12:00
4,Jan,2014-01-01,5.034224e-08,nmdz,2014,00:00


#### nlgz (2014)

In [7]:
%cd /home/lindsay/hioekg-2014

# Iterating sequentially (i) through each array (element) in the list of arrays (selection)
for i,element in enumerate(nlgz):
    element = element[0:604560]
    this_date = date[i] # each array i is assigned date i
    this_month = month[i] # each array i is assigned month i
    for sub_element in element:
        list_for_dataframe_nlgz_2014.append(
            {'month': this_month, 'date': this_date, 'concentration': sub_element})

nlgz_df_2014 = pd.DataFrame(list_for_dataframe_nlgz_2014)

# Add group identifier and year column
nlgz_df_2014['group']='nlgz'
nlgz_df_2014['year']=2014

# Add timestamp:
timestamp = ['00:00','12:00'] * 3627360 # one half the length of the df
nlgz_df_2014['timestamp']= timestamp
print(len(nlgz_df_2014))

%cd /home/lindsay/hioekg-compare-years/
pd.DataFrame.to_csv(nlgz_df_2014,'nlgz_semidaily_df_2014.csv')
nlgz_df_2014.head()

/home/lindsay/hioekg-2014
7254720
/home/lindsay/hioekg-compare-years


Unnamed: 0,month,date,concentration,group,year,timestamp
0,Jan,2014-01-01,1.142207e-08,nlgz,2014,00:00
1,Jan,2014-01-01,1.142196e-08,nlgz,2014,12:00
2,Jan,2014-01-01,1.142195e-08,nlgz,2014,00:00
3,Jan,2014-01-01,1.142204e-08,nlgz,2014,12:00
4,Jan,2014-01-01,1.141875e-08,nlgz,2014,00:00


#### nsm (2014)

In [8]:
%cd /home/lindsay/hioekg-2014

# Iterating sequentially (i) through each array (element) in the list of arrays (selection)
for i,element in enumerate(nsm):
    element = element[0:604560]
    this_date = date[i] # each array i is assigned date i
    this_month = month[i] # each array i is assigned month i
    for sub_element in element:
        list_for_dataframe_nsm_2014.append(
            {'month': this_month, 'date': this_date, 'concentration': sub_element})

nsm_df_2014 = pd.DataFrame(list_for_dataframe_nsm_2014)

# Add group identifier and year column
nsm_df_2014['group']='nsm'
nsm_df_2014['year']=2014

# Add timestamp:
timestamp = ['00:00','12:00'] * 3627360 # one half the length of the df
nsm_df_2014['timestamp']= timestamp
print(len(nsm_df_2014))

%cd /home/lindsay/hioekg-compare-years/
pd.DataFrame.to_csv(nsm_df_2014,'nsm_semidaily_df_2014.csv')
nsm_df_2014.head()

/home/lindsay/hioekg-2014
7254720
/home/lindsay/hioekg-compare-years


Unnamed: 0,month,date,concentration,group,year,timestamp
0,Jan,2014-01-01,9.472166e-08,nsm,2014,00:00
1,Jan,2014-01-01,9.472244e-08,nsm,2014,12:00
2,Jan,2014-01-01,9.472346e-08,nsm,2014,00:00
3,Jan,2014-01-01,9.472469e-08,nsm,2014,12:00
4,Jan,2014-01-01,9.473372e-08,nsm,2014,00:00


#### nlg (2014)

In [9]:
%cd /home/lindsay/hioekg-2014

# Iterating sequentially (i) through each array (element) in the list of arrays (selection)
for i,element in enumerate(nlg):
    element = element[0:604560]
    this_date = date[i] # each array i is assigned date i
    this_month = month[i] # each array i is assigned month i
    for sub_element in element:
        list_for_dataframe_nlg_2014.append(
            {'month': this_month, 'date': this_date, 'concentration': sub_element})

nlg_df_2014 = pd.DataFrame(list_for_dataframe_nlg_2014)

# Add group identifier and year column
nlg_df_2014['group']='nlg'
nlg_df_2014['year']=2014

# Add timestamp:
timestamp = ['00:00','12:00'] * 3627360 # one half the length of the df
nlg_df_2014['timestamp']= timestamp
print(len(nlg_df_2014))

%cd /home/lindsay/hioekg-compare-years/
pd.DataFrame.to_csv(nlg_df_2014,'nlg_semidaily_df_2014.csv')
nlg_df_2014.head()

/home/lindsay/hioekg-2014
7254720
/home/lindsay/hioekg-compare-years


Unnamed: 0,month,date,concentration,group,year,timestamp
0,Jan,2014-01-01,3.601164e-08,nlg,2014,00:00
1,Jan,2014-01-01,3.59681e-08,nlg,2014,12:00
2,Jan,2014-01-01,3.592446e-08,nlg,2014,00:00
3,Jan,2014-01-01,3.588073e-08,nlg,2014,12:00
4,Jan,2014-01-01,3.6198e-08,nlg,2014,00:00
