URL index: [Home](https://zzz.bwh.harvard.edu/luna-walkthrough/) | [Data](https://zzz.bwh.harvard.edu/luna-walkthrough/data/) | [S1. File QC](https://zzz.bwh.harvard.edu/luna-walkthrough/p1/) | [S2. Signal QC](https://zzz.bwh.harvard.edu/luna-walkthrough/p2) | [S3. Staging](https://zzz.bwh.harvard.edu/luna-walkthrough/p3) | [S4. Artifacts](https://zzz.bwh.harvard.edu/luna-walkthrough/p4) | [S5. Analysis](https://zzz.bwh.harvard.edu/luna-walkthrough/p5)

Notebook index: [Index](../00_index.ipynb) | [S1. File QC](../p1/00_index.ipynb) | [S2. Signal QC](../p2/00_index.ipynb) | [S3. Staging](../p3/00_index.ipynb) | [S4. Artifacts](../p4/00_index.ipynb) | [S5. Analysis](../p5/00_index.ipynb)

---

#  1.2. Fixing invalid files

Walkthrough URL = [https://zzz.bwh.harvard.edu/luna-walkthrough/p1/filefix/](https://zzz.bwh.harvard.edu/luna-walkthrough/p1/filefix/)

Initiate lunapi and read the previously saved `s1.lst` sample list.

In [1]:
import lunapi as lp
proj = lp.proj()
proj.sample_list( 's1.lst' )
proj.sample_list() 

initiated lunapi v0.1.1 <lunapi.lunapi0.luna object at 0x285b98af0> 

read 20 individuals from s1.lst


Unnamed: 0,ID,EDF,Annotations
1,F01,../work/data//edfs/F01.edf,{../work/data//annots/F01.annot}
2,F02,../work/data//edfs/F02.edf,{../work/data//annots/F02.annot}
3,F03,../work/data//edfs/F03.edf,{../work/data//annots/F03.annot}
4,F04,../work/data//edfs/F04.edf,{../work/data//annots/F04.annot}
5,F05,../work/data//edfs/F05.edf,{../work/data//annots/F05.annot}
6,F06,../work/data//edfs/F06.edf,{../work/data//annots/F06.annot}
7,F07,../work/data//edfs/F07.edf,{../work/data//annots/F07.annot}
8,F08,../work/data//edfs/F08.edf,{../work/data//annots/F08.annot}
9,F09,../work/data//edfs/F09.edf,{../work/data//annots/F09.annot}
10,F10,../work/data//edfs/F10.edf,{../work/data//annots/F10.annot}


As an example of how we can attach and view a valid fileset, e.g. for `F01`:

In [2]:
p = proj.inst( 'F01' )

___________________________________________________________________
Processing: F01 | ../work/data//edfs/F01.edf
 duration 03.00.30, 10830s | time 22.00.00 - 04.58.00 | date 01.01.85

 signals: 60 (of 60) selected in an EDF+D file
  Fp1 | Fp2 | AF3 | AF4 | F7 | F5 | F3 | F1
  F2 | F4 | F6 | F8 | FT7 | FC5 | FC3 | FC1
  FC2 | FC4 | FC6 | FT8 | T7 | C5 | C3 | C1
  C2 | C4 | C6 | T8 | TP7 | CP5 | CP3 | CP1
  CP2 | CP4 | CP6 | TP8 | P7 | P5 | P3 | P1
  P2 | P4 | P6 | P8 | PO3 | PO4 | O1 | O2
  AFZ | FZ | FCZ | CZ | CPZ | PZ | POz | OZ
  A1 | A2 | FPZ | EDF Annotations
  extracting 'EDF Annotations' track from EDF+


In [3]:
p.stat()

Unnamed: 0,Value
annotation_files,../work/data//annots/F01.annot
duration,03.00.30.000
edf_file,../work/data//edfs/F01.edf
id,F01
na,7
ns,59
nt,60
state,1


## Truncated EDFs

In contrast, we expect that `F06` and `F08` should give errors when we try to attach them, as they were flagged as _invalid_.

### F08

This gives the same error `corrupt EDF, file < header size (256 bytes): ../work/data//edfs/F08.edf` as `F08.edf` has a corrupt header.

In [4]:
p = proj.inst( 'F08' )

RuntimeError: corrupt EDF, file < header size (256 bytes): ../work/data//edfs/F08.edf

### F06

Similarly, we expect an error -- although this is one that we can potentially "fix" (i.e. as the truncation is a very small one, just the final EDF record is truncated (< 1 second loss).

In [5]:
p = proj.inst( 'F06' )

RuntimeError: corrupt EDF: expecting 370214400 but observed 370210000 bytes: ../work/data//edfs/F06.edf
details:
  header size ( = 256 + # signals * 256 ) = 15360
  num signals = 59
  record size = 15104
  number of records = 24510
  implied EDF size from header = 15360 + 15104 * 24510 = 370214400

  assuming header correct, implies the file has -0.291314 records too many
  (where one record is 1 seconds)

IF you're confident about the remaining data you can add the option:

    luna s.lst fix-edf=T ... 

  to attempt to fix this.  This may be appropriate under some circumstances, e.g.
  if just the last one or two records were clipped.  However, if other EDF header
  information is incorrect (e.g. number of signals, sample rates), then you'll be
  dealing with GIGO... so be sure to carefully check all signals for expected properties;
  really you should try to determine why the EDF was invalid in the first instance, though


To attempt to "fix" this by ignoring that final, partial EDF record: 

In [6]:
proj.var( 'fix-edf' , 'T' )

Check/review that the `fix-edf` variable was set (this also lists a bunch of internal variables).

In [7]:
proj.vars()

{'fix-edf': 'T', 'sleep': 'N1,N2,N3,R'}

Attach `F06` as above, but now with `fix-edf` set to `T`.

In [8]:
p = proj.inst( 'F06' )


details:
  header size ( = 256 + # signals * 256 ) = 15360
  num signals = 59
  record size = 15104
  number of records = 24510
  implied EDF size from header = 15360 + 15104 * 24510 = 370214400

  assuming header correct, implies the file has -0.291314 records too many
  (where one record is 1 seconds)

  attempting to fix this, changing the header number of records from 24510 to 24509 ... good luck!
___________________________________________________________________
Processing: F06 | ../work/data//edfs/F06.edf
 duration 06.48.29, 24509s | time 22.00.00 - 04.48.29 | date 01.01.85

 signals: 59 (of 59) selected in a standard EDF file
  Fp1 | Fp2 | AF3 | AF4 | F7 | F5 | F3 | F1
  F2 | F4 | F6 | F8 | FT7 | FC5 | FC3 | FC1
  FC2 | FC4 | FC6 | FT8 | T7 | C5 | C3 | C1
  C2 | C4 | C6 | T8 | TP7 | CP5 | CP3 | CP1
  CP2 | CP4 | CP6 | TP8 | P7 | P5 | P3 | P1
  P2 | P4 | P6 | P8 | PO3 | PO4 | O1 | O2
  AFZ | FZ | FCZ | CZ | CPZ | PZ | POz | OZ
  A1 | A2 | FPZ


This works:

In [9]:
p.stat()

Unnamed: 0,Value
annotation_files,../work/data//annots/F06.annot
duration,06.48.29.000
edf_file,../work/data//edfs/F06.edf
id,F06
na,5
ns,59
nt,59
state,1


## Fixing EDFs

We'll copy the `v1` versions as the "fix" here:

In [10]:
%%sh
cp ../orig/v1/edfs/F06.edf ../orig/v1/edfs/F08.edf ../work/data/edfs/

The validate step also flagged some bad annotation files.

## Invalid annotation file formats

### CSV files

Luna does not accept .csv files, so these need to be converted, e.g. here with command-line tools (`sed`).

In [11]:
%%sh
head ../work/data/annots/M01.csv

class,start,stop
W,22:00:00,22:00:30
W,22:00:30,22:01:00
W,22:01:00,22:01:30
W,22:01:30,22:02:00
W,22:02:00,22:02:30
W,22:02:30,22:03:00
N1,22:03:00,22:03:30
N1,22:03:30,22:04:00
N1,22:04:00,22:04:30


In [12]:
%%sh
sed 's/,/\t/g' < ../work/data/annots/M01.csv > ../work/data/annots/M01.tsv
sed 's/,/\t/g' < ../work/data/annots/M02.csv > ../work/data/annots/M02.tsv


### TSV files

These .tsv files had columns in the wrong order.  We'll change that using `awk`.

In [13]:
%%sh

awk ' { print $2 , $1 , $3 } ' OFS="\t" ../work/data/annots/M03.tsv > xx
mv xx ../work/data/annots/M03.tsv

awk ' { print $2 , $1 , $3 } ' OFS="\t" ../work/data/annots/M04.tsv > xx
mv xx ../work/data/annots/M04.tsv


## Validating the final set

Having replaced/changed the offending EDFs and annotation files, we can try to remake the sample list, this time excluding the incorrectly-included .csv files, as we've now made .tsv files for those two individuals.

In [14]:
proj.build( [ '../work/data/edfs/', '../work/data/annots', '-ext=annot,eannot,xml,tsv' ] )

20

In [15]:
%%sh
ls ../work/data/annots

F01.annot
F02.annot
F03.annot
F04.annot
F05.annot
F06.annot
F07.annot
F08.annot
F09.annot
F10.annot
M01.csv
M01.tsv
M02.csv
M02.tsv
M03.tsv
M04.tsv
M05.eannot
M06.eannot
M07.eannot
M08.eannot
M09.xml
M10.xml


In [16]:
proj.sample_list()

Unnamed: 0,ID,EDF,Annotations
1,F01,../work/data/edfs//F01.edf,{../work/data/annots/F01.annot}
2,F02,../work/data/edfs//F02.edf,{../work/data/annots/F02.annot}
3,F03,../work/data/edfs//F03.edf,{../work/data/annots/F03.annot}
4,F04,../work/data/edfs//F04.edf,{../work/data/annots/F04.annot}
5,F05,../work/data/edfs//F05.edf,{../work/data/annots/F05.annot}
6,F06,../work/data/edfs//F06.edf,{../work/data/annots/F06.annot}
7,F07,../work/data/edfs//F07.edf,{../work/data/annots/F07.annot}
8,F08,../work/data/edfs//F08.edf,{../work/data/annots/F08.annot}
9,F09,../work/data/edfs//F09.edf,{../work/data/annots/F09.annot}
10,F10,../work/data/edfs//F10.edf,{../work/data/annots/F10.annot}


## Revalidate the sample list

In [17]:
tbl = proj.validate()

In [18]:
tbl

Unnamed: 0,ID,Filename,Valid
1,F01,../work/data/edfs//F01.edf,True
2,F01,../work/data/annots/F01.annot,True
3,F02,../work/data/edfs//F02.edf,True
4,F02,../work/data/annots/F02.annot,True
5,F03,../work/data/edfs//F03.edf,True
6,F03,../work/data/annots/F03.annot,True
7,F04,../work/data/edfs//F04.edf,True
8,F04,../work/data/annots/F04.annot,True
9,F05,../work/data/edfs//F05.edf,True
10,F05,../work/data/annots/F05.annot,True


## Resave the sample list

In [19]:
sl = proj.sample_list()
sl['Annotations'] = sl['Annotations'].map( lambda x: str(x).strip('{}\'') )
sl.to_csv( 's1.lst', sep="\t", index=None, header=False)