# ePSdata interface demo
30/07/20

This notebook provides a short demo for getting data from [ePSdata repositories (via Zenodo)](https://phockett.github.io/ePSdata/about.html). These contain sets of ePolyScat computations, including all source files, processed outputs, and possibly wavefunctions - see the [ePSdata webpages](https://phockett.github.io/ePSdata/about.html) for more details.

## Setup

All that is required is the `ePSdata` class from `epsproc.util`.

In [1]:
import sys
# ePSproc test codebase (local)
if sys.platform == "win32":
    modPath = r'D:\code\github\ePSproc'  # Win test machine
else:
    modPath = r'/home/femtolab/github/ePSproc/'  # Linux test machine
    
sys.path.append(modPath)
# import epsproc as ep

from epsproc.util.epsdata import ePSdata

* plotly not found, plotly plots not available. 
* pyevtk not found, VTK export not available. 


## Select a dataset for download

Currently, this supports passing a full URL, a DOI or a Zenodo ID corresponding to the record. These can be found on the ePSdata pages, or Zenodo.

As an example, let's grab the data for CH3I (orb 20) ionization. The corresponding web pages are:

- ePSdata page: https://phockett.github.io/ePSdata/CH3I_1-60eV/CH3I_1-60eV_orb20_A1.html.
- Zenodo record page: https://zenodo.org/record/3660708.
- DOI URL: http://dx.doi.org/10.5281/zenodo.3660708.


In [2]:
# Create data object.
# This will check for the Zenodo record, pull some details & create a download directory.
# The default is to set a dir in the current working directory, or pass downloadDir to specify (this must already exist).
CH3Idata = ePSdata(doi='10.5281/zenodo.3660708', downloadDir='~/Downloads')

*** Download dir set to: /home/femtolab/Downloads/3660708

*** Found Zenodo record 3660708: ePSproc: CH3I wavefn run, orb 20 ionization (Iodine 4d, A1), 1 - 60 eV
Zenodo URL: http://dx.doi.org/10.5281/zenodo.3660708
Record 3660708: 5 files, 81.1 MiB


Citation details: https://phockett.github.io/ePSdata/CH3I_1-60eV/CH3I_1-60eV_orb20_A1.html#Cite-this-dataset

*** Created /home/femtolab/Downloads/3660708


In [3]:
# All relevant IDs are stored in the recordID dict
CH3Idata.recordID

{'doi': '10.5281/zenodo.3660708',
 'url': {'doi': 'http://dx.doi.org/10.5281/zenodo.3660708',
  'get': 'https://zenodo.org/api/records/3660708',
  'epsdata': 'https://phockett.github.io/ePSdata/CH3I_1-60eV/CH3I_1-60eV_orb20_A1.html'},
 'zenID': 3660708,
 'downloadBase': PosixPath('/home/femtolab/Downloads'),
 'downloadDir': PosixPath('/home/femtolab/Downloads/3660708')}

## Download record

If all looks good, let's pull the files with the downloadFiles() method.

In [4]:
CH3Idata.downloadFiles()


***Getting item https://zenodo.org/api/files/92450a50-9274-4f44-b823-d429bf140984/readme.txt
Pulled to file: /home/femtolab/Downloads/3660708/readme.txt

***Getting item https://zenodo.org/api/files/92450a50-9274-4f44-b823-d429bf140984/CH3I_1-60eV_orb20_A1.ipynb
Pulled to file: /home/femtolab/Downloads/3660708/CH3I_1-60eV_orb20_A1.ipynb

***Getting item https://zenodo.org/api/files/92450a50-9274-4f44-b823-d429bf140984/CH3I_1-60eV_orb20_A1.md
Pulled to file: /home/femtolab/Downloads/3660708/CH3I_1-60eV_orb20_A1.md

***Getting item https://zenodo.org/api/files/92450a50-9274-4f44-b823-d429bf140984/CH3I_1-60eV_orb20_A1.json
Pulled to file: /home/femtolab/Downloads/3660708/CH3I_1-60eV_orb20_A1.json

***Getting item https://zenodo.org/api/files/92450a50-9274-4f44-b823-d429bf140984/CH3I_1-60eV_orb20_A1.zip
Pulled to file: /home/femtolab/Downloads/3660708/CH3I_1-60eV_orb20_A1.zip


In [5]:
# Note - if the files exist they will not be redownloaded, unless the filesizes don't match, 
# or overwriteFlag=True is passed.

CH3Idata.downloadFiles()

# TODO: also add hash checking here.


***Getting item https://zenodo.org/api/files/92450a50-9274-4f44-b823-d429bf140984/readme.txt
Local file already exists, file size OK.
Skipping download.
Existing file OK: /home/femtolab/Downloads/3660708/readme.txt

***Getting item https://zenodo.org/api/files/92450a50-9274-4f44-b823-d429bf140984/CH3I_1-60eV_orb20_A1.ipynb
Local file already exists, file size OK.
Skipping download.
Existing file OK: /home/femtolab/Downloads/3660708/CH3I_1-60eV_orb20_A1.ipynb

***Getting item https://zenodo.org/api/files/92450a50-9274-4f44-b823-d429bf140984/CH3I_1-60eV_orb20_A1.md
Local file already exists, file size OK.
Skipping download.
Existing file OK: /home/femtolab/Downloads/3660708/CH3I_1-60eV_orb20_A1.md

***Getting item https://zenodo.org/api/files/92450a50-9274-4f44-b823-d429bf140984/CH3I_1-60eV_orb20_A1.json
Local file already exists, file size OK.
Skipping download.
Existing file OK: /home/femtolab/Downloads/3660708/CH3I_1-60eV_orb20_A1.json

***Getting item https://zenodo.org/api/files/92

## Unzipping items

In addition to a few key files at the top level, ePSdata records contain archives of source files. There is some basic handling for unzipping too, although it's currently limited and may not be that robust.


In [6]:
CH3Idata.unzipFiles()


*** Found 1 archive(s).

*** Unzipping archive: /home/femtolab/Downloads/3660708/CH3I_1-60eV_orb20_A1.zip
Unzipped archive size will be 714.8 MiB.
Unzip? (y/n): y
Unzipped file /home/femtolab/Downloads/3660708/CH3I_1-60eV_orb20_A1.zip to directory /home/femtolab/Downloads/3660708

***Record summary
Record 10.5281/zenodo.3660708, title: ePSproc: CH3I wavefn run, orb 20 ionization (Iodine 4d, A1), 1 - 60 eV
Base dir: /home/femtolab/Downloads/3660708
Found 7 subDirs:
.
generators
generators/CH3I_1-60eV
CH3I_1-60eV
CH3I_1-60eV/orb20_A1_idy
CH3I_1-60eV/orb20_A1_waveFn
electronic_structure
Found 256 items, with file types:
Counter({'.dat': 240,
         '.nc': 4,
         '': 2,
         '.inp': 2,
         '.idy': 2,
         '.err': 1,
         '.md': 1,
         '.json': 1,
         '.out': 1,
         '.molden': 1,
         '.log': 1})


In [7]:
# The full file list is now attached to the object, as a list of dicts corresponding to each zip file.
# This also contains zip archive info.
CH3Idata.zip

[{'path': PosixPath('/home/femtolab/Downloads/3660708'),
  'zipfile': PosixPath('/home/femtolab/Downloads/3660708/CH3I_1-60eV_orb20_A1.zip'),
  'files': ['CH3I_1-60eV/CH3I_1-60eV_orb20_A1.inp.err',
   'CH3I_1-60eV/CH3I_1-60eV_orb20_A1.inp.out_BLM-L_2020-01-28_09-39-23.nc',
   'CH3I_1-60eV/CH3I_1-60eV_orb20_A1.inp.out_BLM-L_2020-02-10_09-08-01.nc',
   'CH3I_1-60eV/CH3I_1-60eV_orb20_A1.md',
   'CH3I_1-60eV/CH3I_1-60eV_orb20_A1.inp.out_BLM-V_2020-02-10_09-08-01.nc',
   'CH3I_1-60eV/CH3I_1-60eV_orb20_A1.inp.out_BLM-V_2020-01-28_09-39-23.nc',
   'CH3I_1-60eV/orb20_A1_idy/',
   'CH3I_1-60eV/CH3I_1-60eV_orb20_A1.json',
   'CH3I_1-60eV/CH3I_1-60eV_orb20_A1.inp.out',
   'CH3I_1-60eV/CH3I_1-60eV_orb20_A1.inp',
   'CH3I_1-60eV/orb20_A1_waveFn/',
   'CH3I_1-60eV/orb20_A1_idy/CH3ISECE.idy',
   'CH3I_1-60eV/orb20_A1_idy/CH3ISA1CA1.idy',
   'CH3I_1-60eV/orb20_A1_waveFn/CH3ISECE_26.0eV_Awave.dat',
   'CH3I_1-60eV/orb20_A1_waveFn/CH3ISA1CA1_41.0eV_Swave.dat',
   'CH3I_1-60eV/orb20_A1_waveFn/CH3ISECE_31

In [8]:
# There's also a full dir + file listing, along with other useful info, in .record
CH3Idata.record['files']

[('/home/femtolab/Downloads/3660708',
  ['generators', 'CH3I_1-60eV', 'electronic_structure'],
  ['CH3I_1-60eV_orb20_A1.json',
   'CH3I_1-60eV_orb20_A1.md',
   'readme.txt',
   'CH3I_1-60eV_orb20_A1.zip',
   'CH3I_1-60eV_orb20_A1.ipynb']),
 ('/home/femtolab/Downloads/3660708/generators', ['CH3I_1-60eV'], []),
 ('/home/femtolab/Downloads/3660708/generators/CH3I_1-60eV',
  [],
  ['CH3I_1-60eV_orb20_A1.inp']),
 ('/home/femtolab/Downloads/3660708/CH3I_1-60eV',
  ['orb20_A1_idy', 'orb20_A1_waveFn'],
  ['CH3I_1-60eV_orb20_A1.json',
   'CH3I_1-60eV_orb20_A1.md',
   'CH3I_1-60eV_orb20_A1.inp.out_BLM-V_2020-01-28_09-39-23.nc',
   'CH3I_1-60eV_orb20_A1.inp.out_BLM-L_2020-01-28_09-39-23.nc',
   'CH3I_1-60eV_orb20_A1.inp',
   'CH3I_1-60eV_orb20_A1.inp.out_BLM-V_2020-02-10_09-08-01.nc',
   'CH3I_1-60eV_orb20_A1.inp.err',
   'CH3I_1-60eV_orb20_A1.inp.out',
   'CH3I_1-60eV_orb20_A1.inp.out_BLM-L_2020-02-10_09-08-01.nc']),
 ('/home/femtolab/Downloads/3660708/CH3I_1-60eV/orb20_A1_idy',
  [],
  ['CH3ISE

## Working with the ePS output data

Any ePS results files are logged to `.ePSout`, and can be used as normal for further computations.

In [9]:
CH3Idata.ePSout

[PosixPath('/home/femtolab/Downloads/3660708/CH3I_1-60eV/CH3I_1-60eV_orb20_A1.inp.out')]

In [10]:
import epsproc as ep

dataSet = ep.readMatEle(fileIn = CH3Idata.ePSout)  

*** ePSproc readMatEle(): scanning files for DumpIdy segments.

*** Scanning file(s)
[PosixPath('/home/femtolab/Downloads/3660708/CH3I_1-60eV/CH3I_1-60eV_orb20_A1.inp.out')]

*** Reading ePS output file:  /home/femtolab/Downloads/3660708/CH3I_1-60eV/CH3I_1-60eV_orb20_A1.inp.out
Expecting 24 energy points.
Expecting 2 symmetries.
Scanning CrossSection segments.
Expecting 48 DumpIdy segments.
Found 48 dumpIdy segments (sets of matrix elements).

Processing segments to Xarrays...
Processed 48 sets of DumpIdy file segments, (0 blank)
