First, I will generate example data by running a script that we keep in the dataportal repo.

```
python dataportal/dataportal/examples/sample_data/image_and_scalar.py
```

In [4]:
import dataportal
from dataportal.broker import DataBroker as db, get_events, get_table, Images

In [2]:
dataportal.__version__  # we need v0.2.0 or greater

'0.2.0'

In [5]:
header = db[-1]  # get the most recent scan, which should be our example data generated above

In [7]:
header

0,1
descriptors,"[{'_name': 'EventDescriptor', 'uid': 'f7e4cce4-fee0-4071-817d-a0bcba85b479', 'time': 0.0, 'data_keys': {'img_y_max': {'shape': [], 'dtype': 'number', 'source': 'CCD:ymax'}, 'img_x_max': {'shape': [], 'dtype': 'number', 'source': 'CCD:xmax'}, 'linear_motor': {'shape': [], 'dtype': 'number', 'source': 'PV:ES:sam_x'}, 'total_img_sum': {'shape': [], 'dtype': 'number', 'source': 'CCD:sum'}, 'img_sum_x': {'shape': [5], 'external': 'FILESTORE:', 'dtype': 'array', 'source': 'CCD:xsum'}, 'img': {'shape': [5, 5], 'external': 'FILESTORE:', 'dtype': 'array', 'source': 'CCD'}, 'img_sum_y': {'shape': [5], 'external': 'FILESTORE:', 'dtype': 'array', 'source': 'CCD:ysum'}}, 'run_start': '1cea8e4c-08dc-4058-8c9d-413df7bf8a08'}, {'_name': 'EventDescriptor', 'uid': 'c3a2d1ad-51fe-4daa-a52f-ac8c1515ea66', 'time': 0.0, 'data_keys': {'Tsam': {'shape': [], 'dtype': 'number', 'source': 'PV:ES:Tsam'}}, 'run_start': '1cea8e4c-08dc-4058-8c9d-413df7bf8a08'}]"
start,beamline_id csx  group moon full  owner plotx linear_motor  ploty ['total_img_sum']  project sample scan_id 3  time 6 minutes ago (2015-09-17T12:20:26.998729)  uid 1cea8e4c-08dc-4058-8c9d-413df7bf8a08
stop,exit_status success  reason run completed  run_start beamline_id csx  group moon full  owner plotx linear_motor  ploty ['total_img_sum']  project sample scan_id 3  time 6 minutes ago (2015-09-17T12:20:26.998729)  uid 1cea8e4c-08dc-4058-8c9d-413df7bf8a08  time 6 minutes ago (2015-09-17T12:20:28.700854)  uid d5388ff2-2df1-41ee-af64-72c160ecfde4

0,1
beamline_id,csx
group,
moon,full
owner,
plotx,linear_motor
ploty,['total_img_sum']
project,
sample,
scan_id,3
time,6 minutes ago (2015-09-17T12:20:26.998729)

0,1
exit_status,success
reason,run completed
run_start,beamline_id csx  group moon full  owner plotx linear_motor  ploty ['total_img_sum']  project sample scan_id 3  time 6 minutes ago (2015-09-17T12:20:26.998729)  uid 1cea8e4c-08dc-4058-8c9d-413df7bf8a08
time,6 minutes ago (2015-09-17T12:20:28.700854)
uid,d5388ff2-2df1-41ee-af64-72c160ecfde4

0,1
beamline_id,csx
group,
moon,full
owner,
plotx,linear_motor
ploty,['total_img_sum']
project,
sample,
scan_id,3
time,6 minutes ago (2015-09-17T12:20:26.998729)


## How else can we look for headers?

* `db[-5:]` all of the last five
* `db(start_time='2015-10')`
* `db['234owfweoi-234dwflkwej']` uids ("unique IDs") the only way to *sure* you have the right data; it corresponds to `header.start['uid']`. Note that you can use the first few character; you don't the need the entire string.

In [37]:
header = db['1cea8']

## The sophisticated way: get a Python generator of Event Documents

In [14]:
events = get_events(header)

In [15]:
events  # this is "lazy" -- the data is not yet loaded

<generator object get_events at 0x109dc7b88>

In [16]:
events = list(events)  # making it a list forces Python to actually go get the data

In [18]:
len(events)

220

In [20]:
events[0]['data']  # this is a dict mapping field names to the actual data

{'img': array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        ..., 
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.]]),
 'img_sum_x': array([  0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
          0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
          0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
          0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
          0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
          0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
          0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
          0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
          0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
          0.00000000e+000,   0.00000000e+000,   0.00000000e+000,
          0.00000000e+000,   0.00000000e+000

### If we didn't want the image data, we can save time by not looking it up!

In [26]:
events = get_events(header, ['img_y_max', 'img_x_max'])  # we specify which fields we want

In [27]:
list(events)[0]['data']

{'img_x_max': 249.0, 'img_y_max': 250.0}

## The simpler way: get a pandas DataFrame directly

This doesn't really make sense for *images* but it's great for scalars. It technically works on images, but pandas will be slow.

In [30]:
table = get_table(header, ['img_y_max', 'img_x_max'])
table

Unnamed: 0,time,img_x_max,img_y_max
0,0.014693,249,250
1,0.988204,248,249
2,1.993141,246,249
3,3.002155,245,248
4,3.99294,245,247
5,5.017639,245,246
6,5.997496,245,245
7,7.005259,245,244
8,8.002993,245,244
9,9.010165,245,244


## The simpler way Part 2: get Images as PIMS objects

In [31]:
imgs = Images(header, 'img')



In [32]:
imgs

<Frames>
Length: 20 frames
Frame Shape: 500 x 500
Pixel Datatype: float64

In [33]:
imgs[0]

In [35]:
for img in imgs[5:10]:
    print(img.sum())

94.2511454645
94.2569143278
94.2713332989
94.3054810717
94.382080473
