# Tutorial about the LocData class

In [1]:
import numpy as np
import pandas as pd

import locan as sp

In [2]:
sp.show_versions(system=False, dependencies=False, verbose=False)


Locan:
   version: 0.7.dev3+gb9aca40

Python:
   version: 3.8.8


## Sample data

A localization has certain properties such as 'Position_x'. A list of localizations can be assembled into a dataframe:

In [3]:
df = pd.DataFrame(
    {
        'position_x': np.arange(0,10),
        'position_y': np.random.random(10),
        'frame': np.arange(0,10),
    })

## Instantiate LocData from a dataframe

A LocData object carries localization data together with metadata and aggregated properties for the whole set of localizations.

We first instantiate a LocData object from the dataframe:

In [4]:
dat = sp.LocData.from_dataframe(dataframe=df)

In [5]:
attributes = [x for x in dir(dat) if not x.startswith('_')]
attributes

['alpha_shape',
 'bounding_box',
 'centroid',
 'concat',
 'convex_hull',
 'coordinate_labels',
 'coordinates',
 'count',
 'data',
 'dataframe',
 'dimension',
 'from_chunks',
 'from_collection',
 'from_coordinates',
 'from_dataframe',
 'from_selection',
 'indices',
 'inertia_moments',
 'meta',
 'oriented_bounding_box',
 'print_meta',
 'print_summary',
 'properties',
 'reduce',
 'references',
 'region',
 'reset',
 'update',
 'update_alpha_shape',
 'update_alpha_shape_in_references',
 'update_convex_hulls_in_references',
 'update_inertia_moments_in_references',
 'update_oriented_bounding_box_in_references']

## LocData attributes

The class variable Locdata.count represents the number of all current LocData instantiations.

In [6]:
print('LocData count: ', sp.LocData.count)

LocData count:  1


The localization dataset is provided by the data attribute:

In [7]:
print(dat.data.head())

   position_x  position_y  frame
0           0    0.769188      0
1           1    0.418971      1
2           2    0.227971      2
3           3    0.133487      3
4           4    0.432302      4


Aggregated properties are provided by the attribute properties:

In [8]:
dat.properties

{'localization_count': 10,
 'position_x': 4.5,
 'position_y': 0.5108396139999967,
 'region_measure_bb': 6.650181060524167,
 'localization_density_bb': 1.5037184565335429,
 'subregion_measure_bb': 19.477818013449816}

Since spatial coordinates are quite important one can check on *coordinate_labels* and dimension:

In [9]:
dat.coordinate_labels

['position_x', 'position_y']

In [10]:
dat.dimension

2

A numpy array of spatial coordinates is returned by:

In [11]:
dat.coordinates

array([[0.        , 0.76918841],
       [1.        , 0.41897125],
       [2.        , 0.22797103],
       [3.        , 0.13348687],
       [4.        , 0.43230177],
       [5.        , 0.84072741],
       [6.        , 0.17349019],
       [7.        , 0.74926937],
       [8.        , 0.87239587],
       [9.        , 0.49059396]])

## Metadata 

Metadata is provided by the attribute meta and can be printed as

In [12]:
dat.print_meta()

identifier: "1"
creation_date: "2021-03-04 13:47:07 +0100"
source: DESIGN
state: RAW
history {
  name: "LocData.from_dataframe"
}
element_count: 10
frame_count: 10



A summary of the most important metadata is printed as:

In [13]:
dat.print_summary()

identifier: "1"
comment: ""
creation_date: "2021-03-04 13:47:07 +0100"
modification_date: ""
source: DESIGN
state: RAW
element_count: 10
frame_count: 10
file_type: UNKNOWN_FILE_TYPE
file_path: ""



Metadata fields can be printed and changed individually:

In [14]:
print(dat.meta.comment)
dat.meta.comment = 'user comment'
print(dat.meta.comment)


user comment


LocData.meta.map represents a dictionary structure that can be filled by the user. Both key and value have to be strings, if not a TypeError is thrown.

In [15]:
print(dat.meta.map)
dat.meta.map['user field'] = 'more information'
print(dat.meta.map)

{}
{'user field': 'more information'}


Metadata can also be added at Instantiation:

In [16]:
dat_2 = sp.LocData.from_dataframe(dataframe=df, meta={'identifier': 'myID_1', 
                                                   'comment': 'my own user comment'})
dat_2.print_summary()

identifier: "myID_1"
comment: "my own user comment"
creation_date: "2021-03-04 13:47:07 +0100"
modification_date: ""
source: DESIGN
state: RAW
element_count: 10
frame_count: 10
file_type: UNKNOWN_FILE_TYPE
file_path: ""



## Instantiate locdata from selection

A LocData object can also be instantiated from a selection of localizations. In this case the LocData object keeps a reference to the original locdata together with a list of indices (or a slice object)). The new dataset is assembled on request of the data attribute.

*Typically a selection is derived using a selection method such that using LocData.from_selection() is not often necessary.*

In [17]:
dat_2 = sp.LocData.from_selection(dat, indices=[1,2,3,4])
dat_3 = sp.LocData.from_selection(dat, indices=[5,6,7,8])

print('count: ', sp.LocData.count)
print('')
print(dat_2.data)

count:  3

   position_x  position_y  frame
1           1    0.418971      1
2           2    0.227971      2
3           3    0.133487      3
4           4    0.432302      4


In [18]:
dat_2.print_summary()

identifier: "3"
comment: "user comment"
creation_date: "2021-03-04 13:47:07 +0100"
modification_date: "2021-03-04 13:47:07 +0100"
source: DESIGN
state: MODIFIED
element_count: 4
frame_count: 4
file_type: UNKNOWN_FILE_TYPE
file_path: ""



The reference is kept in a private attribute as are the indices.

In [19]:
print(dat_2.references)
print(dat_2.indices)

<locan.data.locdata.LocData object at 0x000002404856D340>
[1, 2, 3, 4]


The reference is the same for both selections.

In [20]:
print(dat_2.references is dat_3.references)

True


## Instantiate locdata from collection

A LocDat object can further be instantiated from a collection of other LocData objects.

In [21]:
del(dat_2, dat_3)

dat_1 = sp.LocData.from_selection(dat, indices=[0,1,2])
dat_2 = sp.LocData.from_selection(dat, indices=[3,4,5])
dat_3 = sp.LocData.from_selection(dat, indices=[6,7,8])
dat_c = sp.LocData.from_collection(locdatas=[dat_1, dat_2, dat_3], meta={'identifier': 'my_collection'})

print('count: ', sp.LocData.count, '\n')
print(dat_c.data, '\n')
print(dat_c.properties, '\n')
dat_c.print_summary()

count:  5 

   localization_count  position_x  position_y  region_measure_bb  \
0                   3         1.0    0.472044           1.082435   
1                   3         4.0    0.468839           1.414481   
2                   3         7.0    0.598385           1.397811   

   localization_density_bb  subregion_measure_bb  
0                 2.771530              5.082435  
1                 2.120919              5.414481  
2                 2.146212              5.397811   

{'localization_count': 3, 'position_x': 4.0, 'position_y': 0.5130891310274004, 'region_measure_bb': 0.7772787650400116, 'localization_density_bb': 3.8596191417188277, 'subregion_measure_bb': 12.259092921680004} 

identifier: "my_collection"
comment: ""
creation_date: "2021-03-04 13:47:07 +0100"
modification_date: ""
source: DESIGN
state: RAW
element_count: 3
frame_count: 0
file_type: UNKNOWN_FILE_TYPE
file_path: ""



In this case the reference are also kept in case the original localizations from the collected LocData object are requested.

In [22]:
print(dat_c.references)

[<locan.data.locdata.LocData object at 0x0000024048449F10>, <locan.data.locdata.LocData object at 0x0000024048449FD0>, <locan.data.locdata.LocData object at 0x000002404856D550>]


In case the collected LocData objects are not needed anymore and should be free for garbage collection the references can be deleted by a dedicated Locdata method

In [23]:
dat_c.reduce()
print(dat_c.references)

None


## Concatenating LocData objects 

Lets have a second dataset with localization data:

In [24]:
del(dat_2)

df_2 = pd.DataFrame(
    {
        'position_x': np.arange(0,10),
        'position_y': np.random.random(10),
        'frame': np.arange(0,10),
    })

dat_2 = sp.LocData.from_dataframe(dataframe=df_2)

print('First locdata:')
print(dat.data.head())
print('')
print('Second locdata:')
print(dat_2.data.head())

First locdata:
   position_x  position_y  frame
0           0    0.769188      0
1           1    0.418971      1
2           2    0.227971      2
3           3    0.133487      3
4           4    0.432302      4

Second locdata:
   position_x  position_y  frame
0           0    0.740224      0
1           1    0.400304      1
2           2    0.719866      2
3           3    0.774066      3
4           4    0.030463      4


In order to combine two sets of localization data into a single LocData object use the class method *LocData.concat*:

In [25]:
dat_new = sp.LocData.concat([dat, dat_2])
print(f'NUmber of localizations in dat_new: ', len(dat_new))
dat_new.data.head()

NUmber of localizations in dat_new:  20


Unnamed: 0,position_x,position_y,frame
0,0,0.769188,0
1,1,0.418971,1
2,2,0.227971,2
3,3,0.133487,3
4,4,0.432302,4


## Modifying data in place

In case localization data has been modified in place, i.e. the dataset attribute is changed, all properties and hulls must be recomputed. This is best done by re-instantiating the LocData object using `LocData.from_dataframe()`; but it can also be done using the `LocData.reset()` function.

In [26]:
del(df, dat)

df = pd.DataFrame(
    {
        'position_x': np.arange(0,10),
        'position_y': np.random.random(10),
        'frame': np.arange(0,10),
    })

dat = sp.LocData.from_dataframe(dataframe=df)

print(dat.data.head())

   position_x  position_y  frame
0           0    0.160544      0
1           1    0.246683      1
2           2    0.801481      2
3           3    0.257928      3
4           4    0.307097      4


In [27]:
dat.centroid

array([4.5       , 0.31648899])

Now if localization data is changed in place (which you should not do unless you have a good reason), properties and bounding box are not automatically adjusted.

In [28]:
dat.dataframe = pd.DataFrame(
    {
        'position_x': np.arange(0,8),
        'position_y': np.random.random(8),
        'frame': np.arange(0,8),
    })

print(dat.data.head())

   position_x  position_y  frame
0           0    0.448632      0
1           1    0.496115      1
2           2    0.180043      2
3           3    0.382727      3
4           4    0.214865      4


In [29]:
dat.centroid  # so this returns incorrect values here

array([4.5       , 0.31648899])

Update them by re-instantiating a new LocData object:

In [30]:
new_dat = sp.LocData.from_dataframe(dataframe=dat.data)

In [31]:
new_dat.centroid

array([3.5       , 0.42199954])

In [32]:
new_dat.meta

identifier: "12"
creation_date: "2021-03-04 13:47:07 +0100"
source: DESIGN
state: RAW
history {
  name: "LocData.from_dataframe"
}
element_count: 8
frame_count: 8

Alternatively you can use `reset()`. In this case, however, metadata is not updated and will provide wrong information.  

In [33]:
dat.reset()

<locan.data.locdata.LocData at 0x240485f2ee0>

In [34]:
dat.centroid

array([3.5       , 0.42199954])

In [35]:
dat.meta

identifier: "11"
creation_date: "2021-03-04 13:47:07 +0100"
source: DESIGN
state: RAW
history {
  name: "LocData.from_dataframe"
}
element_count: 10
frame_count: 10

## Copy LocData

Shallow and deep copies can be made from LocData instances. In either case the class variable count and the metadata is not just copied but adjusted accordingly.

In [36]:
print('count: ', sp.LocData.count)
print('')
print(dat_2.meta)

count:  8

identifier: "9"
creation_date: "2021-03-04 13:47:07 +0100"
source: DESIGN
state: RAW
history {
  name: "LocData.from_dataframe"
}
element_count: 10
frame_count: 10



In [37]:
from copy import copy, deepcopy

print('count before: ', sp.LocData.count)
dat_copy = copy(dat_2)
dat_deepcopy = deepcopy(dat_2)
print('count after: ', sp.LocData.count)

count before:  8
count after:  10


In [38]:
print(dat_copy.meta)

identifier: "13"
creation_date: "2021-03-04 13:47:07 +0100"
modification_date: "2021-03-04 13:47:07 +0100"
source: DESIGN
state: MODIFIED
history {
  name: "LocData.from_dataframe"
}
history {
  name: "LocData.copy"
  parameter: "None"
}
ancestor_identifiers: "9"
element_count: 10
frame_count: 10



In [39]:
print(dat_deepcopy.meta)

identifier: "14"
creation_date: "2021-03-04 13:47:07 +0100"
modification_date: "2021-03-04 13:47:07 +0100"
source: DESIGN
state: MODIFIED
history {
  name: "LocData.from_dataframe"
}
history {
  name: "LocData.deepcopy"
  parameter: "None"
}
ancestor_identifiers: "9"
element_count: 10
frame_count: 10

