# Tutorial about the LocData class

In [None]:
import numpy as np
import pandas as pd

import locan as lc

In [None]:
lc.show_versions(system=False, dependencies=False, verbose=False)

## Sample data

A localization has certain properties such as 'position_x'. A list of localizations can be assembled into a dataframe:

In [None]:
df = pd.DataFrame(
    {
        'position_x': np.arange(0,10),
        'position_y': np.random.random(10),
        'frame': np.arange(0,10),
    })

## Instantiate LocData from a dataframe

A LocData object carries localization data together with metadata and aggregated properties for the whole set of localizations.

We first instantiate a LocData object from the dataframe:

In [None]:
locdata = lc.LocData.from_dataframe(dataframe=df)

In [None]:
attributes = [x for x in dir(locdata) if not x.startswith('_')]
attributes

## LocData attributes

The class attribute Locdata.count represents the number of all current LocData instantiations.

In [None]:
print('LocData count: ', lc.LocData.count)

The localization dataset is provided by the data attribute:

In [None]:
print(locdata.data.head())

Aggregated properties are provided by the attribute properties. E.g. the property `position_x` represents the mean of the `position_x` for all localizations. We keep the name, since the aggregated dataset can be treated as just a single locdata event with `position_x`. This is used when dealing with data clusters.

In [None]:
locdata.properties

Since spatial coordinates are quite important one can check on *coordinate_labels* and dimension:

In [None]:
locdata.coordinate_labels

In [None]:
locdata.dimension

A numpy array of spatial coordinates is returned by:

In [None]:
locdata.coordinates

## Metadata 

For detailed information see the `Tutorial about metadata`.

Metadata is provided by the attribute meta and can be printed as

In [None]:
locdata.print_meta()

A summary of the most important metadata fields is printed as:

In [None]:
locdata.print_summary()

Metadata fields can be printed and changed individually:

In [None]:
print(locdata.meta.comment)
locdata.meta.comment = 'user comment'
print(locdata.meta.comment)

LocData.meta.map represents a dictionary structure that can be filled by the user. Both key and value have to be strings, if not a TypeError is thrown.

In [None]:
print(locdata.meta.map)
locdata.meta.map['user field'] = 'more information'
print(locdata.meta.map)

Metadata can also be added at Instantiation:

In [None]:
locdata_2 = lc.LocData.from_dataframe(dataframe=df, meta={'identifier': 'myID_1', 
                                                   'comment': 'my own user comment'})
locdata_2.print_summary()

## Instantiate locdata from selection

A LocData object can also be instantiated from a selection of localizations. In this case the LocData object keeps a reference to the original locdata together with a list of indices (or a slice object)). The new dataset is assembled on request of the data attribute.

*Typically a selection is derived using a selection method such that using LocData.from_selection() is not often necessary.*

In [None]:
locdata_2 = lc.LocData.from_selection(locdata, indices=[1,2,3,4])
locdata_3 = lc.LocData.from_selection(locdata, indices=[5,6,7,8])

print('count: ', lc.LocData.count)
print('')
print(locdata_2.data)

In [None]:
locdata_2.print_summary()

The reference is kept in a private attribute as are the indices.

In [None]:
print(locdata_2.references)
print(locdata_2.indices)

The reference is the same for both selections.

In [None]:
print(locdata_2.references is locdata_3.references)

## Instantiate locdata from collection

A LocDat object can further be instantiated from a collection of other LocData objects.

In [None]:
del(locdata_2, locdata_3)

locdata_1 = lc.LocData.from_selection(locdata, indices=[0,1,2])
locdata_2 = lc.LocData.from_selection(locdata, indices=[3,4,5])
locdata_3 = lc.LocData.from_selection(locdata, indices=[6,7,8])
locdata_c = lc.LocData.from_collection(locdatas=[locdata_1, locdata_2, locdata_3], meta={'identifier': 'my_collection'})

print('count: ', lc.LocData.count, '\n')
print(locdata_c.data, '\n')
print(locdata_c.properties, '\n')
locdata_c.print_summary()

In this case the reference are also kept in case the original localizations from the collected LocData object are requested.

In [None]:
print(locdata_c.references)

In case the collected LocData objects are not needed anymore and should be free for garbage collection the references can be deleted by a dedicated Locdata method

In [None]:
locdata_c.reduce()
print(locdata_c.references)

## Concatenating LocData objects 

Lets have a second dataset with localization data:

In [None]:
del(locdata_2)

df_2 = pd.DataFrame(
    {
        'position_x': np.arange(0,10),
        'position_y': np.random.random(10),
        'frame': np.arange(0,10),
    })

locdata_2 = lc.LocData.from_dataframe(dataframe=df_2)

print('First locdata:')
print(locdata.data.head())
print('')
print('Second locdata:')
print(locdata_2.data.head())

In order to combine two sets of localization data from two LocData objects into a single LocData object use the class method *LocData.concat*:

In [None]:
locdata_new = lc.LocData.concat([locdata, locdata_2])
print('Number of localizations in locdata_new: ', len(locdata_new))
locdata_new.data.head()

## Modifying data in place

In case localization data has been modified in place, i.e. the dataset attribute is changed, all properties and hulls must be recomputed. This is best done by re-instantiating the LocData object using `LocData.from_dataframe()`; but it can also be done using the `LocData.reset()` function.

In [None]:
del(df, locdata)

df = pd.DataFrame(
    {
        'position_x': np.arange(0,10),
        'position_y': np.random.random(10),
        'frame': np.arange(0,10),
    })

locdata = lc.LocData.from_dataframe(dataframe=df)

print(locdata.data.head())

In [None]:
locdata.centroid

Now if localization data is changed in place (which you should not do unless you have a good reason), properties and bounding box are not automatically adjusted.

In [None]:
locdata.dataframe = pd.DataFrame(
    {
        'position_x': np.arange(0,8),
        'position_y': np.random.random(8),
        'frame': np.arange(0,8),
    })

print(locdata.data.head())

In [None]:
locdata.centroid  # so this returns incorrect values here

Update them by re-instantiating a new LocData object:

In [None]:
locdata_new = lc.LocData.from_dataframe(dataframe=locdata.data)

In [None]:
locdata_new.centroid

In [None]:
locdata_new.meta

Alternatively you can use `reset()`. In this case, however, metadata is not updated and will provide wrong information.  

In [None]:
locdata.reset()

In [None]:
locdata.centroid

In [None]:
locdata.meta

## Copy LocData

Shallow and deep copies can be made from LocData instances. In either case the class variable count and the metadata is not just copied but adjusted accordingly.

In [None]:
print('count: ', lc.LocData.count)
print('')
print(locdata_2.meta)

In [None]:
from copy import copy, deepcopy

print('count before: ', lc.LocData.count)
locdata_copy = copy(locdata_2)
locdata_deepcopy = deepcopy(locdata_2)
print('count after: ', lc.LocData.count)

In [None]:
print(locdata_copy.meta)

In [None]:
print(locdata_deepcopy.meta)

## Adding a property

Any property that is created for a set of localizations (and represented as a python dictionary) can be added to the Locdata object. As an example, we compute the maximum distance between any two localizations and add that `max_distance` as new property to `locdata`.

In [None]:
max_distance = lc.max_distance(locdata)
max_distance

In [None]:
locdata.properties.update(max_distance)
locdata.properties

## Adding a property to each localization in LocData.data

In case you have processed your data and come up with a new property for each localization in the LocData object, this property can be added to data. As an example, we compute the nearest neighbor distance for each localization and add `nn_distance` as new property.

In [None]:
locdata.data

In [None]:
nn = lc.NearestNeighborDistances().compute(locdata)
nn.results

To add `nn_distance` as new property to each localization in LocData object, use the `pandas.assign` function on the `locdata.dataframe`.

In [None]:
locdata.dataframe = locdata.dataframe.assign(nn_distance=nn.results['nn_distance'])
locdata.data

### Adding nn_distance as new property to each localization in LocData object with dataframe=None

In case the LocData object was created with LocData.from_selection() the LocData.dataframe attribute is None and LocData.data is generated from the referenced locdata and the index list. 

In this case LocData.dataframe can still be filled with additional data that is merged upon returning LocData.data.

In [None]:
locdata_selection = lc.LocData.from_selection(locdata, indices=[1, 3, 4, 5])
locdata_selection.data

In [None]:
locdata_selection.dataframe

In [None]:
nn_selection = lc.NearestNeighborDistances().compute(locdata_selection)
nn_selection.results

Make sure the indices in nn.results match those in dat_selection.data:

In [None]:
locdata_selection.data.index

In [None]:
nn_selection.results.index = locdata_selection.data.index
nn_selection.results

Then assign the corresponding result to dataframe:

In [None]:
locdata_selection.dataframe = locdata_selection.dataframe.assign(nn_distance= nn_selection.results['nn_distance'])
locdata_selection.dataframe

Calling `data` will return the complete dataset.

In [None]:
locdata_selection.data