# Indexing, Selecting and Assigning


Import the LArray library:


In [None]:
from larray import *

Import the test array ``pop``:


In [None]:
# let's start with
pop = load_example_data('demography').pop
pop

## Selecting (Subsets)

LArray allows to select a subset of an array either by labels or indices (positions)


### Selecting by Labels

To take a subset of an array using labels, use brackets [ ].

Let's start by selecting a single element:


In [None]:
# here we select the value associated with Belgian women
# of age 50 from Brussels region for the year 2015
pop[2015, 'BruCap', 50, 'F', 'BE']

Continue with selecting a subset using slices and lists of labels


In [None]:
# here we select the subset associated with Belgian women of age 50, 51 and 52
# from Brussels region for the years 2010 to 2016
pop[2010:2016, 'BruCap', 50:52, 'F', 'BE']

In [None]:
# slices bounds are optional:
# if not given start is assumed to be the first label and stop is the last one.
# Here we select all years starting from 2010
pop[2010:, 'BruCap', 50:52, 'F', 'BE']

In [None]:
# Slices can also have a step (defaults to 1), to take every Nth labels
# Here we select all even years starting from 2010
pop[2010::2, 'BruCap', 50:52, 'F', 'BE']

In [None]:
# one can also use list of labels to take non-contiguous labels.
# Here we select years 2008, 2010, 2013 and 2015
pop[[2008, 2010, 2013, 2015], 'BruCap', 50:52, 'F', 'BE']

The order of indexing does not matter either, so you usually do not care/have to remember about axes positions during computation. It only matters for output.


In [None]:
# order of index doesn't matter
pop['F', 'BE', 'BruCap', [2008, 2010, 2013, 2015], 50:52]

<div class="alert alert-warning">
**Warning:** Selecting by labels as above works well as long as there is no ambiguity.
   When two or more axes have common labels, it may lead to a crash.
   The solution is then to precise to which axis belong the labels.
</div>


In [None]:
# let us now create an array with the same labels on several axes
age, weight, size = Axis('age=0..80'), Axis('weight=0..120'), Axis('size=0..200')

arr_ws = ndtest([age, weight, size])

In [None]:
# let's try to select teenagers with size between 1 m 60 and 1 m 65 and weight > 80 kg.
# In this case the subset is ambiguous and this results in an error:
arr_ws[10:18, :80, 160:165]

In [None]:
# the solution is simple. You need to precise the axes on which you make a selection
arr_ws[age[10:18], weight[:80], size[160:165]]

### Ambiguous Cases - Specifying Axes Using The Special Variable X

When selecting, assiging or using aggregate functions, an axis can be
refered via the special variable ``X``:

-  pop[X.age[:20]]
-  pop.sum(X.age)

This gives you acces to axes of the array you are manipulating. The main
drawback of using ``X`` is that you lose the autocompletion available from
many editors. It only works with non-anonymous axes for which names do not contain whitespaces or special characters.


In [None]:
# the previous example could have been also written as
arr_ws[X.age[10:18], X.weight[:80], X.size[160:165]]

### Selecting by Indices

Sometimes it is more practical to use indices (positions) along the axis, instead of labels.
You need to add the character ``i`` before the brackets: ``.i[indices]``.
As for selection with labels, you can use a single index, a slice or a list of indices.
Indices can be also negative (-1 represent the last element of an axis).


<div class="alert alert-info">
**Note:** Remember that indices (positions) are always **0-based** in Python.
So the first element is at index 0, the second is at index 1, etc.
</div>


In [None]:
# here we select the subset associated with Belgian women of age 50, 51 and 52
# from Brussels region for the first 3 years
pop[X.time.i[:3], 'BruCap', 50:52, 'F', 'BE']

In [None]:
# same but for the last 3 years
pop[X.time.i[-3:], 'BruCap', 50:52, 'F', 'BE']

In [None]:
# using list of indices
pop[X.time.i[-9,-7,-4,-2], 'BruCap', 50:52, 'F', 'BE']

<div class="alert alert-warning">
**Warning:** The end *indice* (position) is EXCLUSIVE while the end label is INCLUSIVE.
</div>


In [None]:
# with labels (3 is included)
pop[2015, 'BruCap', X.age[:3], 'F', 'BE']

In [None]:
# with indices (3 is out)
pop[2015, 'BruCap', X.age.i[:3], 'F', 'BE']

You can use ``.i[]`` selection directly on array instead of axes.
In this context, if you want to select a subset of the first and third axes for example, you must use a full slice ``:`` for the second one.


In [None]:
# here we select the last year and first 3 ages
# equivalent to: pop.i[-1, :, :3, :, :]
pop.i[-1, :, :3]

### Using Groups In Selections


In [None]:
teens = pop.age[10:20]

pop[2015, 'BruCap', teens, 'F', 'BE']

## Assigning subsets

### Assigning A Value

Assign a value to a subset


In [None]:
# let's take a smaller array
pop = load_example_data('demography').pop[2016, 'BruCap', 100:105]
pop2 = pop
pop2

In [None]:
# set all data corresponding to age >= 102 to 0
pop2[102:] = 0
pop2

One very important gotcha though...

<div class="alert alert-warning">
**Warning:** Modifying a slice of an array in-place like we did above should be done with care otherwise you could have **unexpected effects**. The reason is that taking a **slice** subset of an array does not return a copy of that array, but rather a view on that array. To avoid such behavior, use ``.copy()`` method.
</div>

Remember:

-  taking a slice subset of an array is extremely fast (no data is
   copied)
-  if one modifies that subset in-place, one also **modifies the
   original array**
-  **.copy()** returns a copy of the subset (takes speed and memory) but
   allows you to change the subset without modifying the original array
   in the same time


In [None]:
# indeed, data from the original array have also changed
pop

In [None]:
# the right way
pop = load_example_data('demography').pop[2016, 'BruCap', 100:105]

pop2 = pop.copy()
pop2[102:] = 0
pop2

In [None]:
# now, data from the original array have not changed this time
pop

### Assigning Arrays And Broadcasting

Instead of a value, we can also assign an array to a subset. In that
case, that array can have less axes than the target but those which are
present must be compatible with the subset being targeted.


In [None]:
sex, nat = Axis('sex=M,F'), Axis('nat=BE,FO')
new_value = LArray([[1, -1], [2, -2]],[sex, nat])
new_value

In [None]:
# this assigns 1, -1 to Belgian, Foreigner men
# and 2, -2 to Belgian, Foreigner women for all
# people older than 100
pop[102:] = new_value
pop

<div class="alert alert-warning">
**Warning:** The array being assigned must have compatible axes (i.e. same axes names and same labels) with the target subset.
</div>


In [None]:
# assume we define the following array with shape 3 x 2 x 2
new_value = zeros(['age=100..102', sex, nat])
new_value

In [None]:
# now let's try to assign the previous array in a subset from age 103 to 105
pop[103:105] = new_value

In [None]:
# but this works
pop[100:102] = new_value
pop

## Boolean Filtering

Boolean filtering can be use to extract subsets.


In [None]:
#Let's focus on population living in Brussels during the year 2016
pop = load_example_data('demography').pop[2016, 'BruCap']

# here we select all males and females with age less than 5 and 10 respectively
subset = pop[((X.sex == 'H') & (X.age <= 5)) | ((X.sex == 'F') & (X.age <= 10))]
subset

<div class="alert alert-info">
**Note:** Be aware that after boolean filtering, several axes may have merged.
</div>


In [None]:
# 'age' and 'sex' axes have been merged together
subset.info

This may be not what you because previous selections on merged axes are no longer valid


In [None]:
# now let's try to calculate the proportion of females with age less than 10
subset['F'].sum() / pop['F'].sum()

Therefore, it is sometimes more useful to not select, but rather set to 0 (or another value) non matching elements


In [None]:
subset = pop.copy()
subset[((X.sex == 'F') & (X.age > 10))] = 0
subset['F', :20]

In [None]:
# now we can calculate the proportion of females with age less than 10
subset['F'].sum() / pop['F'].sum()

Boolean filtering can also mix axes and arrays. Example above could also have been written as


In [None]:
age_limit = sequence('sex=M,F', initial=5, inc=5)
age_limit

In [None]:
age = pop.axes['age']
(age <= age_limit)[:20]

In [None]:
subset = pop.copy()
subset[X.age > age_limit] = 0
subset['F'].sum() / pop['F'].sum()

Finally, you can choose to filter on data instead of axes


In [None]:
# let's focus on females older than 90
subset = pop['F', 90:110].copy()
subset

In [None]:
# here we set to 0 all data < 10
subset[subset < 10] = 0
subset