# Selecting values from an array

In [1]:
import numpy as np

We use arrays all the time, in data science.

One of the most common tasks we have to do on arrays, is to select values.

We do this with *array slicing*.

We do array slicing when we follow the array variable name with
`[` (open square brackets), followed by something to specify
which elements we want, followed by `]` (close square brackets.

The simplest case is when we want a single element from a one-dimensional array.  In that case the thing between the `[` and the `]` is the *index* of the value that we want.

The *index* of the first value is 0, the index of the second value is 2, and so on.

In [2]:
import pandas as pd

In [3]:
courses = pd.read_csv('rate_my_course.csv').sort_values(
    'Easiness', ascending=False)
disciplines = courses['Discipline'].values[:10]
easiness = courses['Easiness'].values[:10]
disciplines

array(['Reading', 'Physical Education', 'Speech', 'Child Development',
       'Theater', 'Music', 'Kinesiology', 'Hospitality',
       'Criminal Justice', 'Nutrition'], dtype=object)

Here we get the first value.  This value is at index 0.

In [4]:
# Get the first value (at index position 0)
disciplines[0]

'Reading'

In [5]:
# Get the second value (at index position 1)
disciplines[1]

'Physical Education'

In [6]:
# Get the third value (at index position 2)
disciplines[2]

'Speech'

At first this will take some time to get used to, that the first
value is at index position 0.  There are good reasons for this,
and many programming languages have this convention, but it does a while to get this habit of mind.

## Index with negative numbers

If we know how many elements the array has, then we can get the last element by using the number of elements, minus one (why?).

Here the number of elements is 10:

In [7]:
len(disciplines)

10

So, the last element of the array is at index position 9:

In [8]:
disciplines[9]

'Nutrition'

In fact, there is a short cut for getting elements at the end of the array, and that is to use an offset with a minus in front.  The number is then the offset from one past the last item.  For example, here is another way to get the last element:

In [9]:
disciplines[-1]

'Nutrition'

The last but one element:

In [10]:
disciplines[-2]

'Criminal Justice'

## Index with slices

Sometimes we want more than one element from the array.  For example, we might want the first 5 elements from the array.  We can get these using an array *slice*.  It looks like this:

In [11]:
# All the elements from offset zero up to (not including) 5
disciplines[0:5]

array(['Reading', 'Physical Education', 'Speech', 'Child Development',
       'Theater'], dtype=object)

In [12]:
# All the elements from offset 5 up to (not including) 10
disciplines[5:10]

array(['Music', 'Kinesiology', 'Hospitality', 'Criminal Justice',
       'Nutrition'], dtype=object)

You can omit the first number, if you mean to start at offset 0:

In [13]:
disciplines[:5]

array(['Reading', 'Physical Education', 'Speech', 'Child Development',
       'Theater'], dtype=object)

You can omit the last number if you mean to end at the last element of the array:

In [14]:
disciplines[5:]

array(['Music', 'Kinesiology', 'Hospitality', 'Criminal Justice',
       'Nutrition'], dtype=object)

## Indexing with Boolean arrays

We often want to select several elements from an array according to some criterion.

The most common way to do this, is to do array slicing, using
a Boolean array between the square brackets.

It may be easier to understand this by example.

You have already come across Boolean arrays.

These are arrays of `True` and `False` values.

In [15]:
easiness

array([3.88263514, 3.83225025, 3.67470085, 3.60608187, 3.58450835,
       3.54227291, 3.54143939, 3.47149813, 3.46944009, 3.46833333])

Here is a Boolean array, created from applying a comparison to an array:

In [16]:
greater_than_3p5 = easiness > 3.5
greater_than_3p5

array([ True,  True,  True,  True,  True,  True,  True, False, False,
       False])

As you have already seen, we can do things like count the number
of `True` values in the Boolean array:

In [17]:
np.count_nonzero(greater_than_3p5)

7

Now let us say that we wanted to get the elements from `randoms`
that are less than 0.5.   That is, we want to get the elements
in `randoms` for which the corresponding element in
`less_than_0p5` is `True`.

We can do this with *Boolean array indexing*.  The Boolean array goes between the square brackets, like this:

In [18]:
easiness[greater_than_3p5]

array([3.88263514, 3.83225025, 3.67470085, 3.60608187, 3.58450835,
       3.54227291, 3.54143939])

We have selected the numbers in `easiness` that are greater than 3.5.

We can use this same Boolean array to index into another array.  For example, here we show the discipline *names* corresponding to the courses that are easier than 3.5:

In [19]:
disciplines[greater_than_3p5]

array(['Reading', 'Physical Education', 'Speech', 'Child Development',
       'Theater', 'Music', 'Kinesiology'], dtype=object)

In [20]:
courses.head(10)

Unnamed: 0,Discipline,Number of Professors,Clarity,Helpfulness,Overall Quality,Easiness
64,Reading,148,4.159392,4.188919,4.1775,3.882635
36,Physical Education,991,4.078698,4.030797,4.057719,3.83225
49,Speech,351,4.133191,4.101197,4.119345,3.674701
61,Child Development,171,3.950585,4.00807,3.979766,3.606082
31,Theater,1078,3.876633,3.821503,3.851837,3.584508
19,Music,2455,3.844509,3.787804,3.818114,3.542273
67,Kinesiology,132,3.995,3.972879,3.988712,3.541439
54,Hospitality,267,3.697228,3.744607,3.719476,3.471498
24,Criminal Justice,1786,4.056685,4.033779,4.046702,3.46944
68,Nutrition,120,3.815167,3.844333,3.8315,3.468333


## Exercises

See the [array indexing exercises](../exercises/array_indexing).