###  Islands: Python Foundations - Chapter 4

[Back to Main Page](0_main_page.ipynb)

[How to use this book interactively on Deepnote](99_how_to_use_this_book.ipynb)

[Download this book](99_how_to_use_this_book_local.ipynb)

<br>

<h1> <center> Arrays & Boolean Indexing </center> </h1> 

## Importing Libraries

As before, the cell below imports the libraries we need. 

Once again, <b> it is very important you run each cell in this notebook in the order in which they appear. </b> Later cells depend on the activity of earlier cells. 

<br>
<br>
<center> ↓↓↓ <b> Before reading on, please run the cell below</b>. Click on the cell and press `shift` and `Enter` together.↓↓↓ </center>

In [1]:
# run this cell (by pressing 'Control' and 'Enter' together) to import the libraries needed for 
# this page

# 'import' tells python to get a set of functions (which is called a library), in the first case this is 
# the numpy library. The 'as' tells python to name the library something (to save us typing out 'numpy'); in this case
# we name the library 'np'
import numpy as np

# in this case we import the pandas library and name it 'pd'
import pandas as pd

# here we import the matplotlib.pyplot library and name it 'plt'
import matplotlib.pyplot as plt

# this imports the machinery for marking answers to questions
from client.api.notebook import Notebook
ok = Notebook('ok_tests/3_arrays_booleans.ok')

# generating the data for this page
import py_found

psychosis_status_observations, observations_sex, psychosis_scores, names = py_found.arrays_booleans_page_setup()

Assignment: 3_arrays_booleans
OK, version v1.18.1



## Comparisons and Booleans

We mentioned earlier that python is useful for representing things in the world, and the relations between those things. We can use ```comparison operators``` to ask questions about the relations between differnt collections of data.

Consider the following list, which contains the names of the 5 people who were in the first sample of observations that your research group made on the island:

In [2]:
names

['roy', 'david', 'lucy', 'aiesha', 'amelia']

We can see that the third person we observed is called `lucy`. We know that to access this element of the ```names``` list, we would use ```names[2]``` (remember that Python begins counting at 0, so the index of the third element in a list is 2):

In [3]:
names[2]

'lucy'

The ```==``` operator in python asks the question 'is this value equal to another value?'. 

Our psychiatrist friend asks us *'what was the name of the third person we observed? Was it Lucy?'*. 

The code in the cell below asks the question 'is the third element of the ```names``` list equal to the value ```lucy```'?

In [4]:
names[2] == 'lucy'

True

Look at the ```names``` list above. We can see that ```'roy``` is the first name in the list. Therefore, we, as humans, can see that it is <b>not</b> the case that ```'roy'``` is the third element in the ```names``` list. Let's ask Python if ```'roy'``` is the third element of the ```names``` list:

In [5]:
names[2] == 'roy'

False

### Question 1

In the cell below, yse the `==` comparison operator to ask *'is the 5th element of the `names` list equal to 'david'?'. Store your answer in a variable called `is_it_david`:

In [6]:
is_it_david = names[4] == 'david' #!!! replace with ...

# show the value
is_it_david

False

In [7]:
_ = ok.grade('q1')

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 2
    Failed: 0
[ooooooooook] 100.0% passed



The values ```True``` and ```False``` are a specific data type: they are Booleans. If you use a comparison operator to ask python a question about the relations between two values, it will always return a Boolean - a ```True``` or ```False```.

In [8]:
type(True)

bool

In [9]:
type(False)

bool

Notice that the Boolean values ```True``` and ```False``` do not have quotation marks around them. What type do you think they would be if they had quotation marks around them?

In [10]:
type('True') 

str

There are other comparison operators, which ask different questions to ```==```. For instance:

```!=```  - asks *'are these two values NOT equal?*

```names[3]``` has the value ```'aiesha'```, which is not the same value as ```lucy```. 

Our psychiatrist friend asks us *`the fourth person we observed wasn't called Lucy, is that correct?'*. So we ask Python:

In [11]:
# this code asks 'is the fourth element in the observations list NOT EQUAL to 'lucy'?
names[3] != 'lucy' 

True

### Question 2

Find out what value ```names[2]``` has. Ask python 'is ```names[2]``` NOT EQUAL to ```'lucy'```? Store the result in a variable called `ans_lucy`:

In [12]:
names

['roy', 'david', 'lucy', 'aiesha', 'amelia']

In [13]:
# your code here
ans_lucy =  names[2] != 'lucy' #!!! replace with ...

# this line just makes the cell output whatever value you have saved in the ans_lucy variable
ans_lucy  

False

In [14]:
_ = ok.grade('q2')

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 2
    Failed: 0
[ooooooooook] 100.0% passed



Recall the `psychosis_scores` list, which contains the psychosis questionairre scores for the first five individuals sampled on the island:

In [15]:
# run this cell to see the contents of the list
psychosis_scores

[80, 20, 14, 13, 91]

The psychiatrist tells you that, for the questionnaire being used, the cutoff score for identifying that an individual is experiencing a psychotic episode is 70.

The comparison operator `>=` asks 'is the value on the left hand side of the operator greater than or equal to the value on the right hand side?'.

So if we wanted to ask 'is 50 greater than or equal to 100?', we would write:

In [16]:
# run this cell
50 >= 100

False

You can see that this operator returns a Boolean, indicating the answer to the question we asked it.

### Question 3

Using python, how would you ask:
> *'did the fifth person we sampled have a score above the cutoff of 70, indicating they were experiencing a psychotic episode?'*. 

Store your answer in a variable called `fifth_psychosis`:

In [17]:
fifth_psychosis = psychosis_scores[4] > 70 #!!! replace with ...

# this line just makes the cell output whatever value you have saved in the fifth_psychosis variable
fifth_psychosis 

True

In [18]:
_ = ok.grade('q3')

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 2
    Failed: 0
[ooooooooook] 100.0% passed



## Numpy arrays and Boolean Indexing

What if we want to see all of the elements of the `psychosis_scores` list which are greater than 70? 

Could we use `psychosis_scores > 70`? 

In [19]:
psychosis_scores > 70

TypeError: '>' not supported between instances of 'list' and 'int'

If you look at the error message that `psychosis_scores > 70` generates, it is clear that `>` will not work to compare all the elements of a list with a given value.

This is annoying because a lot of questions we might want to investigate will need that sort of operation. For instance, calculating the prevalence of psychotic disorders on the island will require calculating how many islanders have a psychotic disorder. If the `psychosis_scores` list containing the scores of all 10,000 islanders, then working out how many of those scores were over 70 would be essential to calculating the prevalence.

Fortunately, there is a way of using comparison operators like `>` on multiple elements. We need to convert the list to a *numpy array*. This is very much like a list, but has useful properties that let us do things we cannot do with lists.

[Quick aside on functions, mention next page]

In [None]:
import numpy as np

np.array([psychosis_scores])

In [None]:
psychosis_scores_array = np.array([psychosis_scores])

type(psychosis_scores_array)

In [None]:
psychosis_scores_array > 70

In [None]:
greater_than_70 = psychosis_scores_array > 70

# show the array

greater_than_70

The True's and False's act as 'switches' and 'turn off' the values which correspond to a false:

In [None]:
print(psychosis_scores_array)
print(psychosis_scores_array >= 70)

[Explain the False's turning off values]

In [None]:
psychosis_scores_array[psychosis_scores_array >= 70]

## Question 4

In [None]:
less_than_70 = psychosis_scores_array < 70 #!!! replace with...

# show the array
less_than_70_boolean_array

In [None]:
_ = ok.grade('q4')

## Question 5

In [None]:
# compare Boolean array to psychosis scores array 
psychosis_scores_array

# student visually identifies which scores will be 

In [None]:
# student writes scores here

In [None]:
scores_less_than_70 = psychosis_scores_array[less_than_70_boolean_array]

# show the array
scores_less_than_70 

In [None]:
_ = ok.grade('q5')

## Question 5

In [None]:
psychosis_status_observations_array = np.array(psychosis_status_observations)

# show the array
psychosis_status_observations_array

In [None]:
is_psychotic = psychosis_status_observations_array == 'psychotic'

# show the array
is_psychotic_boolean_array

In [None]:
_ = ok.grade('q5')

## Question 6 

[SAME STRUCTURE AS QUESTION ABOVE, MANUALLY IDENTIFY ELEMENTS THEN GET THEM WITH BOOLEAN INDEXING]

psychosis_status_observations_array

In [None]:
_ = ok.grade('q6')

## Question 7

In [None]:
observations_sex_array = np.array(observations_sex)

# show the array 
observations_sex_array

In [None]:
is_male_boolean_array = observations_sex_array == 'male' #!!! replace with ...

males = observations_sex_array[is_male_boolean_array]

# show the males array
males

In [None]:
_ = ok.grade('q7')

In [None]:
# For your convenience, you can run this cell to run all the tests at once!
import os
_ = [ok.grade(q[:-3]) for q in os.listdir("ok_tests/3_arrays_booleans") if q.startswith('q')]

Or, you can [return to the main page](0_main_page.ipynb).

To navigate to any other page, the table of contents is below:

## Other Chapters

1. [Populations, Samples & Questions: Why Learn Python?](1_populations_samples_questions.ipynb)
2. [Lists & Indexing](2_lists_indexing.ipynb)
3. [Arrays & Boolean Indexing](3_arrays_booleans.ipynb)
4. [Functions & Plotting](4_functions_plotting.ipynb)
5. [For Loops - doing things over (and over and over...)](5_for_loops.ipynb)
6. [Testing via Simulation: Psychosis Prevalence](6_simulation_psychosis_prevalence.ipynb)

***
By [pxr687](99_about_the_author.ipynb) 