# Lab 3: Tables

Welcome to lab 3!  

This week, we will focus on manipulating tables.  We will import our data sets into tables and complete the majority of analysis using these tables. Tables are described in [Chapter 6](https://inferentialthinking.com/chapters/06/Tables.html) of the Inferential Thinking text. A related approach in Python programming is to use what is known as a [pandas dataframe](https://pythonbasics.org/pandas-dataframe/) which we will need to resort to occasionally. Pandas is a mainstay data science tool.

First, set up the tests and imports by running the cell below.

In [None]:
# These lines load the tests/checks and commonly used modules
# THIS TAKES A MINUTE...

import glob
import json
import numpy as np
import os
from datascience import *  # Brings into Python the datascience Table object

import nbformat as nbf
from gofer.ok import check
from EDS_mod.EDS import *
notebook = os.path.basename(globals()['__session__'])
notebooks = glob.glob('*.ipynb')
notebook = max(notebooks, key=os.path.getmtime)

user = os.getenv('JUPYTERHUB_USER') # Get username 
print("Done!")

In [None]:
# Enter your name as a string
dogname = "Fido"
# Your name
name = ...

## 1. Introduction

For a collection of things in the world, an array is useful for describing a single attribute of each thing. For example, among the collection of US States, an array could describe the land area of each. Tables extend this idea by describing multiple attributes for each element of a collection.

In most data science applications, we have data about many entities, but we also have several kinds of data about each entity.

For example, in the cell below we have two arrays. The first one contains the world population in each year (estimated by the US Census Bureau), and the second contains the years themselves. These elements are in order, so the year and the world population for that year have the same index in their corresponding arrays.

In [None]:
population_amounts = Table.read_table("world_population.csv").column("Population")
years = np.arange(1950, 2016,1)
print("Population column:", population_amounts)
print("Years column:", years)

In [None]:
len(population_amounts)

In [None]:
population_amounts.num_row

Suppose we want to answer this question:

> When did world population cross 6 billion?

You could technically answer this question just from staring at the arrays, but it's a bit convoluted, since you would have to count the position where the population first crossed 6 billion, then find the corresponding element in the years array. In cases like these, it might be easier to put the data into a *`Table`*, a two-dimensional type of dataset. 

The expression below:

- creates an empty table using the expression `Table()`,
- adds two columns to the table by calling `with_columns` with four arguments (column label and data array for each),
- assigns the result to a table with the name `population`, and finally
- evaluates `population` so that we can see the table.

The strings `"Year"` and `"Population"` are column labels that we have chosen. Their names `population_amounts` and `years` were assigned above to two arrays of the same length. The function `with_columns` (you can find the documentation [here](http://data8.org/datascience/tables.html)) takes in alternating strings (to represent column labels) and arrays (representing the data in those columns), which are all separated by commas. Tip: Both `population_amounts` and `years` need the same number of data points or an error will be returned on attempting to construct the table.

In [None]:
# Create a table with two columns. 
# Each column as a label (header) and an array of data.
# Note: the columns much be the same length.
population = Table().with_columns(
    "Population", population_amounts,
    "Year", years
)
population

In [None]:
len(population)

In [None]:
population.sort("Population", descending=True)

In [None]:
population.num_rows

Now the data are all together in a single table! It's much easier to parse this data--if you need to know what the population was in 1959, for example, you can tell from a single glance. We'll revisit this table later.

<font color=blue> **Question 1.** </font><br />
From the example in the cell above, identify the variables or data types for each of the following:  which variable contains the table?  which variable contains an array? On the right of the equals sign provide the correct variable name. Note that in the cell above we defined one table, which contains two arrays, so you can pick either array variable for your answer.

In [None]:
table_var = ... 
array_var = ...

In [None]:
check('tests/q1.py')

## 2. Creating Tables

<font color=blue> **Question 2.** </font><br />In the cell below, we've created two arrays. In these examples, we're going to be looking at the Environmental Protection Index which describes the state of sustainability in each country.  More information can be found: [Yale EPI](https://epi.yale.edu/). Using the steps above, assign `top_10_epi` to a table that has two columns called "Country" and "Score", which hold `top_10_epi_countries` and `top_10_epi_scores` respectively.

In [None]:
# Create an array of Environmental Index Protection Scores
top_10_epi_scores = make_array(82.5, 82.3, 81.5, 81.3, 80., 79.6, 78.9, 78.7, 77.7, 77.2)

# Create a corresponding array of country names
top_10_epi_countries = make_array(
        'Denmark',
        'Luxembourg', 
        'Switzerland', 
        'United Kingdom', 
        'France', 
        'Austria', 
        'Finland', 
        'Sweden', 
        'Norway',
        'Germany'
        )

# Use the same approach as in section 1 to build the table.
top_10_epi = ...

# We've put this next line here so your table will get printed out when you run this cell.
top_10_epi

In [None]:
check('tests/q2.py')

#### Loading a table from a file
In most cases, we aren't going to go through the trouble of typing in all the data manually. Instead, we can use our `Table` functions.

`Table.read_table` takes one argument, a path to a data file (a string) and returns a table.  There are many formats for data files, but CSV ("comma-separated values") is the most common.

<font color=blue> **Question 3.** </font><br />The file `yale_epi.csv` in the current directory contains a table of information about 180 countries with their corresponding Environmental Performance Index (EPI) based on 32 indicators of sustainability.  Load it as a table called `epi` using the `Table.read_table` function.

Note: the first few rows of this file look like this:

Country,Score,Decade Change,Rank\
Afghanistan,25.5,5,178\
Angola,29.7,5.3,158\
Albania,49,10.2,62\
United Arab Emirates,55.6,11.3,42\

The first row has the column labels, the remainting rows have the values. In each row values are separated by commas, hence the name "comma-separated values." 

In [None]:
epi = ...
epi 

In [None]:
# REMOVE
epi = Table.read_table("yale_epi.csv")
epi

In [None]:
check('tests/q3.py')

Notice the part about "... (170 rows omitted)."  This table is big enough that only a few of its rows are displayed, but the others are still there.  The first 10 are shown, so there are 180 movies total.

Where did `yale_epi.csv` come from? Look at this lab's folder in the sidebar on the left. You should see a file called `yale_epi.csv`.

Double-click to open up the `yale_epi.csv` file in that folder and look at the format. What do you notice? The `.csv` filename ending says that this file is in the [CSV (comma-separated value) format](http://edoceo.com/utilitas/csv-file-format).

**Quick Tip** Don't just blow by links such as the one above. Reading the material in the references provided is an important part of the lab assignments.

## 3. Using lists

A *list* is another Python sequence type, similar to an array. It's different than an array because the values it contains can all have different types. A single list can contain `int` values, `float` values, and `strings`. Elements in a list can even be other lists! A list is created by giving a name to the list of values enclosed in square brackets and separated by commas. For example, `values_with_different_types = ['data', 8, 8.1]`

Lists can be useful when working with tables because they can describe the contents of one row in a table, which often  corresponds to a sequence of values with different types. A list of lists can be used to describe multiple rows.

Each column in a table is a collection of values with the same type (an array). If you create a table column from a list, it will automatically be converted to an array. A row, on the ther hand, mixes types.

Here's a table from Chapter 5. (Run the cell below.)

In [None]:
# Run this cell to recreate the table
flowers = Table().with_columns(
    'Number of petals', make_array(8, 34, 5),
    'Name', make_array('lotus', 'sunflower', 'rose')
)
flowers

Notice that the column 'Number of petals' contains all integers, which are the same data type, so the column can be stored in an array. Similarly, the column 'Name' contains an array of all strings. The first row of the table, however, contains a the number 8 and the string 'lotus' so we cannot store the row as an array, but we can store it as a list.

<font color=blue> **Question 4.** </font><br />Create a list that describes a new fourth row of this table. The details can be whatever you want, but the list must contain two values: the number of petals (an `int` value) and the name of the flower (a string). For example, your flower could be "pondweed"! (A flower with zero petals)

In [None]:
my_flower = ...
my_flower

In [None]:
check('tests/q4.py')

<font color=blue> **Question 5.** </font><br />`my_flower` fits right in to the table from chapter 5. Complete the cell below to create a table of seven flowers that includes your flower as the fourth row followed by `other_flowers`. You can use `with_row` to create a new table with one extra row by passing a list of values and `with_rows` to create a table with multiple extra rows by passing a list of lists of values.

In [None]:
# The with_row method adds a single row to a table.
# Here is an example:
example_table = Table().with_columns("Number", [1, 2, 3], "Name", ["One", "Two", "Three"])
new_row = [4, "Four"]
example_table = example_table.with_row(new_row)
example_table

In [None]:
# Use the method .with_row(...) to create a new table that includes my_flower 
# In this case you are adding a single row
four_flowers = ...
four_flowers

In [None]:
# The with_rows method adds multiple rows to a table.
# Here is an example:
additional_rows = [[5, "Five"], [6, "Six"]]
example_table = example_table.with_rows(additional_rows)
example_table

In [None]:
# Use the method .with_rows(...) to create a table that 
# includes four_flowers followed by other_flowers
# Now you are adding multiple rows, so
# notice that other_flowers is a list of lists.
other_flowers = [[10, 'lavender'], [3, 'birds of paradise'], [6, 'tulip']]

seven_flowers = ...
seven_flowers

In [None]:
check('tests/q5.py')

## 4. Analyzing datasets
With just a few table methods, we can answer some interesting questions about the EPI dataset.

If we want just the scores of each country, we can get an array that contains the data in that column:

In [None]:
# We have a table named epi. We use the column() method on the
# table and pass in the name of the column we wish to extract.
epi.column("Score")

The value of that expression is an array, exactly the same kind of thing you'd get if you typed in `make_array(25.5, 29.7, 49.0, [etc])`.

The .column() table method will accept either the name of the column or the number, where the first column is column 0, the second is column 1, etc. Note that Python always starts counting elements in a list, array or table starting at zero. So, the first column of a table is column zero.

Therefore, an equivalent way of obtaining the scores (the second column) is:

In [None]:
epi.column(1)

So you can index a column in a table with either the column name or the column number.  You'll see later that there are times when this flexibility comes in handy.


<font color=blue> **Question 6.** </font><br />Find the EPI score of the highest-ranked country in the dataset.

*Hint:* Think back to the functions you've learned about for working with arrays of numbers.  We used one to find the maximum of an array of numbers. Ask for help if you can't remember one that's useful for this. 

So, so solve question 6, you need to extract the array of scores from the data table using either of the two methods shown above and find the maximum of that array.

In [None]:
highest_rating = ...
highest_rating

In [None]:
check('tests/q6.py')

That's not very useful, though.  You'd probably want to know the *name* of the country whose score you found!  To do that, we can sort the entire table by EPI Score, which ensures that the scores and country will stay together. Note that calling `sort()` creates a copy of the table and leaves the original table unsorted.

In [None]:
epi.sort("Score")

Well, that doesn't help much, either -- we sorted the countries from lowest -> highest scores.  To look at the highest-ranked countries, sort in reverse order:

In [None]:
epi.sort("Score", descending=True)

(The `descending=True` bit is called an *optional argument*. It has a default value of `False`, so when you explicitly tell the function `descending=True`, then the function will sort in descending order.)

So the country with the highest Environmental Protection Index is Denmark with 82.5.  

Some details about sort:

1. The first argument to `sort` is the name of a column to sort by.
2. If the column has strings in it, `sort` will sort alphabetically; if the column has numbers, it will sort numerically.
3. The value of `epi.sort("Score")` is a *copy of `epi`*; the `epi` table doesn't get modified. For example, if we called `epi.sort("Score")`, then running `epi` by itself would still return the unsorted table.
4. Rows always stick together when a table is sorted.  It wouldn't make sense to sort just one column and leave the other columns alone.  For example, in this case, if we sorted just the "Score" column, the countries would all end up with the wrong scores.

<font color=blue> **Question 7.** </font><br />  We also have information about the changes in sustainability scores from 2010 to 2020.  Create a version of `epi` that's sorted by change, with the largest, positive changes first.  Call it `epi_changes`.

In [None]:
epi_changes = ...
epi_changes

In [None]:
check('tests/q7.py')

<font color=blue> **Question 8.** </font><br />What's the name of the country with the largest, positive change in the dataset?  You could just look this up from the output of the previous cell.  Instead, write Python code to find out.

*Hint:* Starting with `epi_changes`, extract the country column to get an array, then use `item` to get its first item. Recall that Python always start indexing at zero, so `.item(0)` will retrieve the first value from an array. `.item(1)` the second element, etc.

In [None]:
largest_positive_change = ...
largest_positive_change

In [None]:
check('tests/q8.py')

## 5. Finding pieces of a dataset
Let's take a look at another dataset.  In the cell below, we're reading in a movie dataset which contains columns for Votes on imdb, imdb Rating, Movie Title, the year released, and the decade the movie was released. 

In [None]:
imdb = Table.read_table('imdb.csv')
imdb

 Suppose you're interested in movies from the 1940s.  Sorting the table by year doesn't help you, because the 1940s are in the middle of the dataset.

Instead, we use the table method `where()`.

In [None]:
forties = imdb.where('Decade', are.equal_to(1940))
forties

Ignore the syntax for the moment.  Instead, try to read that line like this:

> Assign the name **`forties`** to a table whose rows are the rows in the **`imdb`** table **`where`** the values in the **`Decade`**  column **`are` `equal` `to` `1940`**.

<font color=blue> **Question 9.** </font><br />Compute the average rating of movies from the 1980s.

*Hint:* The function `np.average` computes the average of an array of numbers.

In [None]:
# Use the where() method
eighties = ...

# Extract the ratings column or your new table, then take the average
average_rating_in_eighties = ...
average_rating_in_eighties

In [None]:
check('tests/q9.py')

Now let's dive into the details a bit more.  `where` takes two arguments:

1. The first argument is the name of a column.  `where` finds rows where that column's values meet some criterion.
2. The second argument is something called a predicate that describes the criterion that the column needs to meet.

To create our predicate, we called the function `are.equal_to` with the value we wanted, 1980.  We'll see other predicates soon.

`where` returns a table that's a copy of the original table, but with only the rows that meet the given predicate.

<font color=blue> **Question 10.** </font><br /> Create a table called `ninety_nine` containing the movies that came out in the year 1999.  Use `where`.

In [None]:
ninety_nine = ...
ninety_nine

In [None]:
check('tests/q10.py')

So far we've only been finding where a column is *exactly* equal to a certain value. However, there are many other predicates that can be used to return only the rows of a data table that meet our condition.  Here are a few other predicates:

|Predicate|Example|Result|
|-|-|-|
|`are.equal_to`|`are.equal_to(50)`|Find rows with values equal to 50|
|`are.not_equal_to`|`are.not_equal_to(50)`|Find rows with values not equal to 50|
|`are.above`|`are.above(50)`|Find rows with values above (and not equal to) 50|
|`are.above_or_equal_to`|`are.above_or_equal_to(50)`|Find rows with values above 50 or equal to 50|
|`are.below`|`are.below(50)`|Find rows with values below 50|
|`are.between`|`are.between(2, 10)`|Find rows with values above or equal to 2 and below 10|

The textbook section on selecting rows has more examples.


<font color=blue> **Question 10 Discussion.** </font></br>
Testing for equality with `are.equal_to` works great for integers; for floating point numbers (numbers with decimals), not so much. 

Consider the fraction 1/3. As a floating point number it is 0.333333 with an infinite number of 3's. Computers cannot store numbers to infinte precision, so the value will be truncated at some point. There will be a teesy, tiny, round-off error. You know, and I know, that 1/3 + 1/3 + 1/3 = 1, but if the 1/3's have all stored as truncated decimals, the sum will note **exactly** equal 1. Instead, the sum will be 0.99999999... to sum number of decimals places.

Floating point round-off errors are everywhere in computer calculations. Mostly, we don't care becuase they occur in the umpteenth decimal place, but they will still sometimes cause tests for equality to fail, leading to bugs that can be extremely hard to track down.

**In the markdown cell below,** describe a way to test values in a table using one of the predicates above to return rows that are close enough to a value that they are equal for all practical purposes. Use words, not code, to describe your solution.

...

In [None]:
# Remember to save your notebook before running this check.
check('tests/q10_open_ended.py')

<font color=blue> **Question 11.** </font><br />Using `where` and one of the predicates from the table above, find all the movies with a rating higher than 8.8.  Put their data in a table called `really_highly_rated`.

In [None]:
really_highly_rated = ...
really_highly_rated

In [None]:
check('tests/q11.py')

### A slightly more elaborate example
Suppose you wanted to find the lowest rating of all the movies in 20th century. What are the steps?

1. Filter the table to have just the movies that came out in the 20th century.
2. Extract the ratings column into an array.
3. Find the minimum value of that array.

In [None]:
# Filter the table to have just the movies that came out in the 20th century.
# Note: movies didn't exist before the 20th century.
imdb_20th_century = imdb.where('Year', are.below(2000))

# Extract the ratings column into an array using .column()
imdb_20th_century_ratings = imdb_20th_century.column('Rating')

# Find the minimum value of that array.
lowest_rating = min(imdb_20th_century_ratings)
                    
print("The lowest rating of the 20th century films in the IMDB database is:", lowest_rating)

<font color=blue> **Question 12.** </font><br />**Your turn.** Find the average rating for movies released in the 20th century and the average rating for movies released in the 21st century for the movies in `imdb`.

*Hint*: Think of the steps you need to do (take the average, find the ratings, find movies released in 20th/21st centuries), and try to put them in an order that makes sense.

*Hint*: You can always add lines of code before the lines with the ...

In [None]:
average_20th_century_rating = ...
average_21st_century_rating = ...
print("Average 20th century rating:", average_20th_century_rating)
print("Average 21st century rating:", average_21st_century_rating)

In [None]:
check('tests/q12.py')

The property `num_rows` tells you how many rows are in a table.  (A "property" is just a method that doesn't need to be called by adding parentheses because it doesn't take any arguments.)

In [None]:
num_movies_in_dataset = imdb.num_rows
num_movies_in_dataset

<font color=blue> **Question 13.** </font><br />Use `num_rows` (and arithmetic) to find the *proportion* of movies in the dataset that were released in the 20th century, and the proportion from the 21st century.

*Hint:* The *proportion* of movies released in the 20th century is the *number* of movies released in the 20th century, divided by the *total number* of movies.

*Hint:* Again, you can **always** add lines of code before the lines with the ...

In [None]:
proportion_in_20th_century = ...
proportion_in_21st_century = ...
print("Proportion in 20th century:", proportion_in_20th_century)
print("Proportion in 21st century:", proportion_in_21st_century)

In [None]:
check('tests/q13.py')

### Digression: Column calculations
Often you want to create a new column in a table that is the result of some calculation using one or more of the existings columns. This typically involves three steps:

1. Extract the data from the columns into array(s).
2. Do math with the array(s).
3. Add the results as a new column to your table using the .column() method.

Let's examine this process with a simple table.

In [None]:
# Notice that a table is comprised of columns with a label and a list or array
quiz_scores = Table().with_columns(
    "Name", ["Lynda", "Jerome", "Ali"],
    "Quiz 1", make_array(8, 9, 10),
    "Quiz 2", make_array(7, 8, 9)
)
quiz_scores

Now suppose we wanted a table with just the students whose total quiz scores exceeds 15. How would we do this? We can use the where() method to filter, but first we need a column with the total scores.

To add a column to this table with the total of the quiz 1 and 2, we will follow the three steps.

In [None]:
# Extract the data from the columns into array(s).
q1 = quiz_scores.column('Quiz 1')
q2 = quiz_scores.column('Quiz 2')

# Do math with the array(s).
total = q1 + q2

# Add the results as a new column to your table using the .column() method.
quiz_scores = quiz_scores.with_column('Quiz Total', total)

quiz_scores

Now if we are ready to filter our table. Since we want score exceeding 15, we use the `are.above()` predicate.

In [None]:
quiz_scores.where('Quiz Total', are.above(15))

<font color=blue> **Question 14.** </font><br />Back to movies! **Here's a challenge: Find the number of movies that came out in *odd* years.**

*Hint:* The math operator `%` computes the remainder when dividing by a number.  So `5 % 2` is 1 and `6 % 2` is 0.  A number is odd if the remainder is 1 when you divide by 2.

*Hint 2:* `%` can be used on arrays, operating elementwise like `+` or `*`.  So `make_array(5, 6, 7) % 2` is `array([1, 0, 1])`.

*Hint 3:* Create a column called "Year Remainder" that's the remainder when each movie's release year is divided by 2.  Make a copy of `imdb` that includes that column.  (You may need to add more variables.)  Then use `where` to find rows where that new column is equal to 1.  Then use `num_rows` to count the number of such rows.

*Hint 4:* Break this calculation into steps by adding code cells before the one below. Check the results of each step.

In [None]:
...
num_odd_year_movies = ...
num_odd_year_movies

In [None]:
check('tests/q14.py')

## 6. Miscellanea
There are a few more table methods you'll need to fill out your toolbox.  The first three have to do with manipulating the columns in a table.

The table `farmers_markets.csv` contains data on farmers' markets in the United States  (data collected [by the USDA]([dataset](https://apps.ams.usda.gov/FarmersMarketsExport/ExcelExport.aspx)).  Each row represents one such market.

<font color=blue> **Question 15.** </font><br />Load the dataset into a table.  Call it `farmers_markets`.

In [None]:
farmers_markets = ...
farmers_markets

In [None]:
# REMOVE
farmers_markets = Table.read_table("farmers_markets.csv")
farmers_markets

In [None]:
check('tests/q15.py')

Notice that the table has `nan` strings in many of the columns. `nan` stands for 'not a number,' which means the file being read into the table had some missing values. We won't discuss the problem of missing values in this lab, but it comes up A LOT in data science, so we will return to it later in the course.

You'll also notice that the table has a large number of columns!

### `num_columns`

<font color=blue> **Question 16.** </font><br /> The table property `num_columns` (example call: `tbl.num_columns`) produces the number of columns in a table.  Use it to find the number of columns in our farmers' markets dataset.

In [None]:
num_farmers_markets_columns = ...
print("The table has", num_farmers_markets_columns, "columns in it!")

In [None]:
check('tests/q16.py')

Most of the columns are about particular products -- whether the market sells tofu, pet food, etc.  If we're not interested in that stuff, it just makes the table difficult to read.  This comes up more than you might think.

### `select`

In such situations, we can use the table method `select` to pare down the columns of a table.  It takes any number of arguments.  Each should be the name or index number of a column in the table.  It returns a new table with only those columns in it.

For example, the value of `imdb.select("Year", "Decade")` is a table with only the years and decades of each movie in `imdb`.

### Digression: `select` and `column` -- Know the difference!

Students often confuse these two, so pay attention.

* Use `select` when you want a **new table** that contains only some of the columns from the original table.
* Use `column` when you want to **extract the data** from a particular column of a table.

Let's illustrate with our simple quiz scores table.

In [None]:
# Create the table
quiz_scores = Table().with_columns(
    "Name", ["Lynda", "Jerome", "Ali"],
    "Quiz 1", make_array(8, 9, 10),
    "Quiz 2", make_array(7, 8, 9)
)
quiz_scores

In [None]:
# Select returns a new table with the just the selected columns.
# The argument to select is a list of column names.
quiz_scores.select(["Name", "Quiz 1"])

In [None]:
# Column returns that data from a column in the table.
quiz_scores.column("Quiz 1")

Make sure you understand the difference between `column` and `select`. You will use both of them a lot!

<font color=blue> **Question 17.** </font><br />Use `select` to create a table with only the name, city, state, latitude ('y'), and longitude ('x') of each market.  Call that new table `farmers_markets_locations`.

In [None]:
farmers_markets_locations = ...
farmers_markets_locations

In [None]:
check('tests/q17.py')

### `select` is not `column`!

The method `select` is **definitely not** the same as the method `column`.

`farmers_markets.column('y')` is an *array* of the latitudes of all the markets.  `farmers_markets.select('y')` is a *table* that happens to contain only one column, the latitudes of all the markets.

<font color=blue> **Question 18.** </font><br />Below, we tried using the function `np.average` to find the average latitude ('y') and average longitude ('x') of the farmers' markets in the table, but we screwed something up.  Run the cell to see the (somewhat inscrutable) error message that results from calling `np.average` on a table.  Then, fix our code.

In [None]:
average_latitude = np.average(farmers_markets.select('y'))
average_longitude = np.average(farmers_markets.select('x'))
print("The average of US farmers' markets' coordinates is located at (", average_latitude, ",", average_longitude, ")")

In [None]:
check('tests/q18.py')

### `drop`

`drop` serves the same purpose as `select`, but it takes away the columns you list instead of the ones you don't list, leaving all the rest of the columns.

<font color=blue> **Question 19.** </font><br />Suppose you just didn't want the "Website" or "Location" columns in `farmers_markets`.  Create a table that's a copy of `farmers_markets` but doesn't include those columns.  Call that table `farmers_markets_without_website`.

In [None]:
farmers_markets_without_website = ...
farmers_markets_without_website

In [None]:
check('tests/q19.py')

#### `take`
Let's find the five easternmost farmers' markets in the US.  You already know how to sort by longitude ('x'), but we haven't seen how to get the first five rows of a table.  That's what `take` is for.

Table columns have names, but rows do not. Previously, when you used `column` you could use either the column name or the column number. With rows, you have to use the row numbers. The table method `take` takes as its argument an array of numbers.  Each number should be the index of a row in the table.  It returns a new table with only those rows. As always, the indexing in Python starts at zero.

Let's see how `take` works by creating a simple table and selecting some of the rows.

In [None]:
# Create a table
cookies = Table()
cookies = cookies.with_columns(
    "Cookie", make_array("Sugar cookies", "Chocolate chip", "Red velvet", "Oatmeal raisin", "Peanut butter"),
    "Quantity", make_array(10, 15, 15, 10, 5)
)
cookies

In [None]:
# Use take() to get the Chocolate chip row
# Rember: the first row is row zero
cookies.take(1)

Most often you'll want to use `take` in conjunction with `np.arange` to take the first few rows of a table.

In [None]:
# Generate numbers 0, 1, 2
np.arange(3)

In [None]:
# Use np.arange to "take" the first three rows
cookies.take(np.arange(3))

<font color=blue> **Question 20.** </font><br /> Make a table of the five easternmost farmers' markets in `farmers_markets_locations`.  Call it `eastern_markets`.  (It should include the same columns as `farmers_markets_locations`.

*Hint* The easternmost markets with have the smallest longitude.

In [None]:
eastern_markets = ...
eastern_markets

In [None]:
check('tests/q20.py')

<font color=blue> **Question 21.** </font><br />
At the end of each lab, please include a reflection. 
* How did this lab go? 
* What aspects of Tables do you find confusing?
* Were there questions you found especially challenging you would like your instructor to review in class? 
* How long did the lab take you to complete?

Share your feedback so we can continue to improve this class!

**Write your reflection on this lab in the markdow cell below Add as many lines as you wish.**

...

In [None]:
# Save your notebook before running this test.
check('tests/q21_open_ended.py')

## 7. Summary

For your reference, here's a table of all the functions and methods we saw in this lab.

|Name|Example|Purpose|
|-|-|-|
|`Table`|`Table()`|Create an empty table, usually to extend with data|
|`Table.read_table`|`Table.read_table("my_data.csv")`|Create a table from a data file|
|`with_columns`|`tbl = Table().with_columns("N", np.arange(5), "2*N", np.arange(0, 10, 2))`|Create a copy of a table with more columns|
|`column`|`tbl.column("N")`|Create an array containing the elements of a column|
|`sort`|`tbl.sort("N")`|Create a copy of a table sorted by the values in a column|
|`where`|`tbl.where("N", are.above(2))`|Create a copy of a table with only the rows that match some *predicate*|
|`num_rows`|`tbl.num_rows`|Compute the number of rows in a table|
|`num_columns`|`tbl.num_columns`|Compute the number of columns in a table|
|`select`|`tbl.select("N")`|Create a copy of a table with only some of the columns|
|`drop`|`tbl.drop("2*N")`|Create a copy of a table without some of the columns|
|`take`|`tbl.take(np.arange(0, 6, 2))`|Create a copy of the table with only the rows whose indices are in the given array|

<br/>

Congratulations, you're done with lab 3!  Be sure to 
- **run all the tests and verify that they all pass** (the next cell has a shortcut for that), 
- **Save and Export as, HTML** from the `File` menu,
- **Right click on file name to download noteook** 
- **Upload and Submit your files to Canvas** under the corresponding assignment .

In [None]:
import glob
from gofer.ok import check

for x in [
    "1",
    "2",
    "3",
    "4",
    "5",
    "6",
    "7",
    "8",
    "9",
    "10",
    "10_open_ended",
    "11",
    "12",
    "12",
    "13",
    "14",
    "15",
    "16",
    "17",
    "18",
    "19",
    "20",
    "21_open_ended",
]:
    print("Testing question {}: ".format(x))
    display(check("tests/q{}.py".format(x)))

In [None]:
print("Nice work ",name, user)
import time;
localtime = time.asctime( time.localtime(time.time()) )
print("Submitted @ ", localtime)

In [None]:
from datascience import Table

# Create the data table
phones = Table().with_columns(
    'Student Name', ['Jada', 'Noah', 'Derrick', 'Eva', 'Aisha'],
    'Model', ['iPhone', 'Samsung Galaxy', 'Google Pixel', 'iPhone', 'iPhone'],
    'Screentime', [180, 375, 710, 508, 197],
    'Data (MB/day)', [22, 10.4, 26, 57, 33.6],
    'Cell phone provider', ['Verizon', 'AT&T', 'T-Mobile', 'Sprint', 'T-Mobile']
)

# Display the table
phones

In [None]:
heavy_users = phones.where("Screentime", are.above(500)).where("Data (MB/day)", are.above(50))

In [None]:
heavy_users