# Lab 02: Assignment Statements and Table Operations

Welcome to Lab 02! Each week you will complete a lab assignment like this one. You can't learn technical subjects without hands-on practice, so labs are an important part of the course.

Collaborating on labs is more than okay -- it's encouraged! You should rarely remain stuck for more than a few minutes on questions in labs, so ask a neighbor or an instructor for help. Explaining things is beneficial, too -- the best way to solidify your knowledge of a subject is to explain it. You should not just copy/paste someone else's code, but rather work together to gain understanding of the task you need to complete.

To receive credit for a lab, answer all questions correctly and submit before the deadline.

**Due Date:** 

**Collaboration Policy:** Data science is a collaborative activity. While you may talk with others about the labs, we ask that you **write your solutions individually**. If you do discuss the assignments with others **please include their names below** (it's a good way to learn your classmates' names).

**Collaborators:** 

List collaborators here.

## Today's Lab 

This week, we'll learn how to import a module and practice table operations. Please complete this notebook by filling in the cells provided. Before you begin, be sure you've run the very first cell in this notebook to load `otter`, the auto-grading library that will provide feedback while completing this assignment. This cell should be run each time you load this notebook, as everything stored into memory is reset when you close the notebook for more than a few minutes.

Recommended reading:

* [Names](https://inferentialthinking.com/chapters/03/2/Names.html)

* [Call expressions](https://inferentialthinking.com/chapters/03/3/Calls.html)

* [Introduction to tables](https://www.inferentialthinking.com/chapters/03/4/Introduction_to_Tables)

Run the cell below to set up the imports.

In [43]:
from datascience import *

# 1. Assignment Statement

The two building blocks of Python code are *expressions* and *statements*.  An **expression** is a piece of code that

* is self-contained, meaning it would make sense to write it on a line by itself, and
* usually evaluates to a value.


Here are two expressions that both evaluate to 3

* 3
* 5 - 2
    
One important type of expression is the **call expression**. A call expression begins with the name of a function and is followed by the argument(s) of that function in parentheses. The function returns some value, based on its arguments. Some important mathematical functions are listed below.

| Function | Description                                                   |
|----------|---------------------------------------------------------------|
| `abs`      | Returns the absolute value of its argument                    |
| `max`      | Returns the maximum of all its arguments                      |
| `min`      | Returns the minimum of all its arguments                      |
| `pow`      | Raises its first argument to the power of its second argument |
| `round`    | Rounds its argument to the nearest integer                     |

Here are two call expressions that both evaluate to 3

* `abs(2 - 5)`

* `max(round(2.8), min(pow(2, 10), -1 * pow(2, 10)))`

The expression `5 - 2` and the two call expressions given above are examples of **compound expressions**, meaning that they are actually combinations of several smaller expressions.  `5 - 2` combines the expressions `5` and `2` by subtraction.  In this case, `5` and `2` are called **subexpressions** because they're expressions that are part of a larger expression.

A **statement** is a whole line of code.  Some statements are just expressions.  The expressions listed above are examples.

Other statements *make something happen* rather than *having a value*. For example, an **assignment statement** assigns a value to a name. 

A good way to think about this is that we're **evaluating the right-hand side** of the equals sign and **assigning it to the left-hand side**. Here are some assignment statements:
    
* `height = 1.3`

* `the_number_five = abs(-5)`

* `absolute_height_difference = abs(height - 1.688)`

An important idea in programming is that large, interesting things can be built by combining many simple, uninteresting things.  The key to understanding a complicated piece of code is breaking it down into its simple components.

For example, a lot is going on in the last statement above, but it's really just a combination of a few things.  This picture describes what's going on.

<img src="images/statement.png"/>

**Question 1.** In the next cell, assign the name `new_year` to the larger number among the following two numbers:

1. the **absolute value** of $2^{5}-2^{11}-2^3 + 2$ , and

2. $5 \times 13 \times 31+1$.

Try to use just one statement (one line of code). Be sure to check your work by executing the test cell afterward.


In [44]:
new_year = max(abs(2**5-2**11-2**3+1), 5*13*31+1) # SOLUTION
new_year

2023

In [45]:
new_year

2023

We've asked you to use one line of code in the question above because it only involves mathematical operations. However, more complicated programming questions will more require more steps. It isn’t always a good idea to jam these steps into a single line because it can make the code harder to read and harder to debug.

Good programming practice involves splitting up your code into smaller steps and using appropriate names. You'll have plenty of opportunities to practice this throughout this course.

# 2. Importing code

> What has been will be again,  
> what has been done will be done again;  
> there is nothing new under the sun.

Most programming involves work that is very similar to work that has been done before.  Since writing code is time-consuming, it's good to rely on others' published code when you can.  Rather than copy-pasting, Python allows us to **import modules**. A module is a file with Python code that has defined variables and functions. By importing a module, we are able to use its code in our own notebook.

Python includes many useful modules that are just an import away.  We'll look at the `math` module as a first example. The `math` module is extremely useful in computing mathematical expressions in Python. 

Suppose we want to very accurately compute the area of a circle with a radius of 5 meters.  For that, we need the constant $\pi$, which is roughly 3.14.  Conveniently, the `math` module has `pi` defined for us.

Run the cell below.

In [46]:
import math

radius = 5
area_of_circle = radius**2 * math.pi
area_of_circle

78.53981633974483

In the code above, the line `import math` imports the math module. This statement creates a module and then assigns the name `math` to that module. We are now able to access any variables or functions defined within `math` by typing the name of the module followed by a dot, then followed by the name of the variable or function we want.

**Question 2.** The module `math` also provides the name `e` for the base of the natural logarithm, which is roughly 2.71.  Compute $e^{\pi}-\pi$, giving it the name `near_twenty`.


In [47]:
near_twenty = math.e**math.pi-math.pi # SOLUTION
near_twenty

19.99909997918947

In [48]:
near_twenty != math.e**math.pi

True

In [49]:
near_twenty

19.99909997918947

## 3. Accessing Functions

In the question above, you accessed variables within the `math` module. 

Modules also define functions.  For example, `math` provides the name `sin` for the sine function.  Having imported `math` already, we can write `math.sin(3)` to compute the sine of 3.

**Note:** This sine function considers its argument to be in [radians](https://en.wikipedia.org/wiki/Radian), not degrees. 180 degrees are equivalent to $\pi$ radians.

**Question 3.** A $\frac{\pi}{4}$ radian (45-degree) angle forms a right triangle with equal base and height, pictured below.  If the hypotenuse (the radius of the circle in the picture) is 1, then the height is $\sin\left(\frac{\pi}{4}\right)$.  Compute that value using `sin` and `pi` from the `math` module.  Give the result the name `sine_of_pi_over_four`.

<img src="http://mathworld.wolfram.com/images/eps-gif/TrigonometryAnglesPi4_1000.gif">

**Source:** [Wolfram MathWorld](http://mathworld.wolfram.com/images/eps-gif/TrigonometryAnglesPi4_1000.gif)



In [50]:
sine_of_pi_over_four = math.sin(math.pi/4) # SOLUTION
sine_of_pi_over_four

0.7071067811865475

In [51]:
sine_of_pi_over_four

0.7071067811865475

For your reference, the cells below demonstrate some more examples of functions from the `math` module.

Notice how different functions take in different numbers of arguments. Often, the [documentation](https://docs.python.org/3/library/math.html) of the module will provide information on how many arguments are required for each function.

Run the cell below.

In [52]:
# Calculating logarithms (the logarithm of 8 in base 2)
# The result is 3 because 2 to the power of 3 is 8
math.log(8, 2)

3.0

Run the cell below.

In [53]:
# Calculating square roots
math.sqrt(5)

2.23606797749979

There are various ways to import and access code from outside sources. The method we used above — `import <module_name>` — imports the entire module and requires that we use `<module_name>.<name>` to access its code. 

We can also import a specific constant or function instead of the entire module. Notice that you don't have to use the module name beforehand to reference that particular value. However, you do have to be careful about reassigning the names of the constants or functions to other values.

Run the cell below.

In [54]:
# Importing just cos and pi from math
# We don't have to use `math.` in front of cos or pi

from math import cos, pi
print(cos(pi))

# We do have to use it in front of other functions from math, though
math.log(pi)

-1.0


1.1447298858494002

Or we can import every function and value from the entire module.

Run the cell below.

In [55]:
# Lastly, we can import everything from math using the *
# Once again, we don't have to use 'math.' beforehand 

from math import *
log(pi)

1.1447298858494002

Don't worry too much about which type of import to use. It's often a coding style choice left up to each programmer. In this course, you'll always import the necessary modules when you run the setup cell (like the first code cell in this lab).

# 4. Table Operations

The table `farmers_markets.csv` contains data on farmers' markets in the United States  (data collected [by the USDA](https://apps.ams.usda.gov/FarmersMarketsExport/ExcelExport.aspx)).  Each row represents one such market.

Run the cell below.

In [56]:
farmers_markets = Table.read_table('data/farmers_markets.csv')
farmers_markets

FMID,MarketName,street,city,County,State,zip,x,y,Website,Facebook,Twitter,Youtube,OtherMedia,Organic,Tofu,Bakedgoods,Cheese,Crafts,Flowers,Eggs,Seafood,Herbs,Vegetables,Honey,Jams,Maple,Meat,Nursery,Nuts,Plants,Poultry,Prepared,Soap,Trees,Wine,Coffee,Beans,Fruits,Grains,Juices,Mushrooms,PetFood,WildHarvested,updateTime,Location,Credit,WIC,WICcash,SFMNP,SNAP,Season1Date,Season1Time,Season2Date,Season2Time,Season3Date,Season3Time,Season4Date,Season4Time
1012063,Caledonia Farmers Market Association - Danville,,Danville,Caledonia,Vermont,5828,-72.1403,44.411,https://sites.google.com/site/caledoniafarmersmarket/,https://www.facebook.com/Danville.VT.Farmers.Market/,,,,Y,N,Y,Y,Y,Y,Y,N,Y,Y,Y,Y,Y,Y,N,N,Y,Y,Y,Y,Y,N,Y,Y,Y,N,Y,N,Y,N,6/28/2016 12:10:09 PM,,Y,Y,N,Y,N,06/08/2016 to 10/12/2016,Wed: 9:00 AM-1:00 PM;,,,,,,
1011871,Stearns Homestead Farmers' Market,6975 Ridge Road,Parma,Cuyahoga,Ohio,44130,-81.7286,41.3751,http://Stearnshomestead.com,,,,,-,N,Y,N,N,Y,Y,N,Y,Y,Y,Y,Y,Y,N,N,Y,N,N,N,N,N,N,N,Y,N,N,N,Y,N,4/9/2016 8:05:17 PM,,Y,Y,N,Y,Y,06/25/2016 to 10/01/2016,Sat: 9:00 AM-1:00 PM;,,,,,,
1011878,100 Mile Market,507 Harrison St,Kalamazoo,Kalamazoo,Michigan,49007,-85.5749,42.296,http://www.pfcmarkets.com,https://www.facebook.com/100MileMarket/?fref=ts,,,https://www.instagram.com/100milemarket/,N,N,Y,Y,N,Y,Y,N,Y,Y,Y,Y,Y,Y,N,N,N,Y,Y,Y,N,Y,N,N,Y,Y,N,N,N,N,4/16/2016 12:37:56 PM,,Y,Y,N,Y,Y,05/04/2016 to 10/12/2016,Wed: 3:00 PM-7:00 PM;,,,,,,
1009364,106 S. Main Street Farmers Market,106 S. Main Street,Six Mile,,South Carolina,29682,-82.8187,34.8042,http://thetownofsixmile.wordpress.com/,,,,,-,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,2013,,Y,N,N,N,N,,,,,,,,
1010691,10th Steet Community Farmers Market,10th Street and Poplar,Lamar,Barton,Missouri,64759,-94.2746,37.4956,,,,,http://agrimissouri.com/mo-grown/grodetail.php?type=mo-g ...,-,N,Y,N,Y,N,Y,N,Y,Y,Y,Y,N,Y,N,N,Y,Y,Y,Y,N,N,N,N,Y,N,N,N,N,N,10/28/2014 9:49:46 AM,,Y,N,N,N,N,04/02/2014 to 11/30/2014,Wed: 3:00 PM-6:00 PM;Sat: 8:00 AM-1:00 PM;,,,,,,
1002454,112st Madison Avenue,112th Madison Avenue,New York,New York,New York,10029,-73.9493,40.7939,,,,,,-,N,Y,N,Y,Y,N,N,Y,Y,Y,Y,N,N,N,Y,N,N,Y,Y,N,N,N,N,N,N,N,N,N,N,3/1/2012 10:38:22 AM,Private business parking lot,N,N,Y,Y,N,July to November,Tue:8:00 am - 5:00 pm;Sat:8:00 am - 8:00 pm;,,,,,,
1011100,12 South Farmers Market,3000 Granny White Pike,Nashville,Davidson,Tennessee,37204,-86.7907,36.1184,http://www.12southfarmersmarket.com,12_South_Farmers_Market,@12southfrmsmkt,,@12southfrmsmkt,Y,N,Y,Y,N,Y,Y,N,Y,Y,Y,Y,Y,Y,N,N,N,Y,Y,Y,N,N,Y,N,Y,N,Y,Y,Y,N,5/1/2015 10:40:56 AM,,Y,N,N,N,Y,05/05/2015 to 10/27/2015,Tue: 3:30 PM-6:30 PM;,,,,,,
1009845,125th Street Fresh Connect Farmers' Market,"163 West 125th Street and Adam Clayton Powell, Jr. Blvd.",New York,New York,New York,10027,-73.9482,40.809,http://www.125thStreetFarmersMarket.com,https://www.facebook.com/125thStreetFarmersMarket,https://twitter.com/FarmMarket125th,,Instagram--> 125thStreetFarmersMarket,Y,N,Y,Y,Y,Y,Y,N,Y,Y,Y,Y,Y,Y,N,Y,N,Y,Y,Y,N,Y,Y,N,Y,N,Y,N,N,N,4/7/2014 4:32:01 PM,Federal/State government building grounds,Y,Y,N,Y,Y,06/10/2014 to 11/25/2014,Tue: 10:00 AM-7:00 PM;,,,,,,
1005586,12th & Brandywine Urban Farm Market,12th & Brandywine Streets,Wilmington,New Castle,Delaware,19801,-75.5345,39.7421,,https://www.facebook.com/pages/12th-Brandywine-Urban-Far ...,,,https://www.facebook.com/delawareurbanfarmcoalition,N,N,N,N,N,N,N,N,Y,Y,N,N,N,N,N,N,N,N,N,N,N,N,N,N,Y,N,N,N,N,N,4/3/2014 3:43:31 PM,"On a farm from: a barn, a greenhouse, a tent, a stand, etc",N,N,N,N,Y,05/16/2014 to 10/17/2014,Fri: 8:00 AM-11:00 AM;,,,,,,
1008071,14&U Farmers' Market,1400 U Street NW,Washington,District of Columbia,District of Columbia,20009,-77.0321,38.917,,https://www.facebook.com/14UFarmersMarket,https://twitter.com/14UFarmersMkt,,,Y,N,Y,Y,N,Y,Y,N,Y,Y,Y,Y,N,Y,N,Y,Y,Y,N,N,N,N,N,Y,Y,Y,Y,N,N,N,4/5/2014 1:49:04 PM,Other,Y,Y,Y,Y,Y,05/03/2014 to 11/22/2014,Sat: 9:00 AM-1:00 PM;,,,,,,


Let's examine our table to see what data it contains.

**Question 4.** Use the method `show` to display the first 5 rows of `farmers_markets`.

**Note:** The terms "method" and "function" are technically not the same thing, but for the purposes of this course, we will use them interchangeably.

**Warning:** Make sure not to call `.show()` without an argument, as this will **crash your kernel**.


In [57]:
farmers_markets.show(5) # SOLUTION

FMID,MarketName,street,city,County,State,zip,x,y,Website,Facebook,Twitter,Youtube,OtherMedia,Organic,Tofu,Bakedgoods,Cheese,Crafts,Flowers,Eggs,Seafood,Herbs,Vegetables,Honey,Jams,Maple,Meat,Nursery,Nuts,Plants,Poultry,Prepared,Soap,Trees,Wine,Coffee,Beans,Fruits,Grains,Juices,Mushrooms,PetFood,WildHarvested,updateTime,Location,Credit,WIC,WICcash,SFMNP,SNAP,Season1Date,Season1Time,Season2Date,Season2Time,Season3Date,Season3Time,Season4Date,Season4Time
1012063,Caledonia Farmers Market Association - Danville,,Danville,Caledonia,Vermont,5828,-72.1403,44.411,https://sites.google.com/site/caledoniafarmersmarket/,https://www.facebook.com/Danville.VT.Farmers.Market/,,,,Y,N,Y,Y,Y,Y,Y,N,Y,Y,Y,Y,Y,Y,N,N,Y,Y,Y,Y,Y,N,Y,Y,Y,N,Y,N,Y,N,6/28/2016 12:10:09 PM,,Y,Y,N,Y,N,06/08/2016 to 10/12/2016,Wed: 9:00 AM-1:00 PM;,,,,,,
1011871,Stearns Homestead Farmers' Market,6975 Ridge Road,Parma,Cuyahoga,Ohio,44130,-81.7286,41.3751,http://Stearnshomestead.com,,,,,-,N,Y,N,N,Y,Y,N,Y,Y,Y,Y,Y,Y,N,N,Y,N,N,N,N,N,N,N,Y,N,N,N,Y,N,4/9/2016 8:05:17 PM,,Y,Y,N,Y,Y,06/25/2016 to 10/01/2016,Sat: 9:00 AM-1:00 PM;,,,,,,
1011878,100 Mile Market,507 Harrison St,Kalamazoo,Kalamazoo,Michigan,49007,-85.5749,42.296,http://www.pfcmarkets.com,https://www.facebook.com/100MileMarket/?fref=ts,,,https://www.instagram.com/100milemarket/,N,N,Y,Y,N,Y,Y,N,Y,Y,Y,Y,Y,Y,N,N,N,Y,Y,Y,N,Y,N,N,Y,Y,N,N,N,N,4/16/2016 12:37:56 PM,,Y,Y,N,Y,Y,05/04/2016 to 10/12/2016,Wed: 3:00 PM-7:00 PM;,,,,,,
1009364,106 S. Main Street Farmers Market,106 S. Main Street,Six Mile,,South Carolina,29682,-82.8187,34.8042,http://thetownofsixmile.wordpress.com/,,,,,-,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,2013,,Y,N,N,N,N,,,,,,,,
1010691,10th Steet Community Farmers Market,10th Street and Poplar,Lamar,Barton,Missouri,64759,-94.2746,37.4956,,,,,http://agrimissouri.com/mo-grown/grodetail.php?type=mo-g ...,-,N,Y,N,Y,N,Y,N,Y,Y,Y,Y,N,Y,N,N,Y,Y,Y,Y,N,N,N,N,Y,N,N,N,N,N,10/28/2014 9:49:46 AM,,Y,N,N,N,N,04/02/2014 to 11/30/2014,Wed: 3:00 PM-6:00 PM;Sat: 8:00 AM-1:00 PM;,,,,,,


Notice that some of the values in this table are missing, as denoted by "`nan`." This means either that the value is not available (e.g. if we don’t know the market’s street address) or not applicable (e.g. if the market doesn’t have a street address). You'll also notice that the table has a large number of columns in it.

The table property `num_columns` returns the number of columns in a table. A *"property"* is just a method that doesn't need to be called by adding parentheses.

Example call: `<tbl>.num_columns`

**Question 5.** Use `num_columns` to find the number of columns in our farmers' markets dataset.

Assign the number of columns to `num_farmers_markets_columns`.


In [58]:
num_farmers_markets_columns = farmers_markets.num_columns # SOLUTION
print('The table has', num_farmers_markets_columns, 'columns in it.')

The table has 59 columns in it.


In [59]:
num_farmers_markets_columns == farmers_markets.num_columns

True

Similarly, the property `num_rows` tells you how many rows are in a table.

**Question 6.** Assign the number of rows to `num_farmers_markets_rows`.


In [60]:
num_farmers_markets_rows = farmers_markets.num_rows # SOLUTION
print("The table has", num_farmers_markets_rows, "rows in it.")

The table has 8546 rows in it.


In [61]:
num_farmers_markets_rows == farmers_markets.num_rows

True

Most of the columns are about particular products -- whether the market sells tofu, pet food, etc.  If we're not interested in that information, it just makes the table difficult to read.  This comes up more than you might think, because people who collect and publish data may not know ahead of time what people will want to do with it.

In such situations, we can use the table method `select` to choose only the columns that we want in a particular table. It takes any number of arguments. Each should be the name of a column in the table. It returns a new table with only those columns in it. The columns are in the order in which they were listed as arguments.

For example, the value of `farmers_markets.select("MarketName", "State")` is a table with only the name and the state of each farmers' market in `farmers_markets`.



**Question 7.** Use `select` to create a table with only the name, city, state, latitude (`y`), and longitude (`x`) of each market.  Call that new table `farmers_markets_locations`.

**Hint:** Make sure to be exact when using column names with `select`; double-check capitalization.


In [62]:
farmers_markets_locations = farmers_markets.select('MarketName', 'city', 'State', 'x', 'y') # SOLUTION
farmers_markets_locations

MarketName,city,State,x,y
Caledonia Farmers Market Association - Danville,Danville,Vermont,-72.1403,44.411
Stearns Homestead Farmers' Market,Parma,Ohio,-81.7286,41.3751
100 Mile Market,Kalamazoo,Michigan,-85.5749,42.296
106 S. Main Street Farmers Market,Six Mile,South Carolina,-82.8187,34.8042
10th Steet Community Farmers Market,Lamar,Missouri,-94.2746,37.4956
112st Madison Avenue,New York,New York,-73.9493,40.7939
12 South Farmers Market,Nashville,Tennessee,-86.7907,36.1184
125th Street Fresh Connect Farmers' Market,New York,New York,-73.9482,40.809
12th & Brandywine Urban Farm Market,Wilmington,Delaware,-75.5345,39.7421
14&U Farmers' Market,Washington,District of Columbia,-77.0321,38.917


In [63]:
farmers_markets_locations.num_rows

8546

In [64]:
farmers_markets_locations.num_columns

5

In [65]:
sorted(farmers_markets_locations.labels) == ['MarketName', 'State', 'city', 'x', 'y']

True

`drop` serves the same purpose as `select`, but it takes away the columns that you provide rather than the ones that you don't provide. Like `select`, `drop` returns a new table.

**Question 8.** Suppose you just didn't want the `FMID` or `updateTime` columns in `farmers_markets`.  Create a table that's a copy of `farmers_markets` but doesn't include those columns.  Call that table `farmers_markets_without_fmid`.


In [66]:
farmers_markets_without_fmid = farmers_markets.drop('FMID', 'updateTime') # SOLUTION

In [67]:
farmers_markets_without_fmid.num_columns

57

In [68]:
sorted(farmers_markets_without_fmid.labels) == ['Bakedgoods', 'Beans', 'Cheese', 'Coffee', 'County', 'Crafts', 'Credit', 'Eggs', 'Facebook', 'Flowers', 'Fruits', 'Grains', 'Herbs', 'Honey', 'Jams', 'Juices', 'Location', 'Maple', 'MarketName', 'Meat', 'Mushrooms', 'Nursery', 'Nuts', 'Organic', 'OtherMedia', 'PetFood', 'Plants', 'Poultry', 'Prepared', 'SFMNP', 'SNAP', 'Seafood', 'Season1Date', 'Season1Time', 'Season2Date', 'Season2Time', 'Season3Date', 'Season3Time', 'Season4Date', 'Season4Time', 'Soap', 'State', 'Tofu', 'Trees', 'Twitter', 'Vegetables', 'WIC', 'WICcash', 'Website', 'WildHarvested', 'Wine', 'Youtube', 'city', 'street', 'x', 'y', 'zip']

True

Now, suppose we want to answer some questions about farmers' markets in the US. For example, which market(s) have the largest longitude (given by the `x` column)? 

To answer this, we'll sort `farmers_markets_locations` by longitude.

Run the cell below.

In [69]:
farmers_markets_locations.sort('x')

MarketName,city,State,x,y
Trapper Creek Farmers Market,Trapper Creek,Alaska,-166.54,53.8748
Kekaha Neighborhood Center (Sunshine Markets),Kekaha,Hawaii,-159.718,21.9704
Hanapepe Park (Sunshine Markets),Hanapepe,Hawaii,-159.588,21.9101
Kalaheo Neighborhood Center (Sunshine Markets),Kalaheo,Hawaii,-159.527,21.9251
Hawaiian Farmers of Hanalei,Hanalei,Hawaii,-159.514,22.2033
Hanalei Saturday Farmers Market,Hanalei,Hawaii,-159.492,22.2042
Kauai Culinary Market,Koloa,Hawaii,-159.469,21.9067
Koloa Ball Park (Knudsen) (Sunshine Markets),Koloa,Hawaii,-159.465,21.9081
West Kauai Agricultural Association,Poipu,Hawaii,-159.435,21.8815
Kilauea Neighborhood Center (Sunshine Markets),Kilauea,Hawaii,-159.406,22.2112


That didn't answer our question because we sorted from smallest to largest longitude. To look at the largest longitudes, we'll have to sort in reverse order.

**Question 9.** Sort the `farmers_markets_locations` table from largest to smallest longitude. Call it `farmers_markets_locations_by_longitude`. Click [here](http://data8.org/datascience/_autosummary/datascience.tables.Table.sort.html?highlight=sort#datascience.tables.Table.sort) to read the documentation on how to use the `sort` method.


In [70]:
farmers_markets_locations_by_longitude = farmers_markets_locations.sort('x', descending = True) # SOLUTION
farmers_markets_locations_by_longitude

MarketName,city,State,x,y
"Christian ""Shan"" Hendricks Vegetable Market",Saint Croix,Virgin Islands,-64.7043,17.7449
La Reine Farmers Market,Saint Croix,Virgin Islands,-64.7789,17.7322
Anne Heyliger Vegetable Market,Saint Croix,Virgin Islands,-64.8799,17.7099
Rothschild Francis Vegetable Market,St. Thomas,Virgin Islands,-64.9326,18.3428
Feria Agrícola de Luquillo,Luquillo,Puerto Rico,-65.7207,18.3782
El Mercado Familiar,San Lorenzo,Puerto Rico,-65.9674,18.1871
El Mercado Familiar,Gurabo,Puerto Rico,-65.9786,18.2526
El Mercado Familiar,Patillas,Puerto Rico,-66.0135,18.0069
El Mercado Familiar,Caguas zona urbana,Puerto Rico,-66.039,18.2324
El Maercado Familiar,Arroyo zona urbana,Puerto Rico,-66.0617,17.9686


In [71]:
isinstance(farmers_markets_locations_by_longitude, tables.Table)

True

In [72]:
list(farmers_markets_locations_by_longitude.column('x').take(range(3))) == [-64.7043, -64.7789, -64.8799]

True

Some details about sort

1. The first argument to `sort` is the name of a column to sort by.

2. If the column has text in it, `sort` will sort alphabetically; if the column has numbers, it will sort numerically.

3. The value of `farmers_markets_locations.sort("x")` is a *copy* of `farmers_markets_locations`; the `farmers_markets_locations` table doesn't get modified. For example, if we called `farmers_markets_locations.sort("x")`, then running `farmers_markets_locations` by itself would still return the unsorted table.

4. Rows always stick together when a table is sorted.  It wouldn't make sense to sort just one column and leave the other columns alone.  For example, in this case, if we sorted just the `x` column, the farmers' markets would all end up with the wrong longitudes.

**Question 10.** Create a version of `farmers_markets_locations` that's sorted by **latitude (`y`)**, with the largest latitudes first.  Call it `farmers_markets_locations_by_latitude`.


In [73]:
farmers_markets_locations_by_latitude = farmers_markets_locations.sort('y', descending = True) # SOLUTION
farmers_markets_locations_by_latitude

MarketName,city,State,x,y
Tanana Valley Farmers Market,Fairbanks,Alaska,-147.781,64.8628
Ester Community Market,Ester,Alaska,-148.01,64.8459
Fairbanks Downtown Market,Fairbanks,Alaska,-147.72,64.8444
Nenana Open Air Market,Nenana,Alaska,-149.096,64.5566
Highway's End Farmers' Market,Delta Junction,Alaska,-145.733,64.0385
MountainTraders,Talkeetna,Alaska,-150.118,62.3231
Talkeetna Farmers Market,Talkeetna,Alaska,-150.118,62.3228
Denali Farmers Market,Anchorage,Alaska,-150.234,62.3163
Kenny Lake Harvest II,Valdez,Alaska,-145.476,62.1079
Copper Valley Community Market,Copper Valley,Alaska,-145.444,62.0879


In [74]:
farmers_markets_locations_by_latitude.first('y') == 64.86275

True

Now let's say we want a table of all farmers' markets in North Carolina. Sorting won't help us much here because North Carolina is closer to the middle of the dataset.

Instead, we use the table method `where`. Look at the [documentation](http://data8.org/datascience/_autosummary/datascience.tables.Table.where.html?highlight=where) to see how to use `where`.

Run the following cell:

In [75]:
nc_farmers_markets = farmers_markets_locations.where('State', are.equal_to('North Carolina'))
nc_farmers_markets

MarketName,city,State,x,y
Afton Village Farmers Market,Concord,North Carolina,-80.6702,35.414
Alamance County Farmers Market,Burlington,North Carolina,-79.4357,36.0943
Alexander County Farmers Market,Taylorsville,North Carolina,-81.1781,35.9197
Alleghany Farmers Market,Sparta,North Carolina,-81.1226,36.503
Andrews Farmers Market,Andrews,North Carolina,-83.8232,35.2027
Anson County Farmers Market,Wadesboro,North Carolina,-80.0526,34.9408
Apex Farmers Market,Apex,North Carolina,-78.8499,35.732
Ashboro Downtown Farmers Market,Asheboro,North Carolina,-79.8175,35.7049
Ashe County Farmers Market,West Jefferson,North Carolina,-81.4935,36.4025
Asheville City Market,Asheville,North Carolina,-82.5489,35.5935


Ignore the syntax for the moment.  Instead, try to read that line like this:

> Assign the name **`nc_farmers_markets`** to a table whose rows are the rows in the **`farmers_markets_locations`** table **`where`** the `'State'`s **`are` `equal` `to` `North Carolina`**.

Now let's dive into the details a bit more.

`where` takes 2 arguments.

1. The name of a column.  `where` finds rows where that column's values meet some criterion.

2. A predicate that describes the criterion that the column needs to meet.

The predicate in the example above called the function `are.equal_to` with the value we wanted, 'North Carolina'.  We'll see other predicates soon.

`where` returns a table that's a copy of the original table, but **with only the rows that meet the given predicate**.

**Question 11.** Use `nc_farmers_markets` to create a table called `durham_farmers_markets` containing farmers' markets in Durham, North Caorlina.


In [76]:
durham_farmers_markets = nc_farmers_markets.where('city', are.equal_to('Durham')) # SOLUTION
durham_farmers_markets

MarketName,city,State,x,y
Duke Farmers Market,Durham,North Carolina,-78.9392,36.0054
Duke Mobile Farmers Market,Durham,North Carolina,-78.9321,35.9947
Durham Farmers' Market,Durham,North Carolina,-78.9019,36.0006
Durham Roots Farmers Market,Durham,North Carolina,-78.9084,36.0185
South Durham Farmers' Market,Durham,North Carolina,-78.8966,35.8893


In [77]:
durham_farmers_markets.num_rows

5

In [78]:
all('Durham' == durham_farmers_markets['city'])

True

So far we've only been using `where` with the predicate that requires finding the values in a column to be *exactly* equal to a certain value. However, there are many other predicates. Here are a few:

|Predicate|Example|Result|
|-|-|-|
|`are.equal_to`|`are.equal_to(50)`|Find rows with values equal to 50|
|`are.not_equal_to`|`are.not_equal_to(50)`|Find rows with values not equal to 50|
|`are.above`|`are.above(50)`|Find rows with values above (and not equal to) 50|
|`are.above_or_equal_to`|`are.above_or_equal_to(50)`|Find rows with values above 50 or equal to 50|
|`are.below`|`are.below(50)`|Find rows with values below 50|
|`are.between`|`are.between(2, 10)`|Find rows with values above or equal to 2 and below 10|

# 5. Analyzing a dataset
Now that you're familiar with table operations, let’s answer some interesting questions about a dataset. Run the cell below to load the `imdb` table. It contains information about the 250 highest-rated movies on IMDb.

In [79]:
imdb = Table.read_table('data/imdb.csv')
imdb

Title,Year,Rating,Votes,Decade
Avengers: Endgame,2019,8.7,394632,2010
Spider-Man: Into the Spider-Verse,2018,8.4,199435,2010
Avengers: Infinity War,2018,8.4,657004,2010
Green Book,2018,8.2,193829,2010
Andhadhun,2018,8.1,41901,2010
Coco,2017,8.3,272499,2010
"Three Billboards Outside Ebbing, Missouri",2017,8.1,339946,2010
Logan,2017,8.1,553983,2010
Your Name.,2016,8.3,131814,2010
Dangal,2016,8.3,121027,2010


Often, we want to perform multiple operations - sorting, filtering, or others - in order to turn a table we have into something more useful. You can do these operations one by one, e.g.

```
first_step = original_tbl.where(“col1”, are.equal_to(12))
second_step = first_step.sort(‘col2’, descending=True)
```

However, since the value of the expression `original_tbl.where(“col1”, are.equal_to(12))` is itself a table, you can just call a table method on it:

```
original_tbl.where(“col1”, are.equal_to(12)).sort(‘col2’, descending=True)
```
You should organize your work in the way that makes the most sense to you, using informative names for any intermediate tables you create.

**Question 12.** Create a table of movies released between 2010 and 2016 (inclusive) with ratings above 8. The table should only contain the columns `Title` and `Rating`, **in that order**

Assign the table to the name `above_eight`.

**Hint:** Think about the steps you need to take, and try to put them in an order that make sense. Feel free to create intermediate tables for each step, but please make sure you assign your final table the name `above_eight`.


In [80]:
above_eight = imdb.where('Year', are.between(2010, 2017)).where('Rating', are.above(8)).select('Title', 'Rating') # SOLUTION
above_eight

Title,Rating
Your Name.,8.3
Dangal,8.3
Hacksaw Ridge,8.1
Inside Out,8.1
Room,8.1
Mad Max: Fury Road,8.1
Spotlight,8.1
Interstellar,8.5
Whiplash,8.5
Gone Girl,8.1


In [81]:
above_eight.num_rows

28

In [82]:
# Make sure you're columns are in the correct order!
above_eight.sort(0).take([17])[0][0] == 'Shutter Island'

True

**Question 13.** Use `num_rows` (and arithmetic) to find the *proportion* of movies in the dataset that were released 1900-1999, and the *proportion* of movies in the dataset that were released in the year 2000 or later.

Assign `proportion_in_20th_century` to the proportion of movies in the dataset that were released 1900-1999, and `proportion_in_21st_century` to the proportion of movies in the dataset that were released in the year 2000 or later.

**Hint:** The *proportion* of movies released in the 1900's is the *number* of movies released in the 1900's, divided by the *total number* of movies.


In [83]:
proportion_in_20th_century = imdb.where('Year', are.between(1900,2000)).num_rows/imdb.num_rows # SOLUTION
proportion_in_21st_century = imdb.where('Year', are.above(1999)).num_rows/imdb.num_rows # SOLUTION
print('20th Century: ', proportion_in_20th_century)
print('21st Century: ', proportion_in_21st_century)

20th Century:  0.64
21st Century:  0.36


In [84]:
proportion_in_20th_century

0.64

In [85]:
proportion_in_21st_century

0.36

# 6. Submitting your work
You're done with Lab 02! All assignments in the course will be distributed as notebooks like this one, and you will submit your work by doing the following:

* Save your notebook

* Restart the kernel and run up to this cell.

* Run all the tests by running the cell containing `grader.check_all()`. Make sure they pass the way you expect them to.

* Run the cell below with the code `grader.export("lab02.ipynb")`.

* Download the file named `lab02.zip`, found in the explorer pane on the left side of the screen.

* Upload `lab02.zip` to the Lab 01 assignment to Gradescope for Grading.