In [1]:
from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

# Tables Review
We will start with reviewing tabless.

## Tables Structure
A Table is a sequence of labeled columns.
* The column labels are in `string`
* A column consist of an array of data. In a table, all columns need to have the same length.

<img src = 'table_example.jpg' width = 500\>

## Table Methods
1. For creating tables
    * Use `Table().with_columns` for creating a table from scracth
    * Use `Table.read_table` for creating a table out of a source file (e.g. csv file)
2. Finding size:
    * `num_rows` for number of rows
    * `num_columns` for number of columns
3. Referring to columns
    * `labels` are the column's name
    * Use `relabeled` for changing a column's label
    * Column indices starts at 0
4. Accessing data in column
    * `column` method takes a column label or index and returns an `array`
5. We can use array methods to work with data in columns since the data is an `array`
    * e.g. use `item`, `sum`, `min`, `max`, etc.
6. Create a new table containing columns from another table:
    * Use `select` to select columns from a table
    * Use `drop` to remove columns from a table

## Quick Check
A table called `students` has columns `Name`, `ID`, and `Score`. Write one line of code that evaluates to
1. A table consisting of only the column labeled `Name`
2. The largest `Score`.

In [2]:
students = Table().with_columns(
    'Name', make_array('John', 'Jane'),
    'ID', make_array(56452, 21245),
    'Score', make_array(72, 84)
)

For the first question, below are some ways of doing it

In [3]:
students.select('Name')

Name
John
Jane


In [4]:
students.drop('ID', 'Score')

Name
John
Jane


In [5]:
students.select(0)

Name
John
Jane


For the second question, below are some ways of doing it,

In [6]:
max(students.column('Score'))

84

In [7]:
students.column('Score').max()

84

# Sort
Now we will discuss about a different perspective in seeing a table. 

## Sorting Tables
Previously, we see Tables as sequence of columns. Now, we see Tables as ordered collections of rows.

1. The `sort` method creates a new table with the same rows but with different order (again, the original table is unchanged. We need to reassign the original table to change it)
2. The `show` method display a certain number of rows of a table.

## Demo
Below we have the `nba` table that contains NBA players, their positions, teams and salaries on 2015-2016 in million US dollars.

In [8]:
nba = Table.read_table('nba_salaries.csv')
nba

PLAYER,POSITION,TEAM,2015-2016 SALARY
Paul Millsap,PF,Atlanta Hawks,18.6717
Al Horford,C,Atlanta Hawks,12.0
Tiago Splitter,C,Atlanta Hawks,9.75625
Jeff Teague,PG,Atlanta Hawks,8.0
Kyle Korver,SG,Atlanta Hawks,5.74648
Thabo Sefolosha,SF,Atlanta Hawks,4.0
Mike Scott,PF,Atlanta Hawks,3.33333
Kent Bazemore,SF,Atlanta Hawks,2.0
Dennis Schroder,PG,Atlanta Hawks,1.7634
Tim Hardaway Jr.,SG,Atlanta Hawks,1.30452


Above is a table with a total of 417 rows! Since this is a long table, we can use the `show` method to show only 3 rows of the table.

In [9]:
nba.show(3)

PLAYER,POSITION,TEAM,2015-2016 SALARY
Paul Millsap,PF,Atlanta Hawks,18.6717
Al Horford,C,Atlanta Hawks,12.0
Tiago Splitter,C,Atlanta Hawks,9.75625


We can compute the total amount of money paid to the NBA players using the `sum` method.

In [10]:
sum(nba.column('2015-2016 SALARY'))

2116.1976390000013

We can also find the maximum salary an NBA player would receive using the `max` method.

In [11]:
max(nba.column('2015-2016 SALARY'))

25.0

It seems that there is one lucky player who got paid 25 million dollars! If we want to find who that lucky player is, one way of doing that is to `sort` the table according to salaries.

In [12]:
nba.sort("2015-2016 SALARY").show(5)

PLAYER,POSITION,TEAM,2015-2016 SALARY
Thanasis Antetokounmpo,SF,New York Knicks,0.030888
Jordan McRae,SG,Phoenix Suns,0.049709
Cory Jefferson,PF,Phoenix Suns,0.049709
Elliot Williams,SG,Memphis Grizzlies,0.055722
Orlando Johnson,SG,Phoenix Suns,0.055722


However, turns out that the `sort` method sorts the table in increasing order! By default, `sort` method sorts the table in increasing order. If we want to sort the table in decreasing order, we can do so using the `sort` method but with an additional argument `descending = True`.

In [13]:
nba.sort("2015-2016 SALARY", descending=True).show(5)

PLAYER,POSITION,TEAM,2015-2016 SALARY
Kobe Bryant,SF,Los Angeles Lakers,25.0
Joe Johnson,SF,Brooklyn Nets,24.8949
LeBron James,SF,Cleveland Cavaliers,22.9705
Carmelo Anthony,SF,New York Knicks,22.875
Dwight Howard,C,Houston Rockets,22.3594


Seems like typing `"2015-2016 SALARY"` over and over is a hassle! We can put this string into a variable and use it to make our job easier!

In [14]:
salary = "2015-2016 SALARY"
nba.sort(salary, descending=True).show(5)

PLAYER,POSITION,TEAM,2015-2016 SALARY
Kobe Bryant,SF,Los Angeles Lakers,25.0
Joe Johnson,SF,Brooklyn Nets,24.8949
LeBron James,SF,Cleveland Cavaliers,22.9705
Carmelo Anthony,SF,New York Knicks,22.875
Dwight Howard,C,Houston Rockets,22.3594


The argument `descending = True` is called a **named argument**. We don't actually need to write `descending` and just use the `True` as the following,

In [15]:
nba.sort(salary,True).show(5)

PLAYER,POSITION,TEAM,2015-2016 SALARY
Kobe Bryant,SF,Los Angeles Lakers,25.0
Joe Johnson,SF,Brooklyn Nets,24.8949
LeBron James,SF,Cleveland Cavaliers,22.9705
Carmelo Anthony,SF,New York Knicks,22.875
Dwight Howard,C,Houston Rockets,22.3594


However, it is a good practice to write the named argument so that whoever reads the code understands what the arguments are for. Recall that if we forgot what `sort` does, we can use the `?` question mark.

In [16]:
nba.sort?

If we want to find the Center that gets paid the most, we can do the following,

In [17]:
nba.sort(salary, descending=True).sort(1)

PLAYER,POSITION,TEAM,2015-2016 SALARY
Dwight Howard,C,Houston Rockets,22.3594
Marc Gasol,C,Memphis Grizzlies,19.688
Enes Kanter,C,Oklahoma City Thunder,16.4075
DeMarcus Cousins,C,Sacramento Kings,15.852
Roy Hibbert,C,Los Angeles Lakers,15.5922
Tristan Thompson,C,Cleveland Cavaliers,14.2609
Andrew Bogut,C,Golden State Warriors,13.8
Al Jefferson,C,Charlotte Hornets,13.5
Joakim Noah,C,Chicago Bulls,13.4
Nene Hilario,C,Washington Wizards,13.0


Above, 
1. First we sort the table based on the salary in descending order
2. Then we sort the table based on `POSITION`. Since the data in `POSITION` are in `string`, it sorts the table in alphabetical order. Between C, SF, PG, PF and SG, C comes first.

What if we want to find the most paid player for each position?

In [18]:
nba.sort(salary, descending=True).sort('POSITION', distinct=True)

PLAYER,POSITION,TEAM,2015-2016 SALARY
Dwight Howard,C,Houston Rockets,22.3594
Chris Bosh,PF,Miami Heat,22.1927
Chris Paul,PG,Los Angeles Clippers,21.4687
Kobe Bryant,SF,Los Angeles Lakers,25.0
Dwyane Wade,SG,Miami Heat,20.0


The `distinct` argument in `sort` method means of all the available values within the column, take the first unique value.

What's important here is not the syntax or what the arguments can do. Instead, it is more about what we can do and why it's useful. 

For example: Why we want to sort the table? To find who makes the most money and who makes the least money.

We can always look up how methods work and what they do, but we won't be able to look up, "how to find the player who makes the most salary". Knowing what we can do will allow us to look up what we need to solve a problem.

#### Code is too long
If our code is too long, we can encapsulate the code in a parentheses so that Python will treat it as one entity. Once we encapsulate the code, we can break it down to separate lines. Below is an example,

In [19]:
(nba.sort(salary, descending=True)
 .select("PLAYER")
 .sort("PLAYER", descending=True))

PLAYER
Zoran Dragic
Zaza Pachulia
Zach Randolph
Zach LaVine
Wilson Chandler
Willie Cauley-Stein
Will Barton
Wesley Johnson
Wayne Ellington
Walter Tavares


# Lists
As previously mentioned, arrays can only hold one type of data (e.g. strings only, integers only). `Lists`, on the other hand, do not have this limitation. A `list` can contain anything.

## Lists are Generic Sequences
A list is a sequence of values (similar to array), but the values can all have different types. Below is an example of a `list`,

In [20]:
[2 + 3, 'four', Table().with_column('K', [3, 4])]

[5, 'four', K
 3
 4]

If we try to make an array consisting of integers and floats, all the elements will be converted to floats.

In [21]:
make_array(2, 5, 6.4, 6.2)

array([2. , 5. , 6.4, 6.2])

And if we try to make an array consisting of anything and strings, all the elements will be converted to strings.

In [22]:
make_array(3, 7.2, 'three', True)

array(['3', '7.2', 'three', 'True'], dtype='<U32')

We can always double check the type of data we are working with.

In [23]:
type([2, 'three'])

list

#### Adding a row to a table
We can use `list` to add a row into a table.

In [24]:
row = ['Sam Lau', 'PG', 'Berkeley Data Scientists', 0.0]
nba.with_row(row).sort(salary) #Again, this does not change the table!

PLAYER,POSITION,TEAM,2015-2016 SALARY
Sam Lau,PG,Berkeley Data Scientists,0.0
Thanasis Antetokounmpo,SF,New York Knicks,0.030888
Jordan McRae,SG,Phoenix Suns,0.049709
Cory Jefferson,PF,Phoenix Suns,0.049709
Elliot Williams,SG,Memphis Grizzlies,0.055722
Orlando Johnson,SG,Phoenix Suns,0.055722
Phil Pressey,PG,Phoenix Suns,0.055722
Keith Appling,PG,Orlando Magic,0.061776
Sean Kilpatrick,SG,Denver Nuggets,0.099418
Erick Green,PG,Utah Jazz,0.099418


If we create a table column from a list, it will be automatically converted to an array.

In [25]:
Table().with_column('List', row)

List
Sam Lau
PG
Berkeley Data Scientists
0.0


# Take
This section will cover about taking `rows` from a table.

## Take Rows, Select Columns
Recall the `select` method returns a table with only some columns. Similar to that, the `take` method returns a table with only some rows.

Rows have indices, starting at 0. Taking a single number returns a single row,

In [26]:
# Here we will display the nba table again
nba.show(3)

PLAYER,POSITION,TEAM,2015-2016 SALARY
Paul Millsap,PF,Atlanta Hawks,18.6717
Al Horford,C,Atlanta Hawks,12.0
Tiago Splitter,C,Atlanta Hawks,9.75625


In [27]:
# We will take the first row within the nba table
nba.take(0)

PLAYER,POSITION,TEAM,2015-2016 SALARY
Paul Millsap,PF,Atlanta Hawks,18.6717


Taking a list of numbers returns a table,

In [28]:
nba.take(make_array(0, 1, 2))

PLAYER,POSITION,TEAM,2015-2016 SALARY
Paul Millsap,PF,Atlanta Hawks,18.6717
Al Horford,C,Atlanta Hawks,12.0
Tiago Splitter,C,Atlanta Hawks,9.75625


In [30]:
# We can do the same as above using np.arange
nba.take(np.arange(3))

PLAYER,POSITION,TEAM,2015-2016 SALARY
Paul Millsap,PF,Atlanta Hawks,18.6717
Al Horford,C,Atlanta Hawks,12.0
Tiago Splitter,C,Atlanta Hawks,9.75625


Taking rows using `np.arange` helps intuitively. We can think of the above as "take the first 3 rows". 

Below is an example of taking the 3 most paid NBA players,

In [31]:
rich = nba.sort(salary, descending=True).take(np.arange(3))
rich

PLAYER,POSITION,TEAM,2015-2016 SALARY
Kobe Bryant,SF,Los Angeles Lakers,25.0
Joe Johnson,SF,Brooklyn Nets,24.8949
LeBron James,SF,Cleveland Cavaliers,22.9705


While below is an example of taking the 10th up to 20th most paid NBA players,

In [32]:
nba.sort(salary, descending=True).take(np.arange(10, 20))

PLAYER,POSITION,TEAM,2015-2016 SALARY
LaMarcus Aldridge,PF,San Antonio Spurs,19.689
Kevin Love,PF,Cleveland Cavaliers,19.689
Marc Gasol,C,Memphis Grizzlies,19.688
Blake Griffin,PF,Los Angeles Clippers,18.9077
Paul Millsap,PF,Atlanta Hawks,18.6717
Paul George,SF,Indiana Pacers,17.1201
Russell Westbrook,PG,Oklahoma City Thunder,16.7442
Kyrie Irving,PG,Cleveland Cavaliers,16.4075
Kawhi Leonard,SF,San Antonio Spurs,16.4075
Enes Kanter,C,Oklahoma City Thunder,16.4075


# Where
`where` is another method that is used to take rows from a table. 

## The `Where` Method
The `where` method takes in a column and a condition, and returns a new table containing the rows that satisfy the condition. Below is an example of taking all the NBA players that were paid more than 10 million dollars.

In [33]:
nba.where(salary, are.above(10))

PLAYER,POSITION,TEAM,2015-2016 SALARY
Paul Millsap,PF,Atlanta Hawks,18.6717
Al Horford,C,Atlanta Hawks,12.0
Joe Johnson,SF,Brooklyn Nets,24.8949
Thaddeus Young,PF,Brooklyn Nets,11.236
Al Jefferson,C,Charlotte Hornets,13.5
Nicolas Batum,SG,Charlotte Hornets,13.1253
Kemba Walker,PG,Charlotte Hornets,12.0
Derrick Rose,PG,Chicago Bulls,20.0931
Jimmy Butler,SG,Chicago Bulls,16.4075
Joakim Noah,C,Chicago Bulls,13.4


Above, the first argument is the column to fulfill the condition, while the second argument is the condition itself (e.g. above 10). `are` is a package within `datascience`, while `above` is a method within the package `are`.

Below is the same thing as above, sorted in increasing order.

In [34]:
nba.where(salary, are.above(10)).sort(salary)

PLAYER,POSITION,TEAM,2015-2016 SALARY
DeMar DeRozan,SG,Toronto Raptors,10.05
Gerald Wallace,SF,Philadelphia 76ers,10.1059
Luol Deng,SF,Miami Heat,10.1516
Monta Ellis,SG,Indiana Pacers,10.3
Wilson Chandler,SF,Denver Nuggets,10.4494
Brendan Haywood,C,Cleveland Cavaliers,10.5225
Jrue Holiday,PG,New Orleans Pelicans,10.5955
Tyreke Evans,SG,New Orleans Pelicans,10.7346
Marcin Gortat,C,Washington Wizards,11.2174
Thaddeus Young,PF,Brooklyn Nets,11.236


Below is an example of taking the NBA players that are in the **Golden State Warriors** team.

In [35]:
nba.where("TEAM", are.equal_to('Golden State Warriors'))

PLAYER,POSITION,TEAM,2015-2016 SALARY
Klay Thompson,SG,Golden State Warriors,15.501
Draymond Green,PF,Golden State Warriors,14.2609
Andrew Bogut,C,Golden State Warriors,13.8
Andre Iguodala,SF,Golden State Warriors,11.7105
Stephen Curry,PG,Golden State Warriors,11.3708
Jason Thompson,PF,Golden State Warriors,7.00847
Shaun Livingston,PG,Golden State Warriors,5.54373
Harrison Barnes,SF,Golden State Warriors,3.8734
Marreese Speights,C,Golden State Warriors,3.815
Leandro Barbosa,SG,Golden State Warriors,2.5


While below is an example of taking the NBA players where the `TEAM` column contains the string "Warriors"

In [36]:
nba.where("TEAM", are.containing('Warriors'))

PLAYER,POSITION,TEAM,2015-2016 SALARY
Klay Thompson,SG,Golden State Warriors,15.501
Draymond Green,PF,Golden State Warriors,14.2609
Andrew Bogut,C,Golden State Warriors,13.8
Andre Iguodala,SF,Golden State Warriors,11.7105
Stephen Curry,PG,Golden State Warriors,11.3708
Jason Thompson,PF,Golden State Warriors,7.00847
Shaun Livingston,PG,Golden State Warriors,5.54373
Harrison Barnes,SF,Golden State Warriors,3.8734
Marreese Speights,C,Golden State Warriors,3.815
Leandro Barbosa,SG,Golden State Warriors,2.5


If we want to know what other methods are within the `are` package, recall we can press `tab` key after the `.` dot,

In [None]:
# Press "tab" after the dot!
nba.where("TEAM", are.

Below is an example of taking the NBA players where the salary is between 11 to (but not including) 12 million dollars,

In [37]:
nba.where(salary, are.between(11, 12))

PLAYER,POSITION,TEAM,2015-2016 SALARY
Thaddeus Young,PF,Brooklyn Nets,11.236
Kenneth Faried,PF,Denver Nuggets,11.236
Andre Iguodala,SF,Golden State Warriors,11.7105
Stephen Curry,PG,Golden State Warriors,11.3708
Nikola Vucevic,C,Orlando Magic,11.25
Marcin Gortat,C,Washington Wizards,11.2174


While below is an example of taking the NBA players whose name contain the string "Curry",

In [38]:
nba.where('PLAYER', are.containing('Curry'))

PLAYER,POSITION,TEAM,2015-2016 SALARY
Stephen Curry,PG,Golden State Warriors,11.3708


## Recap - Manipulating Rows
1. `t.sort(column)` sorts the rows in increasing order
2. `t.take(row_numbers)` keeps the numbered rows
    * Each row has an index starting at 0
3. `t.where(column, are.condition)` keeps all rows for which a column's value satisfies a condition
4. `t.where(column, value)` keeps all rows containing a certain value in a column

In [41]:
nba.where(salary, 11.25)

PLAYER,POSITION,TEAM,2015-2016 SALARY
Nikola Vucevic,C,Orlando Magic,11.25


In [43]:
nba.where('POSITION', 'PG')

PLAYER,POSITION,TEAM,2015-2016 SALARY
Jeff Teague,PG,Atlanta Hawks,8.0
Dennis Schroder,PG,Atlanta Hawks,1.7634
Avery Bradley,PG,Boston Celtics,7.73034
Isaiah Thomas,PG,Boston Celtics,6.91287
Marcus Smart,PG,Boston Celtics,3.43104
Terry Rozier,PG,Boston Celtics,1.82436
Jarrett Jack,PG,Brooklyn Nets,6.3
Shane Larkin,PG,Brooklyn Nets,1.5
Kemba Walker,PG,Charlotte Hornets,12.0
Brian Roberts,PG,Charlotte Hornets,2.85494


Note that the method above doesn't work like `are.containing`. It works like `are.equal_to`.

In [44]:
nba.where('PLAYER', 'Atlanta')

PLAYER,POSITION,TEAM,2015-2016 SALARY
