# Tables Review
* Create and extend tables
    * **Table().with_columns** and **Table.read_table**
* Finding size of tables
    * **num_rows** and **num_columns**
* Referring to columns - labels, relabeling, and indices
    * **labels** and **relabeled**
    * Column indices start at 0
* Accessing data in a column
    * **Column** takes a label or index and returns an array
* Using array methods to work with data in columns
    * **item, sum, min, max**, etc.
* Create new tables containing some of the original columns
    * **select, drop**

## Review Quiz

The table **students** has columns **Name**, **ID**, and **Score**. Write one line of code that evaluates to:
* A table consisting of only the column labeled **Name**
* The largest score

In [1]:
from datascience import *
students = Table().with_columns(
    'Name', make_array('John', 'Amy'), 
    'ID', make_array(24663, 57447),
    'Score', make_array(65, 94))
students

Name,ID,Score
John,24663,65
Amy,57447,94


In [2]:
# A table consisting of only the column labeled **Name**
students.select('Name')

Name
John
Amy


In [3]:
# Other way of choosing the column labeled **Name**
students.select(0)

Name
John
Amy


In [4]:
#Choosing the largest score
max(students.column('Score'))

94

# Sorting Tables
* Tables are also ordered collection of rows
* The **sort** method creates a new table with the same rows in a different order (the original table is unaffected)
* The **show** method displays the first rows of a table

Here we're going to read and create a table from the data of salaries of all NBA players in 2015-2016. 
The code for the player position is as the following:
* PG = Point Guard
* SG = Shooting Guard
* PF = Power Forward
* SF = Small Forward
* C = Center

In [5]:
nba_salaries = Table.read_table('nba_salaries.csv')
nba_salaries

PLAYER,POSITION,TEAM,2015-2016 SALARY
Paul Millsap,PF,Atlanta Hawks,18.6717
Al Horford,C,Atlanta Hawks,12.0
Tiago Splitter,C,Atlanta Hawks,9.75625
Jeff Teague,PG,Atlanta Hawks,8.0
Kyle Korver,SG,Atlanta Hawks,5.74648
Thabo Sefolosha,SF,Atlanta Hawks,4.0
Mike Scott,PF,Atlanta Hawks,3.33333
Kent Bazemore,SF,Atlanta Hawks,2.0
Dennis Schroder,PG,Atlanta Hawks,1.7634
Tim Hardaway Jr.,SG,Atlanta Hawks,1.30452


We can use **show** method to determine how many rows we want to show in a table (without any argument, the default value is 10).

In [6]:
nba_salaries.show(3)

PLAYER,POSITION,TEAM,2015-2016 SALARY
Paul Millsap,PF,Atlanta Hawks,18.6717
Al Horford,C,Atlanta Hawks,12.0
Tiago Splitter,C,Atlanta Hawks,9.75625


Below we sort the table based on salary using **sort** method

In [7]:
nba_salaries.sort('2015-2016 SALARY').show(5)

PLAYER,POSITION,TEAM,2015-2016 SALARY
Thanasis Antetokounmpo,SF,New York Knicks,0.030888
Jordan McRae,SG,Phoenix Suns,0.049709
Cory Jefferson,PF,Phoenix Suns,0.049709
Elliot Williams,SG,Memphis Grizzlies,0.055722
Orlando Johnson,SG,Phoenix Suns,0.055722


Notice above that the table is sorted in ascending order! If we want to sort the table in descending order, we need to add an additional argument to the **sort** method.

In [8]:
nba_salaries.sort('2015-2016 SALARY', descending = True).show(5)

PLAYER,POSITION,TEAM,2015-2016 SALARY
Kobe Bryant,SF,Los Angeles Lakers,25.0
Joe Johnson,SF,Brooklyn Nets,24.8949
LeBron James,SF,Cleveland Cavaliers,22.9705
Carmelo Anthony,SF,New York Knicks,22.875
Dwight Howard,C,Houston Rockets,22.3594


We can also stack different **sort**s! Below is an example of a table of NBA players sorted by salary and position.
**Note**: the later **sort** is processed last. In the example below, the salary is sorted first, then position.

In [9]:
nba_salaries.sort('2015-2016 SALARY', descending = True).sort(1)

PLAYER,POSITION,TEAM,2015-2016 SALARY
Dwight Howard,C,Houston Rockets,22.3594
Marc Gasol,C,Memphis Grizzlies,19.688
Enes Kanter,C,Oklahoma City Thunder,16.4075
DeMarcus Cousins,C,Sacramento Kings,15.852
Roy Hibbert,C,Los Angeles Lakers,15.5922
Tristan Thompson,C,Cleveland Cavaliers,14.2609
Andrew Bogut,C,Golden State Warriors,13.8
Al Jefferson,C,Charlotte Hornets,13.5
Joakim Noah,C,Chicago Bulls,13.4
Nene Hilario,C,Washington Wizards,13.0


In [10]:
# If you're curious of what else the **sort** method is capable of, you can look at the documentation
nba_salaries.sort?

In the documentation, notice that there's an optional argument **distinct**. This means sort the table with unique values for the selected column. Below is an example of getting the data of players' salaries based on certain positions.

In [11]:
nba_salaries.sort('2015-2016 SALARY', descending = True).sort('POSITION', distinct = True)

PLAYER,POSITION,TEAM,2015-2016 SALARY
Dwight Howard,C,Houston Rockets,22.3594
Chris Bosh,PF,Miami Heat,22.1927
Chris Paul,PG,Los Angeles Clippers,21.4687
Kobe Bryant,SF,Los Angeles Lakers,25.0
Dwyane Wade,SG,Miami Heat,20.0


# List

## List are Generic Sequences
A list is a sequence of values (similar to array), but the values can be of different types.
If you create a table column from a list, it will be converted to an array automatically.

In [12]:
[2, 'three']

[2, 'three']

In [13]:
x = [2, 'three']
type(x)

list

Some useful application of list is adding a row of entry into a table. Here's an example of adding an entry to the **nba_salaries** table.

In [14]:
row = ['Snoop Dogg', 'PG', 'NBA All Stars', 999]
nba_salaries.with_row(row).sort('2015-2016 SALARY', descending=True).show(5)

PLAYER,POSITION,TEAM,2015-2016 SALARY
Snoop Dogg,PG,NBA All Stars,999.0
Kobe Bryant,SF,Los Angeles Lakers,25.0
Joe Johnson,SF,Brooklyn Nets,24.8949
LeBron James,SF,Cleveland Cavaliers,22.9705
Carmelo Anthony,SF,New York Knicks,22.875


## Take Rows, Select Columns
* **select** returns a table with only some columns
* ** take** method returns a table with only some rows
    * Rows are numbered, starting at 0
    * Taking a single number returns a one-row table
    * Taking a list of numbers returns a table as well

In [15]:
nba_salaries.take(0)

PLAYER,POSITION,TEAM,2015-2016 SALARY
Paul Millsap,PF,Atlanta Hawks,18.6717


In [16]:
nba_salaries.take(1)

PLAYER,POSITION,TEAM,2015-2016 SALARY
Al Horford,C,Atlanta Hawks,12


In [17]:
nba_salaries.take(make_array(0, 1, 2))

PLAYER,POSITION,TEAM,2015-2016 SALARY
Paul Millsap,PF,Atlanta Hawks,18.6717
Al Horford,C,Atlanta Hawks,12.0
Tiago Splitter,C,Atlanta Hawks,9.75625


In [20]:
import numpy as np
# Assign 'nba_salaries' to the variable 'nba' so it's shorter
nba = nba_salaries
# Assign the '2015-2016 SALARY' string to the variable 'salary' so it's also shorter
salary = "2015-2016 SALARY"
# Sort the table by salary then take 3 of them. Then assign it to 'rich'
rich = nba.sort(salary, descending = True).take(np.arange(3))
rich

PLAYER,POSITION,TEAM,2015-2016 SALARY
Kobe Bryant,SF,Los Angeles Lakers,25.0
Joe Johnson,SF,Brooklyn Nets,24.8949
LeBron James,SF,Cleveland Cavaliers,22.9705


In [21]:
# Take the 10th to 19th richest player
nba.sort(salary, descending = True).take(np.arange(10, 20))

PLAYER,POSITION,TEAM,2015-2016 SALARY
LaMarcus Aldridge,PF,San Antonio Spurs,19.689
Kevin Love,PF,Cleveland Cavaliers,19.689
Marc Gasol,C,Memphis Grizzlies,19.688
Blake Griffin,PF,Los Angeles Clippers,18.9077
Paul Millsap,PF,Atlanta Hawks,18.6717
Paul George,SF,Indiana Pacers,17.1201
Russell Westbrook,PG,Oklahoma City Thunder,16.7442
Kyrie Irving,PG,Cleveland Cavaliers,16.4075
Kawhi Leonard,SF,San Antonio Spurs,16.4075
Enes Kanter,C,Oklahoma City Thunder,16.4075


## Where Method
* **where** method specifies a column and a condition
* Returns a new table with all rows satisfying the condition

In [23]:
# List all players whose salary are above 10
above_10 = nba.where(salary, are.above(10))
above_10

PLAYER,POSITION,TEAM,2015-2016 SALARY
Paul Millsap,PF,Atlanta Hawks,18.6717
Al Horford,C,Atlanta Hawks,12.0
Joe Johnson,SF,Brooklyn Nets,24.8949
Thaddeus Young,PF,Brooklyn Nets,11.236
Al Jefferson,C,Charlotte Hornets,13.5
Nicolas Batum,SG,Charlotte Hornets,13.1253
Kemba Walker,PG,Charlotte Hornets,12.0
Derrick Rose,PG,Chicago Bulls,20.0931
Jimmy Butler,SG,Chicago Bulls,16.4075
Joakim Noah,C,Chicago Bulls,13.4


In [24]:
above_10.sort(salary)

PLAYER,POSITION,TEAM,2015-2016 SALARY
DeMar DeRozan,SG,Toronto Raptors,10.05
Gerald Wallace,SF,Philadelphia 76ers,10.1059
Luol Deng,SF,Miami Heat,10.1516
Monta Ellis,SG,Indiana Pacers,10.3
Wilson Chandler,SF,Denver Nuggets,10.4494
Brendan Haywood,C,Cleveland Cavaliers,10.5225
Jrue Holiday,PG,New Orleans Pelicans,10.5955
Tyreke Evans,SG,New Orleans Pelicans,10.7346
Marcin Gortat,C,Washington Wizards,11.2174
Thaddeus Young,PF,Brooklyn Nets,11.236


In [25]:
# Pick all the players based on a certain team
team = "TEAM"
choice = "Golden State Warriors"
nba.where(team, are.equal_to(choice))

PLAYER,POSITION,TEAM,2015-2016 SALARY
Klay Thompson,SG,Golden State Warriors,15.501
Draymond Green,PF,Golden State Warriors,14.2609
Andrew Bogut,C,Golden State Warriors,13.8
Andre Iguodala,SF,Golden State Warriors,11.7105
Stephen Curry,PG,Golden State Warriors,11.3708
Jason Thompson,PF,Golden State Warriors,7.00847
Shaun Livingston,PG,Golden State Warriors,5.54373
Harrison Barnes,SF,Golden State Warriors,3.8734
Marreese Speights,C,Golden State Warriors,3.815
Leandro Barbosa,SG,Golden State Warriors,2.5


In [26]:
# Find a player whose name contains the word 'Curry'
contain = "Curry"
nba.where("PLAYER", are.containing(contain))

PLAYER,POSITION,TEAM,2015-2016 SALARY
Stephen Curry,PG,Golden State Warriors,11.3708


In [27]:
# Find a player whose salary between 11 to 12
nba.where(salary, are.between(11,12))

PLAYER,POSITION,TEAM,2015-2016 SALARY
Thaddeus Young,PF,Brooklyn Nets,11.236
Kenneth Faried,PF,Denver Nuggets,11.236
Andre Iguodala,SF,Golden State Warriors,11.7105
Stephen Curry,PG,Golden State Warriors,11.3708
Nikola Vucevic,C,Orlando Magic,11.25
Marcin Gortat,C,Washington Wizards,11.2174


See [documentation](https://www.inferentialthinking.com/chapters/06/2/Selecting_Rows) for more methods within **are**.

## Recap: Manipulating Rows
* **t.sort(column)** sorts the rows in increasing order
* **t.take(row_numbers)** keeps the numbered rows
    * Each **row** has an index, starting at 0
* **t.where(column, are.condition)** keeps all rows for which a column's value satisfies a condition
* **t.where(column, value)** keeps all rows containing a certain value in a column