# Tables

The next cell has some "boiler plate" code.  You'll find it at the top of all notebooks in CS 104.  Always run this cell.  It sets up our notebook environment to have access to the libraries and resources used in the rest of the code.

In [None]:
from datascience import *
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

## Python Review

### Expressions

In [None]:
24

In [None]:
24 * 7

In [None]:
24 * 60 * (60 + 5 - 3 * 2)

In [None]:
# two to the power of four: 2 * 2 * 2 * 2
2 ** 4

In [None]:
'hello'

### Variables

Terminology:  *expressions*, *values*, *names*, *variables*, *statements*

Naming rules:  letters, numbers, underscores; case sensitive; start with letter usually

Calculate the number of seconds in a year.

In [None]:
60 * 60 * 24 * 365

In [None]:
seconds_per_year = 60 * 60 * 24 * 365

In [None]:
seconds_per_year

In [None]:
seconds_per_hour = 60 * 60
hours_per_year = 24 * 365
seconds_per_year = seconds_per_hour * hours_per_year
seconds_per_year

### Functions

Terminology: *function*, *call*, *argument*

In [None]:
abs(-5)

In [None]:
day_temp = 52
night_temp = 47
abs(night_temp - day_temp)

In [None]:
max(3, 4, 6, -2 , 1, 0)

In [None]:
y = max(3, 4)

In [None]:
y

In [None]:
round(123.456, 1)

## Tables

Tables are stored as CSV (Comma separated values) files.  Have a look!

### Hopkin's Forest Tree Surveys

![](https://hmf.williams.edu/files/foliage.jpg) 

Trees and more here: <https://hmf.williams.edu/researchacademics/data/>

In [None]:
Table.read_table('data/hopkins-plot-0011.csv')

In [None]:
trees = Table.read_table('data/hopkins-plot-0011.csv')
trees

In [None]:
trees.show(3)

In [None]:
trees.show()

### Table Operations: Selecting Columns

Terminology: *method*

In [None]:
trees.select('genus')

In [None]:
trees.select('common name', 'count')

In [None]:
trees.select(genus, 'count')  # Error!

In [None]:
trees.drop('species')

In [None]:
trees  # Still the same...

In [None]:
trees_without_species = trees.drop('species')
trees_without_species

### Table Operations: Sorting

Terminology:  *named argument*

In [None]:
trees_without_species.sort('count')

In [None]:
trees_without_species.sort('count', descending=True)

In [None]:
trees_without_species.sort('genus', descending=True)

## Tying It All Together

![](https://gnnguidepost.org/wp-content/uploads/2022/02/0b44ff30dbe13528285e97eb23b3e185.png)

In [None]:
# Data as of 2020 in dollars
movies = Table.read_table('data/top_movies_2020.csv')

In [None]:
movies

Terminology:  *named argument*

In [None]:
movies.sort('Year', descending=True)

In [None]:
movies.show(15)

In [None]:
movies.sort('Gross')

In [None]:
movies.sort('Gross', descending=True)

In [None]:
sorted_by_gross = movies.sort('Gross', descending=True)
sorted_by_gross
# what about adjusted gross?

In [None]:
sorted_by_gross.sort('Studio')

In [None]:
sorted_by_gross.sort('Studio', distinct=True)

## Visualizing Categorical Data

In [None]:
top_per_studio = sorted_by_gross.sort('Studio', distinct=True)
top_per_studio

In [None]:
# Which studio has the highest grossing movie?
# Bar chart: visualize categorical data
top_per_studio.barh('Studio', 'Gross')

In [None]:
# Sort first: make it easier for the eye to compare
top_studios = top_per_studio.sort('Gross', descending=True)
top_studios.barh('Studio', 'Gross')

In [None]:
# Compare unadjusted vs. adjusted gross
just_revenues = top_studios.select('Studio', 'Gross', 'Gross (Adjusted)')

In [None]:
just_revenues

In [None]:
just_revenues.barh('Studio')

### Table Operations: Selecting Rows

In [None]:
majors = Table.read_table("data/majors.csv")
majors

In [None]:
majors.sort("2018-2021", descending=True)

In [None]:
majors.where("Division", are.equal_to(3))
majors.where("Division", 3)  # same

Terminology: *method chaining*

In [None]:
majors.where("Division", are.equal_to(3)).sort("2018-2021", descending=True)

See complete list of `are` conditions in our [Python Reference](https://www.cs.williams.edu/~cs104/auto/python-library-ref.html#sec-where).

In [None]:
majors.where("Division", are.not_equal_to(3)).sort("2018-2021", descending=True)

In [None]:
majors.where("2018-2021", are.above(30))

In [None]:
majors.where("2018-2021", are.between(10,20))