# Lesson 03: Intro to the `datascience` Package

The [`datascience`](http://data8.org/datascience/index.html) package was written for use in UC Berkeley’s foundations of data science course and contains useful functionality for investigating and graphically displaying data.

## Tables

The most important functionality in the package is is the [`Table`](https://inferentialthinking.com/chapters/06/Tables.html) class, which is the structure used to represent columns of data. First, load the package:

In [None]:
from datascience import *

### Creating a Table

- A `Table` is a sequence of labeled columns of data.

- A `Table` can be constructed from scratch by using the .`with_columns()` method.

In [None]:
t = Table().with_columns(
    'letter', ['a', 'b', 'c', 'z'],
    'count',  [  9,   3,   3,   1],
    'points', [  1,   2,   2,  10],
)
t

More often, a table is read from a `.csv` file (or an Excel spreadsheet). Here’s the content of an example file:

In [None]:
cat "data/cones.csv"

And this is how we load it in as a `Table` using `read_table()`:

In [None]:
cones = Table.read_table("data/cones.csv")
cones

### Accessing Values

To access values of columns in the table, use `column()`, which takes a column label or index and returns an array. Alternatively, `columns()` returns a list of columns (arrays).

In [None]:
t.column('letter')

In [None]:
t.column(1)

You can use bracket notation as a shorthand for this method:

In [None]:
t['letter'] # This is a shorthand for t.column('letter')

In [None]:
t[1] # This is a shorthand for t.column(1)

To access values by row, `row()` returns a row by index. Alternatively, `rows()` returns an list-like `Rows` object that contains tuple-like `Row` objects.

In [None]:
t.rows

In [None]:
t.rows[0]

In [None]:
t.row(0)

In [None]:
second = t.rows[1]
second

In [None]:
second[0]

In [None]:
second[1]

To get the number of rows, use `num_rows`.

In [None]:
t.num_rows

### Manipulating Data

Here are some of the most common operations on data. For the rest, see the reference ([Tables (datascience.tables)](http://data8.org/datascience/tables.html#tables-overview)).

Adding a column with `with_column()`:

In [None]:
t

In [None]:
t.with_column('vowel?', ['yes', 'no', 'no', 'no'])

`.with_column` returns a new table without modifying the original table. 

In [None]:
t

Selecting columns with `select()`:

In [None]:
t.select('letter')

Notice that `.select` returns a table.

In [None]:
t.select(['letter', 'points'])

Renaming columns with `relabeled()`:

In [None]:
t

In [None]:
t.relabeled('points', 'other name')

In [None]:
t

In [None]:
t.relabeled(['letter', 'count', 'points'], ['x', 'y', 'z'])

Selecting out rows by index with `take()` and conditionally with `where()`:

In [None]:
t

Return the third row.

In [None]:
t.take(2)

Return the first and second rows.

In [None]:
t.take[0:2] 

Return the rows where `points == 2`.

In [None]:
t.where('points', 2) 

Return the rows where `count < 8`.

In [None]:
t.where('count', are.below(8))

In [None]:
t.where(t['count'] < 8)

`.where` actually takes in an array of booleans.

In [None]:
t['count'] < 8 

Operate on table data with `sort()`

In [None]:
t

In [None]:
t.sort('count')

In [None]:
t.sort('letter', descending = True)

### More Table Operations

Show a specific number of rows.

In [None]:
cones.show(2)

#### Never Ever Ever Ever do this

```
cones.show()
```

#### Selecting Columns

Columns must contained within single (`' '`) or double (`" "`) quotes.

In [None]:
cones.select(Flavor, 'Price')

**Question 1.** Select the `Flavor` column from the `cones` `Table`.

In [None]:
cones.select(...)

**Question 2.** Select the `Flavor` and `Price` columns from the `cones` `Table`.

In [None]:
cones.select(..., ...)

**Question 3.** Drop the `Price` column from the `cones` `Table` without modifying the original `Table`.

In [None]:
cones.drop(...)

The `.drop()` method does not modify the original table.

In [None]:
cones

**Question 4.** Drop the `Price` column from the `cones` `Table` and save the new table to `cones_without_price`.

In [None]:
cones_without_price = cones.drop(...)
cones_without_price

**Question 5.** Find the rows from the `cones` `Table` where the `Flavor` is chocolate.

In [None]:
cones.where(..., ...)

**Question 6.** Sort the `cones` `Table` in ascending order by `Price`.

In [None]:
cones.sort(...)

**Question 7.** Sort the `cones` `Table` in descending order by `Price`.

In [None]:
cones.sort(..., descending=...)

**Question 8.** Sort the `cones` `Table` in descending order by `Flavor`.

In [None]:
cones.sort(..., descending=...)