# Week 7 Lecture Notebook

## Python Basics

### Built-in Functions

- A function that is already available in a programming language/application that can be accessed by end users.
- Returns some value based on its arguments.
- `print`, `ab`s, `max`, `min`, `pow`, `round`, etc.

In [None]:
abs(-3)

In [None]:
abs(2-5)

In [None]:
round(2.8)

In [None]:
pie = 22/7
pie

In [None]:
round(pie, 4)

In [None]:
pow(3, 2)

In [None]:
max(3, 10**2, 100.1)

### Nested Functions

In [None]:
round(abs(1.6002-1.688), 4)

In [None]:
1.6002-1.688

In [None]:
abs(1.6002-1.688)

### Importing Libraries

- The [`math`](https://www.w3schools.com/PYTHON/module_math.asp) module has a set of methods and constants.

**Entire Module**
```
import math 
```

**Specific Constant**
```
from math import pi
```

- A method is like a function. The difference is a method is associated with an object, but a function is not [(stackoverflow)](https://stackoverflow.com/a/155655).

In [None]:
import math 

In [None]:
math.pi

### `NumPy`

[`NumPy` is a library for the Python programming language](https://en.wikipedia.org/wiki/NumPy), adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

In [None]:
import numpy as np

NumPy arrays can be thought of as fancy, homogeneous lists.

- `np.array([1, 2, 3])`
- `np.array(['one', 'two', 'three'])`
- `np.array([])`

In [None]:
np.array([1, 2, 3])

In [None]:
np.array([“one”, “two”, “three”])

In [None]:
np.array([])

- Operations on every element (e.g. multiplication, etc.)
  - `np.log`
  - `np.sqrt`
  - `np.max`
  - `np.sort`

### Lists vs. Arrays

- Operations on every element


In [None]:
[1, 2, 3, 4, 5] * 2
np.array([1, 2, 3, 4, 5]) * 2

In [None]:
even = np.array([2, 4, 6, 8])
even ** 2

In [None]:
np.arange(5)

In [None]:
np.arange(5, 200, 5)

## `pandas`

- Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool. 

- [Tidy Data](https://www.jstatsoft.org/index.php/jss/article/view/v059i10/772)

- Data Wrangling with `pandas` [Cheat Sheet](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf)

In [None]:
import pandas as pd

In [None]:
skyscrapers = pd.read_csv('../../data/skyscrapers.csv')
skyscrapers

### Common `pandas` `DataFrame` Methods

- `.head()`
- `.shape`
- `.info()`
- `.describe()`
- `.dtypes`
- `.columns`
- `.sample`

In [None]:
# Returns the first 10 rows by default
# Can specify the number of rows by
# head(<number of rows to return>)
skyscrapers.head()

In [None]:
skyscrapers.head(15)

In [None]:
# Returns the number of rows and 
# columns as a tuple
skyscrapers.shape

In [None]:
# Returns information about the dataframe
skyscrapers.info()

In [None]:
# Returns basic statistical details
# from numerical columns 
skyscrapers.describe()

In [None]:
# Returns the names of the columns
skyscrapers.columns

In [None]:
# Returns one random sample of rows
# By defult the sample is without replacement
# Can specify the number of rows by
# sample(<number of rows to return>)
skyscrapers.sample()

In [None]:
skyscrapers.sample(10)

In [None]:
# Returns the values from a column
# as a Series
skyscrapers["city"]

In [None]:
# Returns the values from a column
# as a Series
skyscrapers["height"]

In [None]:
# Since a series is like an array with special
# indices you can use numpy functions on it
np.round(skyscrapers['height'])

In [None]:
# Since a series is like an array with special
# indices you can use numpy functions on it
np.mean(skyscrapers.height)

In [None]:
# Returns the values from a column
# as a dataframe
skyscrapers[['city']]

In [None]:
# Returns the values from a column
# as a dataframe
skyscrapers[['height']]

### `Series` Methods

In [None]:
# Returns the frequency of labels
# from a categorical column (Series)
# as a Series
skyscrapers["material"].value_counts()

In [None]:
# Returns the unique labels
# from a categorical column (Series)
# as an array
skyscrapers["city"].unique()

### Subsetting by Row Value

What if we wanted to know which skyscrapers are taller than 300.

In [None]:
# Select the vlaues from the % column
# This returns a Series
skyscrapers['height']

## Comparison Operators

Python supports the use of six comparion operators:

- Equals: `a == b`

- Not Equals: `a != b`

- Less than: `a < b`

- Less than or equal: `a <= b`

- Greater than: `a > b`

- Greater than or equal: `a >= b`

In [None]:
# Use a comparison operator to
# determine which values from the
# Series are greater than 300
skyscrapers['height'] > 300

In [None]:
# Use the results from the comparison
# to "mask" rows that are False
# This is know as a boolean mask
skyscrapers[skyscrapers['height'] > 300]