# Week 7 Lecture Notebook

## Python Basics

### Built-in Functions

- A function that is already available in a programming language/application that can be accessed by end users.

- Returns some value based on its arguments.

- `print`, `abs`, `max`, `min`, `pow`, `round`, etc.

In [None]:
abs(-3)

In [None]:
abs(2-5)

In [None]:
max(3, 10**2, 100.1)

## Nesting Functions

In [None]:
round(abs(1.6002-1.688), 4)

In [None]:
1.6002-1.688

In [None]:
abs(1.6002-1.688)

In [None]:
round(abs(1.6002-1.688), 4)

# Pandas

## `pandas`

Pandas is an open source Python package that is most widely used for data science/data analysis and machine learning tasks. It is built on top of another library named `Numpy`, which provides support for arrays. Since we know how to perform operations on `NumPy` arrays we can operate on columns in a `pandas` dataframe. 

Pandas is a fast, powerful, flexible and (sometimes) easy to use open source data analysis and manipulation tool. Click the `Cheat Sheet` below to access the Data Wrangling with `pandas` [Cheat Sheet](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf).

In [None]:
# Import the pandas module as pd
import pandas as pd

# Read the .csv file
skyscrapers = pd.read_csv('data/skyscrapers.csv')

**Example 1.** Display the first 10 rows of the `skyscrapers` dataframe.

In [None]:
skyscrapers.head(10)

### Common `pandas` `DataFrame` Methods

- `.head()`
- `.shape`
- `.info()`
- `.describe()`
- `.columns`
- `.sample`

Apply each method in the following examples to the `skyscapers` dataframe.

**Example 2.** `.info()`

In [None]:
# Returns information about the dataframe
...

**Example 3.** `.shape`

In [None]:
# Returns the number of rows and 
# columns as a tuple
...

**Example 4.** `.describe()`

In [None]:
# Returns basic statistical details
# from the numerical columns 
...

**Example 5.** `.columns()`

In [None]:
# Returns the names of the columns
...

**Example 6.** `.sample()`

In [None]:
# Returns one random sample of rows
# By defult the sample is without replacement
# Can specify the number of rows by
# sample(<number of rows to return>)
...

### Rename columns in a `pandas DatFrame`

In [None]:
# Rename the <old column name> to <new column name>
skyscrapers.rename(columns={'location.city': 'city',
                            'statistics.height': 'height',
                            'statistics.floors above': 'floors',
                            'status.completed.year': 'year_completed', 
                            'status.started.year': 'year_started'},
                   inplace=True)

In [None]:
skyscrapers.info()

### Accessing columns from a `pandas` `DataFrame`

**Example 7.** Access the a `name` column from the `skyscrapers` dataframe and return a `Series` type object.

In [None]:
# Returns the values from a 
# column as a Series
...

## Series

A pandas series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). 

**Source:** [Geeks for Geeks](https://www.geeksforgeeks.org/python-pandas-series/)

**Example 8.** Access a numerical column from the `skyscrapers` dataframe and return a `Series` type object.

In [None]:
# Returns the values from a 
# column as a Series
...

Since a `Series` is a 1-dimensional `ndarray` with axis labels (including time series), we can use them as parameters for `Numpy` functions.

Let's import `Numpy` and see!

In [None]:
import numpy as np

In [None]:
# Returns the values from a column
# as a Series
skyscrapers["height"]

In [None]:
# Returns the values from a column
# as a DataFrame
skyscrapers.height

**Example 9.** What is the average height for all skyscrapers in the dataset.

In [None]:
...

### `Series` Attributes and Methods

**Attribute**
 - An attribute of a Series is a property or characteristic that provides information about the `Series` itself.

- Attributes are accessed without parentheses, simply by referencing the attribute name.

- They provide metadata, statistics, or information about the Series but do not perform operations or transformations on the data within the `Series`.

- Examples of Series attributes include `dtype` (data type of the `Series`), `name` (name of the Series), `index` (index labels), and `shape` (shape of the Series).

- Accessing an attribute doesn't require invoking it as a function/method; you access it directly.

**Method**
- A method of a `Series` is a **function** that performs an operation or computation on the data within the `Series`.

- Methods are accessed with parentheses and often accept arguments or parameters to control their behavior.

- Methods manipulate or transform the data and return a result based on the operation performed.

- Examples of `Series` methods include `.sum()` (calculates the sum of elements), `.mean()` (calculates the mean), `.unique()` (returns unique values), and `.apply()` (applies a custom function to each element).

- Accessing a method requires invoking it as a function with parentheses.

**Source:** [ChatGPT generated response](https://docs.google.com/document/d/10Jm9vNpG5_JPxzpfiLSJhiDVS2XQabSOIsfXMH8_Tvg/edit?usp=sharing)

**Example 10.** What is the average height for all skyscrapers in the dataset.

In [None]:
skyscrapers.height.mean

In [None]:
skyscrapers.height.mean()

**Question 11.** What are the unique cities where the skyscrapers in the dataset are located.

In [None]:
skyscrapers.city.unique()