# Lab 3

# Introduction to NumPy

In this lab, you'll be working through Chapter 2 to get an introduction to the numerical computing package for Python, NumPy. This notebook is made up of two sections.

- Section 1: Work through the code samples in Chapter 2
- Section 2: Exercises

# Section 1: Code Practice

In this section, you will be reading through the various chapter sections and typing out/running the code samples given in the sections. The purpose of this is for you to practice using Jupyter to run Python code as well as learn about the functionality available to you in both IPython and Jupyter.

##### Executing code in Jupyter

When typing and executing code in Jupyter, it is helpful to know the various keyboard shortcuts. You can find the full list of these by clicking **Help &rarr; Keyboard Shortcuts** in the menu. However, the two most useful keyboard shortcuts are:

- `Shift-Enter`: Execute the current cell and advance to the next cell. This will create one if none exists, but if a cell exists below your current cell, a new cell will **not** be created.
- `Alt-Enter`: Execute the current cell and **create** a new cell below.
- `Control-Enter`: Execute the current cell without advancing to the next cell

When writing your code, you will be using these two commands to make sure input/output (`In`/`Out`) is consistent with what is found in the chapter. If you create a cell by mistake, you can always go to **Edit &rarr; Delete Cells** to remove it.

#### Purpose of Section 1

Your purpose in this section is 

- **Type out** the code examples from the chapter (do not copy and paste)
- **Run** them
- **Check** to **make sure** you are getting the same results as what is contained in the chapter

---




## Computation on Arrays: Broadcasting

[Chapter/Section link](https://nbviewer.jupyter.org/urls/bitbucket.org/dogwynn/pythondatasciencehandbook/raw/master/notebooks/02.05-Computation-on-arrays-broadcasting.ipynb)

### Introducing Broadcasting

### Rules of Broadcasting

#### Broadcasting example 1

#### Broadcasting example 2

#### Broadcasting example 3

### Broadcasting in Practice

#### Centering an array

#### Plotting a two-dimensional function

## Comparisons, Masks, and Boolean Logic

[Chapter/section link](https://nbviewer.jupyter.org/urls/bitbucket.org/dogwynn/pythondatasciencehandbook/raw/master/notebooks/02.06-Boolean-Arrays-and-Masks.ipynb)

### Example: Counting Rainy Days

In [1]:
def array_from_url(url, column):
    import pandas as pd
    import numpy as np
    data = pd.read_csv(url)
    return np.array(data[column])

rainfall = array_from_url('https://raw.githubusercontent.com/jakevdp/PythonDataScienceHandbook/master/notebooks/data/Seattle2014.csv','PRCP')

Start the next cell at `inches = rainfall / 254.0`

### Comparison Operators as UFuncs

### Working with Boolean Arrays

#### Couting entries

#### Boolean operators

### Boolean Arrays as Masks

### Aside: Using the Keywords `and`/`or` Versus the Operators `&`/`|`

## Fancy Indexing

[Chapter/section link](https://nbviewer.jupyter.org/urls/bitbucket.org/dogwynn/pythondatasciencehandbook/raw/master/notebooks/02.07-Fancy-Indexing.ipynb)

### Exploring Fancy Indexing

### Combined Indexing

### Example: Selecting Random Points

### Modifying Values with Fancy Indexing

### Example: Binning Data

## Sorting Arrays

[Chapter/section link](https://nbviewer.jupyter.org/urls/bitbucket.org/dogwynn/pythondatasciencehandbook/raw/master/notebooks/02.08-Sorting.ipynb)

### Fast Sorting in NumPy: `np.sort` and `np.argsort`

#### Sorting along rows or columns

### Partial Sorts: Partitioning

### Example: k-Nearest Neighbors

## Structured Data: NumPy's Structured Arrays

[Chapter/section link](https://nbviewer.jupyter.org/urls/bitbucket.org/dogwynn/pythondatasciencehandbook/raw/master/notebooks/02.09-Structured-Data-NumPy.ipynb)

### Creating Structured Arrays

### More Advanced Compound Types

### RecordArrays: Structured Arrays with a Twist

---

# Section 2: Exercises

In this section, you will be provided a few exercises to demonstrate your understanding of the chapter contents. Each exercise will have a Markdown section describing the problem, and you will provide cells below the description with code, comments and visual demonstrations of your solution.

---

### Problem 1

Make sure you have the `array_from_url` function defined:

```python
def array_from_url(url, column):
    import pandas as pd
    import numpy as np
    data = pd.read_csv(url)
    return np.array(data[column])
```

Using the `array_from_url` function, load the following two data sets into memory using the variable names provided:

- variable: `areas`
    - URL: `"https://raw.githubusercontent.com/jakevdp/data-USstates/master/state-areas.csv"`
    - column: `"area (sq. mi)"`
- variable: `populations`
    - URL: `"https://raw.githubusercontent.com/jakevdp/PythonDataScienceHandbook/master/notebooks/data/state-population.csv"`
    - column: `"population"`

Compute a new variable: `pop_density` containing the population density of each of the states (plus D.C. and Puerto Rico). Population density is defined as the population divided by the area.

Use this NumPy array to answer the following questions.

- Which state has the highest population density and what is it?
- Which territory has the highest population density and what is it?
- What is the mean population density of just the United States in 2012?
- What is the mean population density of the United States and territories in 2012?

---

### Problem 2

Using the `array_from_url` function, load the following two data sets into memory using the variable names provided:

- variable: `titanid`
    - URL: `"https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"`
    - column: `"age"`



Answer the following questions:

- What are the minimum, maximum, and mean ages of the following types of passengers on the Titanic?
    - All passengers
    - Survivors 
    - Those that died
- What are the percentage of male passengers that died?
- What are the percentage of female passengers that died?


---

### Problem 3

Define the following function:

```python
def titanic_structured():
    import pandas as pd, numpy as np
    data = pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv")
    cols = ['survived', 'pclass', 'sex', 'age', 'fare']
    sarray = np.zeros(len(data), dtype={'names':cols,'formats':('i4','i4','U10','f8','f8')})
    sarray['survived'] = data.Survived
    sarray['pclass'] = data.Pclass
    sarray['sex'] = data.Sex
    sarray['age'] = data.Age
    sarray['fare'] = data.Fare
    return sarray
```

Assign the output of this function to a new variable `titanic_new`, and answer the following questions:

- What is the average age of men that survived?
- What is the average age of women that survived?
- What is the [mode](https://www.mathsisfun.com/definitions/mode.html) of the class of survivors?
- What is the mode of the class of those that died?
   