# Lecture 7: Array-Oriented Programming with NumPy

## Objectives

In this chapter you’ll:

- Learn what **arrays** are and how they differ from **lists**.  
- Use the **NumPy** module’s high-performance **ndarrays**.  
- Compare **list** and **ndarray** performance using the **IPython `%timeit` magic**.   
- Create and initialize **ndarrays**.  
- Refer to individual **ndarray elements**.  
- Create and manipulate **multidimensional ndarrays**.  
- Create and manipulate **pandas** one-dimensional **Series** and two-dimensional **DataFrames**.  
- Calculate basic **descriptive statistics** for data in a **Series** and a **DataFrame**.


## 7.1 Introduction

The **NumPy (Numerical Python)** library, introduced in **2006**, is the preferred Python array implementation.  
It provides a high-performance, feature-rich **n-dimensional array type** called **ndarray**, commonly referred to simply as an **array**.

NumPy is part of the **Anaconda Python distribution** and serves as the foundation for many popular libraries.

### Key Features:
- Offers **high-performance arrays** (up to **100x faster** than lists)
- Supports **n-dimensional data** (1D, 2D, 3D, etc.)
- Enables **array-oriented programming**, reducing the need for explicit loops
- Provides **vectorized operations** for concise and efficient computation

### Importance in Data Science:
Over **450 Python libraries** depend on NumPy, including:
- **Pandas** (data analysis)
- **SciPy** (scientific computing)
- **Keras** (deep learning)

### Concept:
Unlike lists, which often require **nested loops** for multidimensional data processing,  
NumPy enables **functional-style programming** with **internal iteration**, improving:
- Readability  
- Performance  
- Reliability (fewer bugs due to manual loops)

### Beyond NumPy:
The chapter also introduces **pandas**, a library built on NumPy that provides:
- **Series** → one-dimensional labeled data
- **DataFrame** → two-dimensional labeled data (tabular form)

These structures handle:
- **Mixed data types**
- **Custom indices**
- **Missing/inconsistent data**
- **Flexible data manipulation** for analytics and database operations

### Summary:
By the end of this chapter, you’ll be familiar with four array-like data structures:
1. **Lists**
2. **NumPy arrays**
3. **pandas Series**
4. **pandas DataFrames**

A fifth, **tensors**, will be introduced later in the *Deep Learning* chapter.

## 7.2 Creating Arrays from Existing Data


In [7]:
# Importing NumPy with a standard alias
import numpy as np

NumPy provides multiple ways to create arrays.  
The most common method is using the **array()** function,  
which takes a list or another collection and converts it into a NumPy array.


In [8]:
# Creating a NumPy array from a Python list
numbers = np.array([2, 3, 5, 7, 11])

In [9]:
# Checking the type of the object
type(numbers)

numpy.ndarray

In [10]:
# Displaying the array
numbers

array([ 2,  3,  5,  7, 11])

 **Explanation:**
- The `array()` function copies the input data into a new NumPy array.
- Output values are **comma-separated** and **right-aligned**.
- The **field width** (spacing) depends on the widest value (here, `11`).

NumPy prints all values in aligned columns for neat formatting.


### Multidimensional Arguments
The `array()` function preserves the dimensional structure of its input.  
For example, passing a *list of lists* creates a **2D array**.


In [11]:
# Creating a 2D array (two rows, three columns)
np.array([[1, 2, 3],
          [4, 5, 6]])

array([[1, 2, 3],
       [4, 5, 6]])

NumPy **automatically formats multidimensional arrays**  
so that columns are aligned and easy to read.


## 7.3 Array Attributes


In [12]:
# Import NumPy
import numpy as np

# Note Down
# Creating integer and float arrays
integers = np.array([[1, 2, 3], [4, 5, 6]])
floats = np.array([0.0, 0.1, 0.2, 0.3, 0.4])

In [13]:
integers

array([[1, 2, 3],
       [4, 5, 6]])

In [15]:
floats

array([0. , 0.1, 0.2, 0.3, 0.4])

NumPy omits trailing zeros in floating-point output for compactness.

### Determining an Array’s Element Type


In [16]:
integers.dtype

dtype('int64')

In [17]:
floats.dtype

dtype('float64')

### Determining an Array’s Dimensions


- `integers` → 2D array (2 rows × 3 columns)  
- `floats` → 1D array with 5 elements  
- A one-element tuple (e.g., `(5,)`) indicates a one-dimensional array.


In [18]:
integers.ndim

2

In [19]:
floats.ndim

1

In [20]:
integers.shape

(2, 3)

In [21]:
floats.shape

(5,)

### Determining Number of Elements and Element Size


- `size`: Total number of elements.  
- `itemsize`: Memory (in bytes) per element.  
- Example: `integers.shape = (2, 3)` → 2 × 3 = 6 elements total.


In [22]:
integers.size

6

In [24]:
integers.itemsize

8

In [26]:
floats.size

5

In [25]:
floats.itemsize

8

### Iterating Through a Multidimensional Array


In [27]:
integers = np.array([[1, 2, 3], [4, 5, 6]])

# Nested iteration over rows and columns
for row in integers:
    for column in row:
        print(column, end=' ')
    print()

1 2 3 
4 5 6 


### Iterating as a Flat Sequence


The `.flat` attribute allows you to iterate through a multidimensional array as if it were one-dimensional.


In [28]:
integers = np.array([[1, 2, 3], [4, 5, 6]])

for i in integers.flat:
    print(i, end=' ')

1 2 3 4 5 6 

## 7.4 Filling Arrays with Specific Values


In [29]:
import numpy as np

# Array of zeros (default dtype=float64)
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [30]:
# 2D array of ones (integer type)
np.ones((2, 4), dtype=int)

array([[1, 1, 1, 1],
       [1, 1, 1, 1]])

In [31]:
# Array filled with a specific value
np.full((3, 5), 13)

array([[13, 13, 13, 13, 13],
       [13, 13, 13, 13, 13],
       [13, 13, 13, 13, 13]])

## 7.5 Creating Arrays from Ranges


`np.arange` is optimized for array creation and performs faster than using Python's built-in `range`.


In [32]:
# Creating integer ranges with np.arange
np.arange(5)

array([0, 1, 2, 3, 4])

In [33]:
np.arange(5, 10)

array([5, 6, 7, 8, 9])

In [34]:
np.arange(10, 1, -2)

array([10,  8,  6,  4,  2])

### Creating Floating-Point Ranges with linspace


`np.linspace(start, stop, num)`  
- Includes the endpoint.  
- Default `num=50` evenly spaced values.


In [35]:
np.linspace(0.0, 1.0, num=5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

### Reshaping an Array


- `arange(1, 21)` creates values 1–20  
- `.reshape(4, 5)` converts 1D array → 4×5 matrix  
- The new shape must contain the same number of total elements.


In [36]:
np.arange(1, 21)#.reshape(4, 5)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20])

In [37]:
np.arange(1, 21).reshape(4, 5)

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20]])

### Displaying Large Arrays


NumPy abbreviates large arrays using `...` to indicate omitted middle elements, showing only the first and last few values for readability.


In [38]:
np.arange(1, 100001).reshape(4, 25000)

array([[     1,      2,      3, ...,  24998,  24999,  25000],
       [ 25001,  25002,  25003, ...,  49998,  49999,  50000],
       [ 50001,  50002,  50003, ...,  74998,  74999,  75000],
       [ 75001,  75002,  75003, ...,  99998,  99999, 100000]],
      shape=(4, 25000))

In [39]:
np.arange(1, 100001).reshape(100, 1000)

array([[     1,      2,      3, ...,    998,    999,   1000],
       [  1001,   1002,   1003, ...,   1998,   1999,   2000],
       [  2001,   2002,   2003, ...,   2998,   2999,   3000],
       ...,
       [ 97001,  97002,  97003, ...,  97998,  97999,  98000],
       [ 98001,  98002,  98003, ...,  98998,  98999,  99000],
       [ 99001,  99002,  99003, ...,  99998,  99999, 100000]],
      shape=(100, 1000))

## 7.6 List vs. Array Performance: Introducing `%timeit`


**Explanation:**
- `%timeit` automatically runs the code multiple times to measure its **average execution time**.
- For long-running operations (>500 ms), it runs once per iteration.
- For short operations (<500 ms), it runs multiple times for accuracy.
- The example shows that creating a list of 6 million die rolls takes around **x.xx seconds on average**.


In [40]:
# Timing List Creation with Random Rolls
import random

%timeit rolls_list = [random.randrange(1, 7) for i in range(0, 6_000_000)]


1.51 s ± 30.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [41]:
# Timing NumPy Array Creation (60,000,000 Die Rolls)
import numpy as np

%timeit rolls_array = np.random.randint(1, 7, 60_000_000)


311 ms ± 5.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [42]:
# Timing NumPy Array Creation (600,000,000 Die Rolls)
%timeit rolls_array = np.random.randint(1, 7, 600_000_000)


3.13 s ± 39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


**Observation:**
- NumPy arrays scale efficiently — even with 100× more elements, performance improves dramatically compared to lists.
- 6 million elements (list) ≈ **1.52 seconds**
- 60 million elements (NumPy array) ≈ **314 seconds**
- 600 million elements (NumPy array) ≈ **3.1 seconds**
- Arrays are therefore **orders of magnitude faster** for large numerical data operations.


## Customizing `%timeit` Iterations


In [43]:
# Custom loop and repeat settings for %timeit
%timeit -n3 -r2 rolls_array = np.random.randint(1, 7, 6_000_000)


32.8 ms ± 1.19 ms per loop (mean ± std. dev. of 2 runs, 3 loops each)


**Explanation of Options:**
- `-n3`: Run the statement **3 times per loop**.
- `-r2`: Repeat the timing **2 times**.
- This allows fine-tuning for longer or shorter operations.

**Other Useful IPython Magics:**
| Magic | Description |
|--------|--------------|
| `%load` | Load code into IPython from a file or URL. |
| `%save` | Save session snippets to a file. |
| `%run` | Execute a Python file directly in IPython. |
| `%precision` | Set the display precision for floating-point numbers. |
| `%cd` | Change the current working directory. |
| `%edit` | Open an external editor for code modification. |
| `%history` | Display a list of executed commands/snippets. |

## 7.7 Array Operators
NumPy provides many operators that enable you to perform operations on entire arrays easily.


In [44]:
import numpy as np

# Note Down
numbers = np.arange(1, 6)
numbers

array([1, 2, 3, 4, 5])

### Arithmetic Operations with Arrays and Individual Numeric Values
Element-wise arithmetic operations apply to every element of the array.


In [45]:
numbers * 2

array([ 2,  4,  6,  8, 10])

In [46]:
numbers ** 3

array([  1,   8,  27,  64, 125])

In [47]:
# The original array remains unchanged
numbers

array([1, 2, 3, 4, 5])

### Augmented Assignment
Augmented assignments modify every element in the left operand.


In [48]:
numbers += 10
numbers

array([11, 12, 13, 14, 15])

### Broadcasting
When one operand is a scalar, NumPy performs the operation as if the scalar were an array of the same shape.


In [49]:
numbers * [2, 2, 2, 2, 2]

array([22, 24, 26, 28, 30])

### Arithmetic Operations Between Arrays
You may perform arithmetic operations and augmented assignments between arrays of the same shape.


In [50]:
numbers2 = np.linspace(1.1, 5.5, 5)
numbers2

array([1.1, 2.2, 3.3, 4.4, 5.5])

In [51]:
numbers * numbers2

array([12.1, 26.4, 42.9, 61.6, 82.5])

### Comparing Arrays
Comparisons are performed element-wise, producing Boolean arrays indicating the comparison results.


In [52]:
numbers

array([11, 12, 13, 14, 15])

In [53]:
numbers >= 13

array([False, False,  True,  True,  True])

In [54]:
numbers2

array([1.1, 2.2, 3.3, 4.4, 5.5])

In [55]:
numbers2 < numbers

array([ True,  True,  True,  True,  True])

In [56]:
numbers == numbers2

array([False, False, False, False, False])

In [57]:
numbers == numbers

array([ True,  True,  True,  True,  True])

# 7.8 NumPy Calculation Methods
NumPy arrays have built-in methods for performing calculations on their contents.


In [58]:
import numpy as np

grades = np.array([
    [87, 96, 70],
    [100, 87, 90],
    [94, 77, 90],
    [100, 81, 82]
])
grades

array([[ 87,  96,  70],
       [100,  87,  90],
       [ 94,  77,  90],
       [100,  81,  82]])

## Basic Calculations
By default, these methods operate on all elements regardless of the array’s shape.


In [59]:
grades.sum()

np.int64(1054)

In [60]:
grades.min()

np.int64(70)

In [61]:
grades.max()

np.int64(100)

In [62]:
grades.mean()

np.float64(87.83333333333333)

In [63]:
grades.std()

np.float64(8.792357792739987)

In [64]:
grades.var()

np.float64(77.30555555555556)

## Calculations by Row or Column
You can specify the `axis` parameter to perform calculations along a specific dimension.


In [65]:
# Average grade for each exam (by column)
grades.mean(axis=0)

array([95.25, 85.25, 83.  ])

In [66]:
# Average grade for each student (by row)
grades.mean(axis=1)

array([84.33333333, 92.33333333, 87.        , 87.66666667])

# 7.9 Universal Functions
NumPy offers dozens of standalone **universal functions (ufuncs)** that perform element-wise operations on arrays or array-like objects.  
Each returns a new array containing the results.


## Example: Square Roots with `sqrt()`


In [67]:
numbers = np.array([1, 4, 9, 16, 25, 36])
np.sqrt(numbers)

array([1., 2., 3., 4., 5., 6.])

## Example: Adding Arrays with `add()`


In [68]:
numbers2 = np.arange(1, 7) * 10
numbers2

array([10, 20, 30, 40, 50, 60])

In [69]:
np.add(numbers, numbers2)

array([11, 24, 39, 56, 75, 96])

The expression above is equivalent to:

In [70]:
numbers + numbers2

array([11, 24, 39, 56, 75, 96])

## Broadcasting with Universal Functions
NumPy automatically broadcasts operations across arrays of compatible shapes.

In [71]:
np.multiply(numbers2, 5)

array([ 50, 100, 150, 200, 250, 300])

This is equivalent to:

In [72]:
numbers2 * 5

array([ 50, 100, 150, 200, 250, 300])

### Example: Broadcasting Across Dimensions

In [73]:
numbers2

array([10, 20, 30, 40, 50, 60])

In [74]:
numbers3 = numbers2.reshape(2, 3)
numbers3

array([[10, 20, 30],
       [40, 50, 60]])

In [75]:
numbers4 = np.array([2, 4, 6])
np.multiply(numbers3, numbers4)

array([[ 20,  80, 180],
       [ 80, 200, 360]])

NumPy treats `numbers4` as if it were:

```python
import numpy as np

np.array([
    [2, 4, 6],
    [2, 4, 6]
])


If arrays have incompatible shapes, a **ValueError** occurs. 

## Categories of Universal Functions
Universal functions fall into five main categories:


**Math Functions** — `add`, `subtract`, `multiply`, `divide`, `remainder`, `exp`, `log`, `sqrt`, `power`, etc.

**Trigonometric Functions** — `sin`, `cos`, `tan`, `hypot`, `arcsin`, `arccos`, `arctan`, etc.

**Bit Manipulation Functions** — `bitwise_and`, `bitwise_or`, `bitwise_xor`, `invert`, `left_shift`, `right_shift`.

**Comparison Functions** — `greater`, `greater_equal`, `less`, `less_equal`, `equal`, `not_equal`, `logical_and`, `logical_or`, `logical_xor`, `logical_not`, `minimum`, `maximum`, etc.

**Floating-Point Functions** — `floor`, `ceil`, `isinf`, `isnan`, `fabs`, `trunc`, etc.


# 7.10 Indexing and Slicing
One-dimensional arrays can be indexed and sliced just like Python lists.  
Here, we’ll focus on array-specific capabilities for **two-dimensional arrays**.


In [76]:
import numpy as np

grades = np.array([
    [87, 96, 70],
    [100, 87, 90],
    [94, 77, 90],
    [100, 81, 82]
])

grades

# Note Down

array([[ 87,  96,  70],
       [100,  87,  90],
       [ 94,  77,  90],
       [100,  81,  82]])

## Accessing a specific element  
To select an element in a two-dimensional array, specify its **row** and **column** indices.


In [77]:
grades[0, 1]  # Row 0, Column 1

np.int64(96)

## Selecting a single row  
Specifying a single index selects that row as a one-dimensional array.


In [78]:
grades[1]

array([100,  87,  90])

## Selecting multiple sequential rows  
You can use **slice notation** to select a range of rows.


In [79]:
grades[0:2]

array([[ 87,  96,  70],
       [100,  87,  90]])

## Selecting multiple non-sequential rows  
Use a list of indices to pick specific rows.


In [80]:
grades[[1, 3]]

array([[100,  87,  90],
       [100,  81,  82]])

## Selecting a subset of columns  
Use `:` before the comma to include all rows, and specify column indices after the comma.


In [81]:
grades[:, 0]  # All rows, first column

array([ 87, 100,  94, 100])

## Selecting consecutive columns  
Slice notation can also be applied to columns.


In [82]:
grades[:, 1:3]

array([[96, 70],
       [87, 90],
       [77, 90],
       [81, 82]])

## Selecting specific columns  
You can use a list of column indices to select non-sequential columns.


In [83]:
grades[:, [0, 2]]

array([[ 87,  70],
       [100,  90],
       [ 94,  90],
       [100,  82]])

### 7.11 Views: Shallow Copies
Views (also known as **shallow copies**) are array objects that **share the same data** as the original array.  
They do not create independent copies of the data — changes in one affect the other.


In [84]:
import numpy as np

numbers = np.arange(1, 6)
numbers

array([1, 2, 3, 4, 5])

#### Creating a View
The `.view()` method creates a new array object that views the same data as the original.

In [85]:
numbers2 = numbers.view()
numbers2

array([1, 2, 3, 4, 5])

#### Checking Object Identity
The arrays are **different objects**, but they **share the same underlying data**.

In [86]:
id(numbers), id(numbers2)

(2829914256176, 2829914251088)

#### Modifying the Original Array
Changing `numbers` also updates `numbers2` since both view the same data.

In [87]:
numbers[1] *= 10
numbers2

array([ 1, 20,  3,  4,  5])

In [88]:
numbers

array([ 1, 20,  3,  4,  5])

#### Modifying the View
Changes in the view are reflected in the original array.

In [89]:
numbers2[1] /= 10
numbers

array([1, 2, 3, 4, 5])

In [90]:
numbers2

array([1, 2, 3, 4, 5])

### Slice Views
Array **slices** also create views of the original data, not copies.

In [91]:
numbers2 = numbers[0:3]
numbers2

array([1, 2, 3])

#### Confirming Different Objects
Sliced arrays are different objects but share data with the original.

In [92]:
id(numbers), id(numbers2)

(2829914256176, 2829914256080)

#### Accessing Out-of-Range Index
Since `numbers2` is a slice of only 3 elements, accessing index `3` raises an error.

In [93]:
numbers2[3]

IndexError: index 3 is out of bounds for axis 0 with size 3

#### Shared Data Behavior
Modifying shared elements in either array updates both.

In [94]:
numbers[1] *= 20
numbers

array([ 1, 40,  3,  4,  5])

In [95]:
numbers2

array([ 1, 40,  3])

### 7.12 Deep Copies
While **views** share data to save memory, **deep copies** create **independent copies** of the data.  
This is useful when you want to prevent changes in one array from affecting another, especially in **multi-core or parallel programming**.

In [96]:
import numpy as np

numbers = np.arange(1, 6)
numbers

array([1, 2, 3, 4, 5])

#### Creating a Deep Copy
Use the `.copy()` method to create a new array with its **own copy of the data**.

In [97]:
numbers2 = numbers.copy()
numbers2

array([1, 2, 3, 4, 5])

#### Verifying Independence
Changes to the original array do **not** affect the copied array.

In [98]:
numbers[1] *= 10
numbers

array([ 1, 20,  3,  4,  5])

In [99]:
numbers2

array([1, 2, 3, 4, 5])

### Shallow vs. Deep Copies in Python
For other Python objects, you can use the **`copy` module**:
- `copy.copy(obj)` → shallow copy  
- `copy.deepcopy(obj)` → deep copy  
Example:

In [100]:
import copy

original = [[1, 2, 3], [4, 5, 6]]
deep_copied = copy.deepcopy(original)

original[0][0] = 99

print("Original:", original)
print("Deep Copy:", deep_copied)

Original: [[99, 2, 3], [4, 5, 6]]
Deep Copy: [[1, 2, 3], [4, 5, 6]]


### 7.13 Reshaping and Transposing
NumPy provides several ways to reshape, flatten, and combine arrays efficiently.


#### reshape vs. resize
- **`reshape`** returns a **view (shallow copy)** of the array with new dimensions (does not modify the original).  
- **`resize`** modifies the **original array’s shape**.


In [101]:
import numpy as np

grades = np.array([[87, 96, 70], [100, 87, 90]])
grades

array([[ 87,  96,  70],
       [100,  87,  90]])

In [102]:
# reshape creates a view (does not modify the original)
grades.reshape(1, 6)

array([[ 87,  96,  70, 100,  87,  90]])

In [103]:
# original remains unchanged
grades

array([[ 87,  96,  70],
       [100,  87,  90]])

In [104]:
# resize modifies the original array
grades.resize(1, 6)
grades

array([[ 87,  96,  70, 100,  87,  90]])

#### flatten vs. ravel
- **`flatten`** creates a **deep copy** — independent of the original.  
- **`ravel`** creates a **view** — changes reflect in the original.

In [105]:
grades = np.array([[87, 96, 70], [100, 87, 90]])
flattened = grades.flatten()
flattened

array([ 87,  96,  70, 100,  87,  90])

In [106]:
# Modify flattened and observe that grades remains unchanged
flattened[0] = 100
print("Flattened:", flattened)
print("Grades:", grades)

Flattened: [100  96  70 100  87  90]
Grades: [[ 87  96  70]
 [100  87  90]]


In [107]:
# ravel creates a view (shallow copy)
raveled = grades.ravel()
raveled

array([ 87,  96,  70, 100,  87,  90])

In [108]:
# Modify raveled and observe that grades changes too
raveled[0] = 100
print("Raveled:", raveled)
print("Grades:", grades)

Raveled: [100  96  70 100  87  90]
Grades: [[100  96  70]
 [100  87  90]]


#### Transposing Rows and Columns
Use the `.T` attribute to **swap rows and columns** (creates a view).

In [109]:
grades.T

array([[100, 100],
       [ 96,  87],
       [ 70,  90]])

In [110]:
# Original remains unchanged
grades

array([[100,  96,  70],
       [100,  87,  90]])

#### Horizontal and Vertical Stacking
You can **combine arrays** either by adding **columns** (horizontal stack) or **rows** (vertical stack).

In [111]:
grades = np.array([[100, 96, 70], [100, 87, 90]])
grades2 = np.array([[94, 77, 90], [100, 81, 82]])

# Horizontal stacking (adds columns)
np.hstack((grades, grades2))

array([[100,  96,  70,  94,  77,  90],
       [100,  87,  90, 100,  81,  82]])

In [112]:
# Vertical stacking (adds rows)
np.vstack((grades, grades2))

array([[100,  96,  70],
       [100,  87,  90],
       [ 94,  77,  90],
       [100,  81,  82]])

# 7.14 Intro to Data Science: pandas Series and DataFrames
NumPy’s arrays are great for homogeneous numeric data, but data science often involves **mixed types**, **custom indexing**, and **missing values**.  
Pandas provides two powerful data structures for this:

- **Series** → 1D labeled array  
- **DataFrame** → 2D labeled collection (like a spreadsheet)

## 7.14.1 pandas Series
A **Series** is an enhanced one-dimensional array that supports:
- Custom (non-integer) indices  
- Missing data handling  
- Vectorized operations  

Let’s create a simple Series.

In [113]:
import pandas as pd

# Creating a Series with default integer indices
grades = pd.Series([87, 100, 94])
grades

0     87
1    100
2     94
dtype: int64

### Creating a Series with all elements having the same value
You can specify one value and a range of indices.

In [114]:
pd.Series(98.6, range(3))

0    98.6
1    98.6
2    98.6
dtype: float64

### Accessing Elements
You can access elements by index.

In [115]:
grades[0]

np.int64(87)

### Descriptive Statistics
Pandas makes it easy to compute summary statistics.

In [116]:
grades.count(), grades.mean(), grades.min(), grades.max(), grades.std()

(np.int64(3),
 np.float64(93.66666666666667),
 np.int64(87),
 np.int64(100),
 np.float64(6.506407098647712))

In [117]:
# Using describe() for a full summary
grades.describe()

count      3.00
mean      93.67
std        6.51
min       87.00
25%       90.50
50%       94.00
75%       97.00
max      100.00
dtype: float64

### Creating a Series with Custom Indices
You can use strings or other immutable types as indices.

In [118]:
grades = pd.Series([87, 100, 94], index=['Wally', 'Eva', 'Sam'])
grades

Wally     87
Eva      100
Sam       94
dtype: int64

### Dictionary Initializer
When using a dictionary, keys become indices.

In [119]:
grades = pd.Series({'Wally': 87, 'Eva': 100, 'Sam': 94})
grades

Wally     87
Eva      100
Sam       94
dtype: int64

### Accessing Elements by Custom Indices
You can use either:
- Bracket notation (`grades['Eva']`), or  
- Attribute access (`grades.Wally`) if the name is a valid identifier.

In [120]:
grades['Eva'], grades.Wally

(np.int64(100), np.int64(87))

### Series Attributes
You can access the Series’ metadata.

In [121]:
grades.dtype, grades.values

(dtype('int64'), array([ 87, 100,  94]))

### Working with String Data
If a Series contains strings, you can use `.str` to apply string methods elementwise.

In [122]:
hardware = pd.Series(['Hammer', 'Saw', 'Wrench'])
hardware

0    Hammer
1       Saw
2    Wrench
dtype: object

### Example: Checking for lowercase 'a'
The `.str.contains()` method returns a boolean Series.

In [123]:
hardware.str.contains('a')

0     True
1     True
2    False
dtype: bool

### Example: Converting to Uppercase
Use `.str.upper()` to convert all strings in the Series to uppercase.

In [124]:
hardware.str.upper()

0    HAMMER
1       SAW
2    WRENCH
dtype: object

# 7.14.2 DataFrames
A **DataFrame** is an enhanced two-dimensional array that supports:
- Custom row and column indices  
- Missing data handling  
- Mixed data types across columns  

Each column in a DataFrame is a **Series**.  
We can easily load, view, and manipulate tabular data using pandas.

## Creating a DataFrame from a Dictionary
Let's create a DataFrame representing student grades on three tests.

In [125]:
import pandas as pd

grades_dict = {
    'Wally': [87, 96, 70],
    'Eva': [100, 87, 90],
    'Sam': [94, 77, 90],
    'Katie': [100, 81, 82],
    'Bob': [83, 65, 85]
}

grades = pd.DataFrame(grades_dict)
grades

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
0,87,100,94,100,83
1,96,87,77,81,65
2,70,90,90,82,85


By default:
- Row indices are auto-generated integers (0, 1, 2, …)
- Dictionary keys become column names.

## Customizing a DataFrame’s Indices
We can specify custom row labels using the `index` keyword or the `.index` attribute.

In [126]:
grades.index = ['Test1', 'Test2', 'Test3']
grades

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87,100,94,100,83
Test2,96,87,77,81,65
Test3,70,90,90,82,85


## Accessing a DataFrame’s Columns
Access columns by:
- Dictionary-style syntax (`grades['Eva']`), or  
- Attribute-style access (`grades.Sam`) if column names are valid identifiers.

In [127]:
grades['Eva']

Test1    100
Test2     87
Test3     90
Name: Eva, dtype: int64

In [128]:
grades.Sam

Test1    94
Test2    77
Test3    90
Name: Sam, dtype: int64

## Selecting Rows
We can select rows using `.loc` (by label) or `.iloc` (by integer index).

In [129]:
grades.loc['Test1']

Wally     87
Eva      100
Sam       94
Katie    100
Bob       83
Name: Test1, dtype: int64

In [130]:
grades.iloc[1]

Wally    96
Eva      87
Sam      77
Katie    81
Bob      65
Name: Test2, dtype: int64

## Selecting Multiple Rows
Use slicing or lists with `.loc` and `.iloc`.

In [131]:
grades.loc['Test1':'Test3']

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87,100,94,100,83
Test2,96,87,77,81,65
Test3,70,90,90,82,85


In [132]:
grades.iloc[0:2]

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87,100,94,100,83
Test2,96,87,77,81,65


In [133]:
grades.loc[['Test1', 'Test3']]

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87,100,94,100,83
Test3,70,90,90,82,85


In [134]:
grades.iloc[[0, 2]]

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87,100,94,100,83
Test3,70,90,90,82,85


## Selecting Subsets of Rows and Columns
You can specify both rows and columns in a single `.loc` call.

In [135]:
grades.loc['Test1':'Test2', ['Eva', 'Katie']]

Unnamed: 0,Eva,Katie
Test1,100,100
Test2,87,81


In [136]:
grades.iloc[[0, 2], 0:3]

Unnamed: 0,Wally,Eva,Sam
Test1,87,100,94
Test3,70,90,90


## Boolean Indexing
Select elements based on conditions.

In [137]:
grades[grades >= 90]

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,,100.0,94.0,100.0,
Test2,96.0,,,,
Test3,,90.0,90.0,,


In [138]:
grades[(grades >= 80) & (grades < 90)]

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87.0,,,,83.0
Test2,,87.0,,81.0,
Test3,,,,82.0,85.0


## Accessing Specific DataFrame Cells
Use `.at` and `.iat` for fast access to single elements.

In [139]:
grades.at['Test2', 'Eva']

np.int64(87)

In [140]:
grades.iat[2, 0]

np.int64(70)

### Updating Specific Cells

In [141]:
grades.at['Test2', 'Eva'] = 100
grades.at['Test2', 'Eva']

np.int64(100)

In [142]:
grades.iat[1, 2] = 87
grades.iat[1, 2]

np.int64(87)

## Descriptive Statistics
Use `.describe()` to quickly compute summary statistics.

In [143]:
grades.describe()

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
count,3.0,3.0,3.0,3.0,3.0
mean,84.33,96.67,90.33,87.67,77.67
std,13.2,5.77,3.51,10.69,11.02
min,70.0,90.0,87.0,81.0,65.0
25%,78.5,95.0,88.5,81.5,74.0
50%,87.0,100.0,90.0,82.0,83.0
75%,91.5,100.0,92.0,91.0,84.0
max,96.0,100.0,94.0,100.0,85.0


### Controlling Display Precision
We can control numerical precision using `pd.set_option`.

In [144]:
pd.set_option('display.precision', 2)
grades.describe()

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
count,3.0,3.0,3.0,3.0,3.0
mean,84.33,96.67,90.33,87.67,77.67
std,13.2,5.77,3.51,10.69,11.02
min,70.0,90.0,87.0,81.0,65.0
25%,78.5,95.0,88.5,81.5,74.0
50%,87.0,100.0,90.0,82.0,83.0
75%,91.5,100.0,92.0,91.0,84.0
max,96.0,100.0,94.0,100.0,85.0


### Mean of Each Student’s Grades


In [145]:
grades.mean()

Wally    84.33
Eva      96.67
Sam      90.33
Katie    87.67
Bob      77.67
dtype: float64

## Transposing the DataFrame
Use `.T` to flip rows and columns.


In [146]:
grades.T

Unnamed: 0,Test1,Test2,Test3
Wally,87,96,70
Eva,100,100,90
Sam,94,87,90
Katie,100,81,82
Bob,83,65,85


### Descriptive Statistics by Test


In [147]:
grades.T.describe()

Unnamed: 0,Test1,Test2,Test3
count,5.0,5.0,5.0
mean,92.8,85.8,83.4
std,7.66,13.81,8.23
min,83.0,65.0,70.0
25%,87.0,81.0,82.0
50%,94.0,87.0,85.0
75%,100.0,96.0,90.0
max,100.0,100.0,90.0


### Mean Grades by Test


In [148]:
grades.T.mean()

Test1    92.8
Test2    85.8
Test3    83.4
dtype: float64

## Sorting Data
You can sort by row or column indices, or by values.

In [149]:
grades.sort_index(ascending=False)

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test3,70,90,90,82,85
Test2,96,100,87,81,65
Test1,87,100,94,100,83


In [150]:
grades.sort_index(axis=1)

Unnamed: 0,Bob,Eva,Katie,Sam,Wally
Test1,83,100,100,94,87
Test2,65,100,81,87,96
Test3,85,90,82,90,70


### Sorting by Column Values


In [151]:
grades.sort_values(by='Test1', axis=1, ascending=False)

Unnamed: 0,Eva,Katie,Sam,Wally,Bob
Test1,100,100,94,87,83
Test2,100,81,87,96,65
Test3,90,82,90,70,85


### Sorting Transposed Data


In [152]:
grades.T.sort_values(by='Test1', ascending=False)

Unnamed: 0,Test1,Test2,Test3
Eva,100,100,90
Katie,100,81,82
Sam,94,87,90
Wally,87,96,70
Bob,83,65,85


### Sorting Only One Row


In [153]:
grades.loc['Test1'].sort_values(ascending=False)

Eva      100
Katie    100
Sam       94
Wally     87
Bob       83
Name: Test1, dtype: int64

## 7.15 Wrap-Up

This chapter explored the use of NumPy’s high-performance **ndarrays** for storing and retrieving data, and for performing common data manipulations concisely and safely using functional-style programming. Throughout, we referred to *ndarrays* simply as **arrays**.

We demonstrated how to:
- Create, initialize, and reference individual elements of one- and two-dimensional arrays.  
- Use attributes to determine an array’s **size**, **shape**, and **element type**.  
- Create arrays filled with **zeros**, **ones**, specific values, or **ranges**.  
- Compare list and array performance using the IPython `%timeit` magic — showing arrays are often up to **two orders of magnitude faster**.

We also:
- Used array operators and NumPy **universal functions (ufuncs)** for element-wise computations on arrays with matching shapes.  
- Explored **broadcasting**, which allows operations between arrays of different shapes and between arrays and scalars.  
- Performed **aggregate calculations** on entire arrays, as well as along specific rows or columns.  
- Demonstrated **slicing** and **indexing** methods that go beyond Python’s built-in collection capabilities.  
- Showed how to **reshape arrays** using `reshape`, `resize`, `flatten`, and `ravel`.  
- Discussed **shallow** and **deep copies** in NumPy and general Python objects.

In the **Intro to Data Science** section, we began exploring the **pandas** library — a core tool for handling large, mixed-type, and irregularly structured data. You learned that:
- **NumPy arrays** are efficient but limited to homogeneous numeric data.  
- **pandas Series** (1D) and **DataFrames** (2D) support mixed data types, custom indexing, and missing data.  
- Both are **built upon NumPy arrays**, combining speed with flexibility.

We covered:
- Creating and manipulating **Series** and **DataFrames**.  
- Customizing **indices** for readability and context.  
- pandas’ **formatted display** and **precision control** for floats.  
- Accessing and selecting subsets of data.  
- Using `.describe()` to generate summary **descriptive statistics**.  
- **Transposing** rows and columns with the `.T` attribute.  
- Sorting **DataFrames** by indices, column names, or specific data values.

You are now familiar with four key array-like data structures in Python:
1. **Lists** – General-purpose, heterogeneous, built-in.  
2. **NumPy arrays** – Homogeneous, high-performance numeric computation.  
3. **pandas Series** – Labeled, one-dimensional, flexible.  
4. **pandas DataFrames** – Two-dimensional, labeled, powerful data manipulation.

In later chapters, we will add a fifth: **tensors**, in the *Deep Learning* section.


Next, we’ll move into **Chapter 8**, which dives deeper into:
- Strings and their built-in methods  
- String formatting  
- **Regular expressions (regex)** for pattern matching in text  

These capabilities are foundational for **Natural Language Processing (NLP)** and for cleaning and transforming data in preparation for analysis.

In [4]:
import numpy as np
np.arange(1,12).reshape(1,11)

array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11]])