# Why Indexing Matters


**Indexing** is like pointing at specific rows or columns in your data. In real engineering datasets, you rarely use all the data at once — you often need to select a portion (e.g., a certain time window, or specific sensor signals).

Today, we will practice how to do this with both `NumPy arrays` and `Pandas DataFrames`.

## Import Packages

In [None]:
import numpy as np
import pandas as pd

# **1. Array (NumPy)**

A NumPy array is a powerful data structure for numerical computing in Python.

### **1-1. Creating Array Data**

#### **1-1-1. Defining Array Data by Assigning Values**

**One-dimensional data (vector)**

In [None]:
array_1d = np.array([0, 1, 2])
array_1d

Unlike a regular Python `list`, NumPy `arrays` are optimized for mathematical operations.


- The biggest advantage: operations are applied to all elements at once (vectorization), which is very efficient in data analysis

In [None]:
# List (basic data structures in Python)
a = [1,2,3]
b = [3,3,3]

# List addition means concatenation (joining two lists).
a+b

In [None]:
# Array (NumPy)
a = np.array([1,2,3]) # convert 'list' data into 'array' data
b = np.array([3,3,3])

# Array addition means element-wise arithmetic.
a+b

**Two-dimensional data (matrix)**

A 2D array is like a table with rows and columns.

- Each row is a list of values.

- Arrays can be indexed using [row, column].

In [None]:
array_2d = np.array([[10,11,12],[20,21,22]])
array_2d

**Three-dimensional data (tensor)**

A 3D array is like stacking multiple 2D tables on top of each other.

- Often used to represent data with height × width × depth (e.g., images or sensor channels).

In [None]:
array_3d = np.array([ [[10,11,12],[20,21,22]] , [[100,111,112],[120,121,122]] ])
array_3d

.

.

#### **1-1-2. Generating Array Data Filled with 0 or 1**

Sometimes, instead of manually defining values, we want to quickly create arrays filled with a single number.

- `np.zeros(shape)` → creates an array filled with 0

- `np.ones(shape)` → creates an array filled with 1

In [None]:
ary_01 = np.zeros(5)
ary_01

In [None]:
ary_02 = np.zeros((2,3))
ary_02

In [None]:
ary_11 = np.ones(5)
ary_11

In [None]:
ary_12 = np.ones((2,3))
ary_12

💡 **Why is this useful?**

- In Machine Learning / Deep Learning, labels are often represented as arrays of 0s and 1s (e.g., binary classification).

- In data preprocessing, you might want to “reserve space” like an empty house and later fill it with actual values.

- It is also useful for initializing arrays before running simulations or numerical computations.

.

.

#### **1-1-3. Generating Array Data by Designating a Range**

Often, we need arrays that contain numbers within a specific range, for example, time steps or sensor readings sampled at regular intervals.

`np.arange(a)` : numbers from `0` up to (but not including) `a`

In [None]:
ary_range_1 = np.arange(5)
ary_range_1

`np.arange(a,b)` : numbers from `a` up to (but not including) `b`

In [None]:
ary_range_2 = np.arange(1,10)
ary_range_2

`np.arange(a,b,c)` : numbers from `a` up to (but not including) `b`, with step size `c`

In [None]:
ary_range_3 = np.arange(1,10,0.5)
ary_range_3

**Range Definition of Basic Python**

- Python also provides a built-in function `range()`.

- However, it **does not create an array**. It produces a *range object* that is mainly used for iteration in loops.

`range(a)` : **range** from `0` up to (but not including) `a`

In [None]:
range_1 = range(5)
range_1

`range(a,b)` : **range** from `a` up to (but not including) `a`

In [None]:
range_2 = range(1,10)
range_2

💡 Key difference:

- `np.arange()` → creates a **NumPy array** (ready for math operations).

- `range()` → creates a Python object, usually converted into a list, and used for loops.

👉 In data analysis and ML tasks, we prefer np.arange, because arrays can be directly used for computation.

.

.

.

### **1-2. Array Data Indexing**

Indexing allows us to select specific elements or subsets of data from an array.

This is important because in real datasets (e.g., sensor data), we rarely use the entire dataset at once — we often need only a specific row, column, or time window.

#### **1-2-1. Extract a value from a specific position**

- Arrays use **zero-based indexing** (the first element has index 0).

- For 1D arrays (vectors), we can access elements by their index.

In [None]:
ary_1d = np.arange(1,10)
ary_1d

In [None]:
ary_1d[0]      # first element → 1

- For 2D arrays (matrices), we use `[row][column]` or `[row, column]`.

In [None]:
ary_2d = np.array([[1,2,3] , [4,5,6] , [7,8,9]])
ary_2d

In [None]:
ary_2d[0][0]   # first row, first column → 1

In [None]:
ary_2d[2,1]   # third row, second column → 8

#### **1-2-2. Extract values from specific range**

- We can use slicing: `[start:stop]`

- Remember that the `stop` index is **exclusive** (not included).

- In 2D arrays, we can slice rows or columns.

In [None]:
ary_1d[:]       # all elements

In [None]:
ary_1d[2:5]     # elements at positions 2,3,4

In [None]:
ary_2d[1][0:2]  # from 2nd row, take columns 0 and 1 → [4,5]

In [None]:
ary_2d[0:2, 1:3] # sub-matrix → [[2,3],[5,6]]

**Additional Techniques**

- Step slicing

In [None]:
ary_1d[::2]      # every second element → [1,3,5,7,9]

- Negative indexing

In [None]:
print(ary_1d[-1])       # last element → 9
print(ary_2d[-1,-1])    # last row, last column → 9

- Boolean indexing

In [None]:
mask = ary_1d > 5
ary_1d[mask]     # extract only elements greater than 5

- Mutiple indexing

In [None]:
ary_1d[[0,2,4]]  # select elements at indices 0,2,4 → [1,3,5]

.

.

.

# **Mini Quiz: NumPy Arrays**

Try to write the code by yourself first before checking with your neighbor or asking for help.

**Q1. Create and access elements**

- Create a 1D array containing numbers from 1 to 10.

- Print the first element and the last element.

👉 Hint: Use `np.arange()` and remember Python indexing starts at 0.

In [None]:
# Complete the code
arr =
FristElement =
LastElement  =

arr, FristElement, LastElement

</details> <details> <summary>Click to see Answer Q1</summary>

```python
arr = np.arange(1, 11)   
FristElement = arr[0]
LastElement  = arr[-1]

arr, FristElement, LastElement

**Q2. Extract a range**

- From the same array, extract elements from index 2 up to 6.

- What numbers do you get?

👉 Hint: Use slicing `[start:stop]` (the stop index is exclusive).

In [None]:
# Complete the code
part =

part

</details> <details> <summary>Click to see Answer Q2</summary>

```python
part = arr[2:6]
part

**Q3. 2D array indexing**

- Print the element in the second row, third column.

- Extract the first two rows and last two columns as a sub-matrix.

In [None]:
# Complete the code
arr_2d = np.array([[1,2,3] , [4,5,6] , [7,8,9]])

val     =
sub_mat =

val, sub_mat

</details> <details> <summary>Click to see Answer Q3</summary>

```python
arr_2d = np.array([[1,2,3] , [4,5,6] , [7,8,9]])

val = A[1, 2]    
sub_mat = A[0:2, 1:3]    

val, sub_mat

.

.

.

# **2. Data Frame (Pandas)**

`DataFrame` is a powerful 2D data structure provided by Pandas, designed for working with tabular data (like an Excel spreadsheet).

- `NumPy arrays` are great for numerical operations, but they only have positions (e.g. row/column index).

- `DataFrames` add **labels** (row index and column names), making the data much easier to read and manipulate.

- `DataFrames` also provide many convenient functions for data saving/loading (as .csv format), analysis, and visualization.

- While `NumPy arrays` can freely handle multi-dimensional tensors (3D or higher), Pandas `DataFrames` are limited to 2D tabular data structures and are therefore mainly used for statistical analysis and general data handling tasks.

👉 Think of it this way:

- NumPy Array → like raw sensor signals stored in memory.

- Pandas DataFrame → like a well-organized spreadsheet with row/column names.

### **2-1. Creation of Data Frame**
* Generally, DataFrame is defined by converting other data format

**1)  Defining manually with values**

In [None]:
df_1 = pd.DataFrame([11,12,13])
df_1

**2) Defining with multiple columns and labels**

In [None]:
df_2 = pd.DataFrame({
    "A":['a','b','c'] ,
    "B":[21,22,23] ,
    "C":[31,32,33]})

df_2

**3) Converting from a Numpy Array**

In [None]:
ary_2d

In [None]:
df_3 = pd.DataFrame(ary_2d)
df_3

Add column names to NumPy-converted DataFrame

In [None]:
df_3 = pd.DataFrame(ary_2d, columns=["X","Y","Z"])
df_3

### **2-2. Data Frame Indexing**

- With **NumPy**: indexing is always by position.

- With **Pandas**: you can use either position (`.iloc`) or label (`.loc`).

Example (position-based):

In [None]:
df_3.iloc[:2, 0:3]   # first two rows, first three columns

Later, when we add column names, we can use label-based indexing:

In [None]:
df_3.loc[:, ["X","Z"]]   # select columns X and Z

In [None]:
df_2["B"]        # returns one column

Adding a new column

In [None]:
df_2["D"] = df_2["B"] + df_2["C"]
df_2

💡 Key takeaway for students:

- DataFrames are more expressive than arrays, since they carry labels.

- You can still use positions like arrays, but labels make your code more readable.

- Most real-world datasets (CSV, Excel, SQL) are loaded into DataFrames, not arrays.

.

.

.

# **Mini Quiz: Pandas DataFrame**

Try these short exercises to get familiar with DataFrame basics.

**Q1. Create and access columns**

- Create a DataFrame with the following data:

```
Name:   ["Alice","Bob","Charlie"]
Age:    [24, 30, 22]
Score:  [85, 90, 95]
```
- Print only the Age column.

👉 Hint: Use `pd.DataFrame()` and column selection with `df["column"]`.

In [None]:
# Complete the code

df =

print(  )

df[ ]

</details> <details> <summary>Click to see Answer Q1</summary>

```python
df = pd.DataFrame({
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [24, 30, 22],
    "Score": [85, 90, 95]
})
print(df)


df["Age"]

**Q2. Indexing by position**

- From the DataFrame `df_2` (defined earlier with columns A, B, C),
extract the first two rows and columns A and B.

👉 Hint: Use `.iloc[row_slice, col_slice]`.

In [None]:
# Write your code




</details> <details> <summary>Click to see Answer Q2</summary>

```python
# From df_2 created earlier
df_2 = pd.DataFrame({
    "A": ['a','b','c'],
    "B": [21,22,23],
    "C": [31,32,33]
})

# First two rows and columns A and B
df_2.iloc[:2, 0:2]

.

.

.

# Summary of Data Analysis DA1_Code2

In this lab, you practiced the fundamentals of working with **NumPy arrays** and **Pandas DataFrames** in Python.  
You learned how to create, manipulate, and index these data structures for data analysis tasks.

---

🔹 **What you learned:**

1. **NumPy Arrays**  
   - Create arrays from lists and generate arrays using functions like `zeros()`, `ones()`, and `arange()`  
   - Understand differences between Python lists and NumPy arrays in mathematical operations  
   - Perform indexing and slicing (single values, ranges, steps, negative indexing, boolean masks)  

2. **Array Data Indexing**  
   - Extract specific elements by position  
   - Select sub-ranges and sub-matrices from multidimensional arrays  

3. **Pandas DataFrames**  
   - Create DataFrames manually with labeled columns  
   - Convert NumPy arrays into DataFrames for tabular representation  
   - Index rows and columns using `.iloc` (position-based) and `.loc` (label-based)  

4. **Comparing Arrays and DataFrames**  
   - Arrays can handle higher-dimensional tensors (3D, 4D, …)  
   - DataFrames are limited to 2D tabular data, but provide labels and functions useful for statistics and real-world datasets  

---

💡 **Key Takeaway**  
By completing this lab, you now know how to:  
- Efficiently use NumPy arrays for numerical operations  
- Apply indexing and slicing to extract meaningful subsets of data  
- Work with Pandas DataFrames for labeled tabular data analysis  
- Choose between arrays and DataFrames depending on the data structure and task