# List Comprehension and Array Slicing Guide for Data Extraction

## Overview

When working with datasets (lists, NumPy arrays, or pandas DataFrames), you often need to extract specific columns or elements. This guide covers the most common approaches and when to use each.

## Method 1: NumPy Array Slicing (Recommended for 2D Arrays)

### What It Is
NumPy slicing uses the colon notation `[rows, columns]` to extract data from multi-dimensional arrays efficiently.

### Syntax
```python
array[:, column_index]  # Extract entire column
array[row_index, :]     # Extract entire row
array[start:end, col]   # Extract range of rows from a column
```

### Example: 2D Array (like your ballistics data)

In [1]:
import numpy as np

# 2D array: rows are data points, columns are [mach_number, Kd_value]
G1_data = np.array([
    [0.00, 0.2629],
    [0.05, 0.2558],
    [0.10, 0.2487],
    # ... more rows
])

# Extract first column (Mach numbers)
mach_numbers = G1_data[:, 0]  # Returns [0.00, 0.05, 0.10, ...]
print("Mach numbers:", mach_numbers)

# Extract second column (Kd values)
kd_values = G1_data[:, 1]     # Returns [0.2629, 0.2558, 0.2487, ...]
print("Kd values:", kd_values)

Mach numbers: [0.   0.05 0.1 ]
Kd values: [0.2629 0.2558 0.2487]


### Use with scipy interpolation (your use case)

In [2]:
from scipy.interpolate import interp1d

# Use with scipy interpolation
G1 = interp1d(G1_data[:, 0], G1_data[:, 1])
print("Interpolation function created successfully")

Interpolation function created successfully


### Why Use This?
- **Fast**: NumPy slicing is optimized for numerical operations
- **Clean**: One-line syntax
- **Type-safe**: Returns NumPy arrays with consistent data types
- **Perfect for**: Scientific computing, interpolation, mathematical operations

## Method 2: List Comprehension (For Lists or Custom Processing)

### What It Is
A Pythonic way to create a new list by extracting or transforming elements from an existing list.

### Syntax
```python
[element for element in iterable]                    # Basic extraction
[row[index] for row in data]                         # Extract column from list of lists
[transform(element) for element in data]             # Extract and transform
[element for element in data if condition]           # Extract with filtering
```

### Example: Extracting Columns with List Comprehension

In [3]:
# If your data is a list of lists
G1_data_list = [
    [0.00, 0.2629],
    [0.05, 0.2558],
    [0.10, 0.2487],
]

# Extract Mach numbers
mach_numbers = [row[0] for row in G1_data_list]
print("Mach numbers:", mach_numbers)

# Extract Kd values
kd_values = [row[1] for row in G1_data_list]
print("Kd values:", kd_values)

Mach numbers: [0.0, 0.05, 0.1]
Kd values: [0.2629, 0.2558, 0.2487]


### Example: Extract with Filtering

In [4]:
# Only extract Mach numbers greater than 0.5
high_mach = [row[0] for row in G1_data_list if row[0] > 0.05]
print("High Mach:", high_mach)

# Only extract rows where Kd > 0.25
filtered_kd = [row[1] for row in G1_data_list if row[1] > 0.25]
print("Filtered Kd:", filtered_kd)

High Mach: [0.1]
Filtered Kd: [0.2629, 0.2558]


### Example: Extract and Transform

In [5]:
# Extract Mach numbers and convert to percentage
mach_percentages = [row[0] * 100 for row in G1_data_list]
print("Mach percentages:", mach_percentages)

# Apply the Kd to Cd conversion
cd_values = [np.pi/4 * row[1] for row in G1_data_list]
print("Cd values:", cd_values)

Mach percentages: [0.0, 5.0, 10.0]
Cd values: [0.20648117715718917, 0.2009048501970673, 0.19532852323694538]


### Why Use This?
- **Flexible**: Can apply transformations while extracting
- **Readable**: More explicit about what you're doing
- **Pythonic**: Idiomatic Python approach
- **Filtering**: Can add conditions easily
- **Perfect for**: When you need to transform data during extraction, or working with Python lists

## Method 3: Pandas DataFrame Extraction

### What It Is
If your data is in a pandas DataFrame, you can extract columns by name.

### Syntax
```python
df['column_name']           # Get single column
df[['col1', 'col2']]        # Get multiple columns
df.iloc[:, 0]               # Get by index (like NumPy)
df.loc[:, 'column_name']    # Get by name
```

### Example

In [6]:
import pandas as pd

# Create a sample DataFrame (or load from CSV)
G1_df = pd.DataFrame({
    'mach': [0.00, 0.05, 0.10],
    'Kd': [0.2629, 0.2558, 0.2487]
})

# Extract columns by name
mach_numbers = G1_df['mach'].values  # Convert to NumPy array
kd_values = G1_df['Kd'].values
print("Mach numbers:", mach_numbers)
print("Kd values:", kd_values)

# Or keep as pandas Series
mach_series = G1_df['mach']  # Returns pandas Series
print("\nMach series:", mach_series.values)

Mach numbers: [0.   0.05 0.1 ]
Kd values: [0.2629 0.2558 0.2487]

Mach series: [0.   0.05 0.1 ]


### Why Use This?
- **Labeled data**: Column names are more meaningful
- **Built-in methods**: Pandas has many built-in operations
- **Perfect for**: CSV data, labeled datasets, data analysis

## Comparison Table

| Method | Best For | Speed | Readability | Flexibility |
|--------|----------|-------|-------------|-------------|
| NumPy Slicing | 2D/3D numerical arrays | ‚ö° Fast | ‚úì Simple | Limited |
| List Comprehension | Python lists, transformations | üê¢ Slower | ‚úì Clear | ‚úì‚úì High |
| Pandas | CSV/labeled data | ‚ö° Fast | ‚úì Best | ‚úì‚úì High |

## Working with Different Dimensional Data

### 1D Data (Single List)

In [7]:
# NumPy array
data_1d = np.array([1, 2, 3, 4, 5])
first = data_1d[0]
subset = data_1d[0:3]  # [1, 2, 3]
print("First element:", first)
print("Subset:", subset)

# List comprehension
squared = [x**2 for x in data_1d]  # [1, 4, 9, 16, 25]
print("Squared:", squared)

First element: 1
Subset: [1 2 3]
Squared: [np.int64(1), np.int64(4), np.int64(9), np.int64(16), np.int64(25)]


### 2D Data (Rows √ó Columns) - Your Ballistics Case

In [8]:
# NumPy array
data_2d = np.array([
    [0.00, 0.2629],
    [0.05, 0.2558],
    [0.10, 0.2487],
])

column_0 = data_2d[:, 0]    # All rows, column 0
column_1 = data_2d[:, 1]    # All rows, column 1
row_0 = data_2d[0, :]       # Row 0, all columns
subset = data_2d[0:2, :]    # First 2 rows, all columns

print("Column 0:", column_0)
print("Column 1:", column_1)
print("Row 0:", row_0)

# List comprehension
col_0 = [row[0] for row in data_2d]  # Extract column 0
col_1 = [row[1] for row in data_2d]  # Extract column 1
print("\nList comp column 0:", col_0)
print("List comp column 1:", col_1)

Column 0: [0.   0.05 0.1 ]
Column 1: [0.2629 0.2558 0.2487]
Row 0: [0.     0.2629]

List comp column 0: [np.float64(0.0), np.float64(0.05), np.float64(0.1)]
List comp column 1: [np.float64(0.2629), np.float64(0.2558), np.float64(0.2487)]


### 3D Data (Depth √ó Rows √ó Columns)

In [9]:
# NumPy array: 2 datasets, 3 rows each, 2 columns
data_3d = np.array([
    [[0.00, 0.2629], [0.05, 0.2558], [0.10, 0.2487]],
    [[0.00, 0.1198], [0.05, 0.1197], [0.10, 0.1196]],
])

first_dataset = data_3d[0, :, :]    # First 2D array
second_col_first_dataset = data_3d[0, :, 1]  # Column 1 from first dataset

print("First dataset shape:", first_dataset.shape)
print("Second column of first dataset:", second_col_first_dataset)

# List comprehension for 3D
all_first_cols = [dataset[:, 0] for dataset in data_3d]
print("\nAll first columns:", len(all_first_cols), "datasets")
for i, col in enumerate(all_first_cols):
    print(f"  Dataset {i}:", col)

First dataset shape: (3, 2)
Second column of first dataset: [0.2629 0.2558 0.2487]

All first columns: 2 datasets
  Dataset 0: [0.   0.05 0.1 ]
  Dataset 1: [0.   0.05 0.1 ]


## Your Ballistics Code Breakdown

### Current implementation (NumPy slicing - BEST for this use case)

In [10]:
# Your current implementation
G1_df = pd.DataFrame({
    'mach': [0.00, 0.05, 0.10],
    'Kd': [0.2629, 0.2558, 0.2487]
})

G1_data = G1_df.values  # Convert DataFrame to NumPy array (2D)
print("G1_data shape:", G1_data.shape)
print("G1_data type:", type(G1_data))

# This extracts the entire first column (all Mach numbers)
mach_column = G1_data[:, 0]
print("\nMach column:", mach_column)

# This extracts the entire second column (all Kd values)
kd_column = G1_data[:, 1]
print("Kd column:", kd_column)

# Pass to interpolation function
from scipy.interpolate import interp1d
G1 = interp1d(G1_data[:, 0], G1_data[:, 1])
print("\n‚úì Interpolation function created successfully")
print("‚úì Optimal because:")
print("  - Fast execution (NumPy operations)")
print("  - Clean, one-line syntax")
print("  - Perfect for scipy functions")
print("  - Handles large datasets efficiently")

G1_data shape: (3, 2)
G1_data type: <class 'numpy.ndarray'>

Mach column: [0.   0.05 0.1 ]
Kd column: [0.2629 0.2558 0.2487]

‚úì Interpolation function created successfully
‚úì Optimal because:
  - Fast execution (NumPy operations)
  - Clean, one-line syntax
  - Perfect for scipy functions
  - Handles large datasets efficiently


### Alternative with List Comprehension (if needed)

In [11]:
# Using list comprehension instead
mach_values = [row[0] for row in G1_data]
kd_values = [row[1] for row in G1_data]

print("Mach values (list):", mach_values)
print("Kd values (list):", kd_values)

# Convert back to arrays for scipy
G1_alt = interp1d(np.array(mach_values), np.array(kd_values))
print("\n‚úó Less efficient but more explicit")
print("  - Extra conversion steps")
print("  - Slower for large datasets")
print("  - But more readable for beginners")

Mach values (list): [np.float64(0.0), np.float64(0.05), np.float64(0.1)]
Kd values (list): [np.float64(0.2629), np.float64(0.2558), np.float64(0.2487)]

‚úó Less efficient but more explicit
  - Extra conversion steps
  - Slower for large datasets
  - But more readable for beginners


## Quick Reference Cheat Sheet

In [12]:
print("=== SCENARIO 1: NumPy array (2D) - FASTEST ===")
data = np.array([[0.0, 0.26], [0.05, 0.25], [0.1, 0.25]])
col0 = data[:, 0]      # NumPy slicing
col1 = data[:, 1]
print("Column 0:", col0)
print("Column 1:", col1)

print("\n=== SCENARIO 2: Python list of lists - FLEXIBLE ===")
data = [[0.0, 0.26], [0.05, 0.25], [0.1, 0.25]]
col0 = [row[0] for row in data]  # List comprehension
col1 = [row[1] for row in data]
print("Column 0:", col0)
print("Column 1:", col1)

print("\n=== SCENARIO 3: Pandas DataFrame - LABELED ===")
data = pd.DataFrame({'mach': [0.0, 0.05, 0.1], 'kd': [0.26, 0.25, 0.25]})
col0 = data['mach'].values
col1 = data['kd'].values
print("Column 0:", col0)
print("Column 1:", col1)

print("\n=== SCENARIO 4: With filtering ===")
data = np.array([[0.0, 0.26], [0.5, 0.25], [0.1, 0.25]])
filtered = [row[0] for row in data if row[0] > 0.1]  # [0.5]
print("Filtered values (> 0.1):", filtered)

=== SCENARIO 1: NumPy array (2D) - FASTEST ===
Column 0: [0.   0.05 0.1 ]
Column 1: [0.26 0.25 0.25]

=== SCENARIO 2: Python list of lists - FLEXIBLE ===
Column 0: [0.0, 0.05, 0.1]
Column 1: [0.26, 0.25, 0.25]

=== SCENARIO 3: Pandas DataFrame - LABELED ===
Column 0: [0.   0.05 0.1 ]
Column 1: [0.26 0.25 0.25]

=== SCENARIO 4: With filtering ===
Filtered values (> 0.1): [np.float64(0.5)]


## Practice Problems

### Problem 1: Extract and Convert
Given: 2D array of velocity (m/s) and acceleration (m/s¬≤)
Goal: Extract velocities and convert to km/h

In [13]:
# Your code here
data = np.array([[10, 2], [20, 3], [30, 4]])
# Expected result: [36, 72, 108]

# Solution:
velocities_kmh = [row[0] * 3.6 for row in data]
print("Velocities in km/h:", velocities_kmh)

Velocities in km/h: [np.float64(36.0), np.float64(72.0), np.float64(108.0)]


### Problem 2: Filter Then Extract
Given: 2D array of temperature (C) and pressure (Pa)
Goal: Extract only temperatures above 25¬∞C

In [14]:
# Your code here
data = np.array([[20, 101325], [25, 102000], [30, 103000]])
# Expected result: [103000]

# Solution:
high_temps = [row[1] for row in data if row[0] > 25]
print("Pressures at temperatures > 25C:", high_temps)

Pressures at temperatures > 25C: [np.int64(103000)]


### Problem 3: 3D Dataset
Given: Multiple experiment runs with (time, position, velocity)
Goal: Extract all positions from first experiment

In [15]:
# Your code here
data = np.array([
    [[0, 0, 0], [1, 5, 10], [2, 10, 15]],  # Experiment 1
    [[0, 0, 0], [1, 6, 12], [2, 12, 18]],  # Experiment 2
])
# Expected result: [0, 5, 10]

# Solution:
positions = data[0, :, 1]  # First experiment, all rows, column 1 (positions)
print("Positions from experiment 1:", positions)

Positions from experiment 1: [ 0  5 10]


## Key Takeaways

1. **NumPy Slicing** (`array[:, 0]`) is fastest for scientific data
2. **List Comprehension** (`[row[0] for row in data]`) is more flexible and Pythonic
3. **Pandas** is best when data has labeled columns
4. **Choose based on**: your data type, need for transformation, and performance requirements
5. **For ballistics/interpolation**: NumPy slicing is optimal ‚úì