# Array Attributes and Data Types

**Module 01 | Notebook 03**

---

## Objective
By the end of this notebook, you will understand:
- All essential array attributes
- NumPy data types (dtypes) in depth
- Type conversion and casting
- Memory layout and strides
- Structured arrays basics

In [64]:
import numpy as np
np.set_printoptions(precision=3)

---
## 1. Essential Array Attributes

In [65]:
# Create a sample 3D array
arr = np.arange(24).reshape(2, 3, 4)
print(f"Array:\n{arr}")

Array:
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]


In [66]:
# Shape - dimensions as tuple
print(f"shape: {arr.shape}")  # (2, 3, 4)

# Number of dimensions (axes)
print(f"ndim: {arr.ndim}")  # 3

# Total number of elements
print(f"size: {arr.size}")  # 24

# Data type
print(f"dtype: {arr.dtype}")  # int64 or int32

# Bytes per element
print(f"itemsize: {arr.itemsize}")  # 8 or 4

# Total bytes consumed
print(f"nbytes: {arr.nbytes}")  # 192 or 96

shape: (2, 3, 4)
ndim: 3
size: 24
dtype: int64
itemsize: 8
nbytes: 192


In [67]:
# Strides - bytes to step in each dimension
print(f"strides: {arr.strides}")
# For shape (2, 3, 4) with 8-byte elements:
# strides = (96, 32, 8) meaning:
# - Move 96 bytes to next "block" (axis 0)
# - Move 32 bytes to next "row" (axis 1)
# - Move 8 bytes to next element (axis 2)

strides: (96, 32, 8)


In [68]:
# Data buffer pointer
print(f"data: {arr.data}")

# Flags - memory layout information
print(f"flags:\n{arr.flags}")

data: <memory at 0x0000023F62C35120>
flags:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False



### Understanding Flags

In [69]:
arr = np.arange(12).reshape(3, 4)

print("C_CONTIGUOUS:", arr.flags['C_CONTIGUOUS'])  # Row-major (C-style)
print("F_CONTIGUOUS:", arr.flags['F_CONTIGUOUS'])  # Column-major (Fortran-style)
print("OWNDATA:", arr.flags['OWNDATA'])  # Owns the memory
print("WRITEABLE:", arr.flags['WRITEABLE'])  # Can modify
print("ALIGNED:", arr.flags['ALIGNED'])  # Properly aligned in memory

C_CONTIGUOUS: True
F_CONTIGUOUS: False
OWNDATA: False
WRITEABLE: True
ALIGNED: True


In [70]:
# View does not own data
view = arr[::2]
print(f"View OWNDATA: {view.flags['OWNDATA']}")

# Copy owns data
copy = arr.copy()
print(f"Copy OWNDATA: {copy.flags['OWNDATA']}")

View OWNDATA: False
Copy OWNDATA: True


---
## 2. NumPy Data Types (dtypes)

### Numeric Types Overview

| Category | Types | Description |
|----------|-------|-------------|
| Boolean | `bool_` | True/False (1 byte) |
| Integer | `int8`, `int16`, `int32`, `int64` | Signed integers |
| Unsigned | `uint8`, `uint16`, `uint32`, `uint64` | Unsigned integers |
| Float | `float16`, `float32`, `float64` | Floating point |
| Complex | `complex64`, `complex128` | Complex numbers |
| String | `str_`, `bytes_` | Fixed-length strings |

In [71]:
# Integer types and their ranges
for dtype in [np.int8, np.int16, np.int32, np.int64]:
    info = np.iinfo(dtype)
    print(f"{dtype.__name__:8} : {info.min:25} to {info.max}")

int8     :                      -128 to 127
int16    :                    -32768 to 32767
int32    :               -2147483648 to 2147483647
int64    :      -9223372036854775808 to 9223372036854775807


In [72]:
# Unsigned integer types
for dtype in [np.uint8, np.uint16, np.uint32, np.uint64]:
    info = np.iinfo(dtype)
    print(f"{dtype.__name__:8} : {info.min} to {info.max}")

uint8    : 0 to 255
uint16   : 0 to 65535
uint32   : 0 to 4294967295
uint64   : 0 to 18446744073709551615


In [73]:
# Float types
for dtype in [np.float16, np.float32, np.float64]:
    info = np.finfo(dtype)
    print(f"{dtype.__name__:10} : precision={info.precision}, range=[{info.min:.2e}, {info.max:.2e}]")

float16    : precision=3, range=[-6.55e+04, 6.55e+04]
float32    : precision=6, range=[-3.40e+38, 3.40e+38]
float64    : precision=15, range=[-1.80e+308, 1.80e+308]


### Specifying dtypes

In [74]:
# Method 1: Using numpy type objects
arr1 = np.array([1, 2, 3], dtype=np.float32)
print(f"Using np.float32: {arr1.dtype}")

# Method 2: Using string specifiers
arr2 = np.array([1, 2, 3], dtype='float32')
print(f"Using 'float32': {arr2.dtype}")

# Method 3: Using single-character codes
arr3 = np.array([1, 2, 3], dtype='f')  # 'f' = float32
print(f"Using 'f': {arr3.dtype}")

Using np.float32: float32
Using 'float32': float32
Using 'f': float32


In [75]:
# Common single-character codes:
codes = {
    'b': 'int8',
    'B': 'uint8',
    'i': 'int32',
    'l': 'int64',
    'f': 'float32',
    'd': 'float64',
    'c': 'complex64',
    '?': 'bool'
}

for code, name in codes.items():
    arr = np.array([1], dtype=code)
    print(f"'{code}' -> {arr.dtype}")

'b' -> int8
'B' -> uint8
'i' -> int32
'l' -> int32
'f' -> float32
'd' -> float64
'c' -> |S1
'?' -> bool


---
## 3. Type Conversion (Casting)

In [76]:
# Using astype() - creates a copy with new dtype
int_arr = np.array([1, 2, 3, 4, 5])
print(f"Original: {int_arr}, dtype: {int_arr.dtype}")

float_arr = int_arr.astype(np.float64)
print(f"Float: {float_arr}, dtype: {float_arr.dtype}")

Original: [1 2 3 4 5], dtype: int64
Float: [1. 2. 3. 4. 5.], dtype: float64


In [77]:
# Float to int - truncates decimals
float_arr = np.array([1.7, 2.3, 3.9, 4.1])
int_arr = float_arr.astype(np.int32)
print(f"Float: {float_arr}")
print(f"Int (truncated): {int_arr}")  # [1, 2, 3, 4]

Float: [1.7 2.3 3.9 4.1]
Int (truncated): [1 2 3 4]


In [78]:
# String to numeric
str_arr = np.array(['1.5', '2.7', '3.9'])
num_arr = str_arr.astype(np.float64)
print(f"String: {str_arr}")
print(f"Numeric: {num_arr}")

String: ['1.5' '2.7' '3.9']
Numeric: [1.5 2.7 3.9]


In [79]:
# Numeric to string
num_arr = np.array([1, 2, 3])
str_arr = num_arr.astype(str)
print(f"Numeric: {num_arr}")
print(f"String: {str_arr}, dtype: {str_arr.dtype}")

Numeric: [1 2 3]
String: ['1' '2' '3'], dtype: <U21


### Type Promotion (Upcasting)

In [80]:
# NumPy automatically upcasts to prevent data loss
int_arr = np.array([1, 2, 3], dtype=np.int32)
float_arr = np.array([0.5, 0.5, 0.5], dtype=np.float64)

result = int_arr + float_arr
print(f"int32 + float64 = {result.dtype}")  # float64

int32 + float64 = float64


In [81]:
# Find common type
common_type = np.result_type(np.int32, np.float64)
print(f"Common type of int32 and float64: {common_type}")

# Promote types
promoted = np.promote_types(np.int16, np.float32)
print(f"Promoted type: {promoted}")

Common type of int32 and float64: float64
Promoted type: float32


### Overflow Behavior

In [82]:
# Integer overflow wraps around (no error by default!)
arr = np.array([127], dtype=np.int8)
print(f"int8 max: {arr}")
arr_overflow = arr + 1
print(f"int8 max + 1: {arr_overflow}")  # -128 (wraps around!)

int8 max: [127]
int8 max + 1: [-128]


In [83]:
# Method 1: Manual Range Check (Safest)
val = 300
target_dtype = np.uint8
info = np.iinfo(target_dtype)

# Standard range check
is_safe = info.min <= val <= info.max

print(f"Can cast {val} to {target_dtype.__name__}? {is_safe}")
print(f"uint8 max: {info.max}")
# Use code with caution.

Can cast 300 to uint8? False
uint8 max: 255


In [84]:
# Method 2: Convert to NumPy Scalar first
# Convert to a large enough NumPy type first
val_np = np.int64(300) 
print(f"Can cast? {np.can_cast(val_np, np.uint8)}") # Returns False because int64 -> uint8 is unsafe


Can cast? False


---
## 4. Memory Layout: C vs Fortran Order

In [85]:
# C-order (row-major) - default
c_arr = np.array([[1, 2, 3], [4, 5, 6]], order='C')
print(f"C-order array:\n{c_arr}")
print(f"Strides: {c_arr.strides}")  # (24, 8) - rows are contiguous
print(f"Is C-contiguous: {c_arr.flags['C_CONTIGUOUS']}")
print(f"Is F-contiguous: {c_arr.flags['F_CONTIGUOUS']}")

C-order array:
[[1 2 3]
 [4 5 6]]
Strides: (24, 8)
Is C-contiguous: True
Is F-contiguous: False


In [86]:
# Fortran-order (column-major)
f_arr = np.array([[1, 2, 3], [4, 5, 6]], order='F')
print(f"F-order array:\n{f_arr}")
print(f"Strides: {f_arr.strides}")  # (8, 16) - columns are contiguous
print(f"Is C-contiguous: {f_arr.flags['C_CONTIGUOUS']}")
print(f"Is F-contiguous: {f_arr.flags['F_CONTIGUOUS']}")

F-order array:
[[1 2 3]
 [4 5 6]]
Strides: (8, 16)
Is C-contiguous: False
Is F-contiguous: True


In [87]:
# Flatten preserves order by default
c_arr = np.array([[1, 2, 3], [4, 5, 6]], order='C')

print(f"Flatten C-order: {c_arr.flatten('C')}")  # [1, 2, 3, 4, 5, 6]
print(f"Flatten F-order: {c_arr.flatten('F')}")  # [1, 4, 2, 5, 3, 6]

Flatten C-order: [1 2 3 4 5 6]
Flatten F-order: [1 4 2 5 3 6]


In [88]:
# Performance impact - iterating along contiguous axis is faster
import time

large = np.random.rand(1000, 1000)

# Row-wise sum (follows C-order)
start = time.time()
for _ in range(100):
    row_sum = large.sum(axis=1)
row_time = time.time() - start

# Column-wise sum (against C-order)
start = time.time()
for _ in range(100):
    col_sum = large.sum(axis=0)
col_time = time.time() - start

print(f"Row sum time: {row_time:.4f}s")
print(f"Col sum time: {col_time:.4f}s")

Row sum time: 0.1169s
Col sum time: 0.0524s


---
## 5. Structured Arrays

In [89]:
# Define a structured dtype (like a database record)
dt = np.dtype([
    ('name', 'U10'),      # Unicode string, max 10 chars
    ('age', 'i4'),        # 32-bit integer
    ('weight', 'f8')      # 64-bit float
])

# Create structured array
people = np.array([
    ('Alice', 25, 55.5),
    ('Bob', 30, 75.2),
    ('Charlie', 35, 82.1)
], dtype=dt)

print(f"Structured array:\n{people}")
print(f"Dtype: {people.dtype}")

Structured array:
[('Alice', 25, 55.5) ('Bob', 30, 75.2) ('Charlie', 35, 82.1)]
Dtype: [('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]


In [90]:
# Access by field name
print(f"Names: {people['name']}")
print(f"Ages: {people['age']}")
print(f"Weights: {people['weight']}")

Names: ['Alice' 'Bob' 'Charlie']
Ages: [25 30 35]
Weights: [55.5 75.2 82.1]


In [91]:
# Filter structured arrays
adults_over_30 = people[people['age'] > 25]
print(f"People over 25:\n{adults_over_30}")

People over 25:
[('Bob', 30, 75.2) ('Charlie', 35, 82.1)]


In [92]:
# Modify fields
people['age'] += 1
print(f"After birthday:\n{people}")

After birthday:
[('Alice', 26, 55.5) ('Bob', 31, 75.2) ('Charlie', 36, 82.1)]


---
## 6. Special Values

In [93]:
# NaN (Not a Number)
arr = np.array([1, 2, np.nan, 4])
print(f"Array with NaN: {arr}")
print(f"Is NaN: {np.isnan(arr)}")

# NaN propagates
print(f"Sum with NaN: {arr.sum()}")  # nan
print(f"nansum (ignores NaN): {np.nansum(arr)}")  # 7.0

Array with NaN: [ 1.  2. nan  4.]
Is NaN: [False False  True False]
Sum with NaN: nan
nansum (ignores NaN): 7.0


In [94]:
# Infinity
arr = np.array([1, np.inf, -np.inf, 0])
print(f"Array with inf: {arr}")
print(f"Is infinite: {np.isinf(arr)}")
print(f"Is finite: {np.isfinite(arr)}")

Array with inf: [  1.  inf -inf   0.]
Is infinite: [False  True  True False]
Is finite: [ True False False  True]


In [95]:
# Operations producing inf/nan
print(f"1/0: {np.array([1.0]) / 0}")  # inf
print(f"0/0: {np.array([0.0]) / 0}")  # nan
print(f"inf - inf: {np.inf - np.inf}")  # nan

1/0: [inf]
0/0: [nan]
inf - inf: nan


  print(f"1/0: {np.array([1.0]) / 0}")  # inf
  print(f"0/0: {np.array([0.0]) / 0}")  # nan


---
## Key Points Summary

**Essential Attributes:**
- `shape`: Tuple of dimensions
- `ndim`: Number of axes
- `size`: Total elements
- `dtype`: Data type
- `itemsize`: Bytes per element
- `strides`: Bytes to step in each dimension

**Data Types:**
- Integer: int8, int16, int32, int64 (signed)
- Unsigned: uint8, uint16, uint32, uint64
- Float: float16, float32, float64
- Complex: complex64, complex128

**Casting:**
- Use `astype()` for explicit conversion
- NumPy auto-promotes to prevent data loss
- Integer overflow wraps silently!

**Memory:**
- C-order (row-major) is default
- F-order (column-major) for Fortran compatibility
- Contiguous access is faster

---
## Interview Tips

**Q1: What is the difference between float32 and float64?**
> - float32: 4 bytes, ~7 decimal precision, faster, less memory
> - float64: 8 bytes, ~15 decimal precision, more accurate
> - Use float32 for ML/GPU, float64 for scientific computing

**Q2: How do you check if two arrays share memory?**
> Use `np.shares_memory(a, b)` or check if `a.base is b`

**Q3: What happens during integer overflow in NumPy?**
> NumPy wraps around silently (no exception). int8 max (127) + 1 = -128.
> Use `np.can_cast()` to check safety or larger dtypes.

**Q4: Why use structured arrays instead of pandas?**
> - Lower memory overhead
> - Faster for simple operations
> - Better for binary file I/O
> - No pandas dependency
> - Use pandas when you need advanced data manipulation

---
## Practice Exercises

### Exercise 1: Find memory usage of an array
Create a 1000x1000 float64 array and calculate its memory in MB.

In [96]:
# Your code here


In [97]:
# Solution
arr = np.zeros((1000, 1000), dtype=np.float64)
memory_mb = arr.nbytes / (1024 * 1024)
print(f"Memory: {memory_mb:.2f} MB")
# Also can calculate: 1000 * 1000 * 8 bytes = 8MB

Memory: 7.63 MB


### Exercise 2: Safe casting check
Write code to check if values [100, 200, 300] can safely be stored in int8, uint8, and int16.

In [98]:
# Your code here


In [99]:
# Solution
values = [100, 200, 300]
dtypes = [np.int8, np.uint8, np.int16]

for v in values:
    print(f"Value {v}:")
    for dt in dtypes:
        # Get range info for the current integer type
        info = np.iinfo(dt)
        # Check if value is within the min and max allowed for that type
        can_fit = info.min <= v <= info.max
        print(f"  {dt.__name__ + ':':<7} {can_fit}")

Value 100:
  int8:   True
  uint8:  True
  int16:  True
Value 200:
  int8:   False
  uint8:  True
  int16:  True
Value 300:
  int8:   False
  uint8:  False
  int16:  True


### Exercise 3: Create a structured array for student records
Create a structured array with fields: id (int), name (string 20 chars), gpa (float).

In [100]:
# Your code here


In [101]:
# Solution
student_dt = np.dtype([
    ('id', 'i4'),
    ('name', 'U20'),
    ('gpa', 'f8')
])

students = np.array([
    (1, 'Alice Smith', 3.8),
    (2, 'Bob Johnson', 3.5),
    (3, 'Carol Williams', 3.9)
], dtype=student_dt)

print(students)
print(f"\nHigh achievers (GPA > 3.7):\n{students[students['gpa'] > 3.7]}")

[(1, 'Alice Smith', 3.8) (2, 'Bob Johnson', 3.5)
 (3, 'Carol Williams', 3.9)]

High achievers (GPA > 3.7):
[(1, 'Alice Smith', 3.8) (3, 'Carol Williams', 3.9)]


---
## Next Notebook
**04_indexing_and_slicing.ipynb** - Master array indexing, slicing, and element selection.