# Numpy File handling

In [1]:
import numpy as np
import sys

NumPy mainly supports **two types of file operations:**

1. **Binary files** (faster, preferred for intermediate storage)
2. **Text files** (like `.csv` or `.txt`, easier to read/edit)


## 1. Binary File Handling (Efficient Storage)

### save()

* Saves **one NumPy array** to a binary `.npy` file.
* **Syntax:**

```python
np.save('filename.npy', array)
```

In [2]:
arr = np.array([1, 2, 3, 4, 5])

np.save('array.npy', arr)

### load()

* Loads arrays from `.npy` files.
* **Syntax:**

```python
array = np.load('filename.npy')
```

In [3]:
# Loads arrays from .npy files.
a = np.load("array.npy")
print(a)

[1 2 3 4 5]


## savez()

* Saves **multiple arrays** in one compressed `.npz` file.
* **Syntax:**

```python
np.savez('filename.npz', arr1=array1, arr2=array2, ...)
```

In [4]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
np.savez('my_arrays.npz', first=a, second=b)

In [5]:
data = np.load('my_arrays.npz')
print(data)
print(data['first'])
print(data['second'])

NpzFile 'my_arrays.npz' with keys: first, second
[1 2 3]
[4 5 6]


## savez_compressed()

* Same as `savez` but compresses the file to save space.

```python
np.savez_compressed('compressed_arrays.npz', arr1=a, arr2=b)
```


In [6]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
np.savez_compressed('my_arrays_compressed.npz', first=a, second=b)

In [7]:
data = np.load('my_arrays_compressed.npz')
print(data)
print(data['first'])
print(data['second'])

NpzFile 'my_arrays_compressed.npz' with keys: first, second
[1 2 3]
[4 5 6]


## 2. Text File Handling (Human Readable)

## savetxt()


* Saves an array to a **text file** (CSV, TXT, etc.).
* You can specify the **delimiter** (like comma, space, etc.).
* **Syntax:**

```python
np.savetxt('filename.txt', array, delimiter=',', fmt='%d')
```


In [8]:
a = np.array([[1, 2, 3], [4, 5, 6]])
np.savetxt('array.txt', a)
# saved file as follows
# 1.000000000000000000e+00 2.000000000000000000e+00 3.000000000000000000e+00
# 4.000000000000000000e+00 5.000000000000000000e+00 6.000000000000000000e+00

In [9]:
np.savetxt('array.txt', a, fmt='%d')
# saved file as follows
# 1 2 3
# 4 5 6

In [10]:
np.savetxt('array.txt', a, delimiter=',', fmt='%d')
# saved file as follows
# 1,2,3
# 4,5,6

In [11]:
np.savetxt('array.txt', a, delimiter=',', fmt='%.2f')
# saved file as follows
# 1.00,2.00,3.00
# 4.00,5.00,6.00

* `fmt='%d'` → Format as integer
* You can use `%.2f` for floating-point numbers.

## loadtxt()

* Loads data from a **text file**.
* **Syntax:**

```python
array = np.loadtxt('filename.txt', delimiter=',')
```

In [16]:
data = np.loadtxt('array.txt', delimiter=',')
print(data)

[[1. 2. 3.]
 [4. 5. 6.]]


## genfromtxt()


* More advanced than `loadtxt`.
* Can handle **missing values**.
* Allows **skipping headers**.


In [18]:
data = np.genfromtxt('array.csv', delimiter=',', skip_header=1, filling_values=0)
print(data)

[4. 5. 6.]


## 3.  Key Differences Between Binary and Text Files

| Feature                     | Binary (`.npy`, `.npz`) | Text (`.txt`, `.csv`) |
| --------------------------- | ----------------------- | --------------------- |
| Human Readable              | ❌ No                    | ✅ Yes                 |
| File Size                   | Smaller (efficient)     | Larger                |
| Loading Speed               | Very fast               | Slower                |
| Multiple Arrays in One File | ✅ Yes (`.npz`)          | ❌ No                  |
| Use Case                    | Intermediate storage    | Data sharing          |


## 4. Common Parameters You Should Know

| Parameter                  | Description                                 |
| -------------------------- | ------------------------------------------- |
| `delimiter`                | Separator for text files (default is space) |
| `fmt`                      | Format specifier (int, float, etc.)         |
| `skiprows` / `skip_header` | Skip initial rows in text files             |
| `usecols`                  | Load specific columns                       |
| `filling_values`           | Fill missing values                         |
| `dtype`                    | Force data type during loading              |


### ✅ Working with `.csv` Files Using NumPy

When using **NumPy** to handle `.csv` files, you primarily use:

* `numpy.savetxt()` 👉 to **write** (save) `.csv` files
* `numpy.loadtxt()` 👉 to **read** simple `.csv` files
* `numpy.genfromtxt()` 👉 to **read** complex `.csv` files (with missing values, headers, etc.)

Let’s go through **both writing and reading in detail with examples.**

---

## 📝 Writing to a CSV File Using NumPy

You use `numpy.savetxt()` to write a NumPy array to a CSV file.

### 📌 Syntax:

```python
numpy.savetxt(fname, X, delimiter=',', fmt='%s')
```

* `fname` → Filename (example: `'data.csv'`)
* `X` → NumPy array to save
* `delimiter` → Separator (`,` for CSV)
* `fmt` → Format (example: `%d` for integers, `%.2f` for floats, `%s` for strings)

---

### ✅ Example 1: Writing an Integer Array to CSV

```python
import numpy as np

# Create a NumPy array
a = np.array([[1, 2, 3], [4, 5, 6]])

# Save to CSV
np.savetxt('array.csv', a, delimiter=',', fmt='%d')
```

✔️ This will create a file named `array.csv`:

```
1,2,3
4,5,6
```

---

### ✅ Example 2: Writing a Float Array to CSV

```python
b = np.array([[1.23, 4.56, 7.89], [0.12, 3.45, 6.78]])

np.savetxt('float_array.csv', b, delimiter=',', fmt='%.2f')
```

✔️ This will create:

```
1.23,4.56,7.89
0.12,3.45,6.78
```

---

### ✅ Example 3: Writing String Data to CSV

```python
c = np.array([['Apple', 'Banana'], ['Cat', 'Dog']])

np.savetxt('string_array.csv', c, delimiter=',', fmt='%s')
```

✔️ This will create:

```
Apple,Banana
Cat,Dog
```

---

## 📥 Reading from a CSV File Using NumPy

### 📌 1. Using `numpy.loadtxt()` for Simple CSV Files

This is best when:

* All data is **numeric**
* There are **no missing values**
* File does not have headers

#### Example:

```python
import numpy as np

# Read from CSV
data = np.loadtxt('array.csv', delimiter=',')

print(data)
```

✔️ Output:

```text
[[1. 2. 3.]
 [4. 5. 6.]]
```

---

### 📌 2. Using `numpy.genfromtxt()` for Complex CSV Files

Use when:

* CSV contains **headers**
* There are **missing values**
* You want to handle string data

#### Example with Header and Missing Values:

Suppose you have a CSV file `sample.csv`:

```text
Name,Age,Salary
John,25,50000
Doe,,45000
Anna,30,
```

#### Reading the file:

```python
data = np.genfromtxt('sample.csv', delimiter=',', dtype=None, names=True, encoding='utf-8', filling_values='NA')

print(data)
```

✔️ Explanation:

* `names=True` → Reads the header as field names.
* `filling_values='NA'` → Fills missing values with `'NA'`.

✔️ Output:

```text
[('John', 25.0, 50000.0) ('Doe', nan, 45000.0) ('Anna', 30.0, nan)]
```

---

## ✅ Summary: When to Use What

| Scenario                             | Method               |
| ------------------------------------ | -------------------- |
| Write array to CSV                   | `numpy.savetxt()`    |
| Read simple numeric CSV              | `numpy.loadtxt()`    |
| Read CSV with headers/missing values | `numpy.genfromtxt()` |

---

## ✅ Quick Tips:

* Always specify the correct **delimiter** (`,` for CSV, `\t` for TSV).
* Use `fmt='%s'` for string arrays when writing.
* `numpy` does not handle CSV files with mixed types (strings + numbers) very well — for such cases, **pandas is more powerful**.

---

If you want, I can help you **practice step-by-step CSV examples with missing values, headers, and different data types.**
👉 Do you want to try some hands-on exercises next? 😊


## Practice Writing to a CSV

In [12]:
a = np.array([[1, 2, 3], [4, 5, 6]])
np.savetxt('array.csv', a, delimiter=',', fmt='%s')

# 1,2,3
# 4,5,6

In [13]:
students = np.array([(1, 'Mahesh'), (2, 'Manju')], 
                    dtype=[('id', 'int8'), ('Name', 'U10')])
np.savetxt('students.csv', students, delimiter=',', fmt='%s', header='id,Name', comments='')

## Reading from a CSV 

In [14]:
# s = np.loadtxt('students.csv', delimiter=',')
# print(s)

- `numpy.loadtxt()` by default expects regular numeric arrays.

- Your CSV actually contains mixed data types (integers and strings), but loadtxt assumes all data should be of the same type.

- So loadtxt is struggling to parse the strings and is getting stuck.

In [15]:
s = np.genfromtxt('students.csv', delimiter=',', dtype=None, encoding='utf-8')

print(s)
print(s.dtype)

[['id' 'Name']
 ['1' 'Mahesh']
 ['2' 'Manju']]
<U6


- `dtype=None` → Automatically detects mixed data types.

- `encoding='utf-8'` → Required to properly read string fields (recommended in Python 3).

- `delimiter=','` → To correctly split CSV columns.

<center><b>Thanks</b></center>