***
## 3.6 Numpy - File I/O and Data Processing

***
### Python3.1 Numpy Introduction
### Python3.2 Numpy DataTypes, Functions, and Random Module
### Python3.3 Numpy Iterating Over Arrays
### Python3.4 Numpy Manipulating Arrays
### Python3.5 Numpy Operations
### Python3.6 Numpy File Input and Output and Data Processing
### Python3.7 Numpy-Sort, Argsort, Nonzero, and Extract Functions
### Python3.8 Numpy BreakoutGroupExercises
### Python3.8 Numpy BreakoutGroupExercises - Solutions
***

### The ndarray objects can be saved to and loaded from the disk files. The IO functions are available:

- 1) `load()` and `save()` functions handle/NumPy binary files (with npy extension)

- 2) `loadtxt()` and `savetxt()` functions handle normal text files

NumPy introduces a simple file format for ndarray objects. This `.npy` file stores data, shape, dtype and other information required to reconstruct the ndarray in a disk file such that the array is correctly retrieved even if the file is on another machine with different architecture.

### I/O 1): Saving & Loading on Disk - `save()` Function

- The numpy save() file stores the input array in a disk file with npy extension.

- The save() and load() functions accept an additional Boolean parameter allow_pickles. A pickle in Python is used to serialize and de-serialize objects before saving to or reading from a disk file.

![image.png](attachment:image.png)

In [15]:
import numpy as np 
a = np.array([1,2,3,4,5]) 
np.save('outfile',a)

In [16]:
# To reconstruct array from outfile.npy, use load() function.
b = np.load('outfile.npy') 
b 

array([1, 2, 3, 4, 5])

In [17]:
%pwd

'C:\\Users\\yumei\\MSCA37014PythonForAnalyticsSummer2022\\Data'

### I/O 2): Saving & Loading Text Files - `savetxt()` Function

- The storage and retrieval of array data in simple text file format is done with savetxt() and loadtxt() functions.

- The savetxt() and loadtxt() functions accept additional optional parameters such as header, footer, and delimiter.

![image.png](attachment:image.png)

In [18]:
a = np.array([1,2,3,4,5]) 
np.savetxt('out.txt',a) 
b = np.loadtxt('out.txt') 
b 

array([1., 2., 3., 4., 5.])

#### Using numpy.savetxt we can store a Numpy array to a file in CSV format:

In [19]:
M = np.random.rand(3,3)
M

array([[0.754208  , 0.72479078, 0.38382575],
       [0.7204977 , 0.08497196, 0.6306744 ],
       [0.0695307 , 0.8611572 , 0.57430917]])

In [20]:
np.savetxt("random-matrix.csv", M)

Numpy's native file format is .npy

Useful when storing and reading back numpy array data. Use the functions numpy.save and numpy.load:

In [21]:
np.save("random-matrix.npy", M)

#### Import WCFrequency data

In [22]:
import numpy as np
import csv

In [23]:
%pwd

'C:\\Users\\yumei\\MSCA37014PythonForAnalyticsSummer2022\\Data'

In [25]:
import os
os.chdir(r'C:\Users\yumei\CSP Workshop 2023\Data')
with open('WCFrequency.csv', 'r') as f:
    data = list(csv.reader(f, delimiter = ','))
data[:5]

[['class', 'year', 'payroll', 'claimcount'],
 ['1', '1', '32.321999', '1'],
 ['1', '2', '33.778999', '4'],
 ['1', '3', '43.548', '3'],
 ['1', '4', '46.686001', '5']]

In [26]:
np.genfromtxt('WCFrequency.csv', delimiter=',', dtype="<U16", autostrip=True)[:5]

array([['class', 'year', 'payroll', 'claimcount'],
       ['1', '1', '32.321999', '1'],
       ['1', '2', '33.778999', '4'],
       ['1', '3', '43.548', '3'],
       ['1', '4', '46.686001', '5']], dtype='<U16')

In [27]:
data[-5:]

[['133', '3', '69.913002', '0'],
 ['133', '4', '46.289001', '0'],
 ['133', '5', '91.119003', '0'],
 ['133', '6', '78.456001', '1'],
 ['133', '7', '84.362', '2']]

In [28]:
np.genfromtxt('WCFrequency.csv', delimiter=',', dtype="<U16", autostrip=True)[-10:]

array([['132', '5', '35.412998', '2'],
       ['132', '6', '16.983999', '1'],
       ['132', '7', '18.797001', '0'],
       ['133', '1', '66.212997', '2'],
       ['133', '2', '65.695999', '0'],
       ['133', '3', '69.913002', '0'],
       ['133', '4', '46.289001', '0'],
       ['133', '5', '91.119003', '0'],
       ['133', '6', '78.456001', '1'],
       ['133', '7', '84.362', '2']], dtype='<U16')

In [29]:
np.genfromtxt('WCFrequency.csv', delimiter=',', dtype="<U16", autostrip=True)

array([['class', 'year', 'payroll', 'claimcount'],
       ['1', '1', '32.321999', '1'],
       ['1', '2', '33.778999', '4'],
       ...,
       ['133', '5', '91.119003', '0'],
       ['133', '6', '78.456001', '1'],
       ['133', '7', '84.362', '2']], dtype='<U16')

## Further reading

- http://numpy.scipy.org
- http://scipy.org/Tentative_NumPy_Tutorial
- http://scipy.org/NumPy_for_Matlab_Users - A Numpy guide for MATLAB users.

#### Note: The course materials are developed mainly based on personal experience and contributions from the Python learning community
Referred Books: 
- Learning Python, 5th Edition by Mark Lutz
- Python Data Science Handbook, Jake, VanderPlas
- Python for Data Analysis, Wes McKinney    

Copyright ©2023 Mei Najim. All rights reserved. 