###### File Input/Output with NumPy

- Numpy is a data science package for Python that is used for numerical data manipulation and analysis.
  <br>
- Sometimes, we need to store our data in an external file that 
  we can put on a computer disk for long term storage. 
  These long-term files that store Numpy array data are .npy and .npz files. 
  We can create .npy and .npz files with Numpy save and Numpy savez.
  <br>
- Later though, if we want to work with those stored arrays again, 
  we need to re-load those array files from those .npy or .npz files back into our working environment.
  <br>
- How do we do this? We do it with Numpy load.
  <br>
  
- So essentially, we put Numpy array data in long-term storage with Numpy save, 
  and we can load it back into our working environment later with Numpy load.

###### NumPy File Operations

The NumPy objects can be saved to a disk file and loaded from it. 

Below mentioned are most commonly used functions to achieve this:-

![image.png](attachment:image.png)

###### numpy.save() function

**The NumPy save() function is used to save an array to a binary file in NumPy .npy format.**

**Syntax:-** numpy.save(file, arr)
    
**Parameters:-**
    
**file:-** Required. Specify file or filename to which the data is saved. 
           If file is a file-object, then the filename is unchanged. 
           If file is a string or Path, a .npy extension will be appended to the filename if it does not already have one.
        
**arr:-** Required. Specify array data to be saved.

**Note:-** The .npy extension is automatically appended to the file name if it's not already present.

In [16]:
# Example 1:-

import numpy as np

print("NumPy version:- ", np.__version__,"\n")

a = np.arange(6, dtype=np.int8).reshape(1, 2, 3)

print("Array Elements:-\n\n", a)

print("\nShape of the Array:-", a.shape)

print("\nType of the data:-", a.dtype)

print()

np.save('x',a)

print("Binary File with extension .npy has been saved \n")

print("--------------------------------------------------")

# Specify the file path, either as a string or as a pathlib.Path object, 
# as the first argument and the ndarray to be saved as the second argument.

# Example 2:-

import numpy as np

arr = np.array([10, 20, 30, 40, 50, 60])

print("Array Elements:-", arr)

print("\nShape of the Array:-", arr.shape)

print("\nType of the data:-", arr.dtype)

print()

#saving arr in binary file - test.npy
np.save("test", arr)

print("Binary File with extension .npy has been saved")

NumPy version:-  1.25.2 

Array Elements:-

 [[[0 1 2]
  [3 4 5]]]

Shape of the Array:- (1, 2, 3)

Type of the data:- int8

Binary File with extension .npy has been saved 

Array Elements:- [10 20 30 40 50 60]

Shape of the Array:- (6,)

Type of the data:- int32

Binary File with extension .npy has been saved


###### Savez()-Save Multiple NumPy Array Files

- To save multiple NumPy arrays into a single file, we use the savez() function.

- The savez() function is similar to the save() function, 
  but it can save multiple arrays at once in the .npz format.

  **Syntax:-** np.savez(file, *args, **kwds)

  **Parameters:-**
  - file** - name of the file where the arrays will be saved
  - *args - list of arrays to be saved (comma-separated)
  - **kwds - keyword arguments to give custom names of the files to be saved

In [None]:
import numpy as np

# create two NumPy arrays

array1 = np.array([2, 6, 10])

array2 = np.array([4, 8, 12])

# save the two arrays into a single file

np.savez('file2.npz', file1 = array1, file2 = array2)

# In this example, we have used the np.savez() function to save 
# the two NumPy arrays array1 and array2 into a single file named file2.npz.

# The file1 and file2 are the names given to the arrays using keyword arguments.

###### numpy.load() function

In [None]:
**The NumPy load() function is used to load arrays from .npy files.**

**Syntax:-** numpy.save(file, mmap_mode=None)
    
**Parameters:-**
    
**File:-** Required. Specify the file to read. File-like objects must support the seek() and read() methods.

**mmap_mode:-** Optional. It can take value from {None, 'r+', 'r', 'w+', 'c'}. Default is None. 

###### Example 1:- Load Single NumPy Array File

In [20]:
import numpy as np

array1 = np.array([[2, 4, 6], 
                  [8, 10, 12]])

# save the array to a file
np.save('file1.npy', array1)

# load the saved NumPy array
x = np.load('file1.npy')

# display the loaded array
print("Displaying the content of file.npy file:-\n", x)

print("------------------------------------------------")

import numpy as np

arr = np.array([10, 20, 30, 40, 50, 60])

#saving arr in binary file - test.npy
np.save("test", arr)

#loading array from test.npy
x = np.load("test.npy")

#displaying the content of y
print("Displaying the content of test.npy file:-\n", x)

Displaying the content of file.npy file:-
 [[ 2  4  6]
 [ 8 10 12]]
------------------------------------------------
Displaying the content of test.npy file:-
 [10 20 30 40 50 60]


###### Example 2:- Load Multiple NumPy Array File

In [24]:
import numpy as np

# create two NumPy arrays

array1 = np.array([2, 6, 10])

array2 = np.array([4, 8, 12])

# save the two arrays into a single file

np.savez('file2.npz', file1 = array1, file2 = array2)

# load the saved arrays 
load_data = np.load('file2.npz')

print("File Elements:-",load_data)
print()
# retrieve the arrays using their names

a1 = load_data['file1']
a2 = load_data['file2']

# display the loaded arrays

print("First Array elements of the file:-", a1)
print()
print("Second Array elements of the file:-", a2)

# In the above example, first we loaded the saved arrays from the file using the np.load() function.

# Then, we retrieved the arrays using their names file1 and file2 as: data['file1'] and data['file2'], respectively.

# Finally, we displayed the loaded arrays.

File Elements:- NpzFile 'file2.npz' with keys: file1, file2

First Array elements of the file:- [ 2  6 10]

Second Array elements of the file:- [ 4  8 12]


###### Delimiters

• Text files (txt, csv etc) will have some way to indicate new column ofdata, rows separated by newlines

• Range of these, with common examples:-
    
   - Single white space
   - Tab
   - Comma
   - Colon
    
• Need to be aware of these as you read and write your ownfiles.

• A delimiter is a character or a string of characters that separates individual values on a line.

### numpy.savetxt() function

- The NumPy savetxt() function is used to save an array to a text file.

  **Syntax:-** numpy.savetxt(fname, X)
     
  **Parameters:-**

   **fname** - Required. Specify the filename to which the data is saved. 
                  If the filename ends in .gz, the file is automatically saved in compressed gzip format. 
                  
   **X** -     Required. Specify data to be saved in the file (1D or 2D array_like).

In [26]:
# Example 1:-
    
import numpy as np

data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Write data to a text file
np.savetxt('output.txt', data, fmt='%d', delimiter='\t')

# In this example, the NumPy array data is written to output.txt. 
# The fmt parameter specifies the format of the data (integer %d in this case),
# and the delimiter parameter sets the delimiter between values (tab \t in this case).

###### .CSV files

In [None]:
.CSV files or comma separated values files are a type of text files that have values separated by commas. 

A CSV file stores tabular data in a text file. 

CSV files can be loaded in NumPy arrays and their data can be analyzed using these functions.

In [34]:
# Example 2:-

import numpy as np

x=np.array([[8,2,3],[4,5,6],[7,8,9]])

print("Given array values:- \n",x)

print()

np.savetxt("Test1.csv", x,delimiter=",")

# np.savetxt("Test1.csv", x, fmt='%d', delimiter=",")

print("Array values has been saved in Test1.csv file")

Given array values:- 
 [[8 2 3]
 [4 5 6]
 [7 8 9]]

Array values has been saved in Test1.csv file


In [84]:
# Example 3

import numpy as np

y=np.array([['Jacob','Pete','Messi'],['Scott','John','Finn'],['Bob','Morely','Lincon']])

print()

print("Given Array:- \n\n",y)

np.savetxt("Names.csv", y,fmt='%s')

print()

print("Array values has been saved in Names.csv file")


Given Array:- 

 [['Jacob' 'Pete' 'Messi']
 ['Scott' 'John' 'Finn']
 ['Bob' 'Morely' 'Lincon']]

Array values has been saved in Names.csv file


#### numpy.loadtxt() function

- The NumPy loadtxt() function is used to load data from a text file. 

- Each row in the text file must have the same number of values.

  **Syntax:-** numpy.loadtxt(fname, dtype=<class 'float'>)
    
  **Parameters:-** 
  
     1. **fname** - Required. Specify File, filename, or generator to read.
                 If the filename extension is .gz or .bz2, the file is first decompressed. 
                 Note that generators should return byte strings.
          <br>     
     2. **dtype** - Optional. Specify data-type of the resulting array. Default: float.

**To import Text files into Numpy Arrays, we have two functions in Numpy:**

  **1. numpy.loadtxt()** – Used to load text file data

  **2. numpy.genfromtxt()** – Used to load data from a text file, with missing values handled as defined.

  **Note:** numpy.loadtxt() is equivalent function to numpy.genfromtxt( ) when no data is missing.

In [41]:
# Example 1:- 

import numpy as np

arr = np.array([10, 20, 30, 40, 50, 60])

#saving arr in text file - test.out

np.savetxt("test.out", arr)

#loading array from test.out

y = np.loadtxt("test.out")

#displaying the content of y

print("Displaying the file contents:-\n", y)

Displaying the file contents:-
 [10. 20. 30. 40. 50. 60.]


In [None]:
Scenario:- 
    
We have 100 lines (or rows) of data in our text file,
each of which comprises two floating-point numbers separated by a space.

The first number on each row represents the weight, and the second number represents the height 
of an individual.

Here’s a little glimpse from the file:

110.90 146.03
44.83 211.82
97.13 209.30
105.64 164.21

This file is stored as `weight_height_1.txt`.
Our task is to read the file and parse the data in a way that we can represent in a NumPy array.

In [43]:
# Example 2:- Loading & Displaying contents from the text file

import numpy as np

x=np.loadtxt("weight_height_1.txt")

print("Shape of the file data:- ", x.shape,"\n")

print("Type of the file data:- ", x.dtype,"\n")

print("Load data from the Text File:- \n\n", x)

# The function returns an n-dimensional NumPy array of values found in the text.

# Here our text had 100 rows with each row having two float values, 
# so the returned object data will be a NumPy array of shape (100, 2) with the float data type.

Shape of the file data:-  (100, 2) 

Type of the file data:-  float64 

Load data from the Text File:- 

 [[110.9  146.03]
 [ 44.83 211.82]
 [ 97.13 209.3 ]
 [ 69.87 207.73]
 [ 48.73 158.87]
 [ 99.25 195.41]
 [ 50.15 184.07]
 [ 64.86 220.5 ]
 [108.96 192.7 ]
 [ 88.13 220.3 ]
 [101.98 137.2 ]
 [ 45.99 158.41]
 [ 74.95 182.4 ]
 [ 61.29 173.22]
 [ 93.91 162.94]
 [ 59.19 208.28]
 [115.93 145.7 ]
 [ 66.32 189.89]
 [ 97.96 216.69]
 [ 54.05 137.74]
 [ 62.62 205.15]
 [ 84.12 177.03]
 [ 81.61 181.1 ]
 [111.97 160.98]
 [119.25 173.58]
 [ 93.2  183.02]
 [105.3  157.19]
 [114.57 185.62]
 [ 95.58 189.92]
 [ 68.19 221.31]
 [100.91 155.55]
 [ 72.93 150.38]
 [116.68 137.15]
 [ 86.51 172.15]
 [ 59.85 155.53]
 [ 56.46 164.25]
 [ 65.47 204.84]
 [ 56.09 205.31]
 [ 98.38 142.48]
 [ 64.46 167.56]
 [113.99 173.27]
 [ 52.1  219.39]
 [ 84.48 176.72]
 [ 95.34 155.51]
 [ 99.88 181.25]
 [ 63.8  159.63]
 [ 41.92 176.16]
 [ 63.07 157.79]
 [ 77.09 180.87]
 [ 91.26 153.32]
 [ 87.44 209.17]
 [ 53.87 219.3 ]
 [ 54.66 1

In [45]:
# Example 3:- Loading & Displaying contents with delimiter from the text file

import numpy as np

x=np.loadtxt("weight_height_2.txt", delimiter=",")

print("Load data from the Text File:- \n\n", x)

Load data from the Text File:- 

 [[110.9  146.03]
 [ 44.83 211.82]
 [ 97.13 209.3 ]
 [ 69.87 207.73]
 [ 48.73 158.87]
 [ 99.25 195.41]
 [ 50.15 184.07]
 [ 64.86 220.5 ]
 [108.96 192.7 ]
 [ 88.13 220.3 ]
 [101.98 137.2 ]
 [ 45.99 158.41]
 [ 74.95 182.4 ]
 [ 61.29 173.22]
 [ 93.91 162.94]
 [ 59.19 208.28]
 [115.93 145.7 ]
 [ 66.32 189.89]
 [ 97.96 216.69]
 [ 54.05 137.74]
 [ 62.62 205.15]
 [ 84.12 177.03]
 [ 81.61 181.1 ]
 [111.97 160.98]
 [119.25 173.58]
 [ 93.2  183.02]
 [105.3  157.19]
 [114.57 185.62]
 [ 95.58 189.92]
 [ 68.19 221.31]
 [100.91 155.55]
 [ 72.93 150.38]
 [116.68 137.15]
 [ 86.51 172.15]
 [ 59.85 155.53]
 [ 56.46 164.25]
 [ 65.47 204.84]
 [ 56.09 205.31]
 [ 98.38 142.48]
 [ 64.46 167.56]
 [113.99 173.27]
 [ 52.1  219.39]
 [ 84.48 176.72]
 [ 95.34 155.51]
 [ 99.88 181.25]
 [ 63.8  159.63]
 [ 41.92 176.16]
 [ 63.07 157.79]
 [ 77.09 180.87]
 [ 91.26 153.32]
 [ 87.44 209.17]
 [ 53.87 219.3 ]
 [ 54.66 173.83]
 [ 47.32 202.23]
 [115.15 201.19]
 [ 91.6  221.95]
 [ 42.11 185.0

In [None]:
Unless specified otherwise, the np.loadtxt function of the NumPy package assumes 
the values in the passed text file to be floating-point values by default.

So if you pass a text file that has characters other than numbers, 
the function will throw an error, stating it was expecting floating-point values.

We can overcome this by specifying the data type of the values in the
text file using the datatype parameter.

In [None]:
Let’s look at a new file ‘./weight_height_4.txt’, 

which has only 1 column for the date of birth of individuals in the dd-mm-yyyy format:

13-2-1991
17-12-1990
18-12-1986
…

So we’ll call the loadtxt method with “-” as the delimiter:

In [54]:
# Example 4:- Loading & Displaying contents with delimiter from the text file

import numpy as np

a = np.loadtxt("weight_height_4.txt", delimiter="-")

print("Data type of the values in the file:-", a.dtype,"\n")

print("Displaying the file contents:-\n\n", a[:5])

Data type of the values in the file:- float64 

Displaying the file contents:-

 [[  13.    2. 1991.]
 [  17.   12. 1990.]
 [  18.   12. 1986.]
 [   3.   13. 1998.]
 [  28.   10. 1983.]]


In [None]:
We can alter this behavior by passing the value ‘int’ to the ‘dtype’ parameter. 
This will ask the function to store the extracted values as integers, 
and hence the data type of the array will also be int.

In [55]:
import numpy as np

x=np.loadtxt("weight_height_4.txt", delimiter="-",dtype="int")

print("Data type of the values in the file:-", x.dtype,"\n")

print("Displaying the file contents:-\n\n", x[:5])

Data type of the values in the file:- int32 

Displaying the file contents:-

 [[  13    2 1991]
 [  17   12 1990]
 [  18   12 1986]
 [   3   13 1998]
 [  28   10 1983]]


###### Ignoring Headers

In [None]:
In some cases (especially CSV files), the first line of the text file may have ‘headers’ 
describing what each column in the following rows represents.

While reading data from such text files, 
we may want to ignore the first line, because we cannot (and should not) store them in our NumPy array.

In such a case, we can use the ‘skiprows’ parameter and pass the value 1, 
asking the function to ignore the first 1 line(s) of the text file.

Let’s try this on a CSV file – ‘weight_height.csv’:

Weight (in Kg),      height (in cm)
73.847017017515,     241.893563180437
68.7819040458903,    162.310472521300
74.1101053917849,    212.7408555565
…
Now we want to ignore the header line, i.e., the first line of the file:

In [57]:
import numpy as np

x=np.loadtxt("weight_height.csv", delimiter=",", skiprows=1)

print("Data type of the values in the file:-", x.dtype,"\n")

print("Displaying the file contents:-\n\n", x[:5])

Data type of the values in the file:- float64 

Displaying the file contents:-

 [[110.9  146.03]
 [ 44.83 211.82]
 [ 97.13 209.3 ]
 [ 69.87 207.73]
 [ 48.73 158.87]]


In [None]:
# To display selected column data from the file

In [75]:
import numpy as np
  
# only column 1 data is imported into numpy array from text file

data = np.loadtxt("Columns.txt", usecols=1, skiprows=1, dtype='str')
  
for each in data:
    print(each)

Ankit
Bunty
Rinku
Chaitu
Pandu


In [79]:
# To display multiple columns data from the file

import numpy as np
  
# only column 1 data is imported into numpy array from text file

data = np.loadtxt("Columns.txt", usecols=[0,1], skiprows=1, dtype='str')
  
for each in data:
    print(each)

['1' 'Ankit']
['2' 'Bunty']
['3' 'Rinku']
['4' 'Chaitu']
['5' 'Pandu']


###### numpy genfromtxt()

In [None]:
We use Numpy genfromtxt() to load the data from the text files.

We can also specify, how to handle the missing values if there are any.

numpy.genfromtxt represents the function that reads the input data or file, 
which contains various data types, into an array format.

**Syntax:-** numpy.genfromtxt(fname, dtype= , comments: , delimiter = , skiprows = , skip_header = , autostrip =)
    
**Parameters:-**

   1. **fname** is the filename of the input data or file we pass through the genfromtxt function. 
       It can be given in a filename, list, or path to read.

   2. **dtype** is the data type declaration when we want the output array of the genfromtxt function
      in that particular data type. If we declare the dtype as ‘None’ it will automatically generate a
      data type depending on the values of that column.

   3. **comments** indicate the starting characters in a line of the given input data. 
      We can discard the values or feelings that come after our declared comment.

   4. **delimiter** is the string value we declared to split the input values. 
      For CSV files, we give delimiter as ‘,’ (comma) to separate the values in the file.

   5. **skiprows** is used to skip particular rows or a set of rows in the given input file.

   6. **skip_header** is used when we want to skip the input file’s starting rows or lines.

   7. **autostrip**, it can be declared to automatically remove the white spaces present in between the values in our data file.

In [91]:
# Example 1:-

import numpy as np

arr = np.genfromtxt("data.csv",delimiter=",", dtype=str)

print(arr)

['Jacob Pete Messi' 'Scott John Finn' 'Bob Morely Lincon'
 'Bob1 Morely1 Lincon1' 'Bob2 Morely2 Lincon2' 'Bob3 Morely3 Lincon3'
 'Bob4 Morely4 Lincon4' 'Bob5 Morely5 Lincon5' 'Bob6 Morely6 Lincon6'
 'Bob7 Morely7 Lincon7' 'Bob8 Morely8 Lincon8' 'Bob9 Morely9 Lincon9'
 'Bob10 Morely10 Lincon10' 'Bob11 Morely11 Lincon11'
 'Bob12 Morely12 Lincon12' 'Bob13 Morely13 Lincon13'
 'Bob14 Morely14 Lincon14' 'Bob15 Morely15 Lincon15']


In [92]:
# Example 2:- Space as a delimter

import numpy as np

arr = np.genfromtxt("data.csv",delimiter=" ", dtype=str)

print(arr)

[['Jacob' 'Pete' 'Messi']
 ['Scott' 'John' 'Finn']
 ['Bob' 'Morely' 'Lincon']
 ['Bob1' 'Morely1' 'Lincon1']
 ['Bob2' 'Morely2' 'Lincon2']
 ['Bob3' 'Morely3' 'Lincon3']
 ['Bob4' 'Morely4' 'Lincon4']
 ['Bob5' 'Morely5' 'Lincon5']
 ['Bob6' 'Morely6' 'Lincon6']
 ['Bob7' 'Morely7' 'Lincon7']
 ['Bob8' 'Morely8' 'Lincon8']
 ['Bob9' 'Morely9' 'Lincon9']
 ['Bob10' 'Morely10' 'Lincon10']
 ['Bob11' 'Morely11' 'Lincon11']
 ['Bob12' 'Morely12' 'Lincon12']
 ['Bob13' 'Morely13' 'Lincon13']
 ['Bob14' 'Morely14' 'Lincon14']
 ['Bob15' 'Morely15' 'Lincon15']]


### Handling Missing Values

In [None]:
One of the strengths of genfromtxt is its ability to handle missing data. 

we can specify the filling_values parameter to replace missing data with a specific value.

The genfromtxt() function converts missing values and character strings in numeric columns to nan. 

But if we specify dtype as int, it converts the missing or other non numeric values to -1. 

We can also convert these missing values and character strings in the data files 
to some specific value using the parameter filling_values. 

In [172]:
import numpy as np

data1 = np.genfromtxt('Strings.csv', delimiter=',', dtype='str',missing_values = "",filling_values=999)

print(data1)

[['Jacob ' 'Pete ' 'Messi']
 ['Scott ' 'John ' '']
 ['Bob ' 'Morely' ' Lincon']
 ['Bob1 ' 'Morely1 ' 'Lincon1']
 ['Bob2' 'Morely2' 'Lincon2']
 ['Bob3' 'Morely3' '']
 ['Bob4' 'Morely4' 'Lincon4']
 ['Bob5' 'Morely5' 'Lincon5']
 ['Bob6' 'Morely6' 'Lincon6']]


In [167]:
import numpy as np

data = np.genfromtxt('numbers.csv', delimiter=',', dtype='int',missing_values = "",filling_values='-1')

print(data)

[[10 20 30]
 [45 55 -1]
 [75 85 95]
 [75 85 -1]
 [75 85 95]
 [75 85 95]]


###### Comparing genfromtxt and loadtxt

In [None]:
While both genfromtxt and loadtxt can be used to read CSV files, they have different strengths. 

genfromtxt is more flexible and can handle missing data,
while loadtxt is simpler and faster, but less flexible.

Choose the one that best fits your needs.

If your data is clean and well-structured, loadtxt might be the better choice for its simplicity and speed. 

If your data has missing values or requires more complex handling, genfromtxt would be more suitable.