---   
 <img align="left" width="75" height="75"  src="../University_of_the_Punjab_logo.png"> 

<h1 align="center">Department of Data Science</h1>
<h1 align="center">Data Science Journey from Beginners to Expert</h1>

---
<h3><div align="right">Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.</div></h3>     

<h1 align="center">Lecture 3.8 (NumPy-08)</h1>

# _IO Operations with NumPy Arrays.ipynb_

<img align="center" width="600" height="150"  src="images/fileformats.png" >

# Learning agenda of this notebook
1. Reading Numeric data from text/csv Files
2. Writing data to files
3. Bonus 1
4. Bonus 2

In [None]:
# To install this library in Jupyter notebook
#import sys
#!{sys.executable} -m pip install numpy

In [None]:
import numpy as np
np.__version__ , np.__path__

## 1. Reading Data from Files
The `np.loadtxt()`  is used to load data from a text file. Each row in the text file must have the same number of values.
```
np.loadtxt(fname, dtype=’float’, delimiter=None, skiprows=0, comments='#')
```
<br><br>
The `np.genfromtxt()`  is more powerful and we will use this to read data into numPy arrays.

```
np.genfromtxt(fname,dtype=’float’,delimiter=None,skip_header=0,comments='#',missing_values=None,filling_values=None)
```

**Example 1:** Read data from a simple text file containing space separated numbers in a single line.

In [None]:
!cat datasets/data0.txt

In [None]:
import numpy as np
# The only required argument is name of file, and by default the numbers are casted to float data type
arr = np.genfromtxt("datasets/data0.txt")
print("data:\n", arr)
print("shape: ", arr.shape)

**Example 2:** Read data from a simple text file containing space separated numbers in multiple lines. The count of numbers on each line must be same

In [None]:
!cat datasets/data1.txt

In [None]:
# The only required argument is name of file, and by default the numbers are casted to float data type
arr = np.genfromtxt("datasets/data1.txt")
print("data:\n", arr)
print("shape: ", arr.shape)

In [None]:
# You can read the numbers as integers, by mentioning the dtype argument
arr = np.genfromtxt("datasets/data1.txt", dtype=np.uint8)
print("data:\n", arr)
print("shape: ", arr.shape)

**Example 3:** Read data from a csv text file containing comma separated numbers. By default, the `genfromtxt()` expect a space as separator. So here, we need to pass `,` as the delimiter argument

In [None]:
!cat datasets/icecreamsales_simple.csv

In [None]:
arr = np.genfromtxt("datasets/icecreamsales_simple.csv", dtype=np.int16, delimiter=',')
print("data:\n", arr)
print(arr.shape)

**Example 4:** By default the `genfromtxt()` method assume that no column labels are there in the first line. However, if the first row of file contains column labels, we need to use skip_header argument

In [None]:
!cat datasets/icecreamsales_withheader.csv

In [None]:
import numpy as np
arr = np.genfromtxt("datasets/icecreamsales_withheader.csv", dtype=np.int16, delimiter=',', skip_header=1)
print("data:\n", arr)
print(arr.shape)

**Example 5:** If the file has comments in the beginning, in between or at the end, you will get an error. To handle this, you need to pass the appropriate character that is used for start of comment to the comment argument

In [None]:
! cat datasets/icecreamsales_withcomments.csv

In [None]:
arr = np.genfromtxt("datasets/icecreamsales_withcomments.csv", dtype=np.int16, delimiter=',', comments='#')
print("data:\n", arr)
print(arr.shape)

## 2. Writing Data into Files
The `np.savetxt()`  is used to save a NumPy array to a text file.
```
np.savetxt(fname, arr, fmt='%.18e', delimiter=' ', newline='\n', header='', footer='', comments='#')

```

   - `fname`: If the filename ends in `.gz`, the file is automatically saved in compressed gzip format.
   - `arr`: 1-D or 2-D array to be saved to a text file.
   - `fmt`: str or sequence of strs, optional.
   - `delimiter`: String or character separating columns (default is space).
   - `newline`: String or character separating lines (default is newline).
   - `header`: A String that will be written at the beginning of the file (default is none).
   - `footer`: A String that will be written at the end of the file (Default is none).
   - `comments`: A string that will be prepended to the `header` and `footer` strings, to mark them comments.
   
>- The `np.save()` saves an array to a binary file in NumPy .npy format
>- The `np.savez()` saves several arrays into an uncompressed .npz archive
>- The `np.savez_compressed()` save several arrays into a compressed .npz archive

**Example 1:** Create a NumPy array and then save it as a text file (space separated numbers in each row). Finally verify by reading the file contents into a numPy array

In [None]:
arr1 = np.array([[1.5, 2.3, 3.7], [4.0, 5.2, 6.8],[7.1, 8.4, 9.3]])
np.savetxt('datasets/myarr.txt', arr1, fmt='%.2f')

In [None]:
!cat datasets/myarr.txt

In [None]:
arr2 = np.genfromtxt("datasets/myarr.txt")
arr2

**Example 2:** Create a NumPy array and then save it as a csv file. Finally verify by reading the file contents into a numPy array

In [None]:
arr1 = np.array([[1.5, 2.3, 3.7], [4.0, 5.2, 6.8],[7.1, 8.4, 9.3]])
np.savetxt('datasets/myarr.csv', arr1, fmt='%.2f', delimiter=',')

In [None]:
! cat datasets/myarr.csv

In [None]:
arr2 = np.genfromtxt("datasets/myarr.csv", usecols=[0, 1], delimiter=',')
arr2

## 3.  Bonus # 1
Visit `https://gist.github.com/arifpucit` and get the URL of public `climate.csv` file from this public GitHub gist, which contains 10,000 climate measurements (temperature, rainfall & humidity) in the following format: 

```
temperature,rainfall,humidity
25.00,76.00,99.00
39.00,65.00,70.00
59.00,45.00,77.00
84.00,63.00,38.00
66.00,50.00,52.00
41.00,94.00,77.00
91.00,57.00,96.00
49.00,96.00,99.00
67.00,20.00,28.00
...
```

Download the file and then read its data and compute the average of temperature, rainfall, and humidity values

- The `urllib.request.urlretrieve(url, filename=None)` method is used to retrieve a remote file into a temporary location on disk.
- Let us download `climate.csv` above mentioned github gist

>**The `urllib.request.urlopen()`, may return a URLError saying `SSL: CERTIFICATE_VERIFY_FAILED`. To handle this error set  the `_create_default_https_context` attribute of `ssl` to `_create_unverified_context`**

In [None]:
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

In [None]:
import urllib
#Get the raw data url from your github gist account of a csv file named climate.csv
myurl = 'https://gist.githubusercontent.com/arifpucit/6e2d95002460db296506ec6f0cfb7008/raw/dae54a4e20d34e4b9622333fcccf04c441a250b7/climate.csv'

# Pass the url string and the path, where to save the file on local disk
urllib.request.urlretrieve(myurl, 'datasets/climate.csv')


In [None]:
! cat datasets/climate.csv

In [None]:
import numpy as np
climate_data = np.genfromtxt("datasets/climate.csv", delimiter=',', skip_header=1)
print("Climate Data:\n", climate_data)
print(climate_data.shape)

In [None]:
# Slice data of the temperature column
climate_data[:,0]

In [None]:
# Slice data of the rainfall column
climate_data[:,1]

In [None]:
# Slice data of the humidity column
climate_data[:,2]

In [None]:
# Calculate the Mean of every column
print("Mean Temperature = ", climate_data[:,0].mean())
print("Mean Rainfall = ", climate_data[:,1].mean())
print("Mean Humidity = ", climate_data[:,2].mean())

>- Let us now create a fourth column, that is the sum obtained by matrix multiplication of climate_data and their corresponding hypothetical weights.

In [None]:
weights = np.array([0.3, 0.2, 0.5])
new_col = np.matmul(climate_data, weights)
new_col

In [None]:
new_col.shape

Let's add the `new_col` to `climate_data` as a fourth column using the `np.concatenate`
Since we wish to add new columns, i.e., horizontally concatenate, so we pass the argument `axis=1` to `np.concatenate`. The `axis` argument specifies the dimension for concatenation.

In [None]:
# First we need to reshape() the new_col to a 10000x1 matrix for concatenation
result_data = new_col.reshape(10000, 1)
result_data, result_data.shape

In [None]:
climate_results = np.concatenate((climate_data, result_data), axis=1)

In [None]:
climate_results

In [None]:
climate_results.shape

The results are written back in the CSV format to the file `climate_results.csv`. 

```
temperature,rainfall,humidity,col4
25.00,76.00,99.00,72.20
39.00,65.00,70.00,59.70
59.00,45.00,77.00,65.20
84.00,63.00,38.00,56.80
...
```



>- Let's write back the resulting numPy array `climate_results` in a new file `climate_results.csv` using the `np.savetxt` method.

In [None]:
np.savetxt('datasets/climate_results.csv', 
           climate_results, 
           fmt='%.2f', 
           delimiter=',',
           header='temperature,rainfall,humidity,col4', 
           comments='')

In [None]:
! cat datasets/climate_results.csv

## 4.  Bonus # 2
Now let us read an image file from disk and load it into a numPy array for image processing task

In [None]:
from PIL import Image

In [None]:
rgb_img = Image.open("datasets/speech.jpg")

In [None]:
rgb_img.mode

In [None]:
rgb_img.size

When translating a color image to greyscale (mode "L"), the library uses the ITU-R 601-2 luma transform::
```
    L = R * 299/1000 + G * 587/1000 + B * 114/1000
```

In [None]:
grey_img = rgb_img.convert('L')

In [None]:
grey_img.mode

In [None]:
grey_img.size

In [None]:
rgb_img

In [None]:
grey_img

#### Let us convert the two images to a NumPy array

In [None]:
rgb_img_array = np.array(rgb_img)

In [None]:
rgb_img_array.shape

In [None]:
rgb_img_array

In [None]:
grey_img_array = np.array(grey_img)

In [None]:
grey_img_array.shape

In [None]:
grey_img_array