## Importing Data
Data can be from text file (e.g., CSV), compressed file (`.gz` or `bz2`), or text string.  
`genfromtxt` or `loadtxt` are options.  `genfromtxt` can take ***missing data*** into account, but `loadtxt` cannot. 

[Importing data with](https://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html#defining-the-input)

In [82]:
import numpy as np
from io import BytesIO, StringIO

## Simple CSV type
Python3 requires `encode()` to the string given to BytesIO()

In [26]:
data = "1, 2, 3\n4, 5, 6"
x = np.genfromtxt(BytesIO(data.encode()), delimiter=",")
print(x)

[[ 1.  2.  3.]
 [ 4.  5.  6.]]


## Delimiter for fixed length 

In [33]:
data = "  1  2  3\n  4  5 67\n890123  4"
x = np.genfromtxt(BytesIO(data.encode()), delimiter=3)  # 3 chars to each column
print(x, '\n')

data = "123456789\n   4  7 9\n   4567 9"
x = np.genfromtxt(BytesIO(data.encode()), delimiter=(4, 3, 2)) # (4chars, 3chars, 2chars)
print(x)

[[   1.    2.    3.]
 [   4.    5.   67.]
 [ 890.  123.    4.]] 

[[ 1234.   567.    89.]
 [    4.     7.     9.]
 [    4.   567.     9.]]


## Comments

In [48]:
data = """#
    # Skip me !
    # Skip me too !
    1, 2
    3, 4
    5, 6 #This is the third line of the data
    7, 8
    # And here comes the last line
    9, 0
    """

x = np.genfromtxt(BytesIO(data.encode()), comments="#", delimiter=",")
print(x)

[[ 1.  2.]
 [ 3.  4.]
 [ 5.  6.]
 [ 7.  8.]
 [ 9.  0.]]


## Skip header/footer

In [50]:
data = """Header Line 1
    Header Line 2
    1, 2, 3
    4, 5, 6
    Footer Line 
    """
x = np.genfromtxt(BytesIO(data.encode()), skip_header=2, skip_footer=1, delimiter=",")
print(x)

[[ 1.  2.  3.]
 [ 4.  5.  6.]]


## Selective Columns

In [65]:
data = "1 2 3\n4 5 6"
x = np.genfromtxt(BytesIO(data.encode()), usecols=(0,2))
print(x)
x.shape

[[ 1.  3.]
 [ 4.  6.]]


(2, 2)

In [72]:
# Assign names to columns and use the names for selection 
# When column name is given, returned obj is accessible by name
data = "1 2 3\n4 5 6"
names = ["a","b","c"]
x = np.genfromtxt(BytesIO(data.encode()), names=names, usecols=("a","c"))
print('returned=', x)
print('shape=', x.shape)
print('x["a"] =', x["a"])

returned= [( 1.,  3.) ( 4.,  6.)]
shape= (2,)
x["a"] = [ 1.  4.]


## With column name and format 

In [83]:
data = "1, 1.2, 4\n2, 2.4, 6"
dtype=[('a', '<i8'), ('b', '<f8'), ('c', '<i8')]
x = np.genfromtxt(BytesIO(data.encode()), dtype=dtype, delimiter=',')
print('returned=', x)
print('x["a"]=', x["a"])
print('x["b"]=', x["b"])

returned= [(1,  1.2, 4) (2,  2.4, 6)]
x["a"]= [1 2]
x["b"]= [ 1.2  2.4]


## Custom Conveter

In [109]:
# converter function (from % to ratio)
convertfunc = lambda x: float(x.decode().strip("%"))/100.

data = "1, 2.3%, 45.\n6, 78.9%, 0"
names = ("i", "p", "n")

# For the column named "p", apply convertfunc
x = np.genfromtxt(BytesIO(data.encode()), delimiter=",", names=names, converters={"p":convertfunc} )
print(x)

[( 1.,  0.023,  45.) ( 6.,  0.789,   0.)]


## Missing value handling, and parameters via dic

In [116]:
data = "N/A, 2, 3\n4, ,???"


# 0th column: N/A -> 0
# 'b' name column: " " -> 0 
# 2nd column: "???" -> -999
kwargs = dict( delimiter=",", 
                dtype=int, 
                names="a,b,c",
                missing_values={0:"N/A", 'b':" ", 2:"???"},
                filling_values={0:0, 'b':0, 2:-999}
              )
x = np.genfromtxt(BytesIO(data.encode()), **kwargs)
print(x)

[(0, 2,    3) (4, 0, -999)]


## Sisters to genfromtxt
- `ndfromtxt` : Always set usemask=False (output is always standard numpy.ndarray)
- `mafromtxt` : Always set usemask=True (output is always MaskedArray) 
- `recfromtxt` : Default dtype=None, i.e., try to automatically determine value type of each column 
- `recfromcsv` : delimiter=","

___
## Simple loadtxt

In [110]:
data = "0 1\n2 3"
x = np.loadtxt(StringIO(data))
print(x)

[[ 0.  1.]
 [ 2.  3.]]


## Mixed format in loadtxt

In [111]:
data = "M 21 72\nF 35 58"
dtype={'names':   ('gender','age', 'weight'), 
       'formats': ('S1',    'i4',  'f4')}
x = np.loadtxt(StringIO(data), dtype=dtype)
print(x)
print(x["age"])

[(b'M', 21,  72.) (b'F', 35,  58.)]
[21 35]
