# Reading data into python scripts

## Data input
Getting data into an out of our programs will be a key component of our scripts.  There are several ways to do this (yet another good/bad aspect of python).  For class we will rely on a few, key python packages:
<ol>
    <li>numpy: contains many numerical functions, together with matplotlib and scipy allows many of the same features of matlab
    <li>scipy: scientific analysis packages
    <li>pandas: package specifically for loading data, and creating data frames
    <li>matplotlib: provides many of the matlab plotting functions
</ol>
Let's start with a simple set of examples.  If we assume we have an ascii dataset, for example, the Honolulu tidegauge data from past classes, here are a few ways to read the data into a script.

<ol>
  <li> ASCII
  <ol>
      <li> open and readline (this makes variables that are strings, not arrays)
      <pre>
       # Open file
       f = open('sample.dat', 'r')
       # Read and ignore header lines
       header1 = f.readline()
       header2 = f.readline()
       # Loop over lines and extract variables of interest
       for line in f:
         line = line.strip()
         columns = line.split()
         month = columns[0]
         temp = float(columns[1])
       print(month, temp)
       f.close()
      </pre>
   <li> numpy loadtxt (this makes an m by n numpy array) </li>
      <pre>
       data = np.loadtxt('sample.dat', delimiter=',', comments='#')
      </pre>      
   <li> numpy fromfile (if the number of columns are not consistent)
      <pre>
       data = np.fromfile('sample2.dat', dtype=float, sep='\t', count=-1)
      </pre>
   <li> numpy fromregex (this makes an m by n numpy array)
      <pre>
       data = np.fromregex('sample.dat', r'(\d+),\s(\d+)', np.float)
      </pre>
   <li> numpy genfromtxt (this makes an m by n numpy array)
      <pre>
       data = np.genfromtxt('sample.dat',delimiter=',',skiprows=2)
       # or, if the columns have different types:
       #1   2.0000  buckle_my_shoe
       #3   4.0000  margery_door
       data = np.genfromtxt('filename', dtype= None)
       # data = [(1, 2.0, 'buckle_my_shoe'), (3, 4.0, 'margery_door')]
      </pre>
   <li> pandas read_table (this makes a pandas.core.frame.DataFrame; note the column headers will be the first row, so may need to specify this)
      <pre>
       data = pd.read_table('sample.dat', sep=',')
      </pre>
   <li> pandas read_csv (this makes a pandas.core.frame.DataFrame; note the column headers will be the first row, so may need to specify this)
      <pre>
       data = pd.read_csv('sample.dat', header=1)
      </pre>
   </ol>
<li>	sound (wav) files
  <ol>
  <li>librosa
     <pre>
      x, sr = librosa.load('sample.wav')
     </pre>
  <li>wave
     <pre>
       wf = wave.open(('sample.wav'), 'rb')
     </pre>
  </ol>
<li> NetCDF
  <ol>
    <li> netCDF4
       <pre>
          from netCDF4 import Dataset
          fh = Dataset('sample.nc', mode='r')
          time = fh.variables['time'][:]
          lon = fh.variables['lon'][:,:]
          lat = fh.variables['lat'][:,:]
          temp = fh.variables['temp'][:,:]
       </pre>
    <li> xarray
       <pre>
          import xarray as xr
          ds = xr.open_dataset('sample.nc')
          df = ds.to_dataframe()
       </pre>
    </ol>
<li>OPeNDAP
  <ol>
     <li> netCDF4 – see above (just like local file, but pass URL endpoint)
     <li> pydap
         <pre>
            from pydap.client import open_url
            import numpy as np
            from numpy import *
            # set ULR from PO.DAAC
            dataset = open_url("http://opendap-uat.jpl.nasa.gov/thredds/dodsC/ncml_aggregation/OceanTemperature/ghrsst/aggregate__ghrsst_DMI_OI-DMI-L4-GLOB-v1.0.ncml")
            lat = dataset.lat[:]
            lon = dataset.lon[:]
            time = dataset.time[:]
            sst = dataset.analysed_sst.array[0]
         </pre>
    </ol>
<li> matlab binary
  <ol>
      <li> scipy loadmat
        <pre>
           from scipy.io import loadmat
           fin1 = loadmat('sample.mat',squeeze_me=True)
           mtime = fin1['mday']
           Tair = fin1['ta_h']
           Press = fin1['bpr']
        </pre>
    </ol>
<li>shapefile
  <ol>
    <li> geopandas
       <pre>
          import geopandas as gpd
          shape_gpd = gpd.read_file('sample.shp')
       </pre>
    <li> salem
       <pre>
          shpf = salem.get_demo_file('sample.shp')
          gdf = salem.read_shapefile(shpf)
   </ol>
</ol>