# Introduction to Scientific Computing Lecture 7.1

## File I/O (input/output)


In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import xarray as xr

In [None]:
# if I want to change all of my plots:
import matplotlib as mpl
mpl.rcParams['font.size'] = 14
# note this method changes tick mark font sizes as well

## Warm up exercise: 

E1. Plot the height function $H = -100(lat-40)^2 -400{(lon+106)^2} + 8000$ over 38 to 42 N and 108 to 104 W. What mountain range is this a rough approximation to?

# File I/O

You have learned a number of ways to visualize data, and a few analysis tools. Throughout previous lectures, we have had a few examples of grabbing data files, including text files and netcdf files. To review:

### Text files

In [None]:
# to load a simple text file as a numpy array:
tdata = np.loadtxt('Data/populations.txt')

In [None]:
tdata

If we want to save an array as a text file:

In [None]:
np.savetxt('test.txt', tdata)

## Exercises: 

E2. Go look at the file that was just created. Is it is the same as "populations.txt"? If not, how is it different?

E3. Add a header to this file (see the function documentation) with the names of the variables

To see a more complicated text file upload, look here: https://scipython.com/book/chapter-6-numpy/examples/using-numpys-loadtxt-method/

### Netcdf files

In [None]:
# to load a netcdf file as an xarray dataset and convert to numpy arrays:
file = '/Users/chha5666/Documents/Teaching/Intro_computing/Data/CESM.003.SST.1980.nc'
data = xr.open_dataset(file)

In [None]:
# note data is an xarray dataset, which gives you a bunch of "metadata" about what is in the netcdf file
data

In [None]:
# extract variables from the above netcdf file
lat = np.array(data.lat)
#lat
lon = np.array(data.lon)
# monthly sea surface temperature anamoly
sst = np.array(data.SST)

Interrogating the dataset will give us more detailed information about each variable or coordinate

In [None]:
data.lon

### There are many other types of data, and often a number of packages that can be used to open the same type of data files. For example, xarray is built on top of the netCDF4 package

In [None]:
# need a package to deal with netcdf files
import netCDF4 as nc

In [None]:
# open the same file as above with netCDF4
data2 = nc.Dataset(file, 'r')
data2
# this is a netcdf4 dataset

In [None]:
data2.variables

In [None]:
data2.dimensions

To get out a variable we have to do a little more work than with xarray

In [None]:
sst2 = data2.variables['SST'][:]

## Exercises:

E4. What type of object is "sst2"? Is it different from "sst" we got with xarray and numpy? How?

E5. Make sst2 like sst. Hint, this involves masking

# Spreadsheets

Many of your data sets will be in excel files. We have been using xarray for netcdf files, and you have just seen that this gave you consise and useful information about the content of the netcdf file. Xarray is built on pandas, which works similarly for spreadsheets.

In [None]:
import pandas as pd

## Exercises:

Ex 6 (rest of lab). Go through this tutorial on excel and pandas

https://www.dataquest.io/blog/excel-and-pandas/
    
