<center><img src="http://www.nasa.gov/sites/all/themes/custom/nasatwo/images/nasa-logo.svg"></center>

<center>
<h1><font size="+3">GSFC Python Bootcamp</font></h1>
</center>

---

<center><h3>File Manipulation and Usage within Science and Engineering Applications</h3></center>

In [0]:
# data downloads for this lesson

import urllib.request

# obtain jpg file from online
url = 'https://blog.lipsumarium.com/assets/img/posts/2017-07-22-caption-memes-in-python/one-does-not-simply-make-a-good-meme-generator-in-python.jpg'
urllib.request.urlretrieve(url, "meme.jpg")

# this file contains a list of Winter Olympic Medals and details
urllib.request.urlretrieve('http://winterolympicsmedals.com/medals.csv', "medals.csv")

# obtain JSON file from online
url = 'https://services.swpc.noaa.gov/json/solar_probabilities.json'
urllib.request.urlretrieve(url, 'probabilities.json')

# 1. The Basics

---

Normally, in an introductory course, one would learn the following operations to read and write plain text/string data:

## 1a. Reading ASCII/Text files

```python
f = open('filename.txt', 'r')
data = f.read() # readline(s) as well
f.close()

with open('filename.txt', 'r') as f:
    data = f.read() # we can also use other modes
```

## 1b. Writing ASCII/Text files

```python
f = open('filename.txt', 'w')
f.write(data) # type(data) == str
f.close()

with open('filename.txt', 'w') as f:
    f.write(data) # writeline(s) as well
```

## 1c. Binary

```python
with open('filename.bin', 'rb') as f:
    data = f.read() # read without decoding
```

If you had a Binary file with mixed data types of known formats, you would then use the [`struct`](https://docs.python.org/3/library/struct.html) function to aid you in decoding binary data. Imagery files such as JPG, PNG, etc. can be read directly using the binary mode of Python, but this can be very tedious as well as not a viable option to read image data.

_Warning:_ Be careful about the endianness of your files! (Big or little)

Take for example, the following image:

![meme](https://blog.lipsumarium.com/assets/img/posts/2017-07-22-caption-memes-in-python/one-does-not-simply-make-a-good-meme-generator-in-python.jpg)

In [0]:
with open('meme.jpg', 'rb') as f:
  data = f.read()
  
print(data[:40])

Here, we have read an imagery file in binary mode, but have not decoded this binary string into numbers, text, or whatever else we desire. Other packages such as PIL (Python Imaging Library) exist for those inclined which you should use instead the fork called [Pillow](https://python-pillow.org/). The more advanced and popular [OpenCV](https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_tutorials.html) would aid in doing [image manipulation](https://docs.python-guide.org/scenarios/imaging/).

# 2. Other File Types (Standard Packages)

---

Beyond document-based reading and writing of file data, what other types of data are there?

## 2a. CSV

Comma-separated value files are similar to spreadsheets and tabular/formatted data and used widely in the engineering and financial disciplines. In Python, we can read these directly using the `csv` module:

In [0]:
# this file contains a list of Winter Olympic Medals and details
!head medals.csv

In [0]:
import csv

cr = csv.reader(open('medals.csv')) # there is also a writer to write csv files

records = 0 # just need a counter to limit output
for row in cr:
  print(row)
  
  if records != 10:
    records += 1
  else:
    break

We can also use [NumPy](https://github.com/pytrain/numpy/blob/master/IntroNumPy.ipynb) or [Pandas](https://github.com/pytrain/pandas/blob/master/Intro_Pandas.ipynb) or other packages to read this file type.

In [0]:
# NumPy
import numpy as np
year = np.loadtxt('medals.csv', delimiter=',', usecols=(0), unpack=True, skiprows=1)
print(year)

################################################################################

import pandas as pd
# only 10 rows of data will be displayed
pd.set_option("max_rows", 10) 
# print floating point numbers using fixed point notation,
np.set_printoptions(suppress=True)

# Pandas
data = pd.read_csv('medals.csv')
print(data)

# 2b. JSON

---

JavaScript Object Notion is basically a dictionary or list of dictionaries put into an ASCII/Text file or streamed directly. They are mainly used in web programming and with JavaScript for passing data between websites and the user. Like, CSV, Python contains a direct package to read this type of data.

The following data is the solar event probabilities from the Space Weather Prediction Center. This aids scientists in determining if there will be a solar event that could either cause damage to space-based instruments or impact other Earth-based instrumentation like GPS.

In [0]:
!cat probabilities.json

In [0]:
import json
with open('probabilities.json') as f:
  data = json.loads(f.read())
  
print(data[0])

Or, if we wanted to have some fun, we could continually find the location of the International Space Station:

In [0]:
import json
import urllib
import time
import datetime as dt

i = 0
while i < 10:
  response = urllib.request.urlopen("http://api.open-notify.org/iss-now.json")
  obj = json.loads(response.read())
  
  t = dt.datetime.utcfromtimestamp(obj['timestamp']).strftime('%Y-%m-%d %H:%M:%S')
  
  print('time: ', t, ', position: (',
        obj['iss_position']['latitude'], ' ,', obj['iss_position']['longitude'],
        ')', end='')
  
  time.sleep(5)
  i += 1
  print('\r', end='')

We can also use Pandas to read this file type.

In [0]:
pd.read_json('probabilities.json')

## Exercise

---




I'd like to find out the list of airbus flights and their properties of the aircraft. Download, read, and plot the points of the aircraft track.


CSV File: https://opensky-network.org/datasets/states/airbus_tree.csv

It's hard to test your knowledge of these packages as they are so simple and reading data is usually a preliminary step for data analysis.

In [0]:
#@title
import pandas as pd

data = pd.read_csv('https://opensky-network.org/datasets/states/airbus_tree.csv')

# plotting
import matplotlib.pyplot as plt

for flight in data:
  plt.plot(data['lat'], data['lon'])

# 3. Other File Types (Non-Standard)

---

Beyond these multi-disciplinary file types, there are other file types that exists that are specific to a research area or data source.


## 3a. Earth Science (HDF-5 / netCDF4)

Due to the nature of the data produced by Earth Science models, one would need to store time-dependent data within files that can be grouped or put into a particular type of hierachy. HDF-5 is the base file type for this hierarchical data type and netCDF4 is a reduced version limiting to the groups to just one.


In [0]:
import netCDF4 as nc

f = nc.Dataset('filename.nc4') # works with netCDF3 file types
print(f.variables)
f.close()

In [0]:
import h5py as h5

f = h5.File('filename.h5') # should work with netCDF4 files
print(f.groups)
print(f.attrs)
f.close()

In addition to this hierarchical raw data format for Earth Science data, there is also GIS application data types

In [0]:
# insert to read/manipulate shapefiles & GEOtiffs

## 3b. Space Science (Astronomy, Heliophysics, etc.) - FITS Files

FITS (Flexible Image Transport System) files contains imagery and the metadata associated with the imagery that is found in the file. FITS is a standard data format used within astronomy and is endorsed by [GSFC NASA](http://fits.gsfc.nasa.gov/) and the IAU (International Astronomical Union).

Most FITS files when opened from a web browser shows a header of ASCII (human readible) giving the details or descriptions of the data contained within the file.

> Sample Files:  
>  
> There are samples within the package AstroPy and some distributed online through GSFC. [Here](http://fits.gsfc.nasa.gov/fits_samples.html) is a link to those samples provided by GSFC.

### Reading a FITS Fiile: Crab Nebula and Pulsar

In [0]:
from astropy.io import fits

# FITS sample file used from Chandra X-Ray Observatory:
# http://chandra.harvard.edu/photo/2009/crab/fits/crab.fits
image_file = fits.open('http://chandra.harvard.edu/photo/2009/crab/fits/crab.fits')

Our image file contains headers and data combined. Let's look at the header information first.

### FITS Headers

In [0]:
image_file[0].header

In [0]:
image_file.info()

In [0]:
image_data = image_file[0].data
print(image_data.shape)

### Plotting with AstroPy

Here, we will use matplotlib in conjunction with AstroPy to visualize this Nebula.

In [0]:
import matplotlib.pyplot as plt
%matplotlib inline
from astropy.visualization import astropy_mpl_style
plt.style.use(astropy_mpl_style)

plt.figure(figsize=(20,10))
plt.imshow(image_data, cmap='gray')
plt.colorbar()

In [0]:
plt.figure(figsize=(20,10))
plt.imshow(image_data, cmap='plasma')
plt.colorbar()

You can also create a FITS file from a NumPy array using the following template:

```python
hdu = fits.PrimaryHDU(new_data)
hdu.writeto('filename.fits')
```

The metadata can be added in later, but with the PrimaryHDU function, it goes ahead and fills some of that data in for you.

## 3c. Engineering Applications (Signal Processing, Streamed Data, etc.)

Signal processing is one example of an engineering application that would take a specific data format and require one to manipulate or modify the data in order to produce desired physical quantities. Let's take for example a sample sine wave for audio.

In [0]:
# Generate a sound
import numpy as np
from IPython.display import Audio
import matplotlib.pyplot as plt
%matplotlib inline

framerate = 44100
t = np.linspace(0,5,framerate*5)
data = np.sin(2*np.pi*220*t) # one tone
plt.plot(data)
data = data + np.sin(2*np.pi*224*t) # two tones (two sine waves)
plt.plot(data)
plt.xlim(0,1000)
Audio(data,rate=framerate)

In [0]:
# Can also do stereo or more channels
dataleft = np.sin(2*np.pi*220*t)
dataright = np.sin(2*np.pi*224*t)
plt.plot(dataleft)
plt.plot(dataright)
plt.xlim(0,1000)
Audio([dataleft, dataright],rate=framerate)

In [0]:
Audio("http://www.nasa.gov/mp3/574928main_houston_problem.mp3")  # From URL