# ARROW Python Activity 5.2 and 5.3 Hints, Tips, and Code Snippets


This notebook contains hints, tips and code snippets that you might find useful in completing Activities 5.2 and 5.3

Generally, many activities like this will require the same workflow:

1. Read in some data.
2. Process the data.
3. Optionally, Display the data.
4. Write out the data - usually to a new file.

OPTIONAL: For those of you more confident in Python coding, you could put all the activities in a loop, provide the program with a list of all your spectrum file names and process all of them at the same time. If you do attempt this, don't use Matplotlib to display anything, use Bokeh. For a number of technical reasons, Matplotlib doesn't display multiple, sequential plots well in Jupyter notebooks.

GENERAL HINT: You'll need to read in the spectrum header lines (there's 12 of them - all starting with '#') using ordinary Python File IO. Later you'll use ordinary FileIO to write these to a new file then APPEND the modified pandas Dataframe to this as comma delimited data. (use pandas .to_csv() with "mode='a'). You'll want this information preserved for future use in the Topic.

HINT: Read in, process and write out the main spectral data using Pandas.

HINT: You can display the spectrum using either Matplotlib or Bokeh. Matplotlib is superficially easier but Bokeh will be more useful later on.

HINT: Review the 'UsingPandas', 'UsingBokeh' and 'FileIO' before you start



Here's some tips and  bare bones snippets you might find helpful. 

Don't forget the spectrum files will either have to be in the same directory/folder as this notebook or you'll have to specify the path to them.

Don't forget to 'import' all the packages/modules you'll need first.


## Step 0 - Imports and Functions

Do your imports - you should be pretty confident with this now. We'll do the firts obvious one:


In [1]:
import pandas as pd
# Any others you might need

Now write any functions you might find useful later. This isn't strictly necessary, but will be useful - especially if you later put the simple code into a loop.

First we'll provide you with  a function that performs the slightly awkward process of reading in the data file - initially reading (and saving for later) the header lines, and then the actual data. You could, of course just do this in a block at the start of the code. Do, at least try and understand what is going on though.


In [2]:
def read_ARROW_data(filename):
    """Reads in and partially processes an  ARROW spectrum
    
    The spectrum file contains a number of header lines indicated by `#' or blanks. 
    This function splits these from the main data and returns both 
            
    Parameters
    ----------
    filename : str
        Name of the spectrum file
    
    Returns
    -------
    dat : class: pandas.DataFrame
        Spectrum data
    Header lines : list of str
        List of header lines
    """
    
    # Read lines till first line not starting with #, or whitespace.
    # Store these as a list
    header_list=[]
    number_header_lines=0
    dat=None
    with open(filename) as f:
        line = f.readline()
        while line[0] == '#' or line[0] == ',' or line[0].isspace():
            header_list.append(line)
            number_header_lines += 1
            line = f.readline()
        dat = pd.read_csv(filename, header=number_header_lines)

    return dat, header_list

Next, let's write one to convert from frequency to radial velocity. You just need to fill in the actual calculation line.

NOTE: if you pass a pandas Series (or numPy Array) to the function it will operate over the whole data structure.

In [None]:
# Function to convert frequency to radial velocity. Normally expect this to be pleaced at the
# start of the program

def freq_to_vel(freq, f0=1420.4e6):
    ''' Takes a frequency value (or Pandas Dataframe column or Series) and returns
    a velocity value (or new Dataframe column of values). f0 is the rest
    frequency and defaults to 1420.4 MHz'''
    
    # We need a value for 'c' - speed of light. Either just do it here or, neatly, use the 
    # astropy 'constants'
    c = 299792458.0  #m/s
    
    #v = # DO YOUR CALCULATION HERE - probably use km/s for convenience 
    v = 
    return v  #(km/s)                          

### Step 1 - Get the data

Now we can start the actual code.

First we'll use our function to get the header lines and the main data

In [3]:
# Prompt the user for a file name (we'll call it file_name)
# You should know how to do this by now
file_name = #?????????

spectrum_df, header_lines = read_ARROW_data(# What goes in here?)

# Display the first few lines - does it look reasonable?
spectrum_df.head(4)


Unnamed: 0,frequency,intensity
0,-800000,2.322
1,-795000,2.363
2,-790000,2.439
3,-785000,2.446


### Step 2 - Baseline Removal

This acvtivity uses the spectra you should have collected 'off-source'.

You'll need to read these in, find an average of the 'intensity' column (as the frequency column will be the same as the main spectra you only need to concern yourself with the 'intensity' column)

The steps will be something like this:

1. Read in the separate background files using Pandas. This will be as above, but you can ignore the header lines completely.
2. Average the 'intensity' columns. Take advantage of the fact that, as with a numPy 1D Array, you can sum a number of Pandas Series (or DataFRame columns) by just using '+'. And you can divide a whole column by a number by just using '/'. See Section 5.3 'NumPy Arrays, in the "Python Everything You Wanted To Know" resource.
3. Subtract this from the main spectrum 'intensity' column.

You could use our data reading function or just skip the header lines using pandas

Below is a very clumsy way of doing this using 'hard wired' file names. You should be able to make this more flexible by providing a list of file names (best by reading from a text file you supply) and iterating or looping over this list.


In [None]:
# Read in the background spectra, average and subtract from the spectrum
# Here we use  'hard-wired' file names but you could use a file list, or manually enter them
number_header_lines = 12

bg1 = pd.read_csv('bg1.csv', header=number_header_lines)
bg2 = pd.read_csv('bg2.csv', header=number_header_lines)
bg3 = pd.read_csv('bg3.csv', header=number_header_lines)

# Compute average 'intensity' values
bg_av = (bg1['intensity']+bg2['intensity']+bg3['intensity'])/3
print(type(bg_av))
# Subract from spectrum 'intensity'
spectrum_df['intensity'] = spectrum_df['intensity']-bg_av.values

spectrum_df.head(4)


Here, in outline, is a much better and more flexible way of doing this:

1. Prepare a text file (with a simple text editor - NOT a word processor) containing a list of background files - one file name per line. 
2. Using the .read().splitlines() method demonstrated in the FileIO notebook, section 2.2, produce a Python list of these file names by reading the file - call it, say, 'bg_files'
3. Now produce a Python list of the actual data from each of these files. Here's the sort of code you'll need:


`li=[]
 for f in bg_files:
    df = pd.read_csv(f, header=12)
    li.append(df)`

4. Now you have a list containg data you can produce an average set of data by using the 'sum' function and then dividing by the number of files - which, of course, is the length of the file list you've produced.

`bg_av = sum(li)/len(li)`

5. Finally you can subtract the 'intensity' values of this from the spectrum_df 'intensity' data as above.

`spectrum_df['intensity'] = spectrum_df['intensity']-bg_av['intensity'].values`

    

### Step 2 - Process it

Actually, now we've defined our 'freq_to_vel()' function, this is pretty trivial.

There are a couple of steps you'll need to take to then modify the exitsing DataFrame. This isn't necessary if you're going to be using straight forward Python file IO, but makes writing the file later using Pandas pretty easy.



In [None]:
# Convert frequency to radial velocity values using this function
spectrum_v = freq_to_vel(spectrum_df['frequency'])

# Add a new 'velocity' column with these values
spectrum_df['velocity'] = spectrum_v


### Step 3 - Display it

Use matplotlib or bokeh. There are UsingMatplotlib and UsingBokeh notebooks to assist.

Don't forget you need to identify 'x' values and 'y' values from your data to pass to matplotlib or Bokeh. UsingBokeh section 2 should help here.



In [None]:
# Display code goes here


### Step 4 - Finally, write the modified data out to a file.

1. Prompt for a new file name
2. Use the .writelines() function from FileIO section 2.1 to write the saved header lines to this file.
3. Now APPEND the modified pandas data using the pandas .to_csv() method that is illustrated in UsingPandas section 3. Don't forget to APPEND it or you will overwrite the header lines.

In [None]:
# Prompt for a new file name
new_file_name = #########?

# First write the header lines that we read in earlier to the file.
# Use the .writelines() function from FileIO section 2.1

    
# Now APPEND the modified csv data using the pandas .to_csv() method 
# UsingPandas section 3 should help