# ARROW Python Activity 4.2 Hints, Tips, and Code Snippets


This notebook contains hints, tips and code snippets that you might find useful in completing Activity 4.2.

Generally, many activities like this will require the same workflow:

1. Read in some data.
2. Process the data.
3. Optionally, Display the data.
4. Write out the data - usually to a new file.

OPTIONAL: For those of you more confident in Python coding, you could put all the activities in a loop, provide the program with a list of all your spectrum file names and process all of them at the same time. If you do attempt this, don't use Matplotlib to display anything, use Bokeh. For a number of technical reasons, Matplotlib doesn't display multiple, sequential plots well in Jupyter notebooks.

GENERAL HINT: You'll need to read in the spectrum header lines (there's 12 of them - all starting with '#') using ordinary Python File IO. Later you'll use ordinary FileIO to write these to a new file then APPEND the modified pandas Dataframe to this as comma delimited data. (use pandas .to_csv() with "mode='a'). You'll want this information preserved for future use in the Topic.

HINT: Read in, process and write out the main spectral data using Pandas.

HINT: You can display the spectrum using either Matplotlib or Bokeh. Matplotlib is superficially easier but Bokeh will be more useful later on.

HINT: Review the 'Pandas' and 'Bokeh' notebooks which you first used in Week 1


## ARROW Data reduction


Here's some tips and  bare bones snippets you might find helpful. 

Don't forget the spectrum files will either have to be in the same directory/folder as this notebook or you'll have to specify the path to them.

Don't forget to 'import' all the packages/modules you'll need first.


### Step 0 - Functions

Write any functions you might find useful later. This isn't strictly necessary, but will be useful - esspecially if you later put the simple code into a loop.

In this case let's write one to convert from frequency to readial velocity. 

NOTE: the value of using pandas Series (or numPy Array) as the function will operate over the whole vector.

In [None]:
# Function to convert frequency to radial velocity. Normally expect this to be pleaced at the
# start of the program

def freq_to_vel(freq, f0=1420.4e6):
    ''' Takes a frequency value (or Pandas Dataframe column or Series) and returns
    a velocity value (or new Dataframe column of values). f0 is the rest
    frequency and defaults to 1420.4 MHz'''
    
    # We need a value for 'c' - speed of light. Either just do it here or, neatly, use the 
    # astropy 'constants'
    c = 299792458.0  #m/s
    
    v = # DO YOUR CALCULATION HERE
    return v  #(km/s)                          

This next one is a simple function that takes a file name and returns a new file name that just has "-vel" added into the original name just before the ".".

So if we passed it "spectra_080.csv" it will return "spectra_080-vel.csv"

This just allows you to generate a new name for spectra that have been converted to radial velocities from frequencies. 

You could, of course, just prompt the user for a new name!

In [None]:
# OPTIONAL A simple function to generate a new name for the velocity/intensity file.
# Could just prompt user for a new name.
def gen_file_name(f_name):
    ''' Just adds "-vel" to file name before ".csv"'''
    return f_name.replace(".", "-vel.")

### Step 1 - Get the data

In [None]:
number_header_lines=12
file_name = input('Enter spectrum file name ')

# First get 'header' lines - there are 12 of them we'll
# You'll add these back on to our modified CSV later
with open(file_name) as f:
#   Here's a simple, single line to get the 12 header lines - you could use an ordinary loop
    header_lines = [f.readline() for x in range(number_header_lines)]

# Read in the rest of the data
spectrum_df = pd.read_csv(file_name, header=number_header_lines)

### Step 2 - Process it

Actually, now we've defined our 'freq_to_vel()' function, this is pretty trivial.

There are a couple of steps you'll need to take to then modify the exitsing DataFrame. This isn't necessary if you're going to be using straight forward Python file IO, but makes writing the file later using Pandas pretty easy.



In [None]:
# Convert frequency to radial velocity values using this function
spectrum_v = freq_to_vel(spectrum_df['frequency'])

# New df with column name changed to reflect unit change
spectrum_df_v = spectrum_df.rename(index=str, columns={'frequency': 'velocity'})
# Replace frequency values with velocity values
spectrum_df_v['velocity'] = spectrum_v


### Optional Activity - Baseline Removal

This acvtivity uses the spectra you might have collected 'off-source'.

You'll need to read these in, find and average of the 'intensity' column (as the frequency column will be the same as the main spectra you only need to concern yourself with the 'intensity' column)

The steps will be something like this:

1. Read in the seprate background files using Pandas. This will be as above, but you can ignore the header lines completely.
2. Average the 'intensity' columns. Take advantage of the fact that, as with a numPy 1D Array, you can sum a number of Pandas Series (or DataFRame columns) by just using '+'. And you can divide a whole column by a number by just using '/'. See Section 5.3 'NumPy Arrays, in the "Python Everything You Wanted To Know" resource.
3. Subtract this from the main spectrum 'intensity' column.


### Step 3 - Display it

In [None]:
output_notebook()
p = figure(title = "Put your Title here", 
          x_axis_label='Velocity (kms^-1)', 
          y_axis_label='Intensity')
p.line(DataFrame x values, DataFrame y values)
show(p)


### Step 4 - Finally, write the modified data out to a file.


In [None]:
# The 'file_name' was user input in Step 1
# Here, gen_file_name() is a function we defined earlier which is used to generate a new file
# name. You could just as easily prompt the user for a new name.
new_file_name = gen_file_name(file_name) 

# First write the header lines that we read in earlier to the file
with open(new_file_name, "w") as f:
    f.writelines(header_lines)
    
# Now append the modified csv data
# Note we just 'append' the data - otherwise we would overwtite the header lines
spectrum_df_v.to_csv(new_file_name, index=False, mode='a')