# Reading Data from Files, Writing Data to Files

We learned about the different data types in python (integers, floats, booleans, strings), the different types of python containers that can hold data, where that be native containers (lists, tuples, dictionaries), or numpy arrays.  We will often encounter situations where we need to read data from a file into these containers, or write data in these containers to a file. This is a very common task in scientific computing will 
be covered in this section. 

In Python, text is stored in strings in **text files**. We learned about the string (`str`) data type in the Week2 lecture. By text we mean sequences of alphanumeric characters that make sense to a human. In contrast, there are also so-called **binary files** used for large data outputs, or executable programs, that can only be read by a computer program.

There are several different ways to read and write text files in Python. We will cover the most common ones in turn which are: 
- Reading and writing files line-by-line
- Reading and writing files directly into arrays using the `numpy` library
- Reading and writing files using the `pandas` library

There are of course many other ways to read and write files in Python, but these are arguably the most common and useful ones. 

We have already talked about built-in Python types, but there are more types that we did not speak about. One of these is the file() object which can be used to read or write files.

## The Data

The file `data/data_2MASS.txt` contains the magnitudes of some stars from the 2MASS astronomical sky survey.

``` text 
# Magnitudes of some stars from the 2MASS astronomical sky survey. 
# Coordinates are in decimal degrees in the J2000 equinox.
#  RA          DEC          Name          Jmag   e_Jmag
# (deg)       (deg)                       (mag)   (mag)
010.684737 +41.269035 J00424433+4116085   9.453  0.052
010.683469 +41.268585 J00424403+4116069   9.321  0.022
010.685657 +41.269550 J00424455+4116103  10.773  0.069
010.686026 +41.269226 J00424464+4116092   9.299  0.063
010.683465 +41.269676 J00424403+4116108  11.507  0.056
010.686015 +41.269630 J00424464+4116106   9.399  0.045
010.685270 +41.267124 J00424446+4116016  12.070  0.035
```

The lines that start with `#` are comments that make up the file **header**. A header provides descriptive information or metadata (like units) about what is stored in the file and how it is arranged, but does not constitute the data itself. Then we see a table of data with 5 columns. The first two columns contain floating point numbers (floats) which are  the coordinates (right ascension and declination) of the star in decimal degrees. The third column is a string, which is name of the star.  The fourth and fifth columns also contain floats which represent the $J$-band magnitude of the star (`Jmag`), and the error on this measurement (`e_Jmag`). In Astronomy, the magnitude of a star is a logarithmic measure of its brightness (the $\log_{10}$ of its intensity), and $J$ represents the astronomical filter that was used for the measurements. 

## Reading and Writing Files Line-by-Line

Let's try and access the contents of the data file in Python. We start off by creating a file object:

In [1]:
f2MASS = open('data/data_2MASS.txt', 'r')

The open function is taking the data/data.txt file, opening it, and returning an object (which we call f) that can then be used to access the data.


We have already talked about several built-in Python data types (lists, tuples, dictionaries), but there are other types that we did not discuss. One of these is the file() object which can be used to read or write files.

In [1]:
import numpy as np

# Initialize empty lists
raj_list = []
dej_list = []
name_list = []
jmag_list = []
e_jmag_list = []

# Open the file
with open('data_2MASS.txt', 'r') as file:
    # Read and parse the lines
    for line in file:
        # Skip comment lines
        if line.startswith('#'):
            continue

        # Split the line into columns
        columns = line.split()

        # Parse the columns and append to the lists
        raj_list.append(float(columns[0]))
        dej_list.append(columns[1])
        name_list.append(columns[2])
        jmag_list.append(float(columns[3]))
        e_jmag_list.append(float(columns[4]))

# Convert the lists to arrays
raj = np.array(raj_list)
dej = np.array(dej_list)
two_mass = np.array(name_list)
jmag = np.array(jmag_list)
e_jmag = np.array(e_jmag_list)

# Print the arrays
print('RAJ:', raj)
print('DEJ:', dej)
print('2MASS:', two_mass)
print('Jmag:', jmag)
print('e_Jmag:', e_jmag)

RAJ: [10.684737 10.683469 10.685657 10.686026 10.683465 10.686015 10.68527 ]
DEJ: ['+41.269035' '+41.268585' '+41.269550' '+41.269226' '+41.269676'
 '+41.269630' '+41.267124']
2MASS: ['00424433+4116085' '00424403+4116069' '00424455+4116103'
 '00424464+4116092' '00424403+4116108' '00424464+4116106'
 '00424446+4116016']
Jmag: [ 9.453  9.321 10.773  9.299 11.507  9.399 12.07 ]
e_Jmag: [0.052 0.022 0.069 0.063 0.056 0.045 0.035]


In [2]:
import numpy as np

# Use np.loadtxt to read the data from the file
data = np.loadtxt('data_2MASS.txt', dtype={'names': ('RAJ', 'DEJ', '2MASS', 'Jmag', 'e_Jmag'),
                                           'formats': ('f8', 'S10', 'S20', 'f8', 'f8')})

# Extract the columns
raj = data['RAJ']
dej = data['DEJ']
two_mass = data['2MASS']
jmag = data['Jmag']
e_jmag = data['e_Jmag']

# Print the arrays
print('RAJ:', raj)
print('DEJ:', dej)
print('2MASS:', two_mass)
print('Jmag:', jmag)
print('e_Jmag:', e_jmag)

RAJ: [10.684737 10.683469 10.685657 10.686026 10.683465 10.686015 10.68527 ]
DEJ: [b'+41.269035' b'+41.268585' b'+41.269550' b'+41.269226' b'+41.269676'
 b'+41.269630' b'+41.267124']
2MASS: [b'00424433+4116085' b'00424403+4116069' b'00424455+4116103'
 b'00424464+4116092' b'00424403+4116108' b'00424464+4116106'
 b'00424446+4116016']
Jmag: [ 9.453  9.321 10.773  9.299 11.507  9.399 12.07 ]
e_Jmag: [0.052 0.022 0.069 0.063 0.056 0.045 0.035]


In [12]:
import pandas as pd

# Use pd.read_csv to read the data from the file
data = pd.read_csv('data_2MASS.txt', sep='\s+', comment='#', 
                   names=['RAJ', 'DEJ', '2MASS', 'Jmag', 'e_Jmag'])

# Extract the columns and convert to numpy arrays
raj = data['RAJ'].values
dej = data['DEJ'].values
two_mass = data['2MASS'].values
jmag = data['Jmag'].values
e_jmag = data['e_Jmag'].values

# Print the arrays
print('RAJ:', raj)
print('DEJ:', dej)
print('2MASS:', two_mass)
print('Jmag:', jmag)
print('e_Jmag:', e_jmag)

RAJ: [10.684737 10.683469 10.685657 10.686026 10.683465 10.686015 10.68527 ]
DEJ: [41.269035 41.268585 41.26955  41.269226 41.269676 41.26963  41.267124]
2MASS: ['00424433+4116085' '00424403+4116069' '00424455+4116103'
 '00424464+4116092' '00424403+4116108' '00424464+4116106'
 '00424446+4116016']
Jmag: [ 9.453  9.321 10.773  9.299 11.507  9.399 12.07 ]
e_Jmag: [0.052 0.022 0.069 0.063 0.056 0.045 0.035]


In [11]:
# Use pd.read_csv to read the data from the file
data = pd.read_csv('data_2MASS.csv', comment='#')
data

Unnamed: 0,RA,DEC,Name,Jmag,e_Jmag
0,10.684737,41.269035,00424433+4116085,9.453,0.052
1,10.683469,41.268585,00424403+4116069,9.321,0.022
2,10.685657,41.26955,00424455+4116103,10.773,0.069
3,10.686026,41.269226,00424464+4116092,9.299,0.063
4,10.683465,41.269676,00424403+4116108,11.507,0.056
5,10.686015,41.26963,00424464+4116106,9.399,0.045
6,10.68527,41.267124,00424446+4116016,12.07,0.035


In [14]:
# Assign the pandas columns to numpy arrays
raj = data['RAJ'].values
dej = data['DEJ'].values
two_mass = data['2MASS'].values
jmag = data['Jmag'].values
e_jmag = data['e_Jmag'].values

# Print the arrays
print('RAJ:', raj)
print('DEJ:', dej)
print('2MASS:', two_mass)
print('Jmag:', jmag)
print('e_Jmag:', e_jmag)

RAJ: [10.684737 10.683469 10.685657 10.686026 10.683465 10.686015 10.68527 ]
DEJ: [41.269035 41.268585 41.26955  41.269226 41.269676 41.26963  41.267124]
2MASS: ['00424433+4116085' '00424403+4116069' '00424455+4116103'
 '00424464+4116092' '00424403+4116108' '00424464+4116106'
 '00424446+4116016']
Jmag: [ 9.453  9.321 10.773  9.299 11.507  9.399 12.07 ]
e_Jmag: [0.052 0.022 0.069 0.063 0.056 0.045 0.035]


In [20]:
# Open the file in write mode
with open('output.txt', 'w') as file:
    # Write the header lines
    file.write("# Magnitudes of some stars from the 2MASS astronomical sky survey.\n")
    file.write("# Coordinates are in decimal degrees in the J2000 equinox.\n")
    file.write("#  RA         DEC         Name               Jmag   e_Jmag\n")
    file.write("# (deg)      (deg)                           (mag)   (mag)\n")

    # Write the data line by line
    for i in range(len(raj)):
        line = f"{raj[i]:10.6f} {dej[i]:+10.6f} {two_mass[i]:<20} {jmag[i]:7.3f}  {e_jmag[i]:.3f}\n"
        file.write(line)


In [21]:
import numpy as np

# Create a structured array with your data
data = np.zeros(len(raj), dtype={'names':('RAJ', 'DEJ', '2MASS', 'Jmag', 'e_Jmag'),
                                 'formats':('f8', 'f8', 'U20', 'f8', 'f8')})

data['RAJ'] = raj
data['DEJ'] = dej
data['2MASS'] = two_mass
data['Jmag'] = jmag
data['e_Jmag'] = e_jmag

# Define the header and footer
header = ("# Magnitudes of some stars from the 2MASS astronomical sky survey.\n"
          "# Coordinates are in decimal degrees in the J2000 equinox.\n"
          "#  RA         DEC         Name               Jmag   e_Jmag\n"
          "# (deg)      (deg)                           (mag)   (mag)")

# Define the format for each field
formats = "%10.6f %+10.6f %-20s %7.3f %.3f"

# Write the data to the file
np.savetxt('output2.txt', data, fmt=formats, header=header, comments='')

In [22]:
import numpy as np

# Stack the data into a 2D array
data = np.column_stack((raj, dej, two_mass, jmag, e_jmag))

# Define the header
header = ("# Magnitudes of some stars from the 2MASS astronomical sky survey.\n"
          "# Coordinates are in decimal degrees in the J2000 equinox.\n"
          "#  RA         DEC         Name               Jmag   e_Jmag\n"
          "# (deg)      (deg)                           (mag)   (mag)")

# Write the data to the file
np.savetxt('output3.txt', data, fmt="%s", header=header, comments='')

In [None]:
import pandas as pd

# Create a DataFrame from your numpy arrays
df = pd.DataFrame({
    'RA': raj,
    'DEC': dej,
    'Name': two_mass,
    'Jmag': jmag,
    'e_Jmag': e_jmag
})

# Define the header
header = ("# Magnitudes of some stars from the 2MASS astronomical sky survey.\n"
          "# Coordinates are in decimal degrees in the J2000 equinox.\n"
          "# Units: RA (deg), DEC (deg), Name, Jmag (mag), e_Jmag (mag)")

# Write the DataFrame to a file
with open('output.csv', 'w') as f:
    f.write(header + '\n')
df.to_csv('output.csv', sep=',', mode='a', index=False, float_format="%.6f")

In [24]:
import pandas as pd

# Create a DataFrame from your numpy arrays
df = pd.DataFrame({
    'RA': raj,
    'DEC': dej,
    'Name': two_mass,
    'Jmag': jmag,
    'e_Jmag': e_jmag
})

# Write the DataFrame to a file
df.to_csv('output_noheader.csv', sep=',', index=False, float_format="%.6f")