# Reading Data

<img src="https://uwashington-astro300.github.io/A300_images/DataReading.png" width="120"/>

* Astropy has a large number of different ways to read data from external files. 
* Astropy supports almost any type of file you can think of, from simple text files to complex binary formats.
* Most of our datafiles will be `csv` files (comma separated values)

In [None]:
import numpy as np
from astropy.table import QTable

### Let us read-in the file `StarData.csv` via a URL
- https://uwashington-astro300.github.io/A300_Data/StarData.csv

```
StarID,Parallax,GMag
A5853,768.07,8.98
B4472,546.98,8.19
C3864,415.18,11.03
D7628,392.75,6.55
```

### `ascii.csv` assumes the first row is a list of the column names

In [None]:
star_table = QTable.read('https://uwashington-astro300.github.io/A300_Data/StarData.csv', 
                         format='ascii.csv')

In [None]:
star_table

----

# Reading (bad) Data

<img src="https://uwashington-astro300.github.io/A300_images/Lore.jpg" width="200"/>

## Different Delimiters

Some people just want to watch the world burn, so they create datasets where the columns are separted by something other than a comma.

#### Bad - Using another delimiter like `:`

##### `StarData_Ver2.dat`

```
StarID:Parallax:GMag
A5853:768.07:8.98
B4472:546.98:8.19
C3864:415.18:11.03
D7628:392.75:6.55
```

In [None]:
star_table_2 = QTable.read('https://uwashington-astro300.github.io/A300_Data/StarData_Ver2.dat', 
                           format='ascii.csv',
                           delimiter = ":")

In [None]:
star_table_2

#### Worse - Using whitespace as a delimiter

##### `StarData_Ver3.dat`

```
StarID Parallax GMag 
A5853 768.07 8.98
B4472 546.98 8.19
C3864 415.18 11.03
D7628 392.75 6.55
```

In [None]:
star_table_3 = QTable.read('https://uwashington-astro300.github.io/A300_Data/StarData_Ver3.dat', 
                           format='ascii.csv',
                           delimiter = " ")

In [None]:
star_table_3

---
# Messy Data

<img src="https://uwashington-astro300.github.io/A300_images/MessyData.jpg" width="230"/>

* In the "real world" all data is messy.

##### Let us read-in the file: `Messy.dat`

```
#######################################################
#
# Col 1 - Name
# Col 2 - Distance
#
#######################################################
"A2344",10
"",23
,
# A random comment row just because
"E5333",
```

### This is not going to end well ... (errors galore!)

In [None]:
messy_table = star_table = QTable.read('https://uwashington-astro300.github.io/A300_Data/Messy.dat', 
                                       format='ascii.csv')

### Deal with the comment lines `#`

In [None]:
messy_table = star_table = QTable.read('https://uwashington-astro300.github.io/A300_Data/Messy.dat', 
                                       format='ascii.csv',
                                       comment = '#')

messy_table

## Not quite correct ...

### Turn off the header

- Since the first row is not the header we cannot use `ascii.csv`
- Switch to `ascii.no_header`
- add a delimiter 

In [None]:
messy_table = QTable.read('https://uwashington-astro300.github.io/A300_Data/Messy.dat', 
                          format='ascii.no_header',
                          delimiter = ',',
                          comment = '#',)

messy_table

### Add the column names

In [None]:
my_column_names = ['Name', 'Distance']

In [None]:
messy_table = QTable.read('https://uwashington-astro300.github.io/A300_Data/Messy.dat', 
                          format='ascii.no_header',
                          delimiter = ',',
                          comment = '#',
                          names = my_column_names)

messy_table

### Deal with the missing data

In [None]:
messy_table['Name'].fill_value = 'XXXXX'
messy_table['Distance'].fill_value = -999

messy_table.filled()

----

# Fixed-Width Data Tables

* These types of data tables are **VERY** common in astronomy
* The columns have a fixed-widths
* Whitespace is used to seperate columns **AND** used within columns

`StdStars.dat`

In [None]:
my_column_start = [0, 12, 15, 18]

In [None]:
my_column_names = ['Star', 'RAh', 'RAm', 'RAs']

In [None]:
standard_table = QTable.read('https://uwashington-astro300.github.io/A300_Data/StdStars.dat', 
                            format='ascii.fixed_width_no_header',
                            names = my_column_names,
                            col_starts = my_column_start
                            )

In [None]:
standard_table

----

# Lots of Data

<img  src="https://uwashington-astro300.github.io/A300_images/LotsData.jpg" width="230"/>

In [None]:
temp_table = QTable.read('https://uwashington-astro300.github.io/A300_Data/SeattleTemp_2022.csv', 
                         format='ascii.csv')

In [None]:
temp_table

In [None]:
temp_table.show_in_notebook()