# Other File Types

In this notebok, we will learn hhow to import different file types:
- MATLAB Files
- Excel
- SAS files
- Stata files
- HDF5 files
- Pickled files

## Importing an Excel File

When importing an excel file, it has to be remembered that it has **sheets**:

```python
# Loading an excel
import pandas as pd
file = 'fiile.xlsx'

data = pd.ExcelFile(file)

print(data.sheet_names)

# Getting individual sheet
df1 = data.parse('sheet_name')
df2 = data.parse(0) # integer as index
```



Sometimes it is important that we tranform our sheets when parsing and `.parase()` provide arguments for that:
- `skiprows` - skip rows
- `names` - rename the columns
- `usecols` - identifies which columns to parse

| Argument         | Use Case                                                                 | Syntax Example                          | Sample Input                        |
|------------------|--------------------------------------------------------------------------|------------------------------------------|-------------------------------------|
| `sheet_name`     | Specify which sheet(s) to read                                           | `sheet_name='Sheet1'`                    | `'Sheet1'`, `0`, `[0, 1]`, `None`   |
| `header`         | Define row(s) to use as column names                                     | `header=0`                               | `0`, `None`, `[0, 1]`               |
| `names`          | Set custom column names                                                  | `names=['A', 'B', 'C']`                  | `['A', 'B', 'C']`                   |
| `index_col`      | Specify which column(s) to use as index                                  | `index_col=0`                            | `0`, `[0, 1]`, `None`               |
| `usecols`        | Select specific columns to read                                          | `usecols='A:C'`                          | `'A:C'`, `[0, 2]`, `['A', 'B']`     |
| `skiprows`       | Skip a number of rows at the beginning                                   | `skiprows=1`                             | `1`, `[0, 2]`                       |
| `nrows`          | Limit number of rows to read                                             | `nrows=10`                               | `10`                                |

---


## Importing SAS/Stata files

- SAS - Statistica Analysis System
- Stata - Statistics + data

```python
# Sample of importing SAS
import pandas as pd
from sas7bdat import SAS7BDAT

with SAS77BDAT('file.sas7bdat') as file:
    df_sas = file.to_data_frame()

# importing stata
data = pd.read_stata('file.data')
```

---



## Import HDF5

- Hierarchical Data Format version 5
- Stores large data

For example, using the HDF5 file from LIGO we can do:
```python
import h5py
filename = 'H-H1_LOSCS_4_V1-....hdf5'
data = h5py.File(filename, 'r')
```

But what is the **structure of this**?
- It is hierarchical
- You can explore its hierarchy by doing

```python
for key in data.keys():
    print(key)
```

> - meta
> - quality
> - realdata

Each key is an HDF group, you can think of that as a **directory**. 

Now, if you want to know what are the metadata, you can do:

```python
for key in data['meta'].keys():
    print(key)
```
> - Description
> - Detector

Now if you are interested in the contents of these key, you can convert them into numoy array:

```python
print(np.array(data['meta']['Descrription'], np.array ...))
```

---



## Importing MATLAB

To read MATLAB, we can use SciPy:
- `scipy.io.loadmat()`: read .mat files
- `scipy.io.savemat()`: write .mat files

A `.mat()` file is simply collection of objects used in MATLAB environment (array, matriix, etc)

For example, you can do:

```python
import scipy.io
filename = 'workspace.mat'
mat = scipy.io.loadmat(filename)
```
The return of this is a dictionary:
- keys - MATLAB variable names
- values - objects assigned to variables