# Lecture 4: Reading Data Files in Python

## Introduction
This lecture covers various methods for reading data files in Python, including text files, CSV files, and working with files in different directories.

## 1. Import Required Libraries
We'll use NumPy for numerical operations, Pandas for data manipulation, and built-in modules for file handling.

In [1]:
import numpy as np
import pandas as pd
import csv
import os

## 2. Loading Text Files with NumPy
NumPy's `loadtxt()` function is efficient for reading numerical data from text files. The `skiprows` parameter lets us skip header lines.

In [2]:
mydata = np.loadtxt('frame_07_00000.shell', skiprows=3)
for row in mydata:
    print(row)

[ -0.6959679   0.6029675 -10.08763  ]
[  0.2439858   1.01116   -10.06818  ]
[  0.06051586   0.1582956  -10.24191   ]
[  0.9235653   0.2442797 -10.05873  ]
[  0.4069332  -0.6476358 -10.06638  ]
[ -0.5964802  -0.428977  -10.08768  ]
[-0.5455569  1.560081  -9.886611 ]
[ 1.191757  1.171054 -9.861659]
[ 1.366334  -0.5939452 -9.823559 ]
[-1.456797    0.01799601 -9.896421  ]
[-0.2661551 -1.323102  -9.835986 ]
[ 1.814858   0.3449763 -9.768678 ]
[ 0.4418391  1.888412  -9.801923 ]
[-1.465041  1.066661 -9.818983]
[-1.262096 -1.005563 -9.76525 ]
[ 0.7652211 -1.43836   -9.706944 ]
[ 2.301517  -0.5110367 -9.532539 ]
[ 2.135316  1.289309 -9.616412]
[-0.3563513  2.488546  -9.632153 ]
[ 1.413257  2.101829 -9.629649]
[-2.277633   0.4856141 -9.618547 ]
[-1.352699  2.057779 -9.636264]
[-0.9544027 -1.933216  -9.48681  ]
[-2.164492  -0.5833598 -9.558676 ]
[ 0.08801756 -2.158316   -9.461086  ]
[ 1.745929 -1.435509 -9.467551]
[ 1.110432 -2.255449 -9.296237]
[-1.938223 -1.58265  -9.359112]
[-2.269188  1.530688

## 3. Reading CSV Files with Pandas
Pandas `read_csv()` is the most convenient method for reading CSV files. It automatically creates a DataFrame with labeled columns and provides many parsing options.

In [3]:
mydata = pd.read_csv('average_actin.csv')
print(mydata)

    Unnamed: 0  time     cable   leading
0            0     0  1.615584  1.250251
1            1     1  1.645311  1.299164
2            2     2  1.675044  1.265786
3            3     3  1.681775  1.286977
4            4     4  1.717248  1.311742
5            5     5  1.710590  1.303908
6            6     6  1.729195  1.334243
7            7     7  1.723315  1.349088
8            8     8  1.741532  1.341120
9            9     9  1.779844  1.348152
10          10    10  1.781008  1.390733
11          11    11  1.794161  1.428721
12          12    12  1.826443  1.372055
13          13    13  1.813017  1.398246
14          14    14  1.822511  1.444728
15          15    15  1.805277  1.413252
16          16    16  1.878621  1.466935
17          17    17  1.854621  1.467896
18          18    18  1.929291  1.471897
19          19    19  1.931475  1.479017
20          20    20  1.914829  1.497291
21          21    21  1.969095  1.438711
22          22    22  1.991238  1.544351
23          23  

## 4. Reading CSV Files with the CSV Module
The built-in `csv` module provides lower-level access to CSV files. This is useful when you need more control over parsing or for simpler operations.

In [4]:
with open('average_actin.csv') as csv_file:
    csv_reader = csv.reader(csv_file)
    for row in csv_reader:
        print(row)

['', 'time', 'cable', 'leading']
['0', '0', '1.6155842763781092', '1.2502512862097157']
['1', '1', '1.645310929367047', '1.2991642183636103']
['2', '2', '1.6750444578524526', '1.2657860601529953']
['3', '3', '1.6817753970039995', '1.2869771029551575']
['4', '4', '1.7172479564003464', '1.3117421639029123']
['5', '5', '1.7105898989461543', '1.3039079794904018']
['6', '6', '1.7291945670305786', '1.3342429998076344']
['7', '7', '1.7233150570310616', '1.3490883633107331']
['8', '8', '1.7415317151641296', '1.3411203043225095']
['9', '9', '1.779843814223063', '1.3481520545256533']
['10', '10', '1.7810079232502445', '1.3907332697254433']
['11', '11', '1.7941605481654352', '1.42872105289098']
['12', '12', '1.8264430594564427', '1.3720547291753866']
['13', '13', '1.813017453485486', '1.398245557991278']
['14', '14', '1.8225112281374933', '1.4447278709440516']
['15', '15', '1.8052766513299154', '1.4132515271766102']
['16', '16', '1.8786209559260822', '1.4669349062332662']
['17', '17', '1.85462127

## 5. Working with Files in Different Directories
Often, data files are stored in separate directories. We use the `os` module to construct proper file paths.

In [5]:
path = str(os.getcwd())
examplePath = os.path.join(path, "Example_Data_Files")

## 6. Listing and Filtering Files
We can scan a directory and filter files by extension to find specific data files.

In [6]:
folder = os.fsencode(examplePath)
datafilenames = []
for file in os.listdir(folder):
    filename = os.fsdecode(file)
    if filename.endswith('.txt'):
        datafilenames.append(filename)
#print(str(os.path.join(examplePath, datafilenames[0])))

## 7. Loading Data from Subdirectories
Once we have the full file path, we can load the data just like before.

In [7]:
mydata = np.loadtxt(str(os.path.join(examplePath, datafilenames[0])), skiprows=1)
for row in mydata:
    print(row)

[23.     55.47    1.13   35.      0.6875]
[24.    79.22   1.17  40.     0.625]
[25.     97.62    1.22   45.      0.5625]
[ 26.   112.93   1.28  50.     0.5 ]


## Key Takeaways
- **NumPy** is useful for loading numerical data from text files
- **Pandas** is useful for CSV files and structured data with headers
- **csv module** provides fine-grained control for CSV parsing
- **os module** helps navigate file systems and construct platform-independent paths
- Always ensure file paths are correct and the working directory is set appropriately