# Reading CSV and TXT Files
Rather than creating `Series` or `DataFrames` strutures from scratch, or even from Python core sequences or `ndarrays`, the most typical use of **pandas** is based on the loading of information from files or sources of information for further exploration, transformation and analysis.

In this lecture we'll learn how to read comma-separated values files (.csv) and raw text files (.txt) into pandas `DataFrame`s.

In [1]:
import pandas as pd

## Reading data with Python

As we saw on previous courses we can read data simply using Python.

When you want to work with a file, the first thing to do is to open it. This is done by invoking the `open()` built-in function.

`open()` has a single required argument that is the path to the file and has a single return, the file object.

The `with` statement automatically takes care of closing the file once it leaves the `with` block, even in cases of error.

In [2]:
with open('btc-market-price.csv', 'r') as fp:
    print(fp)

<_io.TextIOWrapper name='btc-market-price.csv' mode='r' encoding='cp1252'>


Once the file is opened, we can read its content as follows:

In [3]:
with open('btc-market-price.csv', 'r') as fp:
    for index, line in enumerate(fp.readlines()):
        # read just first 10 lines
        if (index < 10):
            print(index, line)

0 2/4/17 0:00,1099.169125

1 3/4/17 0:00,1141.813

2 4/4/17 0:00,?

3 5/4/17 0:00,1133.079314

4 6/4/17 0:00,-

5 7/4/17 0:00,-

6 8/4/17 0:00,1181.149838

7 9/4/17 0:00,1208.8005

8 10/4/17 0:00,1207.744875

9 11/4/17 0:00,1226.617038



How can we process the data from the file using pure Python? It involves a lot of manual work, for example, splitting the values by the correct separator:

In [4]:
with open('btc-market-price.csv', 'r') as fp:
    for index, line in enumerate(fp.readlines()):
        # read just first 10 lines
        if (index < 10):
            timestamp, price = line.split(',')
            print(f"{timestamp}: ${price}")

2/4/17 0:00: $1099.169125

3/4/17 0:00: $1141.813

4/4/17 0:00: $?

5/4/17 0:00: $1133.079314

6/4/17 0:00: $-

7/4/17 0:00: $-

8/4/17 0:00: $1181.149838

9/4/17 0:00: $1208.8005

10/4/17 0:00: $1207.744875

11/4/17 0:00: $1226.617038



But what happens if the separator is unknown, like in the file exam_review.csv:

In [11]:
!head exam_review.csv

'head' is not recognized as an internal or external command,
operable program or batch file.


In [13]:
pd.read_csv('exam_review.csv').head()

Unnamed: 0,Unnamed: 1,first_name>last_name>age>math_score>french_score
"Ray>Morley>18>""68","000"">""75","000"""
Melvin>Scott>24>77>83,,
Amirah>Haley>22>92>67,,
"Gerard>Mills>19>""78","000"">72",
Amy>Grimes>23>91>81,,
