## Accessing Raw Data

알려진 텍스트 포맷의 파일로 부터 데이터를 확보하는 방법을 알아본다.

### 파일 읽고 쓰기

In [None]:
%cat some_file.txt

In [None]:
fname = 'some_file.txt'

f = open(fname, 'r')
content = f.read()
f.close()

print(content)

In [None]:
fname = 'some_file.txt'
with open(fname, 'r') as f:
    content = f.read()

print(content)

In [None]:
fname = 'some_file.txt'
with open(fname, 'r') as f:
    content = f.readlines()

print(content)

In [None]:
fname = 'some_file.txt'
with open(fname, 'r') as f:
    for line in f:
        print(line)

In [None]:
fname = 'some_file.txt'
with open(fname, 'r') as f:
    for i, line in enumerate(f):
        print("Line {}: {}".format(i, line.strip()))

### CSV 파일

Comma Separated Values

관계형 데이터베이스나 엑셀파일의 데이터를 import/export 용으로 주로 사용된다.

In [None]:
%cat data.csv

In [None]:
import csv

fname = 'data.csv'

with open(fname, 'r') as f:
    data_reader = csv.reader(f, delimiter=',')
    headers = next(data_reader)
    print("Headers = {}".format(headers))
    for line in data_reader:
        print(line)

In [None]:
fname = 'data_no_header.csv'

with open(fname, 'r') as f:
    data_reader = csv.reader(f, delimiter=',')
    for line in data_reader:
        print(line)

In [None]:
fname = 'data.csv'

with open(fname, 'r') as f:
    data_reader = csv.reader(f, delimiter=',')
    headers = next(data_reader)
    data = []
    for line in data_reader:
        item = {headers[i]: value for i, value in enumerate(line)}
        data.append(item)

data

### JSON

JavaScript Object Notation

데이터 교환 포맷의 사실상 표준

In [None]:
%cat movie.json

In [None]:
import json

fname = 'movie.json'
with open(fname, 'r') as f:
    content = f.read()
    movie = json.loads(content)

movie

In [None]:
import json

fname = 'movie.json'
with open(fname, 'r') as f:
    movie_alt = json.load(f)

In [None]:
movie == movie_alt

In [None]:
print(json.dumps(movie, indent=4))

In [None]:
%cat movies-90s.jsonl

In [None]:
import json

fname = 'movies-90s.jsonl'

with open(fname, 'r') as f:
    for line in f:
        try:
            movie = json.loads(line)
            print(movie['title'])
        except: 
            ...


### Pickles: Python 객체 직렬화

In [None]:
with open('movie.json', 'r') as f:
    content = f.read()
    data = json.loads(content)

data

In [None]:
type(data)

In [None]:
import pickle 

with open('data.pickle', 'wb') as f:
    pickle.dump(data, f)

In [None]:
%cat data.pickle

In [None]:
with open('data.pickle', 'rb') as f:
    data = pickle.load(f)

data

In [None]:
type(data)