# Python 下載CSV檔案與解析


* 了解 csv 檔案格式與內容
* 能夠利用套件存取 csv 格式的檔案



## 作業目標

* 比較一下範例檔案中的「File I/O」與「CSV Reader」讀出來的內容有什麼差異

* 根據範例檔案的結果：
    1. 取出班次一的每一個時間
    2. 將班次一的每一個時間用一種資料型態保存
    3. 將班次一到五與其所有時間用一種資料型態個別保存


### 比較一下範例檔案中的「File I/O」與「CSV Reader」讀出來的內容有什麼差異

In [1]:
import csv

# file I/O
file_name = './Data/example.csv'

with open(file_name, 'r') as rf:
    file_content = rf.read()

# CSV Reader
with open(file_name, newline='') as csv_file:
    rows = csv.reader(csv_file)
#     for row in rows:
#         print(row)

print('file I/O read content type: {}'.format(type(file_content)))
print('CSV Reader content type: {}'.format(type(rows)))

file I/O read content type: <class 'str'>
CSV Reader content type: <class '_csv.reader'>


### 根據範例檔案的結果：

1. 取出班次一的每一個時間
2. 將班次一的每一個時間用一種資料型態保存
3. 將班次一到五與其所有時間用一種資料型態個別保存

In [2]:
import csv

# 開啟 CSV 檔案
with open('./data/example.csv', newline='') as csvfile:
    # 讀取 CSV 檔案內容
    rows = csv.reader(csvfile)
#     # 以迴圈輸出每一列
#     for idx, row in enumerate(rows):
#         if idx == 0:
#             target_idx = row.index('班次1')
#         print(row[target_idx])

In [3]:
# 1. 取出班次一的每一個時間
# 2. 將班次一的每一個時間用一種資料型態保存

prefix = '班次'
number = 1
query_number = prefix + str(number)
all_times = list()
with open(file_name, newline='') as csvfile:
    rows = csv.reader(csvfile)
    for idx, row in enumerate(rows):
        if idx == 0:
            target_idx = row.index(query_number)
        else:
            all_times.append(row[target_idx])

print('{} times:\n{}'.format(query_number, all_times))

班次1 times:
['06:30', '06:32', '06:33', '06:34', '06:36', '06:38', '06:39', '06:41', '06:42', '06:43', '06:44', '06:45', '06:46', '06:47', '06:48', '06:49', '06:50', '06:51', '06:53', '06:55', '06:57', '06:58', '07:00', '07:01', '07:03', '07:05', '07:07', '07:09', '07:10', '07:12', '07:14', '07:16', '07:18', '07:20', '07:21', '07:23', '07:24', '07:25', '07:00', '07:02', '07:03', '07:04', '07:06', '07:08', '07:09', '07:11', '07:12', '07:13', '07:14', '07:15', '07:16', '07:17', '07:18', '07:19', '07:20', '07:21', '07:23', '07:25', '07:27', '07:28', '07:30', '07:31', '07:33', '07:35', '07:37', '07:39', '07:40', '07:42', '07:44', '07:46', '07:48', '07:49', '07:51', '07:53', '07:54', '07:55']


In [4]:
# 3. 將班次一到五與其所有時間用一種資料型態個別保存

prefix = '班次'
all_numbers = {prefix + str(i): list() for i in range(1, 6)}
with open(file_name, newline='') as csvfile:
    rows = csv.reader(csvfile)
    for idx, row in enumerate(rows):
        if idx == 0:
            all_target_idx = {number: row.index(number) for number in all_numbers.keys()}
        else:
            for k, idx in all_target_idx.items():
                all_numbers[k].append(row[idx])            

In [5]:
print(all_numbers)

{'班次1': ['06:30', '06:32', '06:33', '06:34', '06:36', '06:38', '06:39', '06:41', '06:42', '06:43', '06:44', '06:45', '06:46', '06:47', '06:48', '06:49', '06:50', '06:51', '06:53', '06:55', '06:57', '06:58', '07:00', '07:01', '07:03', '07:05', '07:07', '07:09', '07:10', '07:12', '07:14', '07:16', '07:18', '07:20', '07:21', '07:23', '07:24', '07:25', '07:00', '07:02', '07:03', '07:04', '07:06', '07:08', '07:09', '07:11', '07:12', '07:13', '07:14', '07:15', '07:16', '07:17', '07:18', '07:19', '07:20', '07:21', '07:23', '07:25', '07:27', '07:28', '07:30', '07:31', '07:33', '07:35', '07:37', '07:39', '07:40', '07:42', '07:44', '07:46', '07:48', '07:49', '07:51', '07:53', '07:54', '07:55'], '班次2': ['07:50', '07:52', '07:53', '07:55', '07:57', '07:59', '08:01', '08:03', '08:04', '08:05', '08:06', '08:07', '08:08', '08:09', '08:10', '08:11', '08:12', '08:13', '08:15', '08:17', '08:19', '08:21', '08:23', '08:24', '08:26', '08:28', '08:30', '08:32', '08:33', '08:35', '08:37', '08:39', '08:41', '

In [6]:
import pandas as pd

pd.DataFrame(all_numbers)

Unnamed: 0,班次1,班次2,班次3,班次4,班次5
0,06:30,07:50,09:10,10:30,12:00
1,06:32,07:52,09:12,10:32,12:02
2,06:33,07:53,09:13,10:33,12:03
3,06:34,07:55,09:15,10:35,12:05
4,06:36,07:57,09:17,10:37,12:07
...,...,...,...,...,...
71,07:49,09:43,11:03,12:23,13:53
72,07:51,09:45,11:05,12:25,13:55
73,07:53,09:47,11:07,12:27,13:57
74,07:54,09:49,11:09,12:29,13:59
