# Python Essentials by Paul Chao #
## Python I/O Practices ##

## Image File  圖檔 ##

### Store File from Web 從網路上抓檔案儲存 ###

In [45]:
import requests

In [3]:
help(requests.get)

Help on function get in module requests.api:

get(url, params=None, **kwargs)
    Sends a GET request.
    
    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
    :param \*\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response



In [3]:
import requests
resp = requests.get(url="https://www.nasa.gov/sites/default/files/thumbnails/image/pia21618b-16.jpg")

if resp.status_code == 200:
    with open("data.jpg", "wb") as f:
        f.write(resp.content)


### Open Image File 開啟圖檔 ###

In [6]:
from PIL import Image    # package pillow
with Image.open("data.jpg") as myimage:
    myimage.show()

### ZIP Format  ZIP檔案處理 - 自己做個解壓縮器 ###

[file sample](https://sheethub.com/data.gov.tw/政府資料開放平臺資料集清單/i/44/台灣電力股份有限公司)

In [71]:
abc = requests.get(url="http://data.taipower.com.tw/opendata/apply/file/d003001/001.zip")
if abc.status_code == 200:
    with open("twe-001.zip", "wb") as f:
        f.write(abc.content)

In [1]:
import zipfile

In [72]:
decompressed = zipfile.ZipFile('twe-001.zip','r')
decompressed

<zipfile.ZipFile filename='twe-001.zip' mode='r'>

In [4]:
dir(decompressed)

['NameToInfo',
 '_RealGetContents',
 '__class__',
 '__del__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__enter__',
 '__eq__',
 '__exit__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_allowZip64',
 '_comment',
 '_didModify',
 '_extract_member',
 '_filePassed',
 '_fileRefCnt',
 '_fpclose',
 '_lock',
 '_open_to_write',
 '_sanitize_windows_name',
 '_seekable',
 '_windows_illegal_name_trans_table',
 '_write_end_record',
 '_writecheck',
 '_writing',
 'close',
 'comment',
 'compression',
 'debug',
 'extract',
 'extractall',
 'filelist',
 'filename',
 'fp',
 'getinfo',
 'infolist',
 'mode',
 'namelist',
 'open',
 'printdir',
 'pwd',
 'read',
 'setpassword',
 'start_dir',
 'testzip',
 'write',
 'writestr']

In [73]:
decompressed.printdir()

File Name                                             Modified             Size
╛·ª~Ñ¡ºí│µ╗∙.txt                               2017-05-03 13:15:16         4214
╛·ª~Ñ╬ñß╝╞.txt                                 2017-05-03 13:15:28         6004
╛·ª~ªµ╖~ºOí]Ñ╗└╔ñúªAº≤╖sí^.txt                 2017-05-03 13:15:42       296040


In [52]:
memberlist = decompressed.namelist()
memberlist

['╛·ª~Ñ¡ºí│µ╗∙.txt', '╛·ª~Ñ╬ñß╝╞.txt', '╛·ª~ªµ╖~ºOí]Ñ╗└╔ñúªAº≤╖sí^.txt']

In [53]:
decompressed.extract(path='./twe/', member=memberlist[1])

'twe\\╛·ª~Ñ╬ñß╝╞.txt'

In [74]:
for info in decompressed.infolist():
    print(info.filename)
    print('\tComment:\t', info.comment)
    print('\tSystem:\t\t', info.create_system, '(0 = Windows, 3 = Unix)')
    print('\tZIP version:\t', info.create_version)
    print('\tCompressed:\t', info.compress_size, 'bytes')
    print('\tUncompressed:\t', info.file_size, 'bytes')

╛·ª~Ñ¡ºí│µ╗∙.txt
	Comment:	 b''
	System:		 0 (0 = Windows, 3 = Unix)
	ZIP version:	 63
	Compressed:	 1139 bytes
	Uncompressed:	 4214 bytes
╛·ª~Ñ╬ñß╝╞.txt
	Comment:	 b''
	System:		 0 (0 = Windows, 3 = Unix)
	ZIP version:	 63
	Compressed:	 1863 bytes
	Uncompressed:	 6004 bytes
╛·ª~ªµ╖~ºOí]Ñ╗└╔ñúªAº≤╖sí^.txt
	Comment:	 b''
	System:		 0 (0 = Windows, 3 = Unix)
	ZIP version:	 63
	Compressed:	 48941 bytes
	Uncompressed:	 296040 bytes


The zipfile module does not support ZIP files with appended comments, or multi-disk ZIP files. It does support ZIP files larger than 4 GB that use the ZIP64 extensions.

### Comma Separated Values ( CSV ) file  CSV的檔案處理 ###

[file sample](http://www.taipower.com.tw/content/announcement/ann01.aspx?BType=31)

In [79]:
import pandas as pd
help(pd)

Help on package pandas:

NAME
    pandas

DESCRIPTION
    pandas - a powerful data analysis and manipulation library for Python
    
    **pandas** is a Python package providing fast, flexible, and expressive data
    structures designed to make working with "relational" or "labeled" data both
    easy and intuitive. It aims to be the fundamental high-level building block for
    doing practical, **real world** data analysis in Python. Additionally, it has
    the broader goal of becoming **the most powerful and flexible open source data
    analysis / manipulation tool available in any language**. It is already well on
    its way toward this goal.
    
    Main Features
    -------------
    Here are just a few of the things that pandas does well:
    
      - Easy handling of missing data in floating point as well as non-floating
        point data
      - Size mutability: columns can be inserted and deleted from DataFrame and
        higher dimensional objects
      - Automatic and

In [81]:
df = pd.read_csv("105NewTaipeiCity.csv")

In [82]:
df

Unnamed: 0,Ym,code1,gen,cunli code,cunli,area code,area,city code,city
0,10501,A6517-02-005,8011,11,下福村,65000170,林口區,65000,新北市
1,10503,A6517-02-005,9029,11,下福村,65000170,林口區,65000,新北市
2,10505,A6517-02-005,8966,11,下福村,65000170,林口區,65000,新北市
3,10502,A6517-02-005,77131,11,下福村,65000170,林口區,65000,新北市
4,10504,A6517-02-005,85069,11,下福村,65000170,林口區,65000,新北市
5,10506,A6517-02-005,94433,11,下福村,65000170,林口區,65000,新北市
6,10501,A6509-31-006,165055,8,鳶山里,65000090,三峽區,65000,新北市
7,10503,A6509-31-006,181601,8,鳶山里,65000090,三峽區,65000,新北市
8,10505,A6509-31-006,175888,8,鳶山里,65000090,三峽區,65000,新北市
9,10501,A6508-05-009,48145,16,鳳福里,65000080,鶯歌區,65000,新北市


In [101]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 38968 entries, 0 to 38967
Data columns (total 9 columns):
Ym            38968 non-null int64
code1         38968 non-null object
gen           38968 non-null int64
cunli code    38968 non-null int64
cunli         38968 non-null object
area code     38968 non-null int64
area          38968 non-null object
city code     38968 non-null int64
city          38968 non-null object
dtypes: int64(5), object(4)
memory usage: 2.7+ MB


In [103]:
df.head(5)

Unnamed: 0,Ym,code1,gen,cunli code,cunli,area code,area,city code,city
0,10501,A6517-02-005,8011,11,下福村,65000170,林口區,65000,新北市
1,10503,A6517-02-005,9029,11,下福村,65000170,林口區,65000,新北市
2,10505,A6517-02-005,8966,11,下福村,65000170,林口區,65000,新北市
3,10502,A6517-02-005,77131,11,下福村,65000170,林口區,65000,新北市
4,10504,A6517-02-005,85069,11,下福村,65000170,林口區,65000,新北市


In [113]:
new_df = df.ix[1:100,['gen', 'cunli', 'city']]
new_df

Unnamed: 0,gen,cunli,city
1,9029,下福村,新北市
2,8966,下福村,新北市
3,77131,下福村,新北市
4,85069,下福村,新北市
5,94433,下福村,新北市
6,165055,鳶山里,新北市
7,181601,鳶山里,新北市
8,175888,鳶山里,新北市
9,48145,鳳福里,新北市
10,56645,鳳福里,新北市


In [117]:
new_df[new_df['cunli']=='鳶山里']

Unnamed: 0,gen,cunli,city
6,165055,鳶山里,新北市
7,181601,鳶山里,新北市
8,175888,鳶山里,新北市


In [118]:
pure_dict = dict(df.max())

In [119]:
pure_dict

{'Ym': 10506,
 'area': '鶯歌區',
 'area code': 65000290,
 'city': '新北市',
 'city code': 65000,
 'code1': 'A6529-02-008',
 'cunli': '龜山里',
 'cunli code': 126,
 'gen': 2061583}

## Advanced : Something More ##

### XLSX (Microsoft Excel Open XML)   xlsx 檔案處理 ###

In [54]:
df = pd.read_excel("data.xls", sheetname="Data")
df.head()

Unnamed: 0,"Stock Market Data Used in ""Irrational Exuberance"" Princeton University Press, 2000, 2005, 2015, updated",Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Cyclically,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16
0,Robert J. Shiller,,,,,,,,,,Adjusted,,,,,,
1,,,,,,,,,,,Price,,,,,,
2,,,,,Consumer,,,,,,Earnings,,,,,,
3,,S&P,,,Price,,Long,,,,Ratio,,,,,,
4,,Comp.,Dividend,Earnings,Index,Date,Interest,Real,Real,Real,P/E10 or,,,,,,


In [56]:
df["Unnamed: 1"].head()

0      NaN
1      NaN
2      NaN
3      S&P
4    Comp.
Name: Unnamed: 1, dtype: object

In [58]:
type(df["Unnamed: 1"])

pandas.core.series.Series

### convert to dictionary 轉成字典 ###

In [59]:
dict = df["Unnamed: 1"].to_dict()
dict

{0: nan,
 1: nan,
 2: nan,
 3: 'S&P',
 4: 'Comp.',
 5: 'P',
 6: 4.44,
 7: 4.5,
 8: 4.61,
 9: 4.74,
 10: 4.86,
 11: 4.82,
 12: 4.73,
 13: 4.79,
 14: 4.84,
 15: 4.59,
 16: 4.64,
 17: 4.74,
 18: 4.86,
 19: 4.88,
 20: 5.04,
 21: 5.18,
 22: 5.18,
 23: 5.13,
 24: 5.1,
 25: 5.04,
 26: 4.95,
 27: 4.97,
 28: 4.95,
 29: 5.07,
 30: 5.11,
 31: 5.15,
 32: 5.11,
 33: 5.04,
 34: 5.05,
 35: 4.98,
 36: 4.97,
 37: 4.97,
 38: 4.59,
 39: 4.19,
 40: 4.04,
 41: 4.42,
 42: 4.66,
 43: 4.8,
 44: 4.73,
 45: 4.6,
 46: 4.48,
 47: 4.46,
 48: 4.46,
 49: 4.47,
 50: 4.54,
 51: 4.53,
 52: 4.57,
 53: 4.54,
 54: 4.54,
 55: 4.53,
 56: 4.59,
 57: 4.65,
 58: 4.47,
 59: 4.38,
 60: 4.39,
 61: 4.41,
 62: 4.37,
 63: 4.3,
 64: 4.37,
 65: 4.37,
 66: 4.46,
 67: 4.52,
 68: 4.51,
 69: 4.34,
 70: 4.18,
 71: 4.15,
 72: 4.1,
 73: 3.93,
 74: 3.69,
 75: 3.67,
 76: 3.6,
 77: 3.58,
 78: 3.55,
 79: 3.34,
 80: 3.17,
 81: 2.94,
 82: 2.94,
 83: 2.73,
 84: 2.85,
 85: 3.05,
 86: 3.24,
 87: 3.31,
 88: 3.26,
 89: 3.25,
 90: 3.25,
 91: 3.18,
 92: 

In [60]:
type(dict)

dict

## JavaScript Object Notation (JSON) Format  JSON檔案處理 ##

In [7]:
json_content = {
   "Employee": [
      {
         "id":"1",
         "Name": "Paul",
         "Sal": "1000",
      },
      {

         "id":"2",
         "Name": "Joni",
         "Sal": "2000",
      }
   ]
}

In [20]:
type(json_content)

dict

In [21]:
import json
with open('data.json', 'w') as f:
    json.dump(json_content, f)

In [23]:
import json

with open('data.json') as json_data:
    d = json.load(json_data)
    print(d)

{'Employee': [{'id': '1', 'Name': 'Paul', 'Sal': '1000'}, {'id': '2', 'Name': 'Joni', 'Sal': '2000'}]}


In [37]:
json_a = pd.read_json("data.json", typ="series")
json_a

Employee    [{'id': '1', 'Name': 'Paul', 'Sal': '1000'}, {...
dtype: object

In [44]:
type(json_a)

pandas.core.series.Series

In [45]:
list_a = json_a[0]

In [46]:
list_a

[{'Name': 'Paul', 'Sal': '1000', 'id': '1'},
 {'Name': 'Joni', 'Sal': '2000', 'id': '2'}]

In [47]:
list_a[0]['Name']

'Paul'

## PDF 檔案讀取 ##

In [4]:
!pip install pyPDF2

Collecting pyPDF2
  Downloading PyPDF2-1.26.0.tar.gz (77kB)
Building wheels for collected packages: pyPDF2
  Running setup.py bdist_wheel for pyPDF2: started
  Running setup.py bdist_wheel for pyPDF2: finished with status 'done'
  Stored in directory: C:\Users\User\AppData\Local\pip\Cache\wheels\86\6a\6a\1ce004a5996894d33d93e1fb1b67c30973dc945cc5875a1dd0
Successfully built pyPDF2
Installing collected packages: pyPDF2
Successfully installed pyPDF2-1.26.0


In [9]:
import PyPDF2
# https://docs.python.org/3/download.html
pdf_file = open('installing.pdf', 'rb')
read_pdf = PyPDF2.PdfFileReader(pdf_file)

In [10]:
number_of_pages = read_pdf.getNumPages()

In [39]:
outline = read_pdf.getOutlines

In [43]:
outline

<bound method PdfFileReader.getOutlines of <PyPDF2.pdf.PdfFileReader object at 0x000001AFAF39F5C0>>

In [11]:
number_of_pages

50

In [13]:
page = read_pdf.getPage(0)
page

{'/Contents': {'/Filter': '/FlateDecode'},
 '/MediaBox': [0, 0, 612, 792],
 '/Parent': {'/Count': 6,
  '/Kids': [IndirectObject(84, 0),
   IndirectObject(100, 0),
   IndirectObject(124, 0),
   IndirectObject(130, 0),
   IndirectObject(135, 0),
   IndirectObject(140, 0)],
  '/Parent': {'/Count': 36,
   '/Kids': [IndirectObject(98, 0),
    IndirectObject(154, 0),
    IndirectObject(193, 0),
    IndirectObject(335, 0),
    IndirectObject(490, 0),
    IndirectObject(533, 0)],
   '/Parent': {'/Count': 50,
    '/Kids': [IndirectObject(806, 0), IndirectObject(807, 0)],
    '/Type': '/Pages'},
   '/Type': '/Pages'},
  '/Type': '/Pages'},
 '/Resources': {'/Font': {'/F38': {'/BaseFont': '/ZBSDVM+NimbusSanL-Bold',
    '/Encoding': {'/Differences': [2,
      '/fi',
      '/fl',
      30,
      '/grave',
      '/quotesingle',
      33,
      '/exclam',
      '/quotedbl',
      '/numbersign',
      37,
      '/percent',
      39,
      '/quoteright',
      '/parenleft',
      '/parenright',
      '/