# 数据获取与保存

数据的来源途径无非几种

+ 从网上直接爬取,这个属于写爬虫,不是本文范围
+ 从原生的Python数据结构中获取
+ 从json中获取
+ 数据库中调取,
+ 从excel中读取,
+ 从csv文本文件中获得.
+ 通过pickle序列化数据

而存储数据也无非以下几种方式:

+ 通过pickle序列化数据
+ 保存为json
+ 保存到数据库
+ 保存为excel
+ 保存为csv


pandas针对上面的每种获取途径,都提供了方便的获取方式

In [1]:
import pandas as pd

## 从csv文件中读取和保存

所谓csv文件是指使用特定符号分隔数据属性,换行分隔不同数据的文本文件.
这次的例子我们主要用的数据便来自于此.

我们惯例的用[iris](https://archive.ics.uci.edu/ml/machine-learning-databases/iris/)来作为是数据集.这个数据集在我们的[项目下](https://github.com/TutorialForPython/python-datascience/blob/master/%E5%9F%BA%E6%9C%AC%E5%B7%A5%E5%85%B7/%E5%9F%BA%E6%9C%AC%E5%B7%A5%E5%85%B7pandas/source/iris.csv)也有

让我先看看该数据是什么样子

In [2]:
with open("./source/iris.csv") as f:
    print(f.readline())
    print(f.readline())

sepal_length,sepal_width,petal_length,petal_width,class

5.1,3.5,1.4,0.2,Iris-setosa



In [3]:
iris_data = pd.read_csv("./source/iris.csv",encoding = "utf-8")
iris_data[:5]#取前5行

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


### 保存到csv

In [4]:
iris_data.to_csv("source/iris.csv",index=False)# 记得不要把序号写进去

In [5]:
new_iris_data = pd.read_csv("source/iris.csv")

In [6]:
new_iris_data[:5]

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


## 从json中获取和保存

先随便来个json格式的文件,我们就自己写个例子[people.json](https://github.com/TutorialForPython/python-datascience/blob/master/%E5%9F%BA%E6%9C%AC%E5%B7%A5%E5%85%B7/%E5%9F%BA%E6%9C%AC%E5%B7%A5%E5%85%B7pandas/source/people.json)为例,看下内容:

In [7]:
with open("./source/people.json") as f:
    print(f.readline())

[{"name":"Michael"},{"name":"Andy", "age":30},{"name":"Justin", "age":19}]



In [8]:
people_from_jsonfile = pd.read_json("./source/people.json")
people_from_jsonfile

Unnamed: 0,age,name
0,,Michael
1,30.0,Andy
2,19.0,Justin


### 保存为json

要保存为json格式,也只需要是用`to_json`方法即可

In [9]:
people_from_jsonfile.to_json()

'{"age":{"0":null,"1":30.0,"2":19.0},"name":{"0":"Michael","1":"Andy","2":"Justin"}}'

In [10]:
people_from_jsonfile.to_json("./source/people_cp.json")

In [11]:
new_peoplefrom_jsonfile = pd.read_json("./source/people_cp.json")
new_peoplefrom_jsonfile

Unnamed: 0,age,name
0,,Michael
1,30.0,Andy
2,19.0,Justin


## 数据库中读取 (需要SQLAlchemy库)和保存

我们以python自带的sqlite3来作测试

先创建一个数据库,还是用我们的`people.json`中的数据,如何制作具体看

制作好的`people.db`依然放在`./source`文件夹下

In [12]:
from sqlalchemy import create_engine
conn = create_engine("sqlite:///source/people.db")
conn

Engine(sqlite:///source/people.db)

### 保存到数据库

In [15]:
people_from_jsonfile.to_sql('people', conn)

In [16]:
people_from_db = pd.read_sql('people', conn)

In [17]:
people_from_db

Unnamed: 0,index,age,name
0,0,,Michael
1,1,30.0,Andy
2,2,19.0,Justin


### 读取数据库中的表

In [18]:
people_from_db = pd.read_sql('people', conn)
people_from_db

Unnamed: 0,index,age,name
0,0,,Michael
1,1,30.0,Andy
2,2,19.0,Justin


### 使用查询语句获得数据

In [19]:
peole_from_db_query = pd.read_sql_query('SELECT * FROM people', conn)
peole_from_db_query

Unnamed: 0,index,age,name
0,0,,Michael
1,1,30.0,Andy
2,2,19.0,Justin


## 从excel中读取数据(需要xlrd)和保存

一样的我们还是拿`people`作为数据,创建一个excel文件放入`./source`

In [20]:
# using the ExcelFile class
xls = pd.ExcelFile('./source/people.xlsx')
data_fromExcel = xls.parse(u'工作表1', index_col=None, na_values=['NA'])
data_fromExcel

Unnamed: 0,name,age
0,Michael,
1,Andy,30.0
2,Justin,19.0


In [21]:
people_fromExcel = pd.read_excel('./source/people.xlsx', u'工作表1', index_col=None, na_values=['NA'])
people_fromExcel

Unnamed: 0,name,age
0,Michael,
1,Andy,30.0
2,Justin,19.0


保存到excel

In [22]:
iris_data.to_excel('source/iris.xlsx', sheet_name='Sheet1',index=False)

## 通过pickle序列化数据

要将表格序列化只需要使用`to_pickle`方法就好

In [23]:
iris_data.to_pickle("source/iris.pickle")

读取也是只要`pd.read_pickle(path)`即可

In [24]:
iris_data_copy = pd.read_pickle("source/iris.pickle")

In [25]:
iris_data_copy[:5]

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
