Data Analysis with Python——03 #83

hsipeng · 2019-08-23T09:08:38Z

Data Analysis with Python——03

pandas

pandas 的数据结构

Series
一组数据(各种 Numpy 数据类型)以及一组与之相关的数据标签(索引)组成, 可以通过 Series 的 values 和 index 属性获取其数组表示形式和索引对象

obj = series([4, 7, -5, 3])
obj
# 0 4
# 1 7
# 2 -5
# 3 3

obj.values
# array([4, 7, -5, 3])

obj.index
# Int64Index([0, 1, 2, 3])

obj2 = Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c']) # 指定index

DataFrame

DataFrame 是一个表格型的数据结构, 可以看成由Series 组成的字典(公用同一个索引)

# 字典加数组
data={'state':['Ohio','Ohio','Ohio','Nevada','Nevada'],'year':[2000,2001,2002,2001,2002],'pop':[1.5,1.7,3.6,2.4,2.9]}

frame=DataFrame(data)


# 字典的字典
# 外层字典的键作为列， 内层键作为行索引
pop={'Nevada':{2001:2.4,2002:2.9},....:'Ohio':{2000:1.5,2001:1.7,2002:3.6}}

frame3 = DataFrame(pop)
frame3
#      Nevada   Ohio
# 	2000   NaN		1.5
# 2001   2.4 		1.7
# 2002   2.9 		3.6

索引对象 Index

构建Series 和 DataFrame 时，所用到的任何数组或其他序列的标签都会被转成一个Index, index 不可修改

obj = Series(range(3), index = ['a', 'b', 'c'])

index = obj.index

index
# Index([a, b, c], dtype=object)

基本功能

reindex
drop
索引，切片
算数运算和数据对齐

df1.add(df2, fill_value=0)

# 补0

广播
函数应用和映射
排序
- sort_index
- rank
- order

汇总和计算描述统计

约简, sum ,mean ..

df.sum()

df.sum(axis=1)

累计型
Idxmim idxmcx
describe
一次产生多个结果包含(count, mean, std, min …)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Analysis with Python——03 #83

Data Analysis with Python——03 #83

hsipeng commented Aug 23, 2019

Data Analysis with Python——03 #83

Data Analysis with Python——03 #83

Comments

hsipeng commented Aug 23, 2019

Data Analysis with Python——03

汇总和计算描述统计