We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pandas
pandas 的数据结构
obj = series([4, 7, -5, 3]) obj # 0 4 # 1 7 # 2 -5 # 3 3 obj.values # array([4, 7, -5, 3]) obj.index # Int64Index([0, 1, 2, 3]) obj2 = Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c']) # 指定index
DataFrame 是一个表格型的数据结构, 可以看成由Series 组成的字典(公用同一个索引)
# 字典加数组 data={'state':['Ohio','Ohio','Ohio','Nevada','Nevada'],'year':[2000,2001,2002,2001,2002],'pop':[1.5,1.7,3.6,2.4,2.9]} frame=DataFrame(data) # 字典的字典 # 外层字典的键作为列, 内层键作为行索引 pop={'Nevada':{2001:2.4,2002:2.9},....:'Ohio':{2000:1.5,2001:1.7,2002:3.6}} frame3 = DataFrame(pop) frame3 # Nevada Ohio # 2000 NaN 1.5 # 2001 2.4 1.7 # 2002 2.9 3.6
索引对象 Index
构建Series 和 DataFrame 时, 所用到的任何数组或其他序列的标签都会被转成一个Index, index 不可修改
obj = Series(range(3), index = ['a', 'b', 'c']) index = obj.index index # Index([a, b, c], dtype=object)
基本功能
df1.add(df2, fill_value=0) # 补0
df.sum() df.sum(axis=1)
累计型 Idxmim idxmcx
describe 一次产生多个结果包含(count, mean, std, min …)
相关系数和协方差
唯一值、值计数以及成员资格
mask = obj.isin(['b', 'c']) obj[mask]
处理缺失数据
过滤缺失
data.dropna() data[data.notnull()] # 等效
填充缺失数据
df.fillna(0)
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Data Analysis with Python——03
pandas
pandas 的数据结构
一组数据(各种 Numpy 数据类型)以及一组与之相关的数据标签(索引)组成, 可以通过 Series 的 values 和 index 属性获取其数组表示形式和索引对象
DataFrame 是一个表格型的数据结构, 可以看成由Series 组成的字典(公用同一个索引)
索引对象 Index
构建Series 和 DataFrame 时, 所用到的任何数组或其他序列的标签都会被转成一个Index, index 不可修改
基本功能
汇总和计算描述统计
累计型
Idxmim idxmcx
describe
一次产生多个结果包含(count, mean, std, min …)
相关系数和协方差
唯一值、值计数以及成员资格
处理缺失数据
过滤缺失
填充缺失数据
The text was updated successfully, but these errors were encountered: