Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panda:过滤,筛选数据 #75

Open
myyyy opened this issue Aug 28, 2018 · 0 comments
Open

panda:过滤,筛选数据 #75

myyyy opened this issue Aug 28, 2018 · 0 comments
Labels

Comments

@myyyy
Copy link
Owner

@myyyy myyyy commented Aug 28, 2018

从df中筛选在A中存在的数据

A= device('../input/2000emmc.txt','\t',0)
   result=[]
   import pdb; pdb.set_trace()
   df = pd.read_csv("../input/alldevice","\t")
   df2 = df[df['deviceid'].isin(A)]
   df2.to_csv('result.csv' ,index=0) #index=0 是取消index这一列

添加一个新的时间column

chunk['date'] = pd.date_range(start='20190525', end='20190525',periods=len(chunk))

对行的字符串进行相连,结果存在新的一列

chunk['key'] =chunk['1']+','+chunk['2']+','+chunk['3']

删除某个colunm

chunk = chunk.drop(columns=['1', '2','3'])

设置head

reader = pd.read_csv(input,encoding='gb2312',chunksize=400000,skip_blank_lines=True,names=['1','2','3'])
names=['1','2','3']

过滤空行

reader = pd.read_csv(input,encoding='gb2312',chunksize=400000,skip_blank_lines=True,names=['1','2','3'])
skip_blank_lines=True

大量数据写入mongo

关键词 chunksize=400000
import pandas as pd
from pymongo import MongoClient
client = MongoClient()  
col = client['test']['test']

def read_and_tomongo(input):
        reader = pd.read_csv(input,encoding='gb2312',chunksize=400000,skip_blank_lines=True,names=['1','2','3'])
        df = pd.DataFrame()
        for chunk in reader:
            chunk['date'] = pd.date_range(start='20190525', end='20190525',periods=len(chunk))
            chunk['key'] =chunk['1']+','+chunk['2']+','+chunk['3']
            chunk = chunk.drop(columns=['1', '2','3'])
            data = chunk.to_dict(orient='records')  # Here's our added param..
            col.insert_many(data)


if __name__ == '__main__':
    read_and_tomongo('text.csv')

to_csv

1.首先查询当前的工作路径:

import os
os.getcwd() #获取当前工作路径
2.to_csv()是DataFrame类的方法,read_csv()是pandas的方法
dt.to_csv() #默认dt是DataFrame的一个实例,参数解释如下

路径 path_or_buf: A string path to the file to write or a StringIO
 dt.to_csv('Result.csv') #相对位置,保存在getwcd()获得的路径下
dt.to_csv('C:/Users/think/Desktop/Result.csv') #绝对位置

分隔符 sep : Field delimiter for the output file (default ”,”)
dt.to_csv('C:/Users/think/Desktop/Result.csv',sep='?')#使用?分隔需要保存的数据,如果不写,默认是,

替换空值 na_rep: A string representation of a missing value (default ‘’)
dt.to_csv('C:/Users/think/Desktop/Result1.csv',na_rep='NA') #确实值保存为NA,如果不写,默认是空

格式 float_format: Format string for floating point numbers
dt.to_csv('C:/Users/think/Desktop/Result1.csv',float_format='%.2f') #保留两位小数

是否保留某列数据 cols: Columns to write (default None)
dt.to_csv('C:/Users/think/Desktop/Result.csv',columns=['name']) #保存索引列和name列

是否保留列名 header: Whether to write out the column names (default True)
dt.to_csv('C:/Users/think/Desktop/Result.csv',header=0) #不保存列名

是否保留行索引 index:  whether to write row (index) names (default True)
dt.to_csv('C:/Users/think/Desktop/Result1.csv',index=0) #不保存行索引

pandas速查手册(中文版)

@myyyy myyyy added the 数据分析 label Aug 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.