Data Analysis with Python——01 #75

hsipeng · 2019-03-18T12:02:34Z

billy-gov

requirement

numpy
pandas
matplotlib

时区分析

import json
import numpy as np
import pandas as pd
from pandas import DataFrame, Series
import matplotlib.pyplot as plt
%matplotlib inline 


# json
path = 'datasets/example.txt'
records = [json.loads(line) for line in open(path)]

# pandas DataFrame
frame = DataFrame(records)
frame

# Munge  数据规整
clean_tz = frame['tz'].fillna('Missing')
clean_tz[clean_tz==''] = 'Unkown'
tz_counts2= clean_tz.value_counts()
tz_counts2[:10]

# matplot 
tz_counts2[:10].plot(kind='barh', rot=0)

Agent 分析

# 接上面
results = Series([x.split()[0] for x in frame.a.dropna()])
results[:5]

results.value_counts()[:8]

# remove null
cframe = frame[frame.a.notnull()]

operating_system = np.where(cframe['a'].str.contains('Windows'),'Windows', 'Not Windows')
operating_system[:5]

by_tz_os = cframe.groupby(['tz',operating_system])
agg_counts = by_tz_os.size().unstack().fillna(0)
agg_counts[:10]

# index func
indexer = agg_counts.sum(1).argsort()

indexer[:10]

count_subset = agg_counts.take(indexer)[-10:]
count_subset
# plot
count_subset.plot(kind='barh', stacked=True)
# to see clearly, another plot
normed_subset = count_subset.div(count_subset.sum(1), axis=0)
normed_subset.plot(kind='barh', stacked=True)

hsipeng added python Data Analysis NumPy labels Mar 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Analysis with Python——01 #75

Data Analysis with Python——01 #75

hsipeng commented Mar 18, 2019

Data Analysis with Python——01 #75

Data Analysis with Python——01 #75

Comments

hsipeng commented Mar 18, 2019

billy-gov

requirement

时区分析

Agent 分析