We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
import json import numpy as np import pandas as pd from pandas import DataFrame, Series import matplotlib.pyplot as plt %matplotlib inline # json path = 'datasets/example.txt' records = [json.loads(line) for line in open(path)] # pandas DataFrame frame = DataFrame(records) frame # Munge 数据规整 clean_tz = frame['tz'].fillna('Missing') clean_tz[clean_tz==''] = 'Unkown' tz_counts2= clean_tz.value_counts() tz_counts2[:10] # matplot tz_counts2[:10].plot(kind='barh', rot=0)
# 接上面 results = Series([x.split()[0] for x in frame.a.dropna()]) results[:5] results.value_counts()[:8] # remove null cframe = frame[frame.a.notnull()] operating_system = np.where(cframe['a'].str.contains('Windows'),'Windows', 'Not Windows') operating_system[:5] by_tz_os = cframe.groupby(['tz',operating_system]) agg_counts = by_tz_os.size().unstack().fillna(0) agg_counts[:10] # index func indexer = agg_counts.sum(1).argsort() indexer[:10] count_subset = agg_counts.take(indexer)[-10:] count_subset # plot count_subset.plot(kind='barh', stacked=True) # to see clearly, another plot normed_subset = count_subset.div(count_subset.sum(1), axis=0) normed_subset.plot(kind='barh', stacked=True)
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Data Analysis with Python——01
billy-gov
requirement
时区分析
Agent 分析
The text was updated successfully, but these errors were encountered: