# G7 Python 入门课程
## 项目1 Python操作入门

入门可能主要是读取税前薪水和可能的纳税额，根据正确的公式，判断是否正确

**提示**：这样的文字将会指导你如何使用 iPython Notebook 来完成项目。

In [11]:
# 检查你的Python版本
from sys import version_info
if version_info.major != 2 or version_info.minor != 7:
    raise Exception('请使用Python 2.7来完成此项目')

In [1]:
import numpy as np
import pandas as pd

# 数据可视化代码
from titanic_visualizations import survival_stats
from IPython.display import display
%matplotlib inline

# 加载数据集
in_file = 'data.csv'
out_file = 'export.csv'
full_data = pd.read_csv(in_file)
print(full_data)

# 显示数据列表中的前几项数据
display(full_data.head())

    name  salary  tax_maybe
0   wang    2500          0
1  zhang    7000        105
2     li    8000        205
3   song    9000        405
4   tang   50000        800


Unnamed: 0,name,salary,tax_maybe
0,wang,2500,0
1,zhang,7000,105
2,li,8000,205
3,song,9000,405
4,tang,50000,800


数据样本中，我们可以看到的特征

- **name**：名称
- **salary**：税前薪水
- **tax_maybe**：可能的缴税额度



个税的计算方式
![xxx](https://img-blog.csdn.net/20171017113915227?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvVG9nZXRoZXJfQ1o=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center)

In [6]:
from sys import maxint

def calculator(salary):
    """ 返回税后薪水 """
    
    point = 3500  #免征额
    endowment_insurance_rate = 0.08  # 养老保险费率
    hospital_rate = 0.02  # 医疗保险费率
    losejob_rate = 0.01  # 失业保险费率
    provident_rate = 0.12  # 公积金费率
    provident_max = 20972 # 公积金基数最大值
    provident_min = 1500 # 公积金基数最小值
    
    endowment_insurance_min = 2193 # 养老保险基数最小值
    endowment_insurance_max = 16445 #养老保险基数最大值
    
    # 计算缴纳的养老保险
    insuranceBase = calculate_base(endowment_insurance_min, endowment_insurance_max, salary)
    insurance = insuranceBase * endowment_insurance_rate
    
    # 计算缴纳的公积金
    provident_base = calculate_base(provident_min, provident_max, salary)
    provident = provident_base * provident_rate
    
    # 计算缴纳的医疗保险
    hospital = salary * hospital_rate
    
    # 计算缴纳的失业保险
    losejob = salary * losejob_rate
    
    # 计算计税基数 = 税前工资 - 养老保险 - 公积金 - 医疗保险 - 失业保险
    tax_base = salary - insurance - provident - hospital - losejob 
    
    actual_tax = calculate_tax(tax_base - point)
        
    # 税后工资 = 税前工资 - 养老保险 - 公积金 - 医疗保险 - 失业保险 - 个人所得税
    res_money = salary - insurance - provident - hospital - losejob - actual_tax
    
    print '税前工资为：{0}, 税后工资为：{1}'.format(salary, "%.2f" % res_money) 
    return "%.2f" % abs(actual_tax)

def calculate_base(min, max, number):
    if number > max:
        return max
    elif number < min:
        return min
    else:
        return number
    
def calculate_tax(tax_base):
    rate_table = [
        [0, 0, 0],
        [1500, 0.03, 0],
        [4500, 0.10, 105],
        [9000, 0.20, 555],
        [35000, 0.25, 1005],
        [55000, 0.30, 2755],
        [80000, 0.35, 5505],
        [maxint, 0.45, 13505]
    ]
        
    for level in rate_table:
        if tax_base <= level[0]:
            return tax_base * level[1] - level[2]


In [7]:
def is_money_equals(f1, f2):
    return abs(float(f1) - float(f2)) < 0.001

taxs = {}
for index, salary in full_data.iterrows():
    tax = calculator(salary['salary'])
    taxs[salary['name']] = tax
    if (is_money_equals(tax, salary['tax_maybe'])):
        print "so cool"
        


税前工资为：2500, 税后工资为：1925.00
so cool
税前工资为：7000, 税后工资为：5306.00
税前工资为：8000, 税后工资为：5999.00
税前工资为：9000, 税后工资为：6692.00
税前工资为：50000, 税后工资为：35072.43


## 扩展内容
将正确的纳税额导出到export.csv，并计算预测的正确率


In [8]:
def export() :
    header = ["name", "tax_actual"]
    df = pd.DataFrame(taxs.items(), columns=header)
    
    with_tax_data = full_data.merge(df, on='name')
    with_tax_data.to_csv(out_file, index=False)
    print "export done"
export()

# 加载数据集
export_data = pd.read_csv(out_file)

# 显示数据列表中的前几项数据
display(export_data.head())


export done


Unnamed: 0,name,salary,tax_maybe,tax_actual
0,wang,2500,0,0.0
1,zhang,7000,105,84.0
2,li,8000,205,161.0
3,song,9000,405,238.0
4,tang,50000,800,9595.33


In [14]:
def accuracy_score():
    tax_array = np.array(export_data)        
    score = float(np.sum(tax_array[:, 2]==tax_array[:, 3])) / export_data.shape[0] * 100
    return "accuracy of {:.2f}%.".format(score)
    
accuracy_score()

'accuracy of 20.00%.'

> **注意**: 当你写完了所有**4个TODO**。你就可以把你的 iPython Notebook 导出成 HTML 文件。你可以在菜单栏，这样导出**File -> Download as -> HTML (.html)** 把这个 HTML 和这个 iPython notebook 一起做为你的作业提交。