# Sorting and Renaming

In [1]:
import pandas as pd
import numpy as np

exam_scores = pd.read_csv('data/exam_scores.csv')

## 1. Sorting 排序
- The data present in the DataFrame ‘exam_scores’ is in the default index order not in a value order.  DataFrame“exam_scores”中存储的数据是按默认索引顺序排列的，而不是按值排序的。
- Pandas provides a method called sort_values() which returns the sorted result in value order.  Pandas 提供了一个名为 sort_values() 的方法，该方法按值顺序返回排序结果。

In [3]:
# By default the sorting happens in an ascending order:  默认情况下，排序是按升序进行的：
exam_scores.sort_values(by = 'math score').head()

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
949,male,group C,high school,free/reduced,none,15,18,10
854,male,group C,high school,free/reduced,none,18,30,18
891,female,group C,high school,free/reduced,none,23,31,27
739,female,group E,some high school,free/reduced,none,25,35,38
349,male,group B,some high school,free/reduced,none,25,31,30


In [4]:
# We can get the sorted result in descending (decreasing) order by passing the parameter ‘ascending’ as False in the sort_values() method:  通过在 sort_values() 方法中传入参数 'ascending' 为 False，我们可以得到降序（减少）的排序结果：
exam_scores.sort_values(by = 'math score', ascending = False).head()

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
911,male,group E,some college,standard,none,100,95,87
776,male,group E,associate's degree,standard,completed,100,100,100
495,male,group C,master's degree,standard,completed,100,96,100
565,male,group E,bachelor's degree,standard,completed,100,100,100
588,male,group D,some college,standard,completed,100,85,91


In [5]:
# We can also sort a series using the sort_values() method:
# 我们也可以使用 sort_values() 方法对一系列数据进行排序：
exam_scores['math score'].sort_values(ascending = False)[:10]

911    100
776    100
495    100
565    100
588    100
128    100
91     100
620     99
844     99
744     98
Name: math score, dtype: int64

## 2. Renaming 重命名
- Most of the time we get dataset where column names are not satisfactory.  大多数情况下，我们获取的数据集列名并不令人满意。
- For example in this dataset, 'math score', 'reading score' and 'writing score' contain spaces in their names due to which we are not able to use attribute (dot) selection style method to select a particular column.  例如在这个数据集中，'math score'、'reading score'和'writing score'这些列名中包含空格，因此我们无法使用属性（点）选择样式的方法来选择特定的列。
- Pandas provides a function ‘rename()’ to rename column/indexes in a DataFrame.  Pandas 提供了一个函数‘rename()’来重命名 DataFrame 中的列/索引。

In [6]:
# rename() inplace modifies the DataFrame in place, meaning it changes the original DataFrame without creating a new one.  rename() inplace 修改 DataFrame，意味着它会改变原始 DataFrame，而不会创建一个新的。
exam_scores.rename(columns={
    'race/ethnicity': 'race',
    'parental level of education': 'parental_education_level',
    'test preparation course': 'test_preparation_course',
    'math score': 'math_score',
    'reading score': 'reading_score',
    'writing score': 'writing_score'
}, inplace=True)

In [7]:
# We can get the list of all the columns using the attribute columns.
# 我们可以使用 columns 属性获取所有列的列表。
exam_scores.columns

Index(['gender', 'race', 'parental_education_level', 'lunch',
       'test_preparation_course', 'math_score', 'reading_score',
       'writing_score'],
      dtype='object')

In [9]:
# We can also rename the indexes of the DataFrame using the rename() function as shown in the image below:
# 我们还可以像下图所示，使用 rename()函数重命名 DataFrame 的索引。
exam_scores.rename(index={
    0: 'first_student',
    1: 'second_student',
    2: 'third_student'
}, inplace=True)
exam_scores.head()

Unnamed: 0,gender,race,parental_education_level,lunch,test_preparation_course,math_score,reading_score,writing_score
first_student,male,group B,bachelor's degree,standard,none,74,68,67
second_student,female,group C,some college,standard,completed,58,68,66
third_student,male,group C,some college,free/reduced,none,66,65,65
3,female,group D,bachelor's degree,free/reduced,none,74,75,73
4,male,group D,some college,standard,none,78,77,71


## 3. Exercise 练习

In [10]:
sma_data = pd.read_csv('data/Standard_Metropolitan_Areas_Data-data.csv')
sma_data.head(3)

Unnamed: 0,land_area,percent_city,percent_senior,physicians,hospital_beds,graduates,work_force,income,region,crime_rate
0,1384,78.1,12.3,25627,69678,50.1,4083.9,72100,1,75.55
1,3719,43.9,9.4,13326,43292,53.9,3305.9,54542,2,56.03
2,3553,37.4,10.7,9724,33731,50.6,2066.3,33216,1,41.32


In [11]:
sorted_data1 = sma_data.sort_values(by='crime_rate', ascending=False)
sorted_data1.head()

Unnamed: 0,land_area,percent_city,percent_senior,physicians,hospital_beds,graduates,work_force,income,region,crime_rate
20,9155,53.8,11.1,2280,6450,60.1,575.2,7766,4,85.62
74,1412,39.2,11.3,436,1837,49.4,154.2,2098,4,82.68
53,5966,39.5,9.6,737,1907,52.7,246.6,3007,4,80.94
4,2480,31.5,10.5,8502,16751,66.1,1514.5,26573,4,80.19
67,8152,22.3,9.1,405,1254,51.7,165.6,2257,4,78.1
