# Data Structures
 - Series
 - DataFrame

## Series
series 是一个一维的带标签的 类似数组的对象，能够容纳任意的python数据类型，它由 两个数组合成： 一个列是作为索引(index)或者标签(labels)， 另一个列数组是包含实际的数据。

In [1]:
import pandas as pd
S = pd.Series([11, 28, 72, 3, 5, 8])
print S

0    11
1    28
2    72
3     3
4     5
5     8
dtype: int64


In [2]:
print S.index
print S.values

RangeIndex(start=0, stop=6, step=1)
[11 28 72  3  5  8]


In [3]:
import numpy as np
X = np.array([11,28,72,3,5,8])
print(X)
print(S.values)
# 都是一样的额数据类型，numpy.ndarray
print type(S.values), type(X)

[11 28 72  3  5  8]
[11 28 72  3  5  8]
<type 'numpy.ndarray'> <type 'numpy.ndarray'>


下面可以看到，Series 可以更换索引，使其对应的索引不是数值，而是字符串

In [4]:
fruits = ['apples', 'oranges', 'cherries', 'pears']
quantities = [20, 33, 52, 10]
S = pd.Series(quantities, index=fruits)
print S

apples      20
oranges     33
cherries    52
pears       10
dtype: int64


## Series 基本运算
相同索引的Series可以直接进行运算操作

In [5]:
fruits = ['apples', 'oranges', 'cherries', 'pears']
S = pd.Series([20, 33, 52, 10], index=fruits)
S1 = pd.Series([1,2,3,4], index=fruits)
print S+S1
# print S-S1
# print S*S1
# print S/S1
# print S%10
# print S**2
print 'sum of S: ', sum(S)

apples      21
oranges     35
cherries    55
pears       14
dtype: int64
sum of S:  115


如果索引不同，也可以进行加法操作,不过 index 会进行 union 即求并集。如果不是同时存在，对应数值取NaN, 并且这次数据类型 dtype 变成 float64 了， 而上一次是 int64，为什么？ 猜测pandas内部对遇到某一方数据缺失的时候，默认转换成浮点型运算

In [6]:
fruits = ['apples', 'oranges', 'cherries', 'pears']
fruits1 = ['raspberries', 'oranges', 'cherries', 'pears']
S = pd.Series([20, 33, 52, 10], index=fruits)
S1 = pd.Series([17,0,31,32], index=fruits1)
print S + S1
# print S - S1
# print S * S1
# print S / S1
# print S % S1

apples          NaN
cherries       83.0
oranges        33.0
pears          42.0
raspberries     NaN
dtype: float64


In [7]:
fruits = ['apples', 'oranges', 'cherries', 'pears']
fruits_gr = ['Î¼Î®Î»Î±', 'Ï€Î¿ÏÏ„Î¿ÎºÎ¬Î»Î¹Î±', 'ÎºÎµÏÎ¬ÏƒÎ¹Î±', 'Î±Ï‡Î»Î¬Î´Î¹Î±']
S = pd.Series([20, 33, 52, 10], index=fruits)
S2 = pd.Series([17, 13, 31, 32], index=fruits_gr)
print(S+S2)

apples                 NaN
cherries               NaN
oranges                NaN
pears                  NaN
Î±Ï‡Î»Î¬Î´Î¹Î±         NaN
ÎºÎµÏÎ¬ÏƒÎ¹Î±         NaN
Î¼Î®Î»Î±               NaN
Ï€Î¿ÏÏ„Î¿ÎºÎ¬Î»Î¹Î±   NaN
dtype: float64


## Series数据访问
单个索引和索引数组,注意两者返回的数据类型不同！ 单索引返回的是 值， 索引数组返回的还是 Series
### 单索引访问

In [8]:
print(S['apples'])
print type(S['apples'])

20
<type 'numpy.int64'>


### 多索引访问

In [9]:
print(S[['apples', 'oranges', 'cherries']])
print type(S[['apples', 'oranges', 'cherries']])

apples      20
oranges     33
cherries    52
dtype: int64
<class 'pandas.core.series.Series'>


### 条件选择索引（boolean indexing for selection）

In [10]:
print S[S>30]

oranges     33
cherries    52
dtype: int64


# Series 函数操作
## numpy 函数
和numpy类似，可以直接使用 numpy 的函数应用到 S上面， 因为 S.values 其实就是numpy.ndarray

In [11]:
import numpy as np
print((S + 3) * 4)
print('========================')
print np.sin(S)

apples       92
oranges     144
cherries    220
pears        52
dtype: int64
apples      0.912945
oranges     0.999912
cherries    0.986628
pears      -0.544021
dtype: float64


## 更加灵活的pandas.Series.apply

Series.apply(func, convert_dtype=True, args=(), **kwds)

The function "func" will be applied to the Series and **it returns either a Series or a DataFrame, depending on "func".**

**Parameter	Meaning**
 - func	
   - a function, which can be a NumPy function that will be applied to the entire Series or a Python function that will be applied to every single value of the series
 - convert_dtype	
   - A boolean value. If it is set to True (default), apply will try to find better dtype for elementwise function results. If False, leave as dtype=object
 - args	
   - Positional arguments which will be passed to the function "func" additionally to the values from the series.
 - **kwds	
   - Additional keyword arguments will be passed as keywords to the function


In [12]:
print S.apply(np.sin)

apples      0.912945
oranges     0.999912
cherries    0.986628
pears      -0.544021
dtype: float64


In [13]:
print S.apply(lambda x: x if x > 50 else x+10)

apples      30
oranges     43
cherries    52
pears       20
dtype: int64


In [14]:
print S[S>30]

oranges     33
cherries    52
dtype: int64


我们发现 Series 其实很像 key-value 存储，其实他可以被当成 Python 中的**有序的固定长度的 dictionary**

In [15]:
print 'apples' in S

True


既然 Series 像一个词典，那当然**能够使用词典构造一个Series了！ 而且 Series 会自动排序 key。**

In [16]:
cities = {"London":   8615246, 
          "Berlin":   3562166, 
          "Madrid":   3165235, 
          "Rome":     2874038, 
          "Paris":    2273305, 
          "Vienna":   1805681, 
          "Bucharest":1803425, 
          "Hamburg":  1760433,
          "Budapest": 1754000,
          "Warsaw":   1740119,
          "Barcelona":1602386,
          "Munich":   1493900,
          "Milan":    1350680}
city_series = pd.Series(cities)
print(city_series)

Barcelona    1602386
Berlin       3562166
Bucharest    1803425
Budapest     1754000
Hamburg      1760433
London       8615246
Madrid       3165235
Milan        1350680
Munich       1493900
Paris        2273305
Rome         2874038
Vienna       1805681
Warsaw       1740119
dtype: int64


# 缺失数据处理（NaN）
pandas.Series()构造函数中的 index 参数就像是一个过滤器，**如果你指定了一个索引列表，同时又传入一个词典，那么 index 就会搜索词典中包含该key 的value，未能搜索到的 index 的值将会是 NaN，并且因为 NaN 的缘故，其他的 value 必须强制转换成 float 了**，因此我们在 Series 基本运算中看到，如果index 并不同时包含在两个Series中，会产生 NaN，NaN 会使得数据类型变成 float。

In [17]:
my_cities = ["London", "Paris", "Zurich", "Berlin", 
             "Stuttgart", "Hamburg"]
my_cities_series = pd.Series(cities, index=my_cities)
print my_cities_series

London       8615246.0
Paris        2273305.0
Zurich             NaN
Berlin       3562166.0
Stuttgart          NaN
Hamburg      1760433.0
dtype: float64


In [18]:
my_cities = ["London", "Paris", "Berlin", "Hamburg"]
my_city_series = pd.Series(cities, 
                           index=my_cities)
print my_city_series

London     8615246
Paris      2273305
Berlin     3562166
Hamburg    1760433
dtype: int64


## isnull() & notnull()
NaN 代表的是 “not a number” ,但是我们是可以使用 isnull 和 notnull 方法来检查数据缺失;

In [19]:
my_cities = ["London", "Paris", "Zurich", "Berlin", 
             "Stuttgart", "Hamburg"]
my_city_series = pd.Series(cities, 
                           index=my_cities)
print(my_city_series.isnull())
print(my_city_series.notnull())

London       False
Paris        False
Zurich        True
Berlin       False
Stuttgart     True
Hamburg      False
dtype: bool
London        True
Paris         True
Zurich       False
Berlin        True
Stuttgart    False
Hamburg       True
dtype: bool


对于python 中的 None 关键字，pandas 也会处理为 NaN;

In [20]:
d = {"a":23, "b":45, "c":None, "d":0}
S = pd.Series(d)
print(S)

a    23.0
b    45.0
c     NaN
d     0.0
dtype: float64


## 过滤和填充缺失数据

In [21]:
print my_city_series.dropna()
print my_city_series.fillna(0)
missing_cities = {"Stuttgart":597939, "Zurich":378884}
print my_city_series.fillna(missing_cities)

London     8615246.0
Paris      2273305.0
Berlin     3562166.0
Hamburg    1760433.0
dtype: float64
London       8615246.0
Paris        2273305.0
Zurich             0.0
Berlin       3562166.0
Stuttgart          0.0
Hamburg      1760433.0
dtype: float64
London       8615246.0
Paris        2273305.0
Zurich        378884.0
Berlin       3562166.0
Stuttgart     597939.0
Hamburg      1760433.0
dtype: float64


# DataFrame
带行和列的表格

In [22]:
cities = {"name": ["London", "Berlin", "Madrid", "Rome", 
                   "Paris", "Vienna", "Bucharest", "Hamburg", 
                   "Budapest", "Warsaw", "Barcelona", 
                   "Munich", "Milan"],
          "population": [8615246, 3562166, 3165235, 2874038,
                         2273305, 1805681, 1803425, 1760433,
                         1754000, 1740119, 1602386, 1493900,
                         1350680],
          "country": ["England", "Germany", "Spain", "Italy",
                      "France", "Austria", "Romania", 
                      "Germany", "Hungary", "Poland", "Spain",
                      "Germany", "Italy"]}
city_frame = pd.DataFrame(cities)
print(city_frame)

    country       name  population
0   England     London     8615246
1   Germany     Berlin     3562166
2     Spain     Madrid     3165235
3     Italy       Rome     2874038
4    France      Paris     2273305
5   Austria     Vienna     1805681
6   Romania  Bucharest     1803425
7   Germany    Hamburg     1760433
8   Hungary   Budapest     1754000
9    Poland     Warsaw     1740119
10    Spain  Barcelona     1602386
11  Germany     Munich     1493900
12    Italy      Milan     1350680


## 同样支持自定义索引

In [23]:
ordinals = ["first", "second", "third", "fourth",
            "fifth", "sixth", "seventh", "eigth",
            "ninth", "tenth", "eleventh", "twelvth",
            "thirteenth"]
city_frame = pd.DataFrame(cities, index=ordinals)
print(city_frame)

            country       name  population
first       England     London     8615246
second      Germany     Berlin     3562166
third         Spain     Madrid     3165235
fourth        Italy       Rome     2874038
fifth        France      Paris     2273305
sixth       Austria     Vienna     1805681
seventh     Romania  Bucharest     1803425
eigth       Germany    Hamburg     1760433
ninth       Hungary   Budapest     1754000
tenth        Poland     Warsaw     1740119
eleventh      Spain  Barcelona     1602386
twelvth     Germany     Munich     1493900
thirteenth    Italy      Milan     1350680


## 列序重排
在构建的时候进行

In [24]:
city_frame = pd.DataFrame(cities, columns=['name', 'country', 'population'], index=ordinals)
print city_frame

                 name  country  population
first          London  England     8615246
second         Berlin  Germany     3562166
third          Madrid    Spain     3165235
fourth           Rome    Italy     2874038
fifth           Paris   France     2273305
sixth          Vienna  Austria     1805681
seventh     Bucharest  Romania     1803425
eigth         Hamburg  Germany     1760433
ninth        Budapest  Hungary     1754000
tenth          Warsaw   Poland     1740119
eleventh    Barcelona    Spain     1602386
twelvth        Munich  Germany     1493900
thirteenth      Milan    Italy     1350680


调整

In [25]:
city_frame = city_frame.reindex_axis(['name', 'country', 'population'], axis=1)
print city_frame

                 name  country  population
first          London  England     8615246
second         Berlin  Germany     3562166
third          Madrid    Spain     3165235
fourth           Rome    Italy     2874038
fifth           Paris   France     2273305
sixth          Vienna  Austria     1805681
seventh     Bucharest  Romania     1803425
eigth         Hamburg  Germany     1760433
ninth        Budapest  Hungary     1754000
tenth          Warsaw   Poland     1740119
eleventh    Barcelona    Spain     1602386
twelvth        Munich  Germany     1493900
thirteenth      Milan    Italy     1350680


## 已存在列作为行的索引

In [26]:
city_frame = pd.DataFrame(cities, columns=["name", "population"], index=cities['country'])
print city_frame

              name  population
England     London     8615246
Germany     Berlin     3562166
Spain       Madrid     3165235
Italy         Rome     2874038
France       Paris     2273305
Austria     Vienna     1805681
Romania  Bucharest     1803425
Germany    Hamburg     1760433
Hungary   Budapest     1754000
Poland      Warsaw     1740119
Spain    Barcelona     1602386
Germany     Munich     1493900
Italy        Milan     1350680


另外，我们还可以使用 DataFrame.set_index() 方法来设置索引，但是 set_index() 不是 work in-place 的，他会返回一个新的DataFrame;

In [27]:
city_frame = pd.DataFrame(cities)
city_frame1 = city_frame.set_index("country")
print city_frame1

              name  population
country                       
England     London     8615246
Germany     Berlin     3562166
Spain       Madrid     3165235
Italy         Rome     2874038
France       Paris     2273305
Austria     Vienna     1805681
Romania  Bucharest     1803425
Germany    Hamburg     1760433
Hungary   Budapest     1754000
Poland      Warsaw     1740119
Spain    Barcelona     1602386
Germany     Munich     1493900
Italy        Milan     1350680


## Sum & Cumulative Sum

In [28]:
print(city_frame.sum())

country       EnglandGermanySpainItalyFranceAustriaRomaniaGe...
name          LondonBerlinMadridRomeParisViennaBucharestHamb...
population                                             33800614
dtype: object


In [29]:
print city_frame["population"].sum()

33800614


In [30]:
x = city_frame["population"].cumsum()
print x

0      8615246
1     12177412
2     15342647
3     18216685
4     20489990
5     22295671
6     24099096
7     25859529
8     27613529
9     29353648
10    30956034
11    32449934
12    33800614
Name: population, dtype: int64


## 给某一列分配新值

In [31]:
city_frame['population'] = x
print city_frame

    country       name  population
0   England     London     8615246
1   Germany     Berlin    12177412
2     Spain     Madrid    15342647
3     Italy       Rome    18216685
4    France      Paris    20489990
5   Austria     Vienna    22295671
6   Romania  Bucharest    24099096
7   Germany    Hamburg    25859529
8   Hungary   Budapest    27613529
9    Poland     Warsaw    29353648
10    Spain  Barcelona    30956034
11  Germany     Munich    32449934
12    Italy      Milan    33800614


## 新增加一列

In [32]:
city_frame['cum_population'] = city_frame['population'].cumsum()
print city_frame

    country       name  population  cum_population
0   England     London     8615246         8615246
1   Germany     Berlin    12177412        20792658
2     Spain     Madrid    15342647        36135305
3     Italy       Rome    18216685        54351990
4    France      Paris    20489990        74841980
5   Austria     Vienna    22295671        97137651
6   Romania  Bucharest    24099096       121236747
7   Germany    Hamburg    25859529       147096276
8   Hungary   Budapest    27613529       174709805
9    Poland     Warsaw    29353648       204063453
10    Spain  Barcelona    30956034       235019487
11  Germany     Munich    32449934       267469421
12    Italy      Milan    33800614       301270035


## 存取DataFrame的某些列

### dict-like

In [33]:
print(city_frame["population"])

0      8615246
1     12177412
2     15342647
3     18216685
4     20489990
5     22295671
6     24099096
7     25859529
8     27613529
9     29353648
10    30956034
11    32449934
12    33800614
Name: population, dtype: int64


### object-attributes

In [34]:
print(city_frame.population)

0      8615246
1     12177412
2     15342647
3     18216685
4     20489990
5     22295671
6     24099096
7     25859529
8     27613529
9     29353648
10    30956034
11    32449934
12    33800614
Name: population, dtype: int64


In [35]:
print(type(city_frame.population))

<class 'pandas.core.series.Series'>


## 存取行

In [36]:
city_frame = pd.DataFrame(cities,
                          columns=["country", 
                                   "area",
                                   "population"],
                          index=cities["name"])
print(city_frame)
# 必须指定了索引的 DataFrame 才可以这样
city_frame.ix['Vienna']

           country area  population
London     England  NaN     8615246
Berlin     Germany  NaN     3562166
Madrid       Spain  NaN     3165235
Rome         Italy  NaN     2874038
Paris       France  NaN     2273305
Vienna     Austria  NaN     1805681
Bucharest  Romania  NaN     1803425
Hamburg    Germany  NaN     1760433
Budapest   Hungary  NaN     1754000
Warsaw      Poland  NaN     1740119
Barcelona    Spain  NaN     1602386
Munich     Germany  NaN     1493900
Milan        Italy  NaN     1350680


country       Austria
area              NaN
population    1805681
Name: Vienna, dtype: object

In [37]:
city_frame['area'] = 1572
#print city_frame
areas = [1572, 891.85, 605.77, 1285, 
        105.4, 414.6, 228, 755, 
        525.2, 517, 101.9, 310.4, 
        181.8]
city_frame["area"] = areas
print city_frame

           country     area  population
London     England  1572.00     8615246
Berlin     Germany   891.85     3562166
Madrid       Spain   605.77     3165235
Rome         Italy  1285.00     2874038
Paris       France   105.40     2273305
Vienna     Austria   414.60     1805681
Bucharest  Romania   228.00     1803425
Hamburg    Germany   755.00     1760433
Budapest   Hungary   525.20     1754000
Warsaw      Poland   517.00     1740119
Barcelona    Spain   101.90     1602386
Munich     Germany   310.40     1493900
Milan        Italy   181.80     1350680


## DataFrame 排序 Sorting

In [38]:
city_frame = city_frame.sort_values(by="area", ascending=False)
print(city_frame)

           country     area  population
London     England  1572.00     8615246
Rome         Italy  1285.00     2874038
Berlin     Germany   891.85     3562166
Hamburg    Germany   755.00     1760433
Madrid       Spain   605.77     3165235
Budapest   Hungary   525.20     1754000
Warsaw      Poland   517.00     1740119
Vienna     Austria   414.60     1805681
Munich     Germany   310.40     1493900
Bucharest  Romania   228.00     1803425
Milan        Italy   181.80     1350680
Paris       France   105.40     2273305
Barcelona    Spain   101.90     1602386


## 使用 Series 填充某些行

In [39]:
city_frame = pd.DataFrame(cities,
                          columns=["name",
                                   "country",
                                   "area",
                                   "population"],
                          index=ordinals)
some_areas = pd.Series([1572, 755, 181.8], index=['first', 'eigth', 'thirteenth'])
city_frame['area'] = some_areas
print(city_frame)

                 name  country    area  population
first          London  England  1572.0     8615246
second         Berlin  Germany     NaN     3562166
third          Madrid    Spain     NaN     3165235
fourth           Rome    Italy     NaN     2874038
fifth           Paris   France     NaN     2273305
sixth          Vienna  Austria     NaN     1805681
seventh     Bucharest  Romania     NaN     1803425
eigth         Hamburg  Germany   755.0     1760433
ninth        Budapest  Hungary     NaN     1754000
tenth          Warsaw   Poland     NaN     1740119
eleventh    Barcelona    Spain     NaN     1602386
twelvth        Munich  Germany     NaN     1493900
thirteenth      Milan    Italy   181.8     1350680


## 嵌套词典也可以构造 DataFrame
最外层的键值作为列索引，内层的键值作为行索引;

In [40]:
growth = {"Switzerland": {"2010": 3.0, "2011": 1.8, "2012": 1.1, "2013": 1.9},
          "Germany": {"2010": 4.1, "2011": 3.6, "2012":	0.4, "2013": 0.1},
          "France": {"2010":2.0,  "2011":2.1, "2012": 0.3, "2013": 0.3},
          "Greece": {"2010":-5.4, "2011":-8.9, "2012":-6.6, "2013":	-3.3},
          "Italy": {"2010":1.7, "2011":	0.6, "2012":-2.3, "2013":-1.9}
          } 
growth_frame = pd.DataFrame(growth)
print growth_frame

      France  Germany  Greece  Italy  Switzerland
2010     2.0      4.1    -5.4    1.7          3.0
2011     2.1      3.6    -8.9    0.6          1.8
2012     0.3      0.4    -6.6   -2.3          1.1
2013     0.3      0.1    -3.3   -1.9          1.9


In [41]:
print growth_frame.T

growth_frame = growth_frame.T
growth_frame2 = growth_frame.reindex(["Switzerland", 
                                      "Italy", 
                                      "Germany", 
                                      "Greece"])
print(growth_frame2)

             2010  2011  2012  2013
France        2.0   2.1   0.3   0.3
Germany       4.1   3.6   0.4   0.1
Greece       -5.4  -8.9  -6.6  -3.3
Italy         1.7   0.6  -2.3  -1.9
Switzerland   3.0   1.8   1.1   1.9
             2010  2011  2012  2013
Switzerland   3.0   1.8   1.1   1.9
Italy         1.7   0.6  -2.3  -1.9
Germany       4.1   3.6   0.4   0.1
Greece       -5.4  -8.9  -6.6  -3.3


## DataFrame随机值填充

In [42]:
import numpy as np
names = ['Frank', 'Eve', 'Stella', 'Guido', 'Lara']
index = ["January", "February", "March",
         "April", "May", "June",
         "July", "August", "September",
         "October", "November", "December"]
df = pd.DataFrame(np.random.randn(12, 5)*1000,
                columns=names,
                index=index)
print df

                 Frank          Eve       Stella        Guido         Lara
January     -41.342964   249.009406  -647.466035   751.314453   441.868274
February    956.824398  -167.318696 -1083.490258  -995.829287  -877.601788
March      1466.044230   632.017203  -525.223070   672.343872   460.942635
April     -1233.838912   -46.299182  -450.327146  -634.493488 -1676.538130
May        1434.765215 -1272.686558  -627.864916  -530.043167 -1668.450592
June        473.843358  -487.866527  1328.077715  1029.582500   697.617177
July       -537.167445  1571.812813   110.560442  1162.429808  -940.173846
August      650.028187   950.377605  1563.134332 -1364.742179   576.063982
September -1095.773170   428.511886 -1562.632470  -115.067326  1107.632149
October    -944.859200   598.465845   306.620540  1125.161608   421.237729
November   1644.473375 -1557.357165  -638.299452  1467.625968  -884.354502
December   -384.852600  -479.660812  -125.894638  -883.094395   167.841563


## 多层索引技术
比较类似于 数据库中的 组合key， 即索引是一个 组合键值， 这里 index 就是 [("Vienna", "country"),("Vienna", "area"),("Vienna", "population"),...,()]

In [43]:
import pandas as pd
cities = ["Vienna", "Vienna", "Vienna",
          "Hamburg", "Hamburg", "Hamburg",
          "Berlin", "Berlin", "Berlin",
          "ZÃ¼rich", "ZÃ¼rich", "ZÃ¼rich"]
index = [cities, ["country", "area", "population",
                  "country", "area", "population",
                  "country", "area", "population",
                  "country", "area", "population"]]
# c = ["Vienna", "Vienna", "Vienna",
#           "Hamburg", "Hamburg", "Hamburg",
#           "Berlin", "Berlin", "Berlin",
#           "ZÃ¼rich", "ZÃ¼rich", "ZÃ¼rich"]
# o = ["country", "area", "population",
#      "country", "area", "population",
#      "country", "area", "population",
#      "country", "area", "population"]
# index = zip(c,o)
print(index)


[['Vienna', 'Vienna', 'Vienna', 'Hamburg', 'Hamburg', 'Hamburg', 'Berlin', 'Berlin', 'Berlin', 'Z\xc3\x83\xc2\xbcrich', 'Z\xc3\x83\xc2\xbcrich', 'Z\xc3\x83\xc2\xbcrich'], ['country', 'area', 'population', 'country', 'area', 'population', 'country', 'area', 'population', 'country', 'area', 'population']]


In [44]:
data = ["Austria", 414.60,     1805681,
        "Germany",   755.00,     1760433,
        "Germany",   891.85,     3562166,
        "Switzerland", 87.88, 378884]
city_series = pd.Series(data, index=index)
print(city_series)

Vienna   country           Austria
         area                414.6
         population        1805681
Hamburg  country           Germany
         area                  755
         population        1760433
Berlin   country           Germany
         area               891.85
         population        3562166
ZÃ¼rich  country       Switzerland
         area                87.88
         population         378884
dtype: object


In [45]:
print(city_series["Vienna"])

country       Austria
area            414.6
population    1805681
dtype: object


In [46]:
print(city_series["Vienna"]["area"])
print(city_series["Vienna", "area"])

414.6
414.6


In [47]:
print(city_series[["Hamburg", "Berlin"]])

Hamburg  country       Germany
         area              755
         population    1760433
Berlin   country       Germany
         area           891.85
         population    3562166
dtype: object


### 切片索引

In [48]:
city_series = city_series.sort_index()
print("city_series with sorted index:")
print(city_series)
print("\n\nSlicing the city_series:")
print(city_series["Berlin":"Vienna"])

city_series with sorted index:
Berlin   area               891.85
         country           Germany
         population        3562166
Hamburg  area                  755
         country           Germany
         population        1760433
Vienna   area                414.6
         country           Austria
         population        1805681
ZÃ¼rich  area                87.88
         country       Switzerland
         population         378884
dtype: object


Slicing the city_series:
Berlin   area           891.85
         country       Germany
         population    3562166
Hamburg  area              755
         country       Germany
         population    1760433
Vienna   area            414.6
         country       Austria
         population    1805681
dtype: object
