## Pandas

Python Advanced Data Analysis Toolkit (= Pandas)

Hauptwerkzeug zur Verarbeitung von Daten

In [1]:
import numpy as np
import pandas as pd

### Serie

Ein Numpy Array mit extra Eigenschaften

Eigenschaften:
- Name
- Benannter Index
- ...

Auf das unterliegende Numpy Array kann immer zugegriffen werden

In [3]:
pd.Series([8.6, 83.7, 0.38, 9])  # Diese Zahlen alleine sagen nicht viel aus

0     8.60
1    83.70
2     0.38
3     9.00
dtype: float64

In [4]:
e = pd.Series([8.6, 83.7, 0.38, 9])

#### Name

In [9]:
e.name = "Einwohnerzahlen (Millionen)"

In [10]:
e

0     8.60
1    83.70
2     0.38
3     9.00
Name: Einwohnerzahlen (Millionen), dtype: float64

#### Index

In [11]:
e.index = ["CH", "DE", "LI", "AT"]

In [12]:
e

CH     8.60
DE    83.70
LI     0.38
AT     9.00
Name: Einwohnerzahlen (Millionen), dtype: float64

In [13]:
e["DE"]

np.float64(83.7)

In [16]:
e["CH":"DE"]  # Obergrenze inkludiert

CH     8.6
DE    83.7
Name: Einwohnerzahlen (Millionen), dtype: float64

In [19]:
e[0:2]  # Obergrenze exkludiert

CH     8.6
DE    83.7
Name: Einwohnerzahlen (Millionen), dtype: float64

In [22]:
e.values  # Das unterliegende Numpy Array

array([ 8.6 , 83.7 ,  0.38,  9.  ])

#### Vektorisierung

In [23]:
e > 5

CH     True
DE     True
LI    False
AT     True
Name: Einwohnerzahlen (Millionen), dtype: bool

In [24]:
e[e > 5]

CH     8.6
DE    83.7
AT     9.0
Name: Einwohnerzahlen (Millionen), dtype: float64

In [25]:
e.mean()

np.float64(25.419999999999998)

In [27]:
e > e.mean()

CH    False
DE     True
LI    False
AT    False
Name: Einwohnerzahlen (Millionen), dtype: bool

In [26]:
e[e > e.mean()]

DE    83.7
Name: Einwohnerzahlen (Millionen), dtype: float64

### DataFrame

Effektiv eine Tabelle

Wird generell aus einer Datenquelle erzeugt

In [55]:
pd.DataFrame({"Spalte1": [1, 2, 3], "Spalte2": [4, 5, 6], "Spalte3": [7, 8, 9]})  # Aus einem Dictionary heraus

Unnamed: 0,Spalte1,Spalte2,Spalte3
0,1,4,7
1,2,5,8
2,3,6,9


#### read_csv

Daten aus einer CSV-Quelle einlesen

Wird per Parameter konfiguriert

Beispiele:
- delimiter
- thousands
- decimal
- parse_dates
- index_col
- ...

https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

In [56]:
data = pd.read_csv("Data/PopulationData.csv", delimiter=";", thousands=",", decimal=".")

In [57]:
data

Unnamed: 0,#,Country (or dependency),Population(2020),YearlyChange,NetChange,Density(P/Km²),Land Area(Km²),Migrants(net),Fert.Rate,Med.Age,UrbanPop %,WorldShare
0,1,China,1439323776,0.39 %,5540090,153,9388211,-348399.0,1.7,38,61 %,18.47 %
1,2,India,1380004385,0.99 %,13586631,464,2973190,-532687.0,2.2,28,35 %,17.70 %
2,3,United States,331002651,0.59 %,1937734,36,9147420,954806.0,1.8,38,83 %,4.25 %
3,4,Indonesia,273523615,1.07 %,2898047,151,1811570,-98955.0,2.3,30,56 %,3.51 %
4,5,Pakistan,220892340,2.00 %,4327022,287,770880,-233379.0,3.6,23,35 %,2.83 %
...,...,...,...,...,...,...,...,...,...,...,...,...
230,231,Montserrat,4992,0.06 %,3,50,100,,N.A.,N.A.,10 %,0.00 %
231,232,Falkland Islands,3480,3.05 %,103,0,12170,,N.A.,N.A.,66 %,0.00 %
232,233,Niue,1626,0.68 %,11,6,260,,N.A.,N.A.,46 %,0.00 %
233,234,Tokelau,1357,1.27 %,17,136,10,,N.A.,N.A.,0 %,0.00 %


In [58]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 235 entries, 0 to 234
Data columns (total 12 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   #                        235 non-null    int64  
 1   Country (or dependency)  235 non-null    object 
 2   Population(2020)         235 non-null    int64  
 3   YearlyChange             235 non-null    object 
 4   NetChange                235 non-null    int64  
 5   Density(P/Km²)           235 non-null    int64  
 6   Land Area(Km²)           235 non-null    int64  
 7   Migrants(net)            201 non-null    float64
 8   Fert.Rate                235 non-null    object 
 9   Med.Age                  235 non-null    object 
 10  UrbanPop %               235 non-null    object 
 11  WorldShare               235 non-null    object 
dtypes: float64(1), int64(5), object(6)
memory usage: 22.2+ KB


#### Probleme mit dem Datenset

DataFrame:
- Schlechte Spaltennamen
- Doppelter Index

Daten selbst:
- Prozentzeichen
- N.A.
- NaN
- Datentypen

#### Index setzen

set_index("Name") ODER index_col bei read_csv(...)

WICHTIG: Jeder Befehl erzeugt immer eine Kopie des Datensets (mit der Änderung)

Zwei Optionen:
- Bestehende Variable überschreiben
- inplace

In [59]:
data.set_index("Country (or dependency)", inplace=True)

In [60]:
data

Unnamed: 0_level_0,#,Population(2020),YearlyChange,NetChange,Density(P/Km²),Land Area(Km²),Migrants(net),Fert.Rate,Med.Age,UrbanPop %,WorldShare
Country (or dependency),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
China,1,1439323776,0.39 %,5540090,153,9388211,-348399.0,1.7,38,61 %,18.47 %
India,2,1380004385,0.99 %,13586631,464,2973190,-532687.0,2.2,28,35 %,17.70 %
United States,3,331002651,0.59 %,1937734,36,9147420,954806.0,1.8,38,83 %,4.25 %
Indonesia,4,273523615,1.07 %,2898047,151,1811570,-98955.0,2.3,30,56 %,3.51 %
Pakistan,5,220892340,2.00 %,4327022,287,770880,-233379.0,3.6,23,35 %,2.83 %
...,...,...,...,...,...,...,...,...,...,...,...
Montserrat,231,4992,0.06 %,3,50,100,,N.A.,N.A.,10 %,0.00 %
Falkland Islands,232,3480,3.05 %,103,0,12170,,N.A.,N.A.,66 %,0.00 %
Niue,233,1626,0.68 %,11,6,260,,N.A.,N.A.,46 %,0.00 %
Tokelau,234,1357,1.27 %,17,136,10,,N.A.,N.A.,0 %,0.00 %


In [61]:
data.index.name = "Country"

In [62]:
data

Unnamed: 0_level_0,#,Population(2020),YearlyChange,NetChange,Density(P/Km²),Land Area(Km²),Migrants(net),Fert.Rate,Med.Age,UrbanPop %,WorldShare
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
China,1,1439323776,0.39 %,5540090,153,9388211,-348399.0,1.7,38,61 %,18.47 %
India,2,1380004385,0.99 %,13586631,464,2973190,-532687.0,2.2,28,35 %,17.70 %
United States,3,331002651,0.59 %,1937734,36,9147420,954806.0,1.8,38,83 %,4.25 %
Indonesia,4,273523615,1.07 %,2898047,151,1811570,-98955.0,2.3,30,56 %,3.51 %
Pakistan,5,220892340,2.00 %,4327022,287,770880,-233379.0,3.6,23,35 %,2.83 %
...,...,...,...,...,...,...,...,...,...,...,...
Montserrat,231,4992,0.06 %,3,50,100,,N.A.,N.A.,10 %,0.00 %
Falkland Islands,232,3480,3.05 %,103,0,12170,,N.A.,N.A.,66 %,0.00 %
Niue,233,1626,0.68 %,11,6,260,,N.A.,N.A.,46 %,0.00 %
Tokelau,234,1357,1.27 %,17,136,10,,N.A.,N.A.,0 %,0.00 %


#### Spalten anpassen

Spalten entfernen: drop(...)

Spalten umbenennen: rename(...)

rename benötigt ein Dictionary mit dem columns= Parameter

In [64]:
data.drop(columns=["#"], inplace=True)

In [65]:
data

Unnamed: 0_level_0,Population(2020),YearlyChange,NetChange,Density(P/Km²),Land Area(Km²),Migrants(net),Fert.Rate,Med.Age,UrbanPop %,WorldShare
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
China,1439323776,0.39 %,5540090,153,9388211,-348399.0,1.7,38,61 %,18.47 %
India,1380004385,0.99 %,13586631,464,2973190,-532687.0,2.2,28,35 %,17.70 %
United States,331002651,0.59 %,1937734,36,9147420,954806.0,1.8,38,83 %,4.25 %
Indonesia,273523615,1.07 %,2898047,151,1811570,-98955.0,2.3,30,56 %,3.51 %
Pakistan,220892340,2.00 %,4327022,287,770880,-233379.0,3.6,23,35 %,2.83 %
...,...,...,...,...,...,...,...,...,...,...
Montserrat,4992,0.06 %,3,50,100,,N.A.,N.A.,10 %,0.00 %
Falkland Islands,3480,3.05 %,103,0,12170,,N.A.,N.A.,66 %,0.00 %
Niue,1626,0.68 %,11,6,260,,N.A.,N.A.,46 %,0.00 %
Tokelau,1357,1.27 %,17,136,10,,N.A.,N.A.,0 %,0.00 %


In [68]:
data.rename(columns={
    "Population(2020)": "Population",
    "Density(P/Km²)": "Density",
    "Land Area(Km²)": "LandArea",
    "Migrants(net)": "Migrants",
    "Fert.Rate": "FertRate",
    "Med.Age": "MedAge",
    "UrbanPop %": "UrbanPopPct",
    "WorldShare": "WorldSharePct"
}, inplace=True)

In [69]:
data

Unnamed: 0_level_0,Population,YearlyChange,NetChange,Density,LandArea,Migrants,FertRate,MedAge,UrbanPopPct,WorldSharePct
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
China,1439323776,0.39 %,5540090,153,9388211,-348399.0,1.7,38,61 %,18.47 %
India,1380004385,0.99 %,13586631,464,2973190,-532687.0,2.2,28,35 %,17.70 %
United States,331002651,0.59 %,1937734,36,9147420,954806.0,1.8,38,83 %,4.25 %
Indonesia,273523615,1.07 %,2898047,151,1811570,-98955.0,2.3,30,56 %,3.51 %
Pakistan,220892340,2.00 %,4327022,287,770880,-233379.0,3.6,23,35 %,2.83 %
...,...,...,...,...,...,...,...,...,...,...
Montserrat,4992,0.06 %,3,50,100,,N.A.,N.A.,10 %,0.00 %
Falkland Islands,3480,3.05 %,103,0,12170,,N.A.,N.A.,66 %,0.00 %
Niue,1626,0.68 %,11,6,260,,N.A.,N.A.,46 %,0.00 %
Tokelau,1357,1.27 %,17,136,10,,N.A.,N.A.,0 %,0.00 %


#### Daten speichern

In [70]:
data.to_csv("Data/PopulationDataFastFertig.csv")

#### Daten analysieren

Verschiedene Funktionen, um Erkenntnisse aus den Daten zu ziehen

In [75]:
data.head(3)  # Oberste 3 DS

Unnamed: 0_level_0,Population,YearlyChange,NetChange,Density,LandArea,Migrants,FertRate,MedAge,UrbanPopPct,WorldSharePct
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
China,1439323776,0.39 %,5540090,153,9388211,-348399.0,1.7,38,61 %,18.47 %
India,1380004385,0.99 %,13586631,464,2973190,-532687.0,2.2,28,35 %,17.70 %
United States,331002651,0.59 %,1937734,36,9147420,954806.0,1.8,38,83 %,4.25 %


In [76]:
data.tail(3)  # Unterste 3 DS

Unnamed: 0_level_0,Population,YearlyChange,NetChange,Density,LandArea,Migrants,FertRate,MedAge,UrbanPopPct,WorldSharePct
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Niue,1626,0.68 %,11,6,260,,N.A.,N.A.,46 %,0.00 %
Tokelau,1357,1.27 %,17,136,10,,N.A.,N.A.,0 %,0.00 %
Vatican State,801,0.25 %,2,2003,0,,N.A.,N.A.,N.A.,0.00 %


#### Sortieren

sort_index()

sort_values(["S1", "S2"])

Alle Sortierung können auch mit ascending=True/False angegeben werden

In [82]:
data.sort_index()

Unnamed: 0_level_0,Population,YearlyChange,NetChange,Density,LandArea,Migrants,FertRate,MedAge,UrbanPopPct,WorldSharePct
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Afghanistan,38928346,2.33 %,886592,60,652860,-62920.0,4.6,18,25 %,0.50 %
Albania,2877797,-0.11 %,-3120,105,27400,-14000.0,1.6,36,63 %,0.04 %
Algeria,43851044,1.85 %,797990,18,2381740,-10000.0,3.1,29,73 %,0.56 %
American Samoa,55191,-0.22 %,-121,276,200,,N.A.,N.A.,88 %,0.00 %
Andorra,77265,0.16 %,123,164,470,,N.A.,N.A.,88 %,0.00 %
...,...,...,...,...,...,...,...,...,...,...
Wallis & Futuna,11239,-1.69 %,-193,80,140,,N.A.,N.A.,0 %,0.00 %
Western Sahara,597339,2.55 %,14876,2,266000,5582.0,2.4,28,87 %,0.01 %
Yemen,29825964,2.28 %,664042,56,527970,-30000.0,3.8,20,38 %,0.38 %
Zambia,18383955,2.93 %,522925,25,743390,-8000.0,4.7,18,45 %,0.24 %


In [83]:
data.sort_index(ascending=False)  # Absteigend sortieren

Unnamed: 0_level_0,Population,YearlyChange,NetChange,Density,LandArea,Migrants,FertRate,MedAge,UrbanPopPct,WorldSharePct
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Zimbabwe,14862924,1.48 %,217456,38,386850,-116858.0,3.6,19,38 %,0.19 %
Zambia,18383955,2.93 %,522925,25,743390,-8000.0,4.7,18,45 %,0.24 %
Yemen,29825964,2.28 %,664042,56,527970,-30000.0,3.8,20,38 %,0.38 %
Western Sahara,597339,2.55 %,14876,2,266000,5582.0,2.4,28,87 %,0.01 %
Wallis & Futuna,11239,-1.69 %,-193,80,140,,N.A.,N.A.,0 %,0.00 %
...,...,...,...,...,...,...,...,...,...,...
Andorra,77265,0.16 %,123,164,470,,N.A.,N.A.,88 %,0.00 %
American Samoa,55191,-0.22 %,-121,276,200,,N.A.,N.A.,88 %,0.00 %
Algeria,43851044,1.85 %,797990,18,2381740,-10000.0,3.1,29,73 %,0.56 %
Albania,2877797,-0.11 %,-3120,105,27400,-14000.0,1.6,36,63 %,0.04 %


In [89]:
data.sort_values("LandArea", ascending=False)

Unnamed: 0_level_0,Population,YearlyChange,NetChange,Density,LandArea,Migrants,FertRate,MedAge,UrbanPopPct,WorldSharePct
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Russia,145934462,0.04 %,62206,9,16376870,182456.0,1.8,40,74 %,1.87 %
China,1439323776,0.39 %,5540090,153,9388211,-348399.0,1.7,38,61 %,18.47 %
United States,331002651,0.59 %,1937734,36,9147420,954806.0,1.8,38,83 %,4.25 %
Canada,37742154,0.89 %,331107,4,9093510,242032.0,1.5,41,81 %,0.48 %
Brazil,212559417,0.72 %,1509890,25,8358140,21200.0,1.7,33,88 %,2.73 %
...,...,...,...,...,...,...,...,...,...,...
Nauru,10824,0.63 %,68,541,20,,N.A.,N.A.,N.A.,0.00 %
Tokelau,1357,1.27 %,17,136,10,,N.A.,N.A.,0 %,0.00 %
Gibraltar,33691,-0.03 %,-10,3369,10,,N.A.,N.A.,N.A.,0.00 %
Monaco,39242,0.71 %,278,26337,1,,N.A.,N.A.,N.A.,0.00 %


In [90]:
data.sort_values(["LandArea", "Population"], ascending=False)

Unnamed: 0_level_0,Population,YearlyChange,NetChange,Density,LandArea,Migrants,FertRate,MedAge,UrbanPopPct,WorldSharePct
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Russia,145934462,0.04 %,62206,9,16376870,182456.0,1.8,40,74 %,1.87 %
China,1439323776,0.39 %,5540090,153,9388211,-348399.0,1.7,38,61 %,18.47 %
United States,331002651,0.59 %,1937734,36,9147420,954806.0,1.8,38,83 %,4.25 %
Canada,37742154,0.89 %,331107,4,9093510,242032.0,1.5,41,81 %,0.48 %
Brazil,212559417,0.72 %,1509890,25,8358140,21200.0,1.7,33,88 %,2.73 %
...,...,...,...,...,...,...,...,...,...,...
Nauru,10824,0.63 %,68,541,20,,N.A.,N.A.,N.A.,0.00 %
Gibraltar,33691,-0.03 %,-10,3369,10,,N.A.,N.A.,N.A.,0.00 %
Tokelau,1357,1.27 %,17,136,10,,N.A.,N.A.,0 %,0.00 %
Monaco,39242,0.71 %,278,26337,1,,N.A.,N.A.,N.A.,0.00 %


#### Auf Daten zugreifen

Drei Möglichkeiten:
- Index: Zugriff auf Zeilen/Spalten/Kombination
- loc: Mehrere Spalten
- iloc: Wie loc, aber nur mit Zahlen

In [100]:
data["Population"]  # Einzelne Spalte holen

Country
China               1439323776
India               1380004385
United States        331002651
Indonesia            273523615
Pakistan             220892340
                       ...    
Montserrat                4992
Falkland Islands          3480
Niue                      1626
Tokelau                   1357
Vatican State              801
Name: Population, Length: 235, dtype: int64

In [97]:
data["China":"Russia"]  # Mehrere Zeilen holen

Unnamed: 0_level_0,Population,YearlyChange,NetChange,Density,LandArea,Migrants,FertRate,MedAge,UrbanPopPct,WorldSharePct
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
China,1439323776,0.39 %,5540090,153,9388211,-348399.0,1.7,38,61 %,18.47 %
India,1380004385,0.99 %,13586631,464,2973190,-532687.0,2.2,28,35 %,17.70 %
United States,331002651,0.59 %,1937734,36,9147420,954806.0,1.8,38,83 %,4.25 %
Indonesia,273523615,1.07 %,2898047,151,1811570,-98955.0,2.3,30,56 %,3.51 %
Pakistan,220892340,2.00 %,4327022,287,770880,-233379.0,3.6,23,35 %,2.83 %
Brazil,212559417,0.72 %,1509890,25,8358140,21200.0,1.7,33,88 %,2.73 %
Nigeria,206139589,2.58 %,5175990,226,910770,-60000.0,5.4,18,52 %,2.64 %
Bangladesh,164689383,1.01 %,1643222,1265,130170,-369501.0,2.1,28,39 %,2.11 %
Russia,145934462,0.04 %,62206,9,16376870,182456.0,1.8,40,74 %,1.87 %


In [123]:
data["Population"]["China"]  # Einzelne Werte holen

np.int64(1439323776)

In [122]:
data["Population"]["China":"Russia"]  # Mehrere Werte aus einer Spalte

Country
China            1439323776
India            1380004385
United States     331002651
Indonesia         273523615
Pakistan          220892340
Brazil            212559417
Nigeria           206139589
Bangladesh        164689383
Russia            145934462
Name: Population, dtype: int64

In [127]:
data.loc["China":"Russia", "Population":"Density"]

Unnamed: 0_level_0,Population,YearlyChange,NetChange,Density
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
China,1439323776,0.39 %,5540090,153
India,1380004385,0.99 %,13586631,464
United States,331002651,0.59 %,1937734,36
Indonesia,273523615,1.07 %,2898047,151
Pakistan,220892340,2.00 %,4327022,287
Brazil,212559417,0.72 %,1509890,25
Nigeria,206139589,2.58 %,5175990,226
Bangladesh,164689383,1.01 %,1643222,1265
Russia,145934462,0.04 %,62206,9


In [130]:
data.loc["China":"Russia", ("Population", "Density")]  # Einzelne Spalten selektieren

Unnamed: 0_level_0,Population,Density
Country,Unnamed: 1_level_1,Unnamed: 2_level_1
China,1439323776,153
India,1380004385,464
United States,331002651,36
Indonesia,273523615,151
Pakistan,220892340,287
Brazil,212559417,25
Nigeria,206139589,226
Bangladesh,164689383,1265
Russia,145934462,9


In [135]:
data.loc[("China", "Russia"), :]  # Einzelne Zeilen selektieren

Unnamed: 0_level_0,Population,YearlyChange,NetChange,Density,LandArea,Migrants,FertRate,MedAge,UrbanPopPct,WorldSharePct
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
China,1439323776,0.39 %,5540090,153,9388211,-348399.0,1.7,38,61 %,18.47 %
Russia,145934462,0.04 %,62206,9,16376870,182456.0,1.8,40,74 %,1.87 %


In [142]:
data.iloc[10:20, 0:3]

Unnamed: 0_level_0,Population,YearlyChange,NetChange
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Japan,126476461,-0.30 %,-383840
Ethiopia,114963588,2.57 %,2884858
Philippines,109581078,1.35 %,1464463
Egypt,102334404,1.94 %,1946331
Vietnam,97338579,0.91 %,876473
DR Congo,89561403,3.19 %,2770836
Turkey,84339067,1.09 %,909452
Iran,83992949,1.30 %,1079043
Germany,83783942,0.32 %,266897
Thailand,69799978,0.25 %,174396


#### Daten filtern

Funktioniert wie bei Numpy (Vektorisierung)

In [145]:
data["Population"] > 50_000_000

Country
China                True
India                True
United States        True
Indonesia            True
Pakistan             True
                    ...  
Montserrat          False
Falkland Islands    False
Niue                False
Tokelau             False
Vatican State       False
Name: Population, Length: 235, dtype: bool

In [146]:
data[data["Population"] > 50_000_000]

Unnamed: 0_level_0,Population,YearlyChange,NetChange,Density,LandArea,Migrants,FertRate,MedAge,UrbanPopPct,WorldSharePct
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
China,1439323776,0.39 %,5540090,153,9388211,-348399.0,1.7,38,61 %,18.47 %
India,1380004385,0.99 %,13586631,464,2973190,-532687.0,2.2,28,35 %,17.70 %
United States,331002651,0.59 %,1937734,36,9147420,954806.0,1.8,38,83 %,4.25 %
Indonesia,273523615,1.07 %,2898047,151,1811570,-98955.0,2.3,30,56 %,3.51 %
Pakistan,220892340,2.00 %,4327022,287,770880,-233379.0,3.6,23,35 %,2.83 %
Brazil,212559417,0.72 %,1509890,25,8358140,21200.0,1.7,33,88 %,2.73 %
Nigeria,206139589,2.58 %,5175990,226,910770,-60000.0,5.4,18,52 %,2.64 %
Bangladesh,164689383,1.01 %,1643222,1265,130170,-369501.0,2.1,28,39 %,2.11 %
Russia,145934462,0.04 %,62206,9,16376870,182456.0,1.8,40,74 %,1.87 %
Mexico,128932753,1.06 %,1357224,66,1943950,-60000.0,2.1,29,84 %,1.65 %


#### Gruppierung

groupby(Spalte)

In [149]:
g = data.groupby("MedAge")

In [151]:
g.get_group("20")

Unnamed: 0_level_0,Population,YearlyChange,NetChange,Density,LandArea,Migrants,FertRate,MedAge,UrbanPopPct,WorldSharePct
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Kenya,53771296,2.28 %,1197323,94,569140,-10000.0,3.5,20,28 %,0.69 %
Sudan,43849260,2.42 %,1036022,25,1765048,-50000.0,4.4,20,35 %,0.56 %
Yemen,29825964,2.28 %,664042,56,527970,-30000.0,3.8,20,38 %,0.38 %
Madagascar,27691018,2.68 %,721711,48,581795,-1500.0,4.1,20,39 %,0.36 %
Rwanda,12952218,2.58 %,325268,525,24670,-9000.0,4.1,20,18 %,0.17 %
Mauritania,4649658,2.74 %,123962,5,1030700,5000.0,4.6,20,57 %,0.06 %
Comoros,869601,2.20 %,18715,467,1861,-2000.0,4.2,20,29 %,0.01 %
Solomon Islands,686884,2.55 %,17061,25,27990,-1600.0,4.4,20,23 %,0.01 %
Mayotte,272815,2.50 %,6665,728,375,0.0,3.7,20,46 %,0.00 %


In [153]:
g["Population"].count()

MedAge
15       1
16       1
17       6
18      10
19      14
20       9
21       5
22       7
23       4
24       6
25       4
26       7
27       4
28      12
29       5
30       8
31       6
32      11
33       5
34       6
35       3
36       4
37       4
38       7
39       3
40       7
41       5
42      10
43      11
44       5
45       5
46       3
47       2
48       1
N.A.    34
Name: Population, dtype: int64

In [159]:
g["Population"].mean().astype(np.int32).sort_values()

MedAge
N.A.        33331
39        1120258
37        1772735
27        2088666
36        2161581
34        3726593
24        7213284
43        7631298
22        7686515
21        9621932
44       12104325
45       13200286
35       15952753
42       17172367
19       18241182
31       18478003
20       19396523
41       19428317
16       20250833
15       24206644
26       25107858
47       30418545
32       32849571
25       33041880
46       34801235
17       35396425
18       41485399
40       42772052
29       46930304
33       47894699
30       48946550
23       60288272
48      126476461
28      139879041
38      259087065
Name: Population, dtype: int32