# Pandas - datov√© typy a manipulace se sloupci

V minul√© lekci jsme si p≈ôedstavili knihovnu pandas a jej√≠ z√°kladn√≠ t≈ô√≠dy: `Series`, `DataFrame` a `Index`. Brali jsme je ov≈°em jako statick√© objekty, na kter√© jsme se pouze d√≠vali.

V t√©to lekci zaƒçneme upravovat existuj√≠c√≠ tabulky. Uk√°≈æeme si:

* jak p≈ôidat ƒçi ubrat sloupce a ≈ô√°dky
* jak zmƒõnit hodnotu konkr√©tn√≠ bu≈àky
* jak√© datov√© typy se hod√≠ pro kter√Ω √∫ƒçel
* aritmetick√© a dal≈°√≠ operace, kter√© lze se sloupci prov√°dƒõt
* filtrov√°n√≠ a ≈ôazen√≠ ≈ô√°dk≈Ø

A jeliko≈æ o v√Ωsledky pr√°ce urƒçitƒõ nechce≈° p≈ôij√≠t, p≈ôijde nakonec vhod i ukl√°d√°n√≠ v√Ωsledk≈Ø do extern√≠ch soubor≈Ø.

In [1]:
import pandas as pd

## Manipulace s DataFrames

Pro rozeh≈ô√°t√≠ budeme pracovat s malou tabulkou obsahuj√≠c√≠ nƒõkolik z√°kladn√≠ch informac√≠ o planet√°ch, kter√© snadno najde≈° nap≈ô. na [wikipedii](https://en.wikipedia.org/wiki/Planet).

In [2]:
planety = pd.DataFrame({
    "jmeno": ["Merkur", "Venu≈°e", "Zemƒõ", "Mars", "Jupiter", "Saturn", "Uran", "Neptun"],
    "symbol": ["‚òø", "‚ôÄ", "‚äï", "‚ôÇ", "‚ôÉ", "‚ôÑ", "‚ôÖ", "‚ôÜ"],
    "obezna_poloosa": [0.39, 0.72, 1.00, 1.52, 5.20, 9.54, 19.22, 30.06],
    "obezna_doba": [0.24, 0.62, 1, 1.88, 11.86, 29.46, 84.01, 164.8]
})
planety = planety.set_index("jmeno")    # S jmenn√Ωm indexem se ti bude sn√°ze pracovat
planety

Unnamed: 0_level_0,symbol,obezna_poloosa,obezna_doba
jmeno,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Merkur,‚òø,0.39,0.24
Venu≈°e,‚ôÄ,0.72,0.62
Zemƒõ,‚äï,1.0,1.0
Mars,‚ôÇ,1.52,1.88
Jupiter,‚ôÉ,5.2,11.86
Saturn,‚ôÑ,9.54,29.46
Uran,‚ôÖ,19.22,84.01
Neptun,‚ôÜ,30.06,164.8


### P≈ôid√°n√≠ nov√©ho sloupce

Kdy≈æ chceme p≈ôidat nov√Ω sloupec (`Series`), p≈ôi≈ôad√≠me ho do `DataFrame` jako hodnotu do slovn√≠ku - tedy v hranat√Ωch z√°vork√°ch s n√°zvem sloupce. Dobr√° zpr√°va je, ≈æe stejnƒõ jako v konstruktoru, `pandas` si "porad√≠" jak se `Series`, tak s obyƒçejn√Ωm seznamem.

V na≈°em konkr√©tn√≠m p≈ô√≠padƒõ si najdeme a p≈ôid√°me poƒçet zn√°m√Ωch mƒõs√≠c≈Ø (velk√Ωch i mal√Ωch).

In [3]:
mesice = [0, 0, 1, 2, 79, 82, 27, 14]      # Alternativnƒõ mesice = pd.Series([...])
planety["mesice"] = mesice
planety

Unnamed: 0_level_0,symbol,obezna_poloosa,obezna_doba,mesice
jmeno,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Merkur,‚òø,0.39,0.24,0
Venu≈°e,‚ôÄ,0.72,0.62,0
Zemƒõ,‚äï,1.0,1.0,1
Mars,‚ôÇ,1.52,1.88,2
Jupiter,‚ôÉ,5.2,11.86,79
Saturn,‚ôÑ,9.54,29.46,82
Uran,‚ôÖ,19.22,84.01,27
Neptun,‚ôÜ,30.06,164.8,14


üí° V tomto p≈ô√≠padƒõ jsme p≈ô√≠mo upravili existuj√≠c√≠ `DataFrame`. Vƒõt≈°ina metod / operac√≠ (u≈æ zn√°≈° nap≈ô. `set_index`) ve v√Ωchoz√≠m nastaven√≠ v≈ædy vrac√≠ nov√Ω objekt - je to dobr√Ωm zvykem, kter√Ω budeme dodr≈æovat. P≈ôi≈ôazov√°n√≠ sloupc≈Ø je jednou z v√Ωjimek tohoto jinak uzn√°van√©ho pravidla (tou druhou je pohodlnost).

<div style="background-color: yellow; color: red"><b>TODO</b>: 
   Jak to p√≠≈°u, tak mi to zase tak samoz≈ôejm√© nep≈ôijde. Nƒõjak bych tohle chtƒõl zformulovat l√≠p.</div>
   
`DataFrame` nab√≠z√≠ je≈°tƒõ metodu `assign`, kter√° nemƒõn√≠ tabulku, ale vytv√°≈ô√≠ jej√≠ kopii s p≈ôidan√Ωmi (nebo nahrazen√Ωmi) sloupci:

In [4]:
# Nov√Ω doƒçasn√Ω DataFrame
planety.assign(
    je_stavebnice=[True, False, False, False, False, False, False, False],
    ma_vztah_k_vestonicim=[False, True, False, False, False, False, False, False],
)

# Objekt `planety` z≈Østal nezmƒõnƒõn.

Unnamed: 0_level_0,symbol,obezna_poloosa,obezna_doba,mesice,je_stavebnice,ma_vztah_k_vestonicim
jmeno,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Merkur,‚òø,0.39,0.24,0,True,False
Venu≈°e,‚ôÄ,0.72,0.62,0,False,True
Zemƒõ,‚äï,1.0,1.0,1,False,False
Mars,‚ôÇ,1.52,1.88,2,False,False
Jupiter,‚ôÉ,5.2,11.86,79,False,False
Saturn,‚ôÑ,9.54,29.46,82,False,False
Uran,‚ôÖ,19.22,84.01,27,False,False
Neptun,‚ôÜ,30.06,164.8,14,False,False


**√ökol**: Zkus (jedn√≠m ƒçi druh√Ωm zp≈Øsobem) p≈ôidat sloupec s rokem objevu (`"objeveno"`). √ödaje najde≈° nap≈ô. zde: https://cs.wikipedia.org/wiki/Slune%C4%8Dn%C3%AD_soustava.

Nen√≠ to zase tak ƒçasto praktick√©, ale pro hodnoty nov√©ho sloupce lze pou≈æ√≠t i jednu skal√°rn√≠ hodnotu:

In [5]:
planety["je_planeta"] = True     # Platilo do roku 2006

### P≈ôid√°n√≠ nov√©ho ≈ô√°dku

Kdy≈æ se strojem ƒçasu vr√°t√≠me do dƒõtstv√≠ (nebo ran√© dospƒõlosti) autor≈Ø tƒõchto materi√°l≈Ø, tedy p≈ôed rok 2006, kdy se v Praze konal astronomick√Ω kongres, kter√Ω definoval pojem "planeta" (ale ne p≈ôed rok 1930!), p≈ôibude n√°m nov√° planeta: Pluto.

Do na≈°√≠ tabulky ho vlo≈æ√≠me pomoc√≠ indexeru `loc`, kter√Ω jsme ji≈æ d≈ô√≠ve pou≈æ√≠vali pro "kouk√°n√≠" do tabulky:

In [6]:
planety.loc["Pluto"] = ["‚ôá", 39.48, 247.94, 5, True]   # Seznam hodnot v ≈ô√°dku
planety

Unnamed: 0_level_0,symbol,obezna_poloosa,obezna_doba,mesice,je_planeta
jmeno,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Merkur,‚òø,0.39,0.24,0,True
Venu≈°e,‚ôÄ,0.72,0.62,0,True
Zemƒõ,‚äï,1.0,1.0,1,True
Mars,‚ôÇ,1.52,1.88,2,True
Jupiter,‚ôÉ,5.2,11.86,79,True
Saturn,‚ôÑ,9.54,29.46,82,True
Uran,‚ôÖ,19.22,84.01,27,True
Neptun,‚ôÜ,30.06,164.8,14,True
Pluto,‚ôá,39.48,247.94,5,True


### Zmƒõna hodnoty bu≈àky

"Indexery" `.loc` a `.iloc` se dvƒõma argumenty v hranat√Ωch z√°vork√°ch odkazuj√≠ p≈ô√≠mo na konkr√©tn√≠ bu≈àku, a p≈ôi≈ôazen√≠m do nich (opƒõt, podobnƒõ jako ve slovn√≠ku) se hodnota na p≈ô√≠slu≈°n√© m√≠sto zap√≠≈°e. Jen je t≈ôeba zachovat po≈ôad√≠ (≈ô√°dek, sloupec). 

Vr√°t√≠me se opƒõt do souƒçasnosti a Pluto zbav√≠me jeho privilegi√≠:

In [7]:
planety.loc["Pluto", "je_planeta"] = False
planety

Unnamed: 0_level_0,symbol,obezna_poloosa,obezna_doba,mesice,je_planeta
jmeno,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Merkur,‚òø,0.39,0.24,0,True
Venu≈°e,‚ôÄ,0.72,0.62,0,True
Zemƒõ,‚äï,1.0,1.0,1,True
Mars,‚ôÇ,1.52,1.88,2,True
Jupiter,‚ôÉ,5.2,11.86,79,True
Saturn,‚ôÑ,9.54,29.46,82,True
Uran,‚ôÖ,19.22,84.01,27,True
Neptun,‚ôÜ,30.06,164.8,14,True
Pluto,‚ôá,39.48,247.94,5,False


**‚ö† Pozor:** Podobnƒõ jako ve slovn√≠ku, ale mo≈æn√° ponƒõkud neintuitivnƒõ, je mo≈æn√© zapsat hodnotu do ≈ô√°dku i sloupce, kter√© neexistuj√≠!

In [8]:
planety_bad = planety.copy()     # Pro jistotu si udƒõl√°me kopii

planety_bad.loc["Zeme", "planeta"] = True
planety_bad

Unnamed: 0_level_0,symbol,obezna_poloosa,obezna_doba,mesice,je_planeta,planeta
jmeno,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Merkur,‚òø,0.39,0.24,0.0,True,
Venu≈°e,‚ôÄ,0.72,0.62,0.0,True,
Zemƒõ,‚äï,1.0,1.0,1.0,True,
Mars,‚ôÇ,1.52,1.88,2.0,True,
Jupiter,‚ôÉ,5.2,11.86,79.0,True,
Saturn,‚ôÑ,9.54,29.46,82.0,True,
Uran,‚ôÖ,19.22,84.01,27.0,True,
Neptun,‚ôÜ,30.06,164.8,14.0,True,
Pluto,‚ôá,39.48,247.94,5.0,False,
Zeme,,,,,,True


P≈ôi≈ôazovat je mo≈æn√© i do rozsah≈Ø v indexech - jen je pot≈ôeba hl√≠dat, aby p≈ôi≈ôazovan√° hodnota ƒçi hodnoty byly buƒè skal√°rem, nebo mƒõly stejn√Ω tvar jako oblast, do kter√© p≈ôi≈ôazujeme:

In [9]:
planety.loc["Merkur":"Mars", "je_obr"] = False
planety.loc["Jupiter":"Neptun", "je_obr"] = [True, True, True, True]
planety

Unnamed: 0_level_0,symbol,obezna_poloosa,obezna_doba,mesice,je_planeta,je_obr
jmeno,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Merkur,‚òø,0.39,0.24,0,True,False
Venu≈°e,‚ôÄ,0.72,0.62,0,True,False
Zemƒõ,‚äï,1.0,1.0,1,True,False
Mars,‚ôÇ,1.52,1.88,2,True,False
Jupiter,‚ôÉ,5.2,11.86,79,True,True
Saturn,‚ôÑ,9.54,29.46,82,True,True
Uran,‚ôÖ,19.22,84.01,27,True,True
Neptun,‚ôÜ,30.06,164.8,14,True,True
Pluto,‚ôá,39.48,247.94,5,False,




**√ökol:** Shodou okolnost√≠ (nebo jde o astronomickou nevyhnutelnost?) maj√≠ v≈°ichni planet√°rn√≠ ob≈ôi alespo≈à nƒõjak√Ω prstenec. Dok√°≈æe≈° jednodu≈°e vytvo≈ôit sloupec `"ma_prstenec"`?

### Odstranƒõn√≠ sloupce

Pro odebr√°n√≠ sloupce ƒçi ≈ô√°dku z `DataFrame` slou≈æ√≠ metoda `drop`. Jej√≠ prvn√≠ argument oƒçek√°v√° oznaƒçen√≠ (index) jednoho nebo v√≠ce ≈ô√°dk≈Ø ƒçi sloupc≈Ø, kter√© chce≈° odebrat. Argument `axis` oznaƒçuje, ve kter√© dimenzi se operace m√° aplikovat (0 ƒçi 1). ƒå√≠slo je intuitivn√≠ a odpov√≠d√° po≈ôad√≠, ve kter√©m se uv√°dƒõj√≠ kl√≠ƒçe p≈ôi odkazov√°n√≠ na bu≈àky.

Osa (`axis`):
- 0 = ≈ô√°dky
- 1 = sloupce

(Tento argument pou≈æ√≠vaj√≠ i ƒçetn√© dal≈°√≠ metody a funkce, proto se ujisti, ≈æe mu rozum√≠≈°).

In [10]:
# Odstran√≠me zbyteƒçn√Ω sloupec s informaƒçn√≠ hodnotou na √∫rovni "stƒõraƒçe st√≠raj√≠, klakson troub√≠"
planety = planety.drop("je_planeta", axis=1)   
planety

Unnamed: 0_level_0,symbol,obezna_poloosa,obezna_doba,mesice,je_obr
jmeno,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Merkur,‚òø,0.39,0.24,0,False
Venu≈°e,‚ôÄ,0.72,0.62,0,False
Zemƒõ,‚äï,1.0,1.0,1,False
Mars,‚ôÇ,1.52,1.88,2,False
Jupiter,‚ôÉ,5.2,11.86,79,True
Saturn,‚ôÑ,9.54,29.46,82,True
Uran,‚ôÖ,19.22,84.01,27,True
Neptun,‚ôÜ,30.06,164.8,14,True
Pluto,‚ôá,39.48,247.94,5,


<span style="color: red; left: 50%; top: 0.5em; font-weight: bold; position: absolute; opacity: 0.3; width: 0px; height: 0px; font-size: 6em">‚õß</span> Metoda `drop`, v souladu s v√Ω≈°e zm√≠nƒõnou konvenc√≠, vrac√≠ nov√Ω `DataFrame` (a proto v√Ωsledek operace mus√≠me p≈ôi≈ôadit do `planety`). Pokud chce≈° operovat rovnou na tabulce, m≈Ø≈æe≈° pou≈æ√≠t p≈ô√≠kaz `del` (funguje stejnƒõ jako u slovn√≠ku) nebo poprosit pand√≠ bohy (a autory tƒõchto materi√°l≈Ø) o odpu≈°tƒõn√≠ a p≈ôidat argument `inplace=True`:

In [11]:
# Alternativa 1)
# del planety["je_planeta"]

# Alternativa 2)
# planety.drop("je_planeta", axis=1, inplace=True)

### Odstranƒõn√≠ ≈ô√°dku

Vr√°t√≠me se zp√°tky do budoucnosti (resp. souƒçasnosti) a vypo≈ô√°d√°me se nemilosrdnƒõ s Plutem.

Opƒõt pou≈æijeme metodu `drop` se spr√°vnou hodnotou argument `axis`, tedy 0. Na≈°tƒõst√≠ pro n√°s, tato hodnota je v√Ωchoz√≠, a tak m≈Ø≈æeme argument √∫plnƒõ vynechat:

In [12]:
planety = planety.drop("Pluto")   # P≈ôidej axis=0, chce≈°-li b√Ωt explicitn√≠
planety

Unnamed: 0_level_0,symbol,obezna_poloosa,obezna_doba,mesice,je_obr
jmeno,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Merkur,‚òø,0.39,0.24,0,False
Venu≈°e,‚ôÄ,0.72,0.62,0,False
Zemƒõ,‚äï,1.0,1.0,1,False
Mars,‚ôÇ,1.52,1.88,2,False
Jupiter,‚ôÉ,5.2,11.86,79,True
Saturn,‚ôÑ,9.54,29.46,82,True
Uran,‚ôÖ,19.22,84.01,27,True
Neptun,‚ôÜ,30.06,164.8,14,True


## Datov√© typy

Jak u≈æ jsme p≈ôedeslali, datov√© typy v pandas se trochu li≈°√≠ od typ≈Ø v Pythonu, ale na≈°tƒõst√≠ konverze mezi nimi je vƒõt≈°inou automatick√° a "chovaj√≠c√≠ se dle oƒçek√°v√°n√≠".

#### P≈ô√≠prava dat

V datov√©m kurzu budeme vyu≈æ√≠vat r≈Øzn√Ωch datov√Ωch sad (obvykle vƒõt≈°√≠ch - takov√Ωch, kde nen√≠ praktick√© je cel√© zapsat v konstruktoru). Nyn√≠ opust√≠me planety a pod√≠v√°me se na nƒõkter√© zaj√≠mav√© charakteristiky zem√≠ kolem svƒõta (je≈æto definice toho, co je to zemƒõ, je ponƒõkud v√°gn√≠, bereme v potaz ƒçleny OSN), zachycen√© k jednomu konkr√©tn√≠mu roku uplynul√© dek√°dy (proto≈æe ne v≈ædy jsou v≈°echny √∫daje k dispozici, bereme posledn√≠ rok, kde je zn√°mo dost ukazatel≈Ø). Data poch√°zej√≠ povƒõt≈°inou z projektu [Gapminder](https://www.gapminder.org/), doplnili jsme je jen o nƒõkolik dal≈°√≠ch informac√≠ z wikipedie.

<div style="background-color: yellow; color: red">TODO: Upravit URL podle toho, kde nakonec data budou.</div>

In [13]:
url = "https://raw.githubusercontent.com/janpipek/data-pro-pyladies/master/data/countries.csv"
countries = pd.read_csv(url, index_col="name")   # M√≠sto `set_index`
countries

Unnamed: 0_level_0,iso,world_6region,world_4region,income_groups,is_eu,is_oecd,eu_accession,year,area,population,alcohol_adults,bmi_men,bmi_women,car_deaths_per_100000_people,calories_per_day,infant_mortality,life_expectancy,life_expectancy_female,life_expectancy_male,un_accession
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
Afghanistan,AFG,south_asia,asia,low_income,False,False,,2018,652860.0,34500000.0,0.03,20.62,21.07,,2090.0,66.3,58.69,65.812,63.101,1946-11-19
Albania,ALB,europe_central_asia,europe,upper_middle_income,False,False,,2018,28750.0,3238000.0,7.29,26.45,25.66,5.978,3193.0,12.5,78.01,80.737,76.693,1955-12-14
Algeria,DZA,middle_east_north_africa,africa,upper_middle_income,False,False,,2018,2381740.0,36980000.0,0.69,24.60,26.37,,3296.0,21.9,77.86,77.784,75.279,1962-10-08
Andorra,AND,europe_central_asia,europe,high_income,False,False,,2017,470.0,88910.0,10.17,27.63,26.43,,,2.1,82.55,,,1993-07-28
Angola,AGO,sub_saharan_africa,africa,upper_middle_income,False,False,,2018,1246700.0,20710000.0,5.57,22.25,23.48,,2473.0,96.0,65.19,64.939,59.213,1976-12-01
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Venezuela,VEN,america,americas,upper_middle_income,False,False,,2018,912050.0,30340000.0,7.60,27.45,28.13,7.332,2631.0,12.9,75.91,79.079,70.950,1945-11-15
Vietnam,VNM,east_asia_pacific,asia,lower_middle_income,False,False,,2018,330967.0,90660000.0,3.91,20.92,21.07,,2745.0,17.3,74.88,81.203,72.003,1977-09-20
Yemen,YEM,middle_east_north_africa,asia,lower_middle_income,False,False,,2018,527970.0,26360000.0,0.20,24.44,26.11,,2223.0,33.8,67.14,66.871,63.875,1947-09-30
Zambia,ZMB,sub_saharan_africa,africa,lower_middle_income,False,False,,2018,752610.0,14310000.0,3.56,20.68,23.05,11.260,1930.0,43.3,59.45,65.362,59.845,1964-12-01


Nam√°tkou si vybereme nƒõjakou n√°hodnou\* zemi a pod√≠v√°me se, jak√© √∫daje o n√≠ v tabulce m√°me.

\**Trochu si ≈°tƒõst√≠ƒçko p≈ôiohneme, ale uznej, ≈æe 42 je ƒç√≠slo stejnƒõ dobr√© nebo lep≈°√≠ ne≈æ kter√©koliv jin√©.*

In [16]:
import numpy; numpy.random.seed(42)  # "Usmƒõrnƒõn√≠" n√°hody

row = countries.sample(1).iloc[0]
row

# Alternativa pro m√©nƒõ d≈Øvƒõ≈ôiv√©:
#  countries.loc["Czechia"]

iso                                             PHL
world_6region                     east_asia_pacific
world_4region                                  asia
income_groups                   lower_middle_income
is_eu                                         False
is_oecd                                       False
eu_accession                                    NaN
year                                           2018
area                                         300000
population                                9.811e+07
alcohol_adults                                 6.08
bmi_men                                       22.87
bmi_women                                     23.47
car_deaths_per_100000_people                  2.507
calories_per_day                               2570
infant_mortality                               22.2
life_expectancy                               70.55
life_expectancy_female                       72.973
life_expectancy_male                         66.068
un_accession

U≈æ na prvn√≠ pohled je ka≈æd√© pole jin√©ho typu. Ale jak√©ho? Na to n√°m odpov√≠ vlastnost `dtypes` na≈°√≠ tabulky (Pamatuj: U `Series` jsme pou≈æili `dtype`).

In [None]:
countries.dtypes

Kdy≈æ pandas naƒç√≠t√° z tabulek, sna≈æ√≠ se automaticky rozpoznat ƒç√≠sla (vƒçetnƒõ druhu) a logick√© hodnoty. U v≈°ech zbyl√Ωch sloupc≈Ø to nech√°v√° na tobƒõ a ponƒõkud def√©tisticky je pova≈æuje za "objekty". Nicm√©nƒõ to nejsou v≈°echny (ani v≈°echny bƒõ≈æn√©) typy. Nav√≠c z Pythonu sice zn√°≈° `float` a `int`, ale proƒç je souƒç√°st√≠ n√°zvu i ƒç√≠slo `64`? Pojƒème na to tedy od lesa.

## Typy sloupc≈Ø



**celoƒç√≠seln√© typy**

**ƒç√≠sla s plovouc√≠ desetinnou ƒç√°rkou**

**objekty**

**kategorick√©**

**datum / ƒças**

**logick√©** 

In [None]:
countries.dtypes

In [None]:
countries.world_6region.astype("category")

In [None]:
countries["population"] = countries["population"].astype("int")

## Matematika

In [None]:
countries["population"] / countries["area"]

In [None]:
countries["population"].sum(), countries["area"].sum()

In [None]:
from datetime import datetime
datetime.now() - pd.to_datetime(countries["eu_accession"]).dropna()

## Filtrov√°n√≠

Zat√≠m jsme 

In [None]:
countries["is_eu"].value_counts()

In [None]:
countries[countries["is_eu"]]

In [None]:
countries.query("is_oecd")

## ≈òazen√≠

V √∫vodn√≠ lekci `pandas` jsme si ji≈æ uk√°zali, jak pomoc√≠ metody `sort_index` se≈ôadit ≈ô√°dky podle indexu.

In [None]:
countries["population"].sort_values()

In [None]:
countries["area"].sort_values(ascending=False)

In [None]:
countries.sort_values("alcohol_adults", ascending=False).head(10)

In [None]:
countries[countries["is_eu"]].sort_values(["eu_accession", "population"])

In [None]:
countries.assign(density=countries["population"] / countries["area"]).sort_values("density", ascending=False)[["population", "area", "density"]]

In [None]:
countries.sort_index(axis=1)

**√ökol:** Kter√© zemƒõ maj√≠ probl√©my s nadv√°hou (pr≈Ømƒõrn√© BMI mu≈æ≈Ø a ≈æen je p≈ôes 25)?

In [None]:
bmi = (countries["bmi_men"] + countries["bmi_women"]) / 2
bmi[bmi > 25].sort_values(ascending=False)

**√ökol:** V kter√Ωch 20 zem√≠ch um≈ôe na svƒõtƒõ nejv√≠c lid√≠ p≈ôi automobilov√Ωch hav√°ri√≠ch?

In [None]:
(countries["population"] * countries["car_deaths_per_100000_people"] / 100000).dropna().astype("int").sort_values(ascending=False).head(20)