# Uvod v Pandas

Spodaj je pregled najosnovnejših metod, ki jih ponuja knjižnica Pandas. Vsaka od naštetih metod ponuja še cel kup dodatnih možnosti, ki so natančno opisane v [uradni dokumentaciji](http://pandas.pydata.org/pandas-docs/stable/). Z branjem dokumentacije se vam seveda najbolj splača začeti pri [uvodih](http://pandas.pydata.org/pandas-docs/stable/tutorials.html).

### Predpriprava

In [1]:
# naložimo paket
import pandas as pd

# naložimo razpredelnico, s katero bomo delali
filmi = pd.read_csv('filmi.csv', index_col='id')

# ker bomo delali z velikimi razpredelnicami, povemo, da naj se vedno izpiše le 20 vrstic
pd.options.display.max_rows = 20

### Osnovni izbori elementov razpredelnic

Z metodo `.head(n=5)` pogledamo prvih `n`, z metodo `.tail(n=5)` pa zadnjih `n` vrstic razpredelnice.

In [2]:
filmi.head(10)

Unnamed: 0_level_0,naslov,leto,reziser,dolzina,ocena,opis
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
111161,The Shawshank Redemption,1994,Frank Darabont,142,9.3,Two imprisoned men bond over a number of years...
468569,The Dark Knight,2008,Christopher Nolan,152,9.0,When the menace known as the Joker wreaks havo...
1375666,Inception,2010,Christopher Nolan,148,8.8,"A thief, who steals corporate secrets through ..."
137523,Fight Club,1999,David Fincher,139,8.8,"An insomniac office worker, looking for a way ..."
110912,Pulp Fiction,1994,Quentin Tarantino,154,8.9,"The lives of two mob hit men, a boxer, a gangs..."
109830,Forrest Gump,1994,Robert Zemeckis,142,8.8,"Forrest Gump, while not intelligent, has accid..."
120737,The Lord of the Rings: The Fellowship of the Ring,2001,Peter Jackson,178,8.8,A meek Hobbit from the Shire and eight compani...
133093,The Matrix,1999,Lana Wachowski,136,8.7,A computer hacker learns from mysterious rebel...
167260,The Lord of the Rings: The Return of the King,2003,Peter Jackson,201,8.9,Gandalf and Aragorn lead the World of Men agai...
68646,The Godfather,1972,Francis Ford Coppola,175,9.2,The aging patriarch of an organized crime dyna...


In [3]:
filmi.tail()

Unnamed: 0_level_0,naslov,leto,reziser,dolzina,ocena,opis
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
190865,Vertical Limit,2000,Martin Campbell,124,5.9,"A climber must rescue his sister on top of K2,..."
297284,Mindhunters,2004,Renny Harlin,106,6.4,Trainees in the FBI's psychological profiling ...
2390361,Enough Said,2013,Nicole Holofcener,93,7.1,A divorced woman who decides to pursue the man...
267626,K-19: The Widowmaker,2002,Kathryn Bigelow,138,6.7,When Russia's first nuclear submarine malfunct...
53779,La Dolce Vita,1960,Federico Fellini,174,8.1,A series of stories following a week in the li...


Z indeksiranjem razpredelnice dostopamo do posameznih stolpcev. Če želimo več stolpcev, moramo za indeks podati seznam vseh oznak. Z rezinami pa dostopamo do izbranih vrstic.

In [4]:
filmi['naslov']

id
111161                              The Shawshank Redemption
468569                                       The Dark Knight
1375666                                            Inception
137523                                            Fight Club
110912                                          Pulp Fiction
109830                                          Forrest Gump
120737     The Lord of the Rings: The Fellowship of the Ring
133093                                            The Matrix
167260         The Lord of the Rings: The Return of the King
68646                                          The Godfather
                                 ...                        
327162                                    The Stepford Wives
317303                                        Daddy Day Care
61811                               In the Heat of the Night
91419                                 Little Shop of Horrors
1583420                                         Larry Crowne
190865               

In [5]:
filmi[['naslov', 'ocena']]

Unnamed: 0_level_0,naslov,ocena
id,Unnamed: 1_level_1,Unnamed: 2_level_1
111161,The Shawshank Redemption,9.3
468569,The Dark Knight,9.0
1375666,Inception,8.8
137523,Fight Club,8.8
110912,Pulp Fiction,8.9
109830,Forrest Gump,8.8
120737,The Lord of the Rings: The Fellowship of the Ring,8.8
133093,The Matrix,8.7
167260,The Lord of the Rings: The Return of the King,8.9
68646,The Godfather,9.2


In [6]:
filmi[120:125]

Unnamed: 0_level_0,naslov,leto,reziser,dolzina,ocena,opis
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
892769,How to Train Your Dragon,2010,Dean DeBlois,98,8.2,A hapless young Viking who aspires to hunt dra...
93058,Full Metal Jacket,1987,Stanley Kubrick,116,8.3,A pragmatic U.S. Marine observes the dehumaniz...
1170358,The Hobbit: The Desolation of Smaug,2013,Peter Jackson,161,7.9,"The dwarves, along with Bilbo Baggins and Gand..."
405159,Million Dollar Baby,2004,Clint Eastwood,132,8.1,A determined woman works with a hardened boxin...
936501,Ugrabljena,2008,Pierre Morel,93,7.8,A retired CIA agent travels across Europe and ...


Do vrednosti z indeksom `i` dostopamo z `.iloc[i]`, do tiste s ključem `k` pa z `.loc[k]`.

In [7]:
filmi.iloc[120]

naslov                              How to Train Your Dragon
leto                                                    2010
reziser                                         Dean DeBlois
dolzina                                                   98
ocena                                                    8.2
opis       A hapless young Viking who aspires to hunt dra...
Name: 892769, dtype: object

In [8]:
filmi.loc[379786]

naslov                                              Serenity
leto                                                    2005
reziser                                          Joss Whedon
dolzina                                                  119
ocena                                                      8
opis       The crew of the ship Serenity try to evade an ...
Name: 379786, dtype: object

### Filtriranje in urejanje

Izbor določenih vrstic razpredelnice naredimo tako, da za indeks podamo stolpec logičnih vrednosti, ki ga dobimo z običajnimi operacijami. V vrnjeni razpredelnici bodo ostale vrstice, pri katerih je v stolpcu vrednost `True`.

In [9]:
filmi.ocena >= 8

id
111161      True
468569      True
1375666     True
137523      True
110912      True
109830      True
120737      True
133093      True
167260      True
68646       True
           ...  
327162     False
317303     False
61811       True
91419      False
1583420    False
190865     False
297284     False
2390361    False
267626     False
53779       True
Name: ocena, dtype: bool

In [10]:
filmi[filmi.ocena >= 8]

Unnamed: 0_level_0,naslov,leto,reziser,dolzina,ocena,opis
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
111161,The Shawshank Redemption,1994,Frank Darabont,142,9.3,Two imprisoned men bond over a number of years...
468569,The Dark Knight,2008,Christopher Nolan,152,9.0,When the menace known as the Joker wreaks havo...
1375666,Inception,2010,Christopher Nolan,148,8.8,"A thief, who steals corporate secrets through ..."
137523,Fight Club,1999,David Fincher,139,8.8,"An insomniac office worker, looking for a way ..."
110912,Pulp Fiction,1994,Quentin Tarantino,154,8.9,"The lives of two mob hit men, a boxer, a gangs..."
109830,Forrest Gump,1994,Robert Zemeckis,142,8.8,"Forrest Gump, while not intelligent, has accid..."
120737,The Lord of the Rings: The Fellowship of the Ring,2001,Peter Jackson,178,8.8,A meek Hobbit from the Shire and eight compani...
133093,The Matrix,1999,Lana Wachowski,136,8.7,A computer hacker learns from mysterious rebel...
167260,The Lord of the Rings: The Return of the King,2003,Peter Jackson,201,8.9,Gandalf and Aragorn lead the World of Men agai...
68646,The Godfather,1972,Francis Ford Coppola,175,9.2,The aging patriarch of an organized crime dyna...


In [11]:
filmi[(filmi.leto > 2010) & (filmi.ocena >= 8.5)]

Unnamed: 0_level_0,naslov,leto,reziser,dolzina,ocena,opis
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1345836,The Dark Knight Rises,2012,Christopher Nolan,164,8.5,Eight years after the Joker's reign of anarchy...
1853728,Django Unchained,2012,Quentin Tarantino,165,8.5,"With the help of a German bounty hunter, a fre..."
816692,Interstellar,2014,Christopher Nolan,169,8.6,A team of explorers travel through a wormhole ...
1675434,The Intouchables,2011,Olivier Nakache,112,8.6,After he becomes a quadriplegic from a paragli...
2582802,Whiplash,2014,Damien Chazelle,107,8.5,A promising young drummer enrolls at a cut-thr...


Razpredelnico urejamo z metodo `.sort_values`, ki ji podamo ime ali seznam imen stolpcev, po katerih želimo urejati. Po želji lahko tudi povemo, kateri stolpci naj bodo urejeni naraščajoče in kateri padajoče.

In [12]:
filmi.sort_values('leto')

Unnamed: 0_level_0,naslov,leto,reziser,dolzina,ocena,opis
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
12349,The Kid,1921,Charles Chaplin,68,8.3,"The Tramp cares for an abandoned child, but ev..."
13442,Nosferatu,1922,F.W. Murnau,81,8.0,Vampire Count Orlok expresses interest in a ne...
15864,The Gold Rush,1925,Charles Chaplin,95,8.2,A prospector goes to the Klondike in search of...
17925,The General,1926,Clyde Bruckman,67,8.2,When Union spies steal an engineer's beloved l...
17136,Metropolis,1927,Fritz Lang,153,8.3,In a futuristic city sharply divided between t...
22100,M,1931,Fritz Lang,99,8.4,When the police in a German city are unable to...
21749,City Lights,1931,Charles Chaplin,87,8.6,"With the aid of a wealthy erratic tippler, a d..."
24216,King Kong,1933,Merian C. Cooper,100,8.0,A film crew goes to a tropical island for an e...
25316,It Happened One Night,1934,Frank Capra,105,8.2,"A spoiled heiress, running away from her famil..."
27977,Modern Times,1936,Charles Chaplin,87,8.6,The Tramp struggles to live in modern industri...


In [13]:
# najprej uredi padajoče po oceni, pri vsaki oceni pa še naraščajoče po letu
filmi.sort_values(['ocena', 'leto'], ascending=[False, True])

Unnamed: 0_level_0,naslov,leto,reziser,dolzina,ocena,opis
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
111161,The Shawshank Redemption,1994,Frank Darabont,142,9.3,Two imprisoned men bond over a number of years...
68646,The Godfather,1972,Francis Ford Coppola,175,9.2,The aging patriarch of an organized crime dyna...
71562,The Godfather: Part II,1974,Francis Ford Coppola,202,9.0,The early life and career of Vito Corleone in ...
468569,The Dark Knight,2008,Christopher Nolan,152,9.0,When the menace known as the Joker wreaks havo...
50083,12 Angry Men,1957,Sidney Lumet,96,8.9,A jury holdout attempts to prevent a miscarria...
60196,"The Good, the Bad and the Ugly",1966,Sergio Leone,161,8.9,A bounty hunting scam joins two men in an unea...
108052,Schindler's List,1993,Steven Spielberg,195,8.9,"In German-occupied Poland during World War II,..."
110912,Pulp Fiction,1994,Quentin Tarantino,154,8.9,"The lives of two mob hit men, a boxer, a gangs..."
167260,The Lord of the Rings: The Return of the King,2003,Peter Jackson,201,8.9,Gandalf and Aragorn lead the World of Men agai...
80684,Star Wars: Episode V - The Empire Strikes Back,1980,Irvin Kershner,124,8.8,After the rebels have been brutally overpowere...


### Skupine

Z metodo `.groupby` ustvarimo razpredelnico posebne vrste, v katerem so vrstice združene glede na skupno lastnost.

In [14]:
filmi_po_letih = filmi.groupby('leto')

In [15]:
# povprečna ocena vsakega leta
filmi_po_letih['ocena'].mean()

leto
1921    8.300000
1922    8.000000
1925    8.200000
1926    8.200000
1927    8.300000
1931    8.500000
1933    8.000000
1934    8.200000
1936    8.600000
1937    7.700000
          ...   
2007    6.844628
2008    6.635897
2009    6.727200
2010    6.683471
2011    6.676515
2012    6.732743
2013    6.774815
2014    6.819048
2015    6.885185
2016    6.687097
Name: ocena, dtype: float64

In [16]:
# če želimo, lahko združujemo tudi po izračunanih lastnostih
filmi['desetletje'] = 10 * (filmi.leto // 10)
filmi

Unnamed: 0_level_0,naslov,leto,reziser,dolzina,ocena,opis,desetletje
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
111161,The Shawshank Redemption,1994,Frank Darabont,142,9.3,Two imprisoned men bond over a number of years...,1990
468569,The Dark Knight,2008,Christopher Nolan,152,9.0,When the menace known as the Joker wreaks havo...,2000
1375666,Inception,2010,Christopher Nolan,148,8.8,"A thief, who steals corporate secrets through ...",2010
137523,Fight Club,1999,David Fincher,139,8.8,"An insomniac office worker, looking for a way ...",1990
110912,Pulp Fiction,1994,Quentin Tarantino,154,8.9,"The lives of two mob hit men, a boxer, a gangs...",1990
109830,Forrest Gump,1994,Robert Zemeckis,142,8.8,"Forrest Gump, while not intelligent, has accid...",1990
120737,The Lord of the Rings: The Fellowship of the Ring,2001,Peter Jackson,178,8.8,A meek Hobbit from the Shire and eight compani...,2000
133093,The Matrix,1999,Lana Wachowski,136,8.7,A computer hacker learns from mysterious rebel...,1990
167260,The Lord of the Rings: The Return of the King,2003,Peter Jackson,201,8.9,Gandalf and Aragorn lead the World of Men agai...,2000
68646,The Godfather,1972,Francis Ford Coppola,175,9.2,The aging patriarch of an organized crime dyna...,1970


In [17]:
filmi_po_desetletjih = filmi.groupby('desetletje')

Preštejemo, koliko filmov je bilo v vsakem desetletju. Pri vsakem stolpcu dobimo iste številke, ker imamo v vsakem stolpcu enako vnosov. Če bi kje kakšen podatek manjkal, bi bila številka manjša.

In [18]:
filmi_po_desetletjih.count()

Unnamed: 0_level_0,naslov,leto,reziser,dolzina,ocena,opis
desetletje,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1920,5,5,5,5,5,5
1930,9,9,9,9,9,9
1940,20,20,20,20,20,20
1950,35,35,35,35,35,35
1960,52,52,52,52,52,52
1970,68,68,68,68,68,68
1980,183,183,183,183,183,183
1990,435,435,435,435,435,434
2000,975,975,975,975,975,975
2010,718,718,718,718,718,717


Če želimo dobiti le število članov posamezne skupine, uporabimo metodo `.size()`. V tem primeru dobimo le stolpec, ne razpredelnice.

In [19]:
filmi_po_desetletjih.size()

desetletje
1920      5
1930      9
1940     20
1950     35
1960     52
1970     68
1980    183
1990    435
2000    975
2010    718
dtype: int64

Pogledamo povprečja vsakega desetletja. Dobimo povprečno oceno, povprečno dolžino in povprečno leto (torej povprečno leto izdaje vsakega filma v danem desetletju – vidimo, da je bilo v vseh desetletjih razen v petdesetih več boljših filmov v prvi polovici). Povprečnega naslova ne dobimo, ker se ga ne da izračunati, zato ustreznega stolpca ni.

In [20]:
filmi_po_desetletjih.mean()

Unnamed: 0_level_0,leto,dolzina,ocena
desetletje,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1920,1924.2,92.8,8.2
1930,1935.444444,114.444444,8.222222
1940,1943.3,106.1,8.1
1950,1954.6,109.028571,8.091429
1960,1964.365385,128.769231,7.921154
1970,1975.323529,120.382353,7.75
1980,1985.404372,111.136612,7.340984
1990,1995.503448,115.349425,6.991724
2000,2005.066667,111.705641,6.784821
2010,2012.470752,112.168524,6.749861
