# Analiza podatkov s pandas

[Pandas quick-start guide](http://pandas.pydata.org/pandas-docs/stable/10min.html)  
[Pandas documentation](http://pandas.pydata.org/pandas-docs/stable/)  
[Lecture notes on pandas](../predavanja/Analiza podatkov s knjižnico Pandas.ipynb)


### Naložimo pandas in podatke

In [12]:
# naložimo paket
import pandas as pd
import os.path

# ker bomo delali z velikimi razpredelnicami, povemo, da naj se vedno izpiše le 10 vrstic
pd.options.display.max_rows = 10

# izberemo interaktivni "notebook" stil risanja
%matplotlib notebook
# naložimo razpredelnico, s katero bomo delali

pot_do_filmov = os.path.join("../../","02-zajem-podatkov","predavanja", "obdelani-podatki", "filmi.csv")

filmi = pd.read_csv(pot_do_filmov)

Poglejmo si podatke.

In [13]:
filmi

Unnamed: 0,id,naslov,dolzina,leto,ocena,metascore,glasovi,zasluzek,oznaka,opis
0,4972,The Birth of a Nation,195,1915,6.3,,22363,10000000.0,,The Stoneman family finds its friendship with ...
1,6864,Intolerance: Love's Struggle Throughout the Ages,163,1916,7.7,99.0,13970,2180000.0,,"The story of a poor young woman, separated by ..."
2,9968,Broken Blossoms or The Yellow Man and the Girl,90,1919,7.3,,9296,,,"A frail waif, abused by her brutal boxer fathe..."
3,10323,Das Cabinet des Dr. Caligari,76,1920,8.1,,56089,,,"Hypnotist Dr. Caligari uses a somnambulist, Ce..."
4,12349,The Kid,68,1921,8.3,,110278,5450000.0,,"The Tramp cares for an abandoned child, but ev..."
...,...,...,...,...,...,...,...,...,...,...
9995,11390036,A Fall from Grace,115,2020,5.8,34.0,10414,,,"Disheartened since her ex-husband's affair, Gr..."
9996,11905962,Sputnik,113,2020,6.3,61.0,8285,,,The lone survivor of an enigmatic spaceship in...
9997,12393526,Bulbbul,94,2020,6.6,,8381,,,A man returns home after years to find his bro...
9998,12567088,Raat Akeli Hai,149,2020,7.3,,12232,,,The film follows a small town cop who is summo...


## Proučevanje podatkov

Razvrstite podatke po ocenah.

In [16]:
filmi.sort_values("ocena", ascending=True)

Unnamed: 0,id,naslov,dolzina,leto,ocena,metascore,glasovi,zasluzek,oznaka,opis
9739,7221896,Cumali Ceber: Allah Seni Alsin,100,2017,1.0,,37659,,,Cumali Ceber goes to a vacation with his child...
9825,7886848,Sadak 2,133,2020,1.1,,57957,,,"The film picks up where Sadak left off, revolv..."
8983,4009460,Saving Christmas,79,2014,1.4,18.0,14855,2783970.0,PG,His annual Christmas party faltering thanks to...
9505,5988370,Reis,108,2017,1.4,,72207,,,A drama about the early life of Recep Tayyip E...
9513,6038600,Smolensk,120,2016,1.4,,7630,,,Inspired by true events of 2010 Polish Air For...
...,...,...,...,...,...,...,...,...,...,...
4059,252488,Hababam Sinifi Sinifta Kaldi,91,1976,9.0,,21288,,,A young and beautiful female teacher starts wo...
9355,5354160,Aynabaji,147,2016,9.1,,21429,,,Ayna is an actor and the prison is his stage. ...
908,68646,Boter,175,1972,9.2,100.0,1582906,134966411.0,,The aging patriarch of an organized crime dyna...
4058,252487,Hababam Sinifi,87,1975,9.3,,36468,,,"Lazy, uneducated students share a very close b..."


Poberite stolpec ocen.

In [20]:
ocene = filmi["ocena"]
ocene

0       6.3
1       7.7
2       7.3
3       8.1
4       8.3
       ... 
9995    5.8
9996    6.3
9997    6.6
9998    7.3
9999    6.6
Name: ocena, Length: 10000, dtype: float64

Ukaza `filmi['ocena']` in `filmi[['ocena']]` sta različna:

In [21]:
print(type(filmi['ocena']))
print(type(filmi[['ocena']]))

<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>


Stolpci objekta `DataFrame` so tipa `Series`. Z enojnimi oklepaji poberemo `Series`, z dvojnimi oklepaji pa `DataFrame` podtabelo. Večina operacij (grouping, joining, plotting,  filtering, ...) deluje na `DataFrame`. 

Tip `Series` se uporablja ko želimo npr. dodati stolpec.

Zaokrožite stolpec ocen z funkcijo `round()`.

In [23]:
zaokrozene = round(ocene)
zaokrozene

0       6.0
1       8.0
2       7.0
3       8.0
4       8.0
       ... 
9995    6.0
9996    6.0
9997    7.0
9998    7.0
9999    7.0
Name: ocena, Length: 10000, dtype: float64

Vidimo, da zaokrozene ocene ustrezajo ....

Dodajte zaokrožene vrednosti v podatke.

In [25]:
filmi["zaokrozeno"] = zaokrozene
filmi

Unnamed: 0,id,naslov,dolzina,leto,ocena,metascore,glasovi,zasluzek,oznaka,opis,zaokrozeno
0,4972,The Birth of a Nation,195,1915,6.3,,22363,10000000.0,,The Stoneman family finds its friendship with ...,6.0
1,6864,Intolerance: Love's Struggle Throughout the Ages,163,1916,7.7,99.0,13970,2180000.0,,"The story of a poor young woman, separated by ...",8.0
2,9968,Broken Blossoms or The Yellow Man and the Girl,90,1919,7.3,,9296,,,"A frail waif, abused by her brutal boxer fathe...",7.0
3,10323,Das Cabinet des Dr. Caligari,76,1920,8.1,,56089,,,"Hypnotist Dr. Caligari uses a somnambulist, Ce...",8.0
4,12349,The Kid,68,1921,8.3,,110278,5450000.0,,"The Tramp cares for an abandoned child, but ev...",8.0
...,...,...,...,...,...,...,...,...,...,...,...
9995,11390036,A Fall from Grace,115,2020,5.8,34.0,10414,,,"Disheartened since her ex-husband's affair, Gr...",6.0
9996,11905962,Sputnik,113,2020,6.3,61.0,8285,,,The lone survivor of an enigmatic spaceship in...,6.0
9997,12393526,Bulbbul,94,2020,6.6,,8381,,,A man returns home after years to find his bro...,7.0
9998,12567088,Raat Akeli Hai,149,2020,7.3,,12232,,,The film follows a small town cop who is summo...,7.0


Odstranite novo dodani stolpec z metodo `.drop()` z podanim `columns = ` argumentom.

In [29]:
filmi["zaokrozena ocena"] = zaokrozene
filmi

Unnamed: 0,id,naslov,dolzina,leto,ocena,metascore,glasovi,zasluzek,oznaka,opis,zaokrozena ocena
0,4972,The Birth of a Nation,195,1915,6.3,,22363,10000000.0,,The Stoneman family finds its friendship with ...,6.0
1,6864,Intolerance: Love's Struggle Throughout the Ages,163,1916,7.7,99.0,13970,2180000.0,,"The story of a poor young woman, separated by ...",8.0
2,9968,Broken Blossoms or The Yellow Man and the Girl,90,1919,7.3,,9296,,,"A frail waif, abused by her brutal boxer fathe...",7.0
3,10323,Das Cabinet des Dr. Caligari,76,1920,8.1,,56089,,,"Hypnotist Dr. Caligari uses a somnambulist, Ce...",8.0
4,12349,The Kid,68,1921,8.3,,110278,5450000.0,,"The Tramp cares for an abandoned child, but ev...",8.0
...,...,...,...,...,...,...,...,...,...,...,...
9995,11390036,A Fall from Grace,115,2020,5.8,34.0,10414,,,"Disheartened since her ex-husband's affair, Gr...",6.0
9996,11905962,Sputnik,113,2020,6.3,61.0,8285,,,The lone survivor of an enigmatic spaceship in...,6.0
9997,12393526,Bulbbul,94,2020,6.6,,8381,,,A man returns home after years to find his bro...,7.0
9998,12567088,Raat Akeli Hai,149,2020,7.3,,12232,,,The film follows a small town cop who is summo...,7.0


In [33]:
filmi.head(750).tail(50)

Unnamed: 0,id,naslov,dolzina,leto,ocena,metascore,glasovi,zasluzek,oznaka,opis,zaokrozena ocena
700,60666,Manos: The Hands of Fate,70,1966,1.9,,34568,,,A family gets lost on the road and stumbles up...,2.0
701,60675,Masculin féminin,110,1966,7.6,93.0,12341,200087.0,,"A romance between young Parisians, shown throu...",8.0
702,60782,One Million Years B.C.,100,1966,5.7,58.0,7634,,,Prehistoric man Tumak is banished from his sav...,6.0
703,60802,Ostre sledované vlaky,92,1966,7.7,,11207,3270000.0,,An apprentice train dispatcher at a village st...,8.0
704,60827,Persona,85,1966,8.1,86.0,100017,,,A nurse is put in charge of a mute actress and...,8.0
...,...,...,...,...,...,...,...,...,...,...,...
745,62480,Week End,105,1967,7.2,,12918,,,A surreal tale of a married couple going on a ...,7.0
746,62512,Le dvakrat živiš,117,1967,6.9,61.0,96053,43084787.0,PG,Secret Agent James Bond and the Japanese Secre...,7.0
747,62622,2001: A Space Odyssey,149,1968,8.3,84.0,591757,56954992.0,,After discovering a mysterious artifact buried...,8.0
748,62687,Astérix et Cléopâtre,72,1968,7.2,,11630,,,"Provoked, Cleopatra bets Caesar, that she can ...",7.0


### Opomba: slice
Izbira podtabele ustvari t.i. "rezino" oz. "slice".
Slice ni kopija tabele, temveč zgolj sklic na izvorno tabelo,
in je zato ne moremo spreminjati.
Če želimo kopijo, uporabimo metodo `.copy()` na rezini, ki jo nato lahko spreminjamo.


Izberite podtabelo s stolpci `naslov`, `leto`, in `glasovi`, kateri nato dodate solpec z zaokroženimi ocenami.

### Filtracija

Ustvarite filter, ki izbere filme, ki so izšli pred 1930, in filter za filme po 2017.
Združite ju za izbor filmov, ki so izšli pred 1930 ali po 2017.

Definirajte funkcijo, ki preveri ali niz vsebuje kvečjemu dve besedi. Nato s pomočjo `.apply()` izberite vse filme z imeni krajšimi od dveh besed in oceno nad 8.

### Histogrami

Združite filme po ocenah in jih preštejte.

Naredite stolpični diagram teh podatkov.

Tabele imajo metodo `.hist()`, ki omogoča izgradnjo histogramov za stolpce. Uporabite to metodo za prikaz poenostavljenih podatkov.

### Izris povprečne dolžine filma glede na leto

### Izris skupnega zasluzka za posamezno leto