### Data operations

In [1]:
import pandas as pd

titles = pd.read_csv('titles.csv', index_col = None)
titles.head(3)

Unnamed: 0,title,year
0,The Rising Son,1990
1,The Thousand Plane Raid,1969
2,Crucea de piatra,1993


In [2]:
casts = pd.read_csv('cast.csv', index_col = None)
casts.head(3)

Unnamed: 0,title,year,name,type,character,n
0,Closet Monster,2015,Buffy #1,actor,Buffy 4,31.0
1,Suuri illusioni,1985,Homo $,actor,Guests,22.0
2,Battle of the Sexes,2017,$hutter,actor,Bobby Riggs Fan,10.0


##### Row and column selection

In [3]:
t = titles['title']
t.head(10)

0                    The Rising Son
1           The Thousand Plane Raid
2                  Crucea de piatra
3                           Country
4                        Gaiking II
5                       Medusa (IV)
6    The Fresh Air Will Do You Good
7                Alex in Wonderland
8                        L'outsider
9            Do Outro Lado do Mundo
Name: title, dtype: object

In [4]:
titles.loc[0]

title    The Rising Son
year               1990
Name: 0, dtype: object

##### Filtering Data

In [5]:
titles.head()

Unnamed: 0,title,year
0,The Rising Son,1990
1,The Thousand Plane Raid,1969
2,Crucea de piatra,1993
3,Country,2000
4,Gaiking II,2011


In [6]:
after85 = titles[titles['year'] > 1985]   # movies after 1985 
after85.head()

Unnamed: 0,title,year
0,The Rising Son,1990
2,Crucea de piatra,1993
3,Country,2000
4,Gaiking II,2011
5,Medusa (IV),2015


##### Dispaly movies  in years 1990 - 1995 

In [7]:
t = titles
movies_90 = t[(t['year'] >= 1990) & (t['year'] <= 1992)]
movies_90.head(10)

Unnamed: 0,title,year
0,The Rising Son,1990
27,Arrive Alive,1990
32,Der Brocken,1992
36,The Neverending Story II: The Next Chapter,1990
47,Torn Apart,1990
92,Thieves of Fortune,1990
100,Luchadores de las estrellas,1992
140,Nyaya Anyaya,1990
142,Vacaciones de terror 2,1991
150,Ahankari,1992


###  Sorting
Sorting can be performed using ‘sort_index’ or ‘sort_values’ keywords

In [8]:
# Find all the movies named as Macbeth
t = titles
macbeth = t[t['title'] == 'Macbeth']
macbeth

Unnamed: 0,title,year
4226,Macbeth,1913
9322,Macbeth,2006
11722,Macbeth,2013
17166,Macbeth,1997
25847,Macbeth,1998


In [9]:
# sorting by a column name 
t = titles
macbeth = t[t['title'] == 'Macbeth'].sort_values('year')
macbeth

Unnamed: 0,title,year
4226,Macbeth,1913
17166,Macbeth,1997
25847,Macbeth,1998
9322,Macbeth,2006
11722,Macbeth,2013


In [10]:
t.head(10)

Unnamed: 0,title,year
0,The Rising Son,1990
1,The Thousand Plane Raid,1969
2,Crucea de piatra,1993
3,Country,2000
4,Gaiking II,2011
5,Medusa (IV),2015
6,The Fresh Air Will Do You Good,2008
7,Alex in Wonderland,1970
8,L'outsider,2016
9,Do Outro Lado do Mundo,2008


In [11]:
t.loc[3:8]

Unnamed: 0,title,year
3,Country,2000
4,Gaiking II,2011
5,Medusa (IV),2015
6,The Fresh Air Will Do You Good,2008
7,Alex in Wonderland,1970
8,L'outsider,2016


##### Null values 

display the rows with null values, the condition must be passed in the DataFrame

In [12]:
c = casts
c_null = c[c['n'].isnull()]
c_null.head(10)

Unnamed: 0,title,year,name,type,character,n
3,Secret in Their Eyes,2015,$hutter,actor,2002 Dodger Fan,
4,Steve Jobs,2015,$hutter,actor,1988 Opera House Patron,
5,Straight Outta Compton,2015,$hutter,actor,Club Patron,
6,Straight Outta Compton,2015,$hutter,actor,Dopeman,
7,For Thy Love 2,2009,Bee Moe $lim,actor,Thug 1,
9,Desire (III),2014,Syaiful 'Ariffin,actor,Actor Playing Eteocles from 'Antigone',
12,Mixing Nia,1998,Michael 'babeepower' Viera,actor,Rapper,
13,The Replacements,2000,Steven 'Bear'Boyd,actor,Defensive Tackle - Washington Sentinels,
14,All Out Dysfunktion!,2016,Kirlew 'bliss' Vilbon,actor,Bliss,
15,Gook,2017,Kirlew 'bliss' Vilbon,actor,Bliss,


NaN values can be fll by using fllna, fll(forward fll), and bfll(backward fll) etc. In below code,
‘NaN’ values are replace by NA

In [13]:
c_fill = c[c['n'].isnull()].fillna('NA')
c_fill.head(10)

Unnamed: 0,title,year,name,type,character,n
3,Secret in Their Eyes,2015,$hutter,actor,2002 Dodger Fan,
4,Steve Jobs,2015,$hutter,actor,1988 Opera House Patron,
5,Straight Outta Compton,2015,$hutter,actor,Club Patron,
6,Straight Outta Compton,2015,$hutter,actor,Dopeman,
7,For Thy Love 2,2009,Bee Moe $lim,actor,Thug 1,
9,Desire (III),2014,Syaiful 'Ariffin,actor,Actor Playing Eteocles from 'Antigone',
12,Mixing Nia,1998,Michael 'babeepower' Viera,actor,Rapper,
13,The Replacements,2000,Steven 'Bear'Boyd,actor,Defensive Tackle - Washington Sentinels,
14,All Out Dysfunktion!,2016,Kirlew 'bliss' Vilbon,actor,Bliss,
15,Gook,2017,Kirlew 'bliss' Vilbon,actor,Bliss,


### String Operations

Various String operations can be performed using str

In [14]:
t = titles
t.head()

Unnamed: 0,title,year
0,The Rising Son,1990
1,The Thousand Plane Raid,1969
2,Crucea de piatra,1993
3,Country,2000
4,Gaiking II,2011


In [15]:
t = t[t['title'] == 'Maa']
t

Unnamed: 0,title,year
38880,Maa,1968


In [20]:
x = t[t['title'].str.startswith("Maa ")].head(10)
x

Unnamed: 0,title,year
19,Maa Durga Shakti,1999
3046,Maa Aur Mamta,1970
7470,Maa Vaibhav Laxmi,1989
7933,Maa Kande Aaji Puate Pain,2002
17197,Maa al-Khatar,2016
23807,Maa O Mamata,1990
32698,Maa Beti,1986
33290,Maa Mate Shakti De,1990
42463,Maa Dasha Maa,1987
43448,Maa Balaji,1999


##### Count Values

In [21]:
t = titles
t['year'].value_counts().head(10)

2016    2363
2017    2138
2015    1849
2014    1701
2013    1609
2012    1500
2011    1457
2010    1377
2009    1305
2008    1070
Name: year, dtype: int64