# Reading Data from a csv file

To read data from a csv file, we need to import the pandas library. For convenience, we will use the short term 'pd' as a placeholder for pandas.

In [2]:
import pandas as pd

The `read_csv` function helps read the data from a csv file. The following dataset is downloaded from Kaggle: https://www.kaggle.com/jrobischon/wikipedia-movie-plots/
        
Tip: Make sure that the IPython notebook and the csv file are in the same folder. If not, include the path.

In [3]:
df = pd.read_csv('wiki_movie_plots_deduped.csv')

The `head()` shows the first five entries of the Pandas dataframe.

In [4]:
df.head()

Unnamed: 0,Release Year,Title,Origin/Ethnicity,Director,Cast,Genre,Wiki Page,Plot
0,1901,Kansas Saloon Smashers,American,Unknown,,unknown,https://en.wikipedia.org/wiki/Kansas_Saloon_Sm...,"A bartender is working at a saloon, serving dr..."
1,1901,Love by the Light of the Moon,American,Unknown,,unknown,https://en.wikipedia.org/wiki/Love_by_the_Ligh...,"The moon, painted with a smiling face hangs ov..."
2,1901,The Martyred Presidents,American,Unknown,,unknown,https://en.wikipedia.org/wiki/The_Martyred_Pre...,"The film, just over a minute long, is composed..."
3,1901,"Terrible Teddy, the Grizzly King",American,Unknown,,unknown,"https://en.wikipedia.org/wiki/Terrible_Teddy,_...",Lasting just 61 seconds and consisting of two ...
4,1902,Jack and the Beanstalk,American,"George S. Fleming, Edwin S. Porter",,unknown,https://en.wikipedia.org/wiki/Jack_and_the_Bea...,The earliest known adaptation of the classic f...


# Selecting a Column

There are multiple ways of selecting a column. An efficient and easy method is to call the column by its name.

In [5]:
df['Title']

0                                   Kansas Saloon Smashers
1                            Love by the Light of the Moon
2                                  The Martyred Presidents
3                         Terrible Teddy, the Grizzly King
4                                   Jack and the Beanstalk
5                                      Alice in Wonderland
6                                  The Great Train Robbery
7                                          The Suburbanite
8                                 The Little Train Robbery
9                               The Night Before Christmas
10                                Dream of a Rarebit Fiend
11       From Leadville to Aspen: A Hold-Up in the Rockies
12                                     Kathleen Mavourneen
13                                            Daniel Boone
14                         How Brown Saw the Baseball Game
15                                            Laughing Gas
16                                The Adventures of Doll

Again, the `head()` function shows the first five entries.

In [6]:
df['Title'].head()

0              Kansas Saloon Smashers
1       Love by the Light of the Moon
2             The Martyred Presidents
3    Terrible Teddy, the Grizzly King
4              Jack and the Beanstalk
Name: Title, dtype: object

The following snippet selects a specific column and the first 8 rows for that column.

In [7]:
df['Title'][:8]

0              Kansas Saloon Smashers
1       Love by the Light of the Moon
2             The Martyred Presidents
3    Terrible Teddy, the Grizzly King
4              Jack and the Beanstalk
5                 Alice in Wonderland
6             The Great Train Robbery
7                     The Suburbanite
Name: Title, dtype: object

# Selecting Multiple Columns

Here, we select multiple columns by calling their names.

In [8]:
df[['Title', 'Director', 'Cast', 'Genre']].head()

Unnamed: 0,Title,Director,Cast,Genre
0,Kansas Saloon Smashers,Unknown,,unknown
1,Love by the Light of the Moon,Unknown,,unknown
2,The Martyred Presidents,Unknown,,unknown
3,"Terrible Teddy, the Grizzly King",Unknown,,unknown
4,Jack and the Beanstalk,"George S. Fleming, Edwin S. Porter",,unknown


### Another interesting way is to use `iloc` function to select columns.

df.iloc[row selection , column selection]

Integer list selection = [ 1, 2, 3 ]

Slice or range selection = [ 4 : 7 ]

Single value selection = 4

Select all rows or columns = [ : ]

In [9]:
df.iloc[:, [1,2,3]].head()

Unnamed: 0,Title,Origin/Ethnicity,Director
0,Kansas Saloon Smashers,American,Unknown
1,Love by the Light of the Moon,American,Unknown
2,The Martyred Presidents,American,Unknown
3,"Terrible Teddy, the Grizzly King",American,Unknown
4,Jack and the Beanstalk,American,"George S. Fleming, Edwin S. Porter"


In [10]:
df.iloc[:, 3:6].head()

Unnamed: 0,Director,Cast,Genre
0,Unknown,,unknown
1,Unknown,,unknown
2,Unknown,,unknown
3,Unknown,,unknown
4,"George S. Fleming, Edwin S. Porter",,unknown


In [11]:
df.iloc[:, 7].head()

0    A bartender is working at a saloon, serving dr...
1    The moon, painted with a smiling face hangs ov...
2    The film, just over a minute long, is composed...
3    Lasting just 61 seconds and consisting of two ...
4    The earliest known adaptation of the classic f...
Name: Plot, dtype: object

In [12]:
df.iloc[:,:].head()

Unnamed: 0,Release Year,Title,Origin/Ethnicity,Director,Cast,Genre,Wiki Page,Plot
0,1901,Kansas Saloon Smashers,American,Unknown,,unknown,https://en.wikipedia.org/wiki/Kansas_Saloon_Sm...,"A bartender is working at a saloon, serving dr..."
1,1901,Love by the Light of the Moon,American,Unknown,,unknown,https://en.wikipedia.org/wiki/Love_by_the_Ligh...,"The moon, painted with a smiling face hangs ov..."
2,1901,The Martyred Presidents,American,Unknown,,unknown,https://en.wikipedia.org/wiki/The_Martyred_Pre...,"The film, just over a minute long, is composed..."
3,1901,"Terrible Teddy, the Grizzly King",American,Unknown,,unknown,"https://en.wikipedia.org/wiki/Terrible_Teddy,_...",Lasting just 61 seconds and consisting of two ...
4,1902,Jack and the Beanstalk,American,"George S. Fleming, Edwin S. Porter",,unknown,https://en.wikipedia.org/wiki/Jack_and_the_Bea...,The earliest known adaptation of the classic f...


# Selecting a Row

All the functions such as calling by range selection, name, iloc or loc functions can be used to select rows.

In [13]:
df[:3]

Unnamed: 0,Release Year,Title,Origin/Ethnicity,Director,Cast,Genre,Wiki Page,Plot
0,1901,Kansas Saloon Smashers,American,Unknown,,unknown,https://en.wikipedia.org/wiki/Kansas_Saloon_Sm...,"A bartender is working at a saloon, serving dr..."
1,1901,Love by the Light of the Moon,American,Unknown,,unknown,https://en.wikipedia.org/wiki/Love_by_the_Ligh...,"The moon, painted with a smiling face hangs ov..."
2,1901,The Martyred Presidents,American,Unknown,,unknown,https://en.wikipedia.org/wiki/The_Martyred_Pre...,"The film, just over a minute long, is composed..."


In [14]:
df.loc[4]

Release Year                                                     1902
Title                                          Jack and the Beanstalk
Origin/Ethnicity                                             American
Director                           George S. Fleming, Edwin S. Porter
Cast                                                              NaN
Genre                                                         unknown
Wiki Page           https://en.wikipedia.org/wiki/Jack_and_the_Bea...
Plot                The earliest known adaptation of the classic f...
Name: 4, dtype: object

In [15]:
df.iloc[5]

Release Year                                                     1903
Title                                             Alice in Wonderland
Origin/Ethnicity                                             American
Director                                               Cecil Hepworth
Cast                                                        May Clark
Genre                                                         unknown
Wiki Page           https://en.wikipedia.org/wiki/Alice_in_Wonderl...
Plot                Alice follows a large white rabbit down a "Rab...
Name: 5, dtype: object